dgl.sampling.sample_neighbors

dgl.sampling.sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False, copy_ndata=True, copy_edata=True, _dist_training=False, exclude_edges=None, output_device=None)[source]

Sample neighboring edges of the given nodes and return the induced subgraph.

For each node, a number of inbound (or outbound when edge_dir == 'out') edges will be randomly chosen. The graph returned will then contain all the nodes in the original graph, but only the sampled edges.

Node/edge features are not preserved. The original IDs of the sampled edges are stored as the dgl.EID feature in the returned graph.

GPU sampling is supported for this function. Refer to 6.8 Using GPU for Neighborhood Sampling for more details.

Parameters:
  • g (DGLGraph) – The graph. Can be either on CPU or GPU.

  • nodes (tensor or dict) –

    Node IDs to sample neighbors from.

    This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes.

  • fanout (int or dict[etype, int]) –

    The number of edges to be sampled for each node on each edge type.

    This argument can take a single int or a dictionary of edge types and ints. If a single int is given, DGL will sample this number of edges for each node for every edge type.

    If -1 is given for a single edge type, all the neighboring edges with that edge type and non-zero probability will be selected.

  • edge_dir (str, optional) –

    Determines whether to sample inbound or outbound edges.

    Can take either in for inbound edges or out for outbound edges.

  • prob (str, optional) –

    Feature name used as the (unnormalized) probabilities associated with each neighboring edge of a node. The feature must have only one element for each edge.

    The features must be non-negative floats or boolean. Otherwise, the result will be undefined.

  • exclude_edges (tensor or dict) –

    Edge IDs to exclude during sampling neighbors for the seed nodes.

    This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes.

  • replace (bool, optional) – If True, sample with replacement.

  • copy_ndata (bool, optional) –

    If True, the node features of the new graph are copied from the original graph. If False, the new graph will not have any node features.

    (Default: True)

  • copy_edata (bool, optional) –

    If True, the edge features of the new graph are copied from the original graph. If False, the new graph will not have any edge features.

    (Default: True)

  • _dist_training (bool, optional) –

    Internal argument. Do not use.

    (Default: False)

  • output_device (Framework-specific device context object, optional) – The output device. Default is the same as the input graph.

Returns:

A sampled subgraph containing only the sampled neighboring edges.

Return type:

DGLGraph

Notes

If copy_ndata or copy_edata is True, same tensors are used as the node or edge features of the original graph and the new graph. As a result, users should avoid performing in-place operations on the node features of the new graph to avoid feature corruption.

Examples

Assume that you have the following graph

>>> g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))

And the weights

>>> g.edata['prob'] = torch.FloatTensor([0., 1., 0., 1., 0., 1.])

To sample one inbound edge for node 0 and node 1:

>>> sg = dgl.sampling.sample_neighbors(g, [0, 1], 1)
>>> sg.edges(order='eid')
(tensor([1, 0]), tensor([0, 1]))
>>> sg.edata[dgl.EID]
tensor([2, 0])

To sample one inbound edge for node 0 and node 1 with probability in edge feature prob:

>>> sg = dgl.sampling.sample_neighbors(g, [0, 1], 1, prob='prob')
>>> sg.edges(order='eid')
(tensor([2, 1]), tensor([0, 1]))

With fanout greater than the number of actual neighbors and without replacement, DGL will take all neighbors instead:

>>> sg = dgl.sampling.sample_neighbors(g, [0, 1], 3)
>>> sg.edges(order='eid')
(tensor([1, 2, 0, 1]), tensor([0, 0, 1, 1]))

To exclude certain EID’s during sampling for the seed nodes:

>>> g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))
>>> g_edges = g.all_edges(form='all')``
(tensor([0, 0, 1, 1, 2, 2]), tensor([1, 2, 0, 1, 2, 0]), tensor([0, 1, 2, 3, 4, 5]))
>>> sg = dgl.sampling.sample_neighbors(g, [0, 1], 3, exclude_edges=[0, 1, 2])
>>> sg.all_edges(form='all')
(tensor([2, 1]), tensor([0, 1]), tensor([0, 1]))
>>> sg.has_edges_between(g_edges[0][:3],g_edges[1][:3])
tensor([False, False, False])
>>> g = dgl.heterograph({
...   ('drug', 'interacts', 'drug'): ([0, 0, 1, 1, 3, 2], [1, 2, 0, 1, 2, 0]),
...   ('drug', 'interacts', 'gene'): ([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]),
...   ('drug', 'treats', 'disease'): ([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0])})
>>> g_edges = g.all_edges(form='all', etype=('drug', 'interacts', 'drug'))
(tensor([0, 0, 1, 1, 3, 2]), tensor([1, 2, 0, 1, 2, 0]), tensor([0, 1, 2, 3, 4, 5]))
>>> excluded_edges  = {('drug', 'interacts', 'drug'): g_edges[2][:3]}
>>> sg = dgl.sampling.sample_neighbors(g, {'drug':[0, 1]}, 3, exclude_edges=excluded_edges)
>>> sg.all_edges(form='all', etype=('drug', 'interacts', 'drug'))
(tensor([2, 1]), tensor([0, 1]), tensor([0, 1]))
>>> sg.has_edges_between(g_edges[0][:3],g_edges[1][:3],etype=('drug', 'interacts', 'drug'))
tensor([False, False, False])