Graph samplers¶

dgl.contrib.sampling.sampler.NeighborSampler(g, batch_size, expand_factor, num_hops=1, neighbor_type='in', node_prob=None, seed_nodes=None, shuffle=False, num_workers=1, max_subgraph_size=None, return_seed_id=False)[source]¶

Create a sampler that samples neighborhood.

Note

This method currently only supports MXNet backend. Set “DGLBACKEND” environment variable to “mxnet”.

This creates a subgraph data loader that samples subgraphs from the input graph with neighbor sampling. This simpling method is implemented in C and can perform sampling very efficiently.

A subgraph grows from a seed vertex. It contains sampled neighbors of the seed vertex as well as the edges that connect neighbor nodes with seed nodes. When the number of hops is k (>1), the neighbors are sampled from the k-hop neighborhood. In this case, the sampled edges are the ones that connect the source nodes and the sampled neighbor nodes of the source nodes.

The subgraph loader returns a list of subgraphs and a dictionary of additional information about the subgraphs. The size of the subgraph list is the number of workers. The dictionary contains:

‘seeds’: a list of 1D tensors of seed Ids, if return_seed_id is True.

Parameters:	g (the DGLGraph where we sample subgraphs.) – batch_size (The number of subgraphs in a batch.) – expand_factor (the number of neighbors sampled from the neighbor list) – of a vertex. The value of this parameter can be an integer: indicates the number of neighbors sampled from a neighbor list. a floating-point: indicates the ratio of the sampled neighbors in a neighbor list. string: indicates some common ways of calculating the number of sampled neighbors, e.g., ‘sqrt(deg)’. num_hops (The size of the neighborhood where we sample vertices.) – neighbor_type (indicates the neighbors on different types of edges.) – “in” means the neighbors on the in-edges, “out” means the neighbors on the out-edges and “both” means neighbors on both types of edges. node_prob (the probability that a neighbor node is sampled.) – 1D Tensor. None means uniform sampling. Otherwise, the number of elements should be the same as the number of vertices in the graph. seed_nodes (a list of nodes where we sample subgraphs from.) – If it’s None, the seed vertices are all vertices in the graph. shuffle (indicates the sampled subgraphs are shuffled.) – num_workers (the number of worker threads that sample subgraphs in parallel.) – max_subgraph_size (the maximal subgraph size in terms of the number of nodes.) – GPU doesn’t support very large subgraphs. return_seed_id (indicates whether to return seed ids along with the subgraphs.) – The seed Ids are in the parent graph.
Returns:	The iterator returns a list of batched subgraphs and a dictionary of additional information about the subgraphs.
Return type:	A subgraph iterator