Graph samplers¶
-
dgl.contrib.sampling.sampler.
NeighborSampler
(g, batch_size, expand_factor, num_hops=1, neighbor_type='in', node_prob=None, seed_nodes=None, shuffle=False, num_workers=1, max_subgraph_size=None, return_seed_id=False)[source]¶ Create a sampler that samples neighborhood.
Note
This method currently only supports MXNet backend. Set “DGLBACKEND” environment variable to “mxnet”.
This creates a subgraph data loader that samples subgraphs from the input graph with neighbor sampling. This simpling method is implemented in C and can perform sampling very efficiently.
A subgraph grows from a seed vertex. It contains sampled neighbors of the seed vertex as well as the edges that connect neighbor nodes with seed nodes. When the number of hops is k (>1), the neighbors are sampled from the k-hop neighborhood. In this case, the sampled edges are the ones that connect the source nodes and the sampled neighbor nodes of the source nodes.
The subgraph loader returns a list of subgraphs and a dictionary of additional information about the subgraphs. The size of the subgraph list is the number of workers. The dictionary contains:
‘seeds’: a list of 1D tensors of seed Ids, if return_seed_id is True.Parameters: - g (the DGLGraph where we sample subgraphs.) –
- batch_size (The number of subgraphs in a batch.) –
- expand_factor (the number of neighbors sampled from the neighbor list) – of a vertex. The value of this parameter can be an integer: indicates the number of neighbors sampled from a neighbor list. a floating-point: indicates the ratio of the sampled neighbors in a neighbor list. string: indicates some common ways of calculating the number of sampled neighbors, e.g., ‘sqrt(deg)’.
- num_hops (The size of the neighborhood where we sample vertices.) –
- neighbor_type (indicates the neighbors on different types of edges.) – “in” means the neighbors on the in-edges, “out” means the neighbors on the out-edges and “both” means neighbors on both types of edges.
- node_prob (the probability that a neighbor node is sampled.) – 1D Tensor. None means uniform sampling. Otherwise, the number of elements should be the same as the number of vertices in the graph.
- seed_nodes (a list of nodes where we sample subgraphs from.) – If it’s None, the seed vertices are all vertices in the graph.
- shuffle (indicates the sampled subgraphs are shuffled.) –
- num_workers (the number of worker threads that sample subgraphs in parallel.) –
- max_subgraph_size (the maximal subgraph size in terms of the number of nodes.) – GPU doesn’t support very large subgraphs.
- return_seed_id (indicates whether to return seed ids along with the subgraphs.) – The seed Ids are in the parent graph.
Returns: The iterator returns a list of batched subgraphs and a dictionary of additional information about the subgraphs.
Return type: A subgraph iterator