MiniBatch
- class dgl.graphbolt.MiniBatch(seed_nodes: Tensor | Dict[str, Tensor] | None = None, node_pairs: Tuple[Tensor, Tensor] | Dict[str, Tuple[Tensor, Tensor]] | None = None, labels: Tensor | Dict[str, Tensor] | None = None, seeds: Tensor | Dict[str, Tensor] | None = None, indexes: Tensor | Dict[str, Tensor] | None = None, negative_srcs: Tensor | Dict[str, Tensor] | None = None, negative_dsts: Tensor | Dict[str, Tensor] | None = None, sampled_subgraphs: List[SampledSubgraph] | None = None, input_nodes: Tensor | Dict[str, Tensor] | None = None, node_features: Dict[str, Tensor] | Dict[Tuple[str, str], Tensor] | None = None, edge_features: List[Dict[str, Tensor] | Dict[Tuple[str, str], Tensor]] | None = None, compacted_node_pairs: Tuple[Tensor, Tensor] | Dict[str, Tuple[Tensor, Tensor]] | None = None, compacted_seeds: Tensor | Dict[str, Tensor] | None = None, compacted_negative_srcs: Tensor | Dict[str, Tensor] | None = None, compacted_negative_dsts: Tensor | Dict[str, Tensor] | None = None)[source]
Bases:
object
A composite data class for data structure in the graphbolt.
It is designed to facilitate the exchange of data among different components involved in processing data. The purpose of this class is to unify the representation of input and output data across different stages, ensuring consistency and ease of use throughout the loading process.
- node_ids() Tensor | Dict[str, Tensor] [source]
A representation of input nodes in the outermost layer. Contains all nodes in the sampled_subgraphs. - If input_nodes is a tensor: It indicates the graph is homogeneous. - If input_nodes is a dictionary: The keys should be node type and the
value should be corresponding heterogeneous node id.
- set_edge_features(edge_features: List[Dict[str, Tensor] | Dict[Tuple[str, str], Tensor]]) None [source]
Set edge features.
- set_node_features(node_features: Dict[str, Tensor] | Dict[Tuple[str, str], Tensor]) None [source]
Set node features.
- to_pyg_data()[source]
Construct a PyG Data from MiniBatch. This function only supports node classification task on a homogeneous graph and the number of features cannot be more than one.
- property blocks
Extracts DGL blocks from MiniBatch to construct a graphical structure and ID mappings.
- compacted_negative_dsts: Tensor | Dict[str, Tensor] = None
Representation of compacted nodes corresponding to ‘negative_dsts’, where all node ids inside are compacted.
- compacted_negative_srcs: Tensor | Dict[str, Tensor] = None
Representation of compacted nodes corresponding to ‘negative_srcs’, where all node ids inside are compacted.
- compacted_node_pairs: Tuple[Tensor, Tensor] | Dict[str, Tuple[Tensor, Tensor]] = None
Representation of compacted node pairs corresponding to ‘node_pairs’, where all node ids inside are compacted.
- compacted_seeds: Tensor | Dict[str, Tensor] = None
Representation of compacted seeds corresponding to ‘seeds’, where all node ids inside are compacted.
- edge_features: List[Dict[str, Tensor] | Dict[Tuple[str, str], Tensor]] = None
Edge features associated with the ‘sampled_subgraphs’. - If keys are single strings: It means the graph is homogeneous, and the keys are feature names. - If keys are tuples: It means the graph is heterogeneous, and the keys are tuples of ‘(edge_type, feature_name)’. Note, edge type is single string of format ‘str:str:str’.
- indexes: Tensor | Dict[str, Tensor] = None
Indexes associated with seed nodes / node pairs in the graph, which indicates to which query a seed node / node pair belongs. - If indexes is a tensor: It indicates the graph is homogeneous. The
value should be corresponding query to given ‘seed_nodes’ or ‘node_pairs’.
If indexes is a dictionary: It indicates the graph is heterogeneous. The keys should be node or edge type and the value should be corresponding query to given ‘seed_nodes’ or ‘node_pairs’. For each key, indexes are consecutive integers starting from zero.
- input_nodes: Tensor | Dict[str, Tensor] = None
- A representation of input nodes in the outermost layer. Conatins all nodes
in the ‘sampled_subgraphs’.
If input_nodes is a tensor: It indicates the graph is homogeneous.
If input_nodes is a dictionary: The keys should be node type and the value should be corresponding heterogeneous node id.
- labels: Tensor | Dict[str, Tensor] = None
Labels associated with seed nodes / node pairs in the graph. - If labels is a tensor: It indicates the graph is homogeneous. The value
should be corresponding labels to given ‘seed_nodes’ or ‘node_pairs’.
If labels is a dictionary: The keys should be node or edge type and the value should be corresponding labels to given ‘seed_nodes’ or ‘node_pairs’.
- negative_dsts: Tensor | Dict[str, Tensor] = None
Representation of negative samples for the tail nodes in the link prediction task. - If negative_dsts is a tensor: It indicates a homogeneous graph. - If negative_dsts is a dictionary: The key should be edge type, and the
value should correspond to the negative samples for head nodes of the given type.
- property negative_node_pairs
negative_node_pairs is a representation of negative graphs used for evaluating or computing loss in link prediction tasks. - If negative_node_pairs is a tuple: It indicates a homogeneous graph containing two tensors representing source-destination node pairs. - If negative_node_pairs is a dictionary: The keys should be edge type, and the value should be a tuple of tensors representing node pairs of the given type.
- negative_srcs: Tensor | Dict[str, Tensor] = None
Representation of negative samples for the head nodes in the link prediction task. - If negative_srcs is a tensor: It indicates a homogeneous graph. - If negative_srcs is a dictionary: The key should be edge type, and the
value should correspond to the negative samples for head nodes of the given type.
- node_features: Dict[str, Tensor] | Dict[Tuple[str, str], Tensor] = None
A representation of node features. - If keys are single strings: It means the graph is homogeneous, and the keys are feature names. - If keys are tuples: It means the graph is heterogeneous, and the keys are tuples of ‘(node_type, feature_name)’.
- node_pairs: Tuple[Tensor, Tensor] | Dict[str, Tuple[Tensor, Tensor]] = None
Representation of seed node pairs utilized in link prediction tasks. - If node_pairs is a tuple: It indicates a homogeneous graph where each
tuple contains two tensors representing source-destination node pairs.
If node_pairs is a dictionary: The keys should be edge type, and the value should be a tuple of tensors representing node pairs of the given type.
- property node_pairs_with_labels
Get a node pair tensor and a label tensor from MiniBatch. They are used for evaluating or computing loss. For homogeneous graph, it will return (node_pairs, labels) as result; for heterogeneous graph, the node_pairs and labels will both be a dict with etype as the key. - If it’s a link prediction task, node_pairs will contain both negative and positive node pairs and labels will consist of 0 and 1, indicating whether the corresponding node pair is negative or positive. - If it’s an edge classification task, this function will directly return compacted_node_pairs for each etype and the corresponding labels. - Otherwise it will return None.
- property positive_node_pairs
positive_node_pairs is a representation of positive graphs used for evaluating or computing loss in link prediction tasks. - If positive_node_pairs is a tuple: It indicates a homogeneous graph containing two tensors representing source-destination node pairs. - If positive_node_pairs is a dictionary: The keys should be edge type, and the value should be a tuple of tensors representing node pairs of the given type.
- sampled_subgraphs: List[SampledSubgraph] = None
A list of ‘SampledSubgraph’s, each one corresponding to one layer, representing a subset of a larger graph structure.
- seed_nodes: Tensor | Dict[str, Tensor] = None
Representation of seed nodes used for sampling in the graph. - If seed_nodes is a tensor: It indicates the graph is homogeneous. - If seed_nodes is a dictionary: The keys should be node type and the
value should be corresponding heterogeneous node ids.
- seeds: Tensor | Dict[str, Tensor] = None
Representation of seed items utilized in node classification tasks, link prediction tasks and hyperlinks tasks. - If seeds is a tensor: it indicates that the seeds originate from a
homogeneous graph. It can be either a 1-dimensional or 2-dimensional tensor:
1-dimensional tensor: Each element directly represents a seed node within the graph.
2-dimensional tensor: Each row designates a seed item, which can encompass various entities such as edges, hyperlinks, or other graph components depending on the specific context.
If seeds is a dictionary: it indicates that the seeds originate from a heterogeneous graph. The keys should be edge or node type, and the value should be a tensor, which can be either a 1-dimensional or 2-dimensional tensor:
1-dimensional tensor: Each element directly represents a seed node
of the given type within the graph. - 2-dimensional tensor: Each row designates a seed item of the given
type, which can encompass various entities such as edges, hyperlinks, or other graph components depending on the specific context.