dgl.sampling.pack_traces

dgl.sampling.pack_traces(traces, types)[source]

Pack the padded traces returned by random_walk() into a concatenated array. The padding values (-1) are removed, and the length and offset of each trace is returned along with the concatenated node ID and node type arrays.

Parameters
  • traces (Tensor) – A 2-dimensional node ID tensor. Must be on CPU and either int32 or int64.

  • types (Tensor) – A 1-dimensional node type ID tensor. Must be on CPU and either int32 or int64.

Returns

  • concat_vids (Tensor) – An array of all node IDs concatenated and padding values removed.

  • concat_types (Tensor) – An array of node types corresponding for each node in concat_vids. Has the same length as concat_vids.

  • lengths (Tensor) – Length of each trace in the original traces tensor.

  • offsets (Tensor) – Offset of each trace in the originial traces tensor in the new concatenated tensor.

Notes

The returned tensors are on CPU.

Examples

>>> g2 = dgl.heterograph({
...     ('user', 'follow', 'user'): ([0, 1, 1, 2, 3], [1, 2, 3, 0, 0]),
...     ('user', 'view', 'item'): ([0, 0, 1, 2, 3, 3], [0, 1, 1, 2, 2, 1]),
...     ('item', 'viewed-by', 'user'): ([0, 1, 1, 2, 2, 1], [0, 0, 1, 2, 3, 3])
>>> traces, types = dgl.sampling.random_walk(
...     g2, [0, 0], metapath=['follow', 'view', 'viewed-by'] * 2,
...     restart_prob=torch.FloatTensor([0, 0.5, 0, 0, 0.5, 0]))
>>> traces, types
(tensor([[ 0,  1, -1, -1, -1, -1, -1],
         [ 0,  1,  1,  3,  0,  0,  0]]), tensor([0, 0, 1, 0, 0, 1, 0]))
>>> concat_vids, concat_types, lengths, offsets = dgl.sampling.pack_traces(traces, types)
>>> concat_vids
tensor([0, 1, 0, 1, 1, 3, 0, 0, 0])
>>> concat_types
tensor([0, 0, 0, 0, 1, 0, 0, 1, 0])
>>> lengths
tensor([2, 7])
>>> offsets
tensor([0, 2]))

The first tensor concat_vids is the concatenation of all paths, i.e. flattened array of traces, excluding all padding values (-1).

The second tensor concat_types stands for the node type IDs of all corresponding nodes in the first tensor.

The third and fourth tensor indicates the length and the offset of each path. With these tensors it is easy to obtain the i-th random walk path with:

>>> vids = concat_vids.split(lengths.tolist())
>>> vtypes = concat_vtypes.split(lengths.tolist())
>>> vids[1], vtypes[1]
(tensor([0, 1, 1, 3, 0, 0, 0]), tensor([0, 0, 1, 0, 0, 1, 0]))