{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# CPU Best Practices\n\nThis chapter focus on providing best practises for environment setup\nto get the best performance during training and inference on the CPU.\n\n## Intel\n\n### Hyper-threading\n\nFor specific workloads as GNN\u2019s domain, suggested default setting for having best performance\nis to turn off hyperthreading.\nTurning off the hyper threading feature can be done at BIOS [#f1]_ or operating system level [#f2]_ [#f3]_ .\n\n### Alternative memory allocators\n\nAlternative memory allocators, such as *tcmalloc*, might provide significant performance improvements by more efficient memory usage, reducing overhead on unnecessary memory allocations or deallocations. *tcmalloc* uses thread-local caches to reduce overhead on thread synchronization, locks contention by using spinlocks and per-thread arenas respectively and categorizes memory allocations by sizes to reduce overhead on memory fragmentation.\n\nTo take advantage of optimizations *tcmalloc* provides, install it on your system (on Ubuntu *tcmalloc* is included in libgoogle-perftools4 package) and add shared library to the LD_PRELOAD environment variable:\n\n```shell\nexport LD_PRELOAD=/lib/x86_64-linux-gnu/libtcmalloc.so.4:$LD_PRELOAD\n```\n### OpenMP settings\n\nAs `OpenMP` is the default parallel backend, we could control performance\nincluding sampling and training via `dgl.utils.set_num_threads()`.\n\nIf number of OpenMP threads is not set and `num_workers` in dataloader is set\nto 0, the OpenMP runtime typically use the number of available CPU cores by\ndefault. This works well for most cases, and is also the default behavior in DGL.\n\nIf `num_workers` in dataloader is set to greater than 0, the number of\nOpenMP threads will be set to **1** for each worker process. This is the\ndefault behavior in PyTorch. In this case, we can set the number of OpenMP\nthreads to the number of CPU cores in the main process.\n\nPerformance tuning is highly dependent on the workload and hardware\nconfiguration. We recommend users to try different settings and choose the\nbest one for their own cases.\n\n**Dataloader CPU affinity**\n\n
This feature is available for `dgl.dataloading.DataLoader` only. Not\n available for dataloaders in `dgl.graphbolt` yet.