.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/atomistic/4-profiling.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_atomistic_4-profiling.py: Profiling your models ===================== .. py:currentmodule:: metatensor.torch.atomistic Do you feel like your model is too slow? Do you want to make it faster? Instead of guessing which part of the code is responsible for any slowdown, you should profile your code to learn how much time is spent in each function and where to focus any optimization efforts. In this tutorial you'll learn how to profile your model using PyTorch profiler, how to read the output of the profiler, and how to add your own labels for new functions/steps in your model forward function. .. GENERATED FROM PYTHON SOURCE LINES 16-35 .. code-block:: Python from typing import Dict, List, Optional import ase.build import matplotlib.pyplot as plt import numpy as np import torch from metatensor.torch import Labels, TensorBlock, TensorMap from metatensor.torch.atomistic import ( MetatensorAtomisticModel, ModelCapabilities, ModelMetadata, ModelOutput, System, ) from metatensor.torch.atomistic.ase_calculator import MetatensorCalculator .. GENERATED FROM PYTHON SOURCE LINES 36-39 When profiling your code, it is important to run the model on a representative system to ensure you are actually exercising the behavior of your model at the right scale. Here we'll use a relatively large system with many atoms. .. GENERATED FROM PYTHON SOURCE LINES 40-46 .. code-block:: Python primitive = ase.build.bulk(name="C", crystalstructure="diamond", a=3.567) atoms = ase.build.make_supercell(primitive, 10 * np.eye(3)) print(f"We have {len(atoms)} atoms in our system") .. rst-class:: sphx-glr-script-out .. code-block:: none We have 2000 atoms in our system .. GENERATED FROM PYTHON SOURCE LINES 47-55 We will use the same ``HarmonicModel`` as in the :ref:`previous tutorial ` as our machine learning potential. .. raw:: html
Click to see the definition of HarmonicModel .. GENERATED FROM PYTHON SOURCE LINES 56-130 .. code-block:: Python class HarmonicModel(torch.nn.Module): def __init__(self, force_constant: float, equilibrium_positions: torch.Tensor): """Create an ``HarmonicModel``. :param force_constant: force constant, in ``energy unit / (length unit)^2`` :param equilibrium_positions: torch tensor with shape ``n x 3``, containing the equilibrium positions of all atoms """ super().__init__() assert force_constant > 0 self.force_constant = force_constant self.equilibrium_positions = equilibrium_positions def forward( self, systems: List[System], outputs: Dict[str, ModelOutput], selected_atoms: Optional[Labels], ) -> Dict[str, TensorMap]: if list(outputs.keys()) != ["energy"]: raise ValueError( "this model can only compute 'energy', but `outputs` contains other " f"keys: {', '.join(outputs.keys())}" ) # we don't want to worry about selected_atoms yet if selected_atoms is not None: raise NotImplementedError("selected_atoms is not implemented") if outputs["energy"].per_atom: raise NotImplementedError("per atom energy is not implemented") # compute the energy for each system by adding together the energy for each atom energy = torch.zeros((len(systems), 1), dtype=systems[0].positions.dtype) for i, system in enumerate(systems): assert len(system) == self.equilibrium_positions.shape[0] r0 = self.equilibrium_positions energy[i] += torch.sum(self.force_constant * (system.positions - r0) ** 2) # add metadata to the output block = TensorBlock( values=energy, samples=Labels("system", torch.arange(len(systems)).reshape(-1, 1)), components=[], properties=Labels("energy", torch.tensor([[0]])), ) return { "energy": TensorMap(keys=Labels("_", torch.tensor([[0]])), blocks=[block]) } model = HarmonicModel( force_constant=3.14159265358979323846, equilibrium_positions=torch.tensor(atoms.positions), ) capabilities = ModelCapabilities( outputs={ "energy": ModelOutput(quantity="energy", unit="eV", per_atom=False), }, atomic_types=[6], interaction_range=0.0, length_unit="Angstrom", supported_devices=["cpu"], dtype="float32", ) metadata = ModelMetadata() wrapper = MetatensorAtomisticModel(model.eval(), metadata, capabilities) wrapper.export("exported-model.pt") .. rst-class:: sphx-glr-script-out .. code-block:: none /home/runner/work/metatensor/metatensor/python/examples/atomistic/4-profiling.py:128: DeprecationWarning: `export()` is deprecated, use `save()` instead wrapper.export("exported-model.pt") .. GENERATED FROM PYTHON SOURCE LINES 131-135 .. raw:: html
.. GENERATED FROM PYTHON SOURCE LINES 139-141 If you are trying to profile your own model, you can start here and create a ``MetatensorCalculator`` with your own model. .. GENERATED FROM PYTHON SOURCE LINES 142-146 .. code-block:: Python atoms.calc = MetatensorCalculator("exported-model.pt") .. GENERATED FROM PYTHON SOURCE LINES 147-149 Before trying to profile the code, it is a good idea to run it a couple of times to allow torch to warmup internally. .. GENERATED FROM PYTHON SOURCE LINES 150-154 .. code-block:: Python atoms.get_forces() atoms.get_potential_energy() .. rst-class:: sphx-glr-script-out .. code-block:: none np.float64(3.770593615115558e-09) .. GENERATED FROM PYTHON SOURCE LINES 155-161 Profiling energy calculation ---------------------------- Now we can run code using :py:func:`torch.profiler.profile` to collect statistic on how long each function takes to run. We randomize the positions to force ASE to recompute the energy of the system .. GENERATED FROM PYTHON SOURCE LINES 162-169 .. code-block:: Python atoms.positions += np.random.rand(*atoms.positions.shape) with torch.profiler.profile() as energy_profiler: atoms.get_potential_energy() print(energy_profiler.key_averages().table(sort_by="self_cpu_time_total", row_limit=10)) .. rst-class:: sphx-glr-script-out .. code-block:: none -------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls -------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ASECalculator::compute_neighbors 21.41% 290.356us 24.93% 338.085us 338.085us 1 ASECalculator::prepare_inputs 20.80% 282.050us 28.81% 390.682us 390.682us 1 ASECalculator::sum_energies 8.44% 114.435us 8.44% 114.435us 114.435us 1 Model::forward 5.93% 80.460us 17.01% 230.683us 230.683us 1 ASECalculator::convert_outputs 4.11% 55.726us 5.44% 73.749us 73.749us 1 aten::copy_ 3.48% 47.250us 3.48% 47.250us 3.635us 13 aten::sort 2.91% 39.394us 5.31% 71.996us 71.996us 1 aten::_to_copy 2.15% 29.148us 5.51% 74.662us 6.787us 11 aten::arange 1.95% 26.459us 3.95% 53.591us 13.398us 4 forward 1.88% 25.530us 31.05% 421.050us 421.050us 1 -------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 1.356ms .. GENERATED FROM PYTHON SOURCE LINES 170-195 There are a couple of interesting things to see here. First the total runtime of the code is shown in the bottom; and then the most costly functions are visible on top, one line per function. For each function, ``Self CPU`` refers to the time spent in this function **excluding** any called functions; and ``CPU total`` refers to the time spent in this function, **including** called functions. For more options to record operations and display the output, please refer to the `official documentation for PyTorch profiler `_. Here, ``Model::forward`` indicates the time taken by your model's ``forward()``. Anything starting with ``aten::`` comes from operations on torch tensors, typically with the same function name as the corresponding torch functions (e.g. ``aten::arange`` is :py:func:`torch.arange`). We can also see some internal functions from metatensor, with the name staring with ``MetatensorAtomisticModel::`` for :py:class:`MetatensorAtomisticModel`; and ``ASECalculator::`` for :py:class:`ase_calculator.MetatensorCalculator`. If you want to see more details on the internal steps taken by your model, you can add :py:func:`torch.profiler.record_function` (https://pytorch.org/docs/stable/generated/torch.autograd.profiler.record_function.html) inside your model code to give names to different steps in the calculation. This is how we are internally adding names such as ``Model::forward`` or ``ASECalculator::prepare_inputs`` above. .. GENERATED FROM PYTHON SOURCE LINES 198-203 Profiling forces calculation ---------------------------- Let's now do the same, but computing the forces for this system. This mean we should now see some time spent in the ``backward()`` function, on top of everything else. .. GENERATED FROM PYTHON SOURCE LINES 204-212 .. code-block:: Python atoms.positions += np.random.rand(*atoms.positions.shape) with torch.profiler.profile() as forces_profiler: atoms.get_forces() print(forces_profiler.key_averages().table(sort_by="self_cpu_time_total", row_limit=10)) .. rst-class:: sphx-glr-script-out .. code-block:: none ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch::jit::(anonymous namespace)::DifferentiableGra... 27.65% 750.067us 31.12% 844.133us 844.133us 1 Model::forward 15.58% 422.666us 19.88% 539.172us 539.172us 1 ASECalculator::prepare_inputs 12.00% 325.353us 18.35% 497.714us 497.714us 1 ASECalculator::convert_outputs 6.06% 164.290us 8.06% 218.640us 218.640us 1 ASECalculator::run_backward 5.59% 151.504us 42.46% 1.152ms 1.152ms 1 aten::mm 3.93% 106.481us 3.96% 107.302us 26.825us 4 ASECalculator::sum_energies 2.68% 72.817us 2.68% 72.817us 72.817us 1 aten::copy_ 2.52% 68.328us 2.52% 68.328us 2.971us 23 ASECalculator::compute_neighbors 2.36% 64.000us 4.39% 119.103us 119.103us 1 aten::mul 1.60% 43.451us 2.07% 56.215us 14.054us 4 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 2.712ms .. GENERATED FROM PYTHON SOURCE LINES 213-214 Let's visualize this data in an other way: .. GENERATED FROM PYTHON SOURCE LINES 215-239 .. code-block:: Python events = forces_profiler.key_averages() events = sorted(events, key=lambda u: u.self_cpu_time_total, reverse=True) total_cpu_time = sum(map(lambda u: u.self_cpu_time_total, events)) bottom = 0.0 for event in events: self_time = event.self_cpu_time_total name = event.key if len(name) > 30: name = name[:12] + "[...]" + name[-12:] if self_time > 0.03 * total_cpu_time: plt.bar(0, self_time, bottom=bottom, label=name) bottom += self_time else: plt.bar(0, total_cpu_time - bottom, bottom=bottom, label="others") break plt.legend() plt.xticks([]) plt.xlim(0, 1) plt.ylabel("self time / µs") plt.show() .. image-sg:: /examples/atomistic/images/sphx_glr_4-profiling_001.png :alt: 4 profiling :srcset: /examples/atomistic/images/sphx_glr_4-profiling_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.236 seconds) .. _sphx_glr_download_examples_atomistic_4-profiling.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 4-profiling.ipynb <4-profiling.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 4-profiling.py <4-profiling.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 4-profiling.zip <4-profiling.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_