.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/atomistic/3-profiling.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_atomistic_3-profiling.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_atomistic_3-profiling.py:


Profiling your models
=====================

.. py:currentmodule:: metatensor.torch.atomistic

Do you feel like your model is too slow? Do you want to make it faster? Instead of
guessing which part of the code is responsible for any slowdown, you should profile your
code to learn how much time is spent in each function and where to focus any
optimization efforts.

In this tutorial you'll learn how to profile your model using PyTorch profiler, how to
read the output of the profiler, and how to add your own labels for new functions/steps
in your model forward function.

.. GENERATED FROM PYTHON SOURCE LINES 16-35

.. code-block:: Python


    from typing import Dict, List, Optional

    import ase.build
    import matplotlib.pyplot as plt
    import numpy as np
    import torch

    from metatensor.torch import Labels, TensorBlock, TensorMap
    from metatensor.torch.atomistic import (
        MetatensorAtomisticModel,
        ModelCapabilities,
        ModelMetadata,
        ModelOutput,
        System,
    )
    from metatensor.torch.atomistic.ase_calculator import MetatensorCalculator


.. GENERATED FROM PYTHON SOURCE LINES 36-39

When profiling your code, it is important to run the model on a representative system
to ensure you are actually exercising the behavior of your model at the right scale.
Here we'll use a relatively large system with many atoms.

.. GENERATED FROM PYTHON SOURCE LINES 40-46

.. code-block:: Python


    primitive = ase.build.bulk(name="C", crystalstructure="diamond", a=3.567)
    atoms = ase.build.make_supercell(primitive, 10 * np.eye(3))
    print(f"We have {len(atoms)} atoms in our system")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    We have 2000 atoms in our system


.. GENERATED FROM PYTHON SOURCE LINES 47-55

We will use the same ``HarmonicModel`` as in the :ref:`previous tutorial
<atomistic-tutorial-md>` as our machine learning potential.

.. raw:: html

    <details>
    <summary>Click to see the definition of HarmonicModel</summary>


.. GENERATED FROM PYTHON SOURCE LINES 56-128

.. code-block:: Python


    class HarmonicModel(torch.nn.Module):
        def __init__(self, force_constant: float, equilibrium_positions: torch.Tensor):
            """Create an ``HarmonicModel``.

            :param force_constant: force constant, in ``energy unit / (length unit)^2``
            :param equilibrium_positions: torch tensor with shape ``n x 3``, containing the
                equilibrium positions of all atoms
            """
            super().__init__()
            assert force_constant > 0
            self.force_constant = force_constant
            self.equilibrium_positions = equilibrium_positions

        def forward(
            self,
            systems: List[System],
            outputs: Dict[str, ModelOutput],
            selected_atoms: Optional[Labels],
        ) -> Dict[str, TensorMap]:
            # if the model user did not request an energy calculation, we have nothing to do
            if "energy" not in outputs:
                return {}

            # we don't want to worry about selected_atoms yet
            if selected_atoms is not None:
                raise NotImplementedError("selected_atoms is not implemented")

            if outputs["energy"].per_atom:
                raise NotImplementedError("per atom energy is not implemented")

            # compute the energy for each system by adding together the energy for each atom
            energy = torch.zeros((len(systems), 1), dtype=systems[0].positions.dtype)
            for i, system in enumerate(systems):
                assert len(system) == self.equilibrium_positions.shape[0]
                r0 = self.equilibrium_positions
                energy[i] += torch.sum(self.force_constant * (system.positions - r0) ** 2)

            # add metadata to the output
            block = TensorBlock(
                values=energy,
                samples=Labels("system", torch.arange(len(systems)).reshape(-1, 1)),
                components=[],
                properties=Labels("energy", torch.tensor([[0]])),
            )
            return {
                "energy": TensorMap(keys=Labels("_", torch.tensor([[0]])), blocks=[block])
            }


    model = HarmonicModel(
        force_constant=3.14159265358979323846,
        equilibrium_positions=torch.tensor(atoms.positions),
    )

    capabilities = ModelCapabilities(
        outputs={
            "energy": ModelOutput(quantity="energy", unit="eV", per_atom=False),
        },
        atomic_types=[6],
        interaction_range=0.0,
        length_unit="Angstrom",
        supported_devices=["cpu"],
        dtype="float32",
    )

    metadata = ModelMetadata()
    wrapper = MetatensorAtomisticModel(model.eval(), metadata, capabilities)

    wrapper.export("exported-model.pt")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/runner/work/metatensor/metatensor/python/examples/atomistic/3-profiling.py:126: DeprecationWarning: `export()` is deprecated, use `save()` instead
      wrapper.export("exported-model.pt")


.. GENERATED FROM PYTHON SOURCE LINES 129-133

.. raw:: html

    </details>


.. GENERATED FROM PYTHON SOURCE LINES 137-139

If you are trying to profile your own model, you can start here and create a
``MetatensorCalculator`` with your own model.

.. GENERATED FROM PYTHON SOURCE LINES 140-144

.. code-block:: Python


    atoms.calc = MetatensorCalculator("exported-model.pt")


.. GENERATED FROM PYTHON SOURCE LINES 145-147

Before trying to profile the code, it is a good idea to run it a couple of times to
allow torch to warmup internally.

.. GENERATED FROM PYTHON SOURCE LINES 148-152

.. code-block:: Python


    atoms.get_forces()
    atoms.get_potential_energy()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    3.770593615115558e-09


.. GENERATED FROM PYTHON SOURCE LINES 153-159

Profiling energy calculation
----------------------------

Now we can run code using :py:func:`torch.profiler.profile` to collect statistic on
how long each function takes to run. We randomize the positions to force ASE to
recompute the energy of the system

.. GENERATED FROM PYTHON SOURCE LINES 160-167

.. code-block:: Python


    atoms.positions += np.random.rand(*atoms.positions.shape)
    with torch.profiler.profile() as energy_profiler:
        atoms.get_potential_energy()

    print(energy_profiler.key_averages().table(sort_by="self_cpu_time_total", row_limit=10))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    --------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
    --------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                        Model::forward        44.64%     387.000us        56.75%     492.000us     492.000us             1  
                         ASECalculator::prepare_inputs        20.18%     175.000us        23.64%     205.000us     205.000us             1  
                        ASECalculator::convert_outputs         9.34%      81.000us        10.73%      93.000us      46.500us             2  
                      ASECalculator::compute_neighbors         3.11%      27.000us         3.11%      27.000us      27.000us             1  
                                        aten::_to_copy         2.88%      25.000us         5.42%      47.000us       5.222us             9  
         MetatensorAtomisticModel::convert_units_input         2.65%      23.000us         3.00%      26.000us      26.000us             1  
                                             aten::sum         1.96%      17.000us         1.96%      17.000us      17.000us             1  
                                           aten::copy_         1.85%      16.000us         1.85%      16.000us       1.778us             9  
                                              aten::to         1.73%      15.000us         6.46%      56.000us       3.500us            16  
                                          aten::arange         1.50%      13.000us         2.65%      23.000us      11.500us             2  
    --------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
    Self CPU time total: 867.000us


.. GENERATED FROM PYTHON SOURCE LINES 168-193

There are a couple of interesting things to see here. First the total runtime of the
code is shown in the bottom; and then the most costly functions are visible on top,
one line per function. For each function, ``Self CPU`` refers to the time spent in
this function **excluding** any called functions; and ``CPU total`` refers to the time
spent in this function, **including** called functions.

For more options to record operations and display the output, please refer to the
`official documentation for PyTorch profiler
<https://pytorch.org/docs/stable/profiler.html>`_.

Here, ``Model::forward`` indicates the time taken by your model's ``forward()``.
Anything starting with ``aten::`` comes from operations on torch tensors, typically
with the same function name as the corresponding torch functions (e.g.
``aten::arange`` is :py:func:`torch.arange`). We can also see some internal functions
from metatensor, with the name staring with ``MetatensorAtomisticModel::`` for
:py:class:`MetatensorAtomisticModel`; and ``ASECalculator::`` for
:py:class:`ase_calculator.MetatensorCalculator`.

If you want to see more details on the internal steps taken by your model, you can add
:py:func:`torch.profiler.record_function`
(https://pytorch.org/docs/stable/generated/torch.autograd.profiler.record_function.html)
inside your model code to give names to different steps in the calculation. This is
how we are internally adding names such as ``Model::forward`` or
``ASECalculator::prepare_inputs`` above.


.. GENERATED FROM PYTHON SOURCE LINES 196-201

Profiling forces calculation
----------------------------

Let's now do the same, but computing the forces for this system. This mean we should
now see some time spent in the ``backward()`` function, on top of everything else.

.. GENERATED FROM PYTHON SOURCE LINES 202-210

.. code-block:: Python


    atoms.positions += np.random.rand(*atoms.positions.shape)
    with torch.profiler.profile() as forces_profiler:
        atoms.get_forces()

    print(forces_profiler.key_averages().table(sort_by="self_cpu_time_total", row_limit=10))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                       Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
    torch::jit::(anonymous namespace)::DifferentiableGra...        37.51%     719.000us        45.17%     866.000us     866.000us             1  
                                             Model::forward        18.68%     358.000us        27.33%     524.000us     524.000us             1  
                              ASECalculator::prepare_inputs         7.98%     153.000us         9.02%     173.000us     173.000us             1  
                             ASECalculator::convert_outputs         5.43%     104.000us         6.36%     122.000us      61.000us             2  
                                                  aten::mul         5.27%     101.000us         5.58%     107.000us      26.750us             4  
                                ASECalculator::run_backward         5.27%     101.000us        54.20%       1.039ms       1.039ms             1  
                                                aten::copy_         2.82%      54.000us         2.82%      54.000us       3.600us            15  
                                              <backward op>         2.03%      39.000us         7.67%     147.000us     147.000us             1  
                                                  aten::pow         1.77%      34.000us         2.61%      50.000us      25.000us             2  
                                             aten::_to_copy         1.46%      28.000us         3.76%      72.000us       6.000us            12  
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
    Self CPU time total: 1.917ms


.. GENERATED FROM PYTHON SOURCE LINES 211-212

Let's visualize this data in an other way:

.. GENERATED FROM PYTHON SOURCE LINES 213-237

.. code-block:: Python


    events = forces_profiler.key_averages()
    events = sorted(events, key=lambda u: u.self_cpu_time_total, reverse=True)
    total_cpu_time = sum(map(lambda u: u.self_cpu_time_total, events))

    bottom = 0.0
    for event in events:
        self_time = event.self_cpu_time_total
        name = event.key
        if len(name) > 30:
            name = name[:12] + "[...]" + name[-12:]

        if self_time > 0.03 * total_cpu_time:
            plt.bar(0, self_time, bottom=bottom, label=name)
            bottom += self_time
        else:
            plt.bar(0, total_cpu_time - bottom, bottom=bottom, label="others")
            break

    plt.legend()
    plt.xticks([])
    plt.xlim(0, 1)
    plt.ylabel("self time / µs")
    plt.show()


.. image-sg:: /examples/atomistic/images/sphx_glr_3-profiling_001.png
   :alt: 3 profiling
   :srcset: /examples/atomistic/images/sphx_glr_3-profiling_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.283 seconds)


.. _sphx_glr_download_examples_atomistic_3-profiling.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 3-profiling.ipynb <3-profiling.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 3-profiling.py <3-profiling.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_