.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/core/2-handling-sparsity.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_core_2-handling-sparsity.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_core_2-handling-sparsity.py:


.. _core-tutorial-sparsity:

Handling sparsity
=================

The one sentence introduction to metatensor mentions that it is a
"self-describing **sparse** tensor data format". The :ref:`previous tutorial
<core-tutorial-first-steps>` explained the self-describing part of the format,
and in this tutorial we will explore what makes metatensor a sparse format; and
how to remove this sparsity when required.

Like in the :ref:`previous tutorial <core-tutorial-first-steps>`, we will load the data
we need from a file. The code used to generate this file can be found below:

.. details:: Show the code used to generate the :file:`radial-spectrum.mts` file

    ..

        The data was generated with `featomic`_, a package to compute atomistic
        representations for machine learning applications.

        .. _featomic: https://metatensor.github.io/featomic/latest/index.html

        .. literalinclude:: radial-spectrum.py.example
            :language: python

The file contains a representation of two molecules called the radial spectrum. The atom
:math:`i` is represented by the radial spectrum :math:`R_i^\alpha`, which is an
expansion of the neighbor density :math:`\rho_i^\alpha(r)` on a set of radial basis
functions :math:`f_n(r)`

.. math::

    R_i^\alpha(n) = \int f_n(r) \rho_i(r) dr

The density :math:`\rho_i^\alpha(r)` associated with all neighbors of species
:math:`\alpha` for the atom :math:`i` (where each neighbor is replaced with a
Gaussian function centered on the neighbor's coordinates :math:`g(r_{ij})`) is
defined as:

.. math::

    \rho_i^\alpha(r) = \sum_{j \in \text{ neighborhood of i }} g(r_{ij})
        \delta_{\alpha_j,\alpha}


The exact mathematical details above don't matter too much for this tutorial,
the main point being that this representation treats each atomic species as an
independent quantity, effectively using the neighboring species :math:`\alpha`
for `one-hot encoding`_.

.. _one-hot encoding: https://en.wikipedia.org/wiki/One-hot

.. py:currentmodule:: metatensor

.. GENERATED FROM PYTHON SOURCE LINES 61-70

.. code-block:: Python


    import ase
    import ase.visualize.plot
    import matplotlib.pyplot as plt
    import numpy as np

    import metatensor as mts


.. GENERATED FROM PYTHON SOURCE LINES 71-73

We will work on the radial spectrum representation of three molecules in our system:
a carbon monoxide molecule, an oxygen molecule and a nitrogen molecule.

.. GENERATED FROM PYTHON SOURCE LINES 74-85

.. code-block:: Python


    atoms = ase.Atoms(
        "COO2N2",
        positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)],
    )

    fig, ax = plt.subplots(figsize=(3, 3))
    ase.visualize.plot.plot_atoms(atoms, ax)
    ax.set_axis_off()
    plt.show()


.. image-sg:: /examples/core/images/sphx_glr_2-handling-sparsity_001.png
   :alt: 2 handling sparsity
   :srcset: /examples/core/images/sphx_glr_2-handling-sparsity_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 86-93

Sparsity in ``TensorMap``
-------------------------

The radial spectrum representation has two keys: ``central_species`` indicating the
species of the central atom (atom :math:`i` in the equations); and
``neighbor_type`` indicating the species of the neighboring atoms (atom :math:`j`
in the equations)

.. GENERATED FROM PYTHON SOURCE LINES 94-99

.. code-block:: Python


    radial_spectrum = mts.load("radial-spectrum.mts")

    print(radial_spectrum)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorMap with 5 blocks
    keys: center_type  neighbor_type
               6             6
               6             8
               7             7
               8             6
               8             8


.. GENERATED FROM PYTHON SOURCE LINES 100-112

This shows the first level of sparsity in ``TensorMap``: block sparsity.

Out of all possible combinations of ``central_species`` and ``neighbor_type``, some
are missing such as ``central_species=7, neighbor_type=8``. This is because we are
using a spherical cutoff of 2.5 Å, and as such there are no oxygen neighbor atoms
close enough to the nitrogen centers. This means that all the corresponding radial
spectrum coefficients :math:`R_i^\alpha(n)` will be zero (since the neighbor density
:math:`\rho_i^\alpha(r)` is zero everywhere).

Instead of wasting memory by storing all of these zeros explicitly, we simply
avoid creating the corresponding blocks from the get-go and save a lot of
memory!

.. GENERATED FROM PYTHON SOURCE LINES 116-118

Let's now look at the block containing the representation for oxygen centers and
carbon neighbors:

.. GENERATED FROM PYTHON SOURCE LINES 119-122

.. code-block:: Python


    block = radial_spectrum.block(center_type=8, neighbor_type=6)


.. GENERATED FROM PYTHON SOURCE LINES 123-125

Naively, this block should contain samples for all oxygen atoms (since
``center_type=8``); in practice we only have a single sample!

.. GENERATED FROM PYTHON SOURCE LINES 126-129

.. code-block:: Python


    print(block.samples)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        system  atom
          0      1
    )


.. GENERATED FROM PYTHON SOURCE LINES 130-137

There is a second level of sparsity here, using a format related to the
`coordinate sparse arrays (COO format) <COO_>`_. Since there is only one
oxygen atom with carbon neighbors, we only include this atom in the samples,
and the density/radial spectrum coefficient for all the other oxygen atoms is
assumed to be zero.

.. _COO: https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)

.. GENERATED FROM PYTHON SOURCE LINES 141-154

Making the data dense again
---------------------------

Sometimes, we might have to use data in a sparse metatensor format with code
that does not understands this sparsity pattern. One solution is to convert
the data to a dense format, making the zeros explicit.
Metatensor provides functionalities to convert sparse data to a dense format
for the keys sparsity; and metadata to convert to a dense format for sample
sparsity.

First, the sample sparsity can be removed block by block by creating a new
array full of zeros, and copying the data according to the indices in
``block.samples``

.. GENERATED FROM PYTHON SOURCE LINES 155-164

.. code-block:: Python


    dense_block_data = np.zeros((len(atoms), block.values.shape[1]))

    # only copy the non-zero data stored in the block
    dense_block_data[block.samples["atom"]] = block.values

    print(dense_block_data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[0.         0.         0.        ]
     [0.05052215 0.13111562 0.01025378]
     [0.         0.         0.        ]
     [0.         0.         0.        ]
     [0.         0.         0.        ]
     [0.         0.         0.        ]]


.. GENERATED FROM PYTHON SOURCE LINES 165-173

Alternatively, we can undo the keys sparsity with
:py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`,
which merge multiple blocks along the samples or properties dimensions respectively.

Which one of these functions to call will depend on the data you are handling.
Typically, one-hot encoding (the ``neighbor_types`` key here) should be merged
along the properties dimension; and keys that define subsets of the samples
(``center_type``) should be merged along the samples dimension.

.. GENERATED FROM PYTHON SOURCE LINES 174-178

.. code-block:: Python


    dense_radial_spectrum = radial_spectrum.keys_to_samples("center_type")
    dense_radial_spectrum = dense_radial_spectrum.keys_to_properties("neighbor_type")


.. GENERATED FROM PYTHON SOURCE LINES 179-181

After calling these two functions, we now have a :py:class:`TensorMap` with a single
block and no keys:

.. GENERATED FROM PYTHON SOURCE LINES 182-187

.. code-block:: Python


    print(dense_radial_spectrum)

    block = dense_radial_spectrum.block()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorMap with 1 blocks
    keys: _
          0


.. GENERATED FROM PYTHON SOURCE LINES 188-190

We can see that the resulting dense data array contains a lot of zeros (and has a well
defined block-sparse structure):

.. GENERATED FROM PYTHON SOURCE LINES 191-195

.. code-block:: Python


    with np.printoptions(precision=3):
        print(block.values)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[ 0.556 -0.282  0.016  0.051  0.131  0.01   0.     0.     0.   ]
     [ 0.051  0.131  0.01   0.556 -0.282  0.016  0.     0.     0.   ]
     [ 0.     0.     0.     0.633 -0.152  0.017  0.     0.     0.   ]
     [ 0.     0.     0.     0.633 -0.152  0.017  0.     0.     0.   ]
     [ 0.     0.     0.     0.     0.     0.     0.586 -0.157  0.036]
     [ 0.     0.     0.     0.     0.     0.     0.586 -0.157  0.036]]


.. GENERATED FROM PYTHON SOURCE LINES 196-199

By using the metadata attached to the block, we can understand which part of the data
is zero and why. For example, the lower-right corner of the array corresponds to
the nitrogen atoms (the last two samples):

.. GENERATED FROM PYTHON SOURCE LINES 200-203

.. code-block:: Python


    print(block.samples.print(max_entries=-1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

     system  atom  center_type
       0      0         6
       0      1         8
       0      2         8
       0      3         8
       0      4         7
       0      5         7


.. GENERATED FROM PYTHON SOURCE LINES 204-206

And these two bottom rows are zero everywhere, except in the part representing the
nitrogen neighbor density:

.. GENERATED FROM PYTHON SOURCE LINES 207-209

.. code-block:: Python


    print(block.properties.print(max_entries=-1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

     neighbor_type  n
           6        0
           6        1
           6        2
           8        0
           8        1
           8        2
           7        0
           7        1
           7        2


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.035 seconds)


.. _sphx_glr_download_examples_core_2-handling-sparsity.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 2-handling-sparsity.ipynb <2-handling-sparsity.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 2-handling-sparsity.py <2-handling-sparsity.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 2-handling-sparsity.zip <2-handling-sparsity.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_