.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/core/2-handling-sparsity.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_core_2-handling-sparsity.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_core_2-handling-sparsity.py:


.. _core-tutorial-sparsity:

Handling sparsity
=================

The one sentence introduction to metatensor mentions that this is a "self-describing
**sparse** tensor data format". The :ref:`previous tutorial <core-tutorial-first-steps>`
explained the self-describing part of the format, and in this tutorial we will explore
what makes metatensor a sparse format; and how to remove the corresponding sparsity when
required.

Like in the :ref:`previous tutorial <core-tutorial-first-steps>`, we will load the data
we need from a file. The code used to generate this file can be found below:

.. details:: Show the code used to generate the :file:`radial-spectrum.mts` file

    ..

        The data was generated with `featomic`_, a package to compute atomistic
        representations for machine learning applications.

        .. _featomic: https://metatensor.github.io/featomic/latest/index.html

        .. literalinclude:: radial-spectrum.py.example
            :language: python

The file contains a representation of two molecules called the radial spectrum. The atom
:math:`i` is represented by the radial spectrum :math:`R_i^\alpha`, which is an
expansion of the neighbor density :math:`\rho_i^\alpha(r)` on a set of radial basis
functions :math:`f_n(r)`

.. math::

    R_i^\alpha(n) = \int f_n(r) \rho_i(r) dr

The density :math:`\rho_i^\alpha(r)` associated with all neighbors of species
:math:`\alpha` of the atom :math:`i` (each neighbor is replaced with a Gaussian function
centered on the neighbor :math:`g(r_{ij})`) is defined as:

.. math::

    \rho_i^\alpha(r) = \sum_{j \in \text{ neighborhood of i }} g(r_{ij})
        \delta_{\alpha_j,\alpha}


The exact mathematical details above don't matter too much for this tutorial, the main
point being that this representation treats atomic species as completely independent,
effectively using the neighbor species :math:`\alpha` for `one-hot encoding`_.

.. _one-hot encoding: https://en.wikipedia.org/wiki/One-hot

.. py:currentmodule:: metatensor

.. GENERATED FROM PYTHON SOURCE LINES 59-68

.. code-block:: Python


    import ase
    import ase.visualize.plot
    import matplotlib.pyplot as plt
    import numpy as np

    import metatensor as mts


.. GENERATED FROM PYTHON SOURCE LINES 69-71

We will work on the radial spectrum representation of three molecules in our system:
a carbon monoxide, an oxygen molecule and a nitrogen molecule.

.. GENERATED FROM PYTHON SOURCE LINES 72-83

.. code-block:: Python


    atoms = ase.Atoms(
        "COO2N2",
        positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)],
    )

    fig, ax = plt.subplots(figsize=(3, 3))
    ase.visualize.plot.plot_atoms(atoms, ax)
    ax.set_axis_off()
    plt.show()


.. image-sg:: /examples/core/images/sphx_glr_2-handling-sparsity_001.png
   :alt: 2 handling sparsity
   :srcset: /examples/core/images/sphx_glr_2-handling-sparsity_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 84-91

Sparsity in ``TensorMap``
-------------------------

The radial spectrum representation has two keys: ``central_species`` indicating the
species of the central atom (atom :math:`i` in the equations); and
``neighbor_type`` indicating the species of the neighboring atoms (atom :math:`j`
in the equations)

.. GENERATED FROM PYTHON SOURCE LINES 92-97

.. code-block:: Python


    radial_spectrum = mts.load("radial-spectrum.mts")

    print(radial_spectrum)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorMap with 5 blocks
    keys: center_type  neighbor_type
               6             6
               6             8
               7             7
               8             6
               8             8


.. GENERATED FROM PYTHON SOURCE LINES 98-109

This shows the first level of sparsity in ``TensorMap``: block sparsity.

Out of all possible combinations of ``central_species`` and ``neighbor_type``, some
are missing such as ``central_species=7, neighbor_type=8``. This is because we are
using a spherical cutoff of 2.5 Å, and as such there are no oxygen neighbor atoms
close enough to the nitrogen centers. This means that all the corresponding radial
spectrum coefficients :math:`R_i^\alpha(n)` will be zero (since the neighbor density
:math:`\rho_i^\alpha(r)` is zero everywhere).

Instead of wasting memory space by storing all of these zeros explicitly, we simply
avoid creating the corresponding blocks from the get-go and save a lot of memory!

.. GENERATED FROM PYTHON SOURCE LINES 113-115

Let's now look at the block containing the representation for oxygen centers and
carbon neighbors:

.. GENERATED FROM PYTHON SOURCE LINES 116-119

.. code-block:: Python


    block = radial_spectrum.block(center_type=8, neighbor_type=6)


.. GENERATED FROM PYTHON SOURCE LINES 120-122

Naively, this block should contain samples for all oxygen atoms (since
``center_type=8``); in practice we only have a single sample!

.. GENERATED FROM PYTHON SOURCE LINES 123-126

.. code-block:: Python


    print(block.samples)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        system  atom
          0      1
    )


.. GENERATED FROM PYTHON SOURCE LINES 127-134

There is a second level of sparsity happening here, using a format related to
`coordinate sparse arrays (COO format) <COO_>`_. Since there is only one oxygen atom
with carbon neighbors, we only include this atom in the samples, and the
density/radial spectrum coefficient for all the other oxygen atoms is assumed to be
zero.

.. _COO: https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)

.. GENERATED FROM PYTHON SOURCE LINES 138-149

Making the data dense again
---------------------------

Sometimes, we might have to use data in a sparse metatensor format with code that does
not understands this sparsity. One solution is to convert the data to a dense format,
making the zeros explicit as much as possible. Metatensor provides functionalities to
convert sparse data to a dense format for the keys sparsity; and metadata to convert
to a dense format for sample sparsity.

First, the sample sparsity can be removed block by block by creating a new array full
of zeros, and copying the data according to the indices in ``block.samples``

.. GENERATED FROM PYTHON SOURCE LINES 150-159

.. code-block:: Python


    dense_block_data = np.zeros((len(atoms), block.values.shape[1]))

    # only copy the non-zero data stored in the block
    dense_block_data[block.samples["atom"]] = block.values

    print(dense_block_data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[0.         0.         0.        ]
     [0.05052215 0.13111562 0.01025378]
     [0.         0.         0.        ]
     [0.         0.         0.        ]
     [0.         0.         0.        ]
     [0.         0.         0.        ]]


.. GENERATED FROM PYTHON SOURCE LINES 160-168

Alternatively, we can undo the keys sparsity with
:py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`,
which merge multiple blocks along the samples or properties dimensions respectively.

Which one of these functions to call will depend on the data you are handling.
Typically, one-hot encoding (the ``neighbor_types`` key here) should be merged
along the properties dimension; and keys that define subsets of the samples
(``center_type``) should be merged along the samples dimension.

.. GENERATED FROM PYTHON SOURCE LINES 169-173

.. code-block:: Python


    dense_radial_spectrum = radial_spectrum.keys_to_samples("center_type")
    dense_radial_spectrum = dense_radial_spectrum.keys_to_properties("neighbor_type")


.. GENERATED FROM PYTHON SOURCE LINES 174-176

After calling these two functions, we now have a :py:class:`TensorMap` with a single
block and no keys:

.. GENERATED FROM PYTHON SOURCE LINES 177-182

.. code-block:: Python


    print(dense_radial_spectrum)

    block = dense_radial_spectrum.block()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorMap with 1 blocks
    keys: _
          0


.. GENERATED FROM PYTHON SOURCE LINES 183-185

We can see that the resulting dense data array contains a lot of zeros (and has a well
defined block-sparse structure):

.. GENERATED FROM PYTHON SOURCE LINES 186-190

.. code-block:: Python


    with np.printoptions(precision=3):
        print(block.values)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[ 0.556 -0.282  0.016  0.051  0.131  0.01   0.     0.     0.   ]
     [ 0.051  0.131  0.01   0.556 -0.282  0.016  0.     0.     0.   ]
     [ 0.     0.     0.     0.633 -0.152  0.017  0.     0.     0.   ]
     [ 0.     0.     0.     0.633 -0.152  0.017  0.     0.     0.   ]
     [ 0.     0.     0.     0.     0.     0.     0.586 -0.157  0.036]
     [ 0.     0.     0.     0.     0.     0.     0.586 -0.157  0.036]]


.. GENERATED FROM PYTHON SOURCE LINES 191-194

And using the metadata attached to the block, we can understand which part of the data
is zero and why. For example, the lower-right corner of the array corresponds to
nitrogen atoms (the last two samples):

.. GENERATED FROM PYTHON SOURCE LINES 195-198

.. code-block:: Python


    print(block.samples.print(max_entries=-1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

     system  atom  center_type
       0      0         6
       0      1         8
       0      2         8
       0      3         8
       0      4         7
       0      5         7


.. GENERATED FROM PYTHON SOURCE LINES 199-201

And these two bottom rows are zero everywhere, except in the part representing the
nitrogen neighbor density:

.. GENERATED FROM PYTHON SOURCE LINES 202-204

.. code-block:: Python


    print(block.properties.print(max_entries=-1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

     neighbor_type  n
           6        0
           6        1
           6        2
           8        0
           8        1
           8        2
           7        0
           7        1
           7        2


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.035 seconds)


.. _sphx_glr_download_examples_core_2-handling-sparsity.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 2-handling-sparsity.ipynb <2-handling-sparsity.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 2-handling-sparsity.py <2-handling-sparsity.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 2-handling-sparsity.zip <2-handling-sparsity.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_