.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/core/2-handling-sparsity.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_core_2-handling-sparsity.py: .. _core-tutorial-sparsity: Handling sparsity ================= The one sentence introduction to metatensor mentions that this is a "self-describing **sparse** tensor data format". The :ref:`previous tutorial ` explained the self-describing part of the format, and in this tutorial we will explore what makes metatensor a sparse format; and how to remove the corresponding sparsity when required. Like in the :ref:`previous tutorial `, we will load the data we need from a file. The code used to generate this file can be found below: .. details:: Show the code used to generate the :file:`radial-spectrum.npz` file .. The data was generated with `featomic`_, a package to compute atomistic representations for machine learning applications. .. _featomic: https://metatensor.github.io/featomic/latest/index.html .. literalinclude:: radial-spectrum.py.example :language: python The file contains a representation of two molecules called the radial spectrum. The atom :math:`i` is represented by the radial spectrum :math:`R_i^\alpha`, which is an expansion of the neighbor density :math:`\rho_i^\alpha(r)` on a set of radial basis functions :math:`f_n(r)` .. math:: R_i^\alpha(n) = \int f_n(r) \rho_i(r) dr The density :math:`\rho_i^\alpha(r)` associated with all neighbors of species :math:`\alpha` of the atom :math:`i` (each neighbor is replaced with a Gaussian function centered on the neighbor :math:`g(r_{ij})`) is defined as: .. math:: \rho_i^\alpha(r) = \sum_{j \in \text{ neighborhood of i }} g(r_{ij}) \delta_{\alpha_j,\alpha} The exact mathematical details above don't matter too much for this tutorial, the main point being that this representation treats atomic species as completely independent, effectively using the neighbor species :math:`\alpha` for `one-hot encoding`_. .. _one-hot encoding: https://en.wikipedia.org/wiki/One-hot .. py:currentmodule:: metatensor .. GENERATED FROM PYTHON SOURCE LINES 59-68 .. code-block:: Python import ase import ase.visualize.plot import matplotlib.pyplot as plt import numpy as np import metatensor .. GENERATED FROM PYTHON SOURCE LINES 69-71 We will work on the radial spectrum representation of three molecules in our system: a carbon monoxide, an oxygen molecule and a nitrogen molecule. .. GENERATED FROM PYTHON SOURCE LINES 72-83 .. code-block:: Python atoms = ase.Atoms( "COO2N2", positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)], ) fig, ax = plt.subplots(figsize=(3, 3)) ase.visualize.plot.plot_atoms(atoms, ax) ax.set_axis_off() plt.show() .. image-sg:: /examples/core/images/sphx_glr_2-handling-sparsity_001.png :alt: 2 handling sparsity :srcset: /examples/core/images/sphx_glr_2-handling-sparsity_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 84-91 Sparsity in ``TensorMap`` ------------------------- The radial spectrum representation has two keys: ``central_species`` indicating the species of the central atom (atom :math:`i` in the equations); and ``neighbor_type`` indicating the species of the neighboring atoms (atom :math:`j` in the equations) .. GENERATED FROM PYTHON SOURCE LINES 92-97 .. code-block:: Python radial_spectrum = metatensor.load("radial-spectrum.npz") print(radial_spectrum) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 5 blocks keys: center_type neighbor_type 6 6 6 8 7 7 8 6 8 8 .. GENERATED FROM PYTHON SOURCE LINES 98-109 This shows the first level of sparsity in ``TensorMap``: block sparsity. Out of all possible combinations of ``central_species`` and ``neighbor_type``, some are missing such as ``central_species=7, neighbor_type=8``. This is because we are using a spherical cutoff of 2.5 Å, and as such there are no oxygen neighbor atoms close enough to the nitrogen centers. This means that all the corresponding radial spectrum coefficients :math:`R_i^\alpha(n)` will be zero (since the neighbor density :math:`\rho_i^\alpha(r)` is zero everywhere). Instead of wasting memory space by storing all of these zeros explicitly, we simply avoid creating the corresponding blocks from the get-go and save a lot of memory! .. GENERATED FROM PYTHON SOURCE LINES 113-115 Let's now look at the block containing the representation for oxygen centers and carbon neighbors: .. GENERATED FROM PYTHON SOURCE LINES 116-119 .. code-block:: Python block = radial_spectrum.block(center_type=8, neighbor_type=6) .. GENERATED FROM PYTHON SOURCE LINES 120-122 Naively, this block should contain samples for all oxygen atoms (since ``center_type=8``); in practice we only have a single sample! .. GENERATED FROM PYTHON SOURCE LINES 123-126 .. code-block:: Python print(block.samples) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( system atom 0 1 ) .. GENERATED FROM PYTHON SOURCE LINES 127-134 There is a second level of sparsity happening here, using a format related to `coordinate sparse arrays (COO format) `_. Since there is only one oxygen atom with carbon neighbors, we only include this atom in the samples, and the density/radial spectrum coefficient for all the other oxygen atoms is assumed to be zero. .. _COO: https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO) .. GENERATED FROM PYTHON SOURCE LINES 138-149 Making the data dense again --------------------------- Sometimes, we might have to use data in a sparse metatensor format with code that does not understands this sparsity. One solution is to convert the data to a dense format, making the zeros explicit as much as possible. Metatensor provides functionalities to convert sparse data to a dense format for the keys sparsity; and metadata to convert to a dense format for sample sparsity. First, the sample sparsity can be removed block by block by creating a new array full of zeros, and copying the data according to the indices in ``block.samples`` .. GENERATED FROM PYTHON SOURCE LINES 150-159 .. code-block:: Python dense_block_data = np.zeros((len(atoms), block.values.shape[1])) # only copy the non-zero data stored in the block dense_block_data[block.samples["atom"]] = block.values print(dense_block_data) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0. 0. 0. ] [0.05052215 0.13111562 0.01025378] [0. 0. 0. ] [0. 0. 0. ] [0. 0. 0. ] [0. 0. 0. ]] .. GENERATED FROM PYTHON SOURCE LINES 160-168 Alternatively, we can undo the keys sparsity with :py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`, which merge multiple blocks along the samples or properties dimensions respectively. Which one of these functions to call will depend on the data you are handling. Typically, one-hot encoding (the ``neighbor_types`` key here) should be merged along the properties dimension; and keys that define subsets of the samples (``center_type``) should be merged along the samples dimension. .. GENERATED FROM PYTHON SOURCE LINES 169-173 .. code-block:: Python dense_radial_spectrum = radial_spectrum.keys_to_samples("center_type") dense_radial_spectrum = dense_radial_spectrum.keys_to_properties("neighbor_type") .. GENERATED FROM PYTHON SOURCE LINES 174-176 After calling these two functions, we now have a :py:class:`TensorMap` with a single block and no keys: .. GENERATED FROM PYTHON SOURCE LINES 177-182 .. code-block:: Python print(dense_radial_spectrum) block = dense_radial_spectrum.block() .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 1 blocks keys: _ 0 .. GENERATED FROM PYTHON SOURCE LINES 183-185 We can see that the resulting dense data array contains a lot of zeros (and has a well defined block-sparse structure): .. GENERATED FROM PYTHON SOURCE LINES 186-190 .. code-block:: Python with np.printoptions(precision=3): print(block.values) .. rst-class:: sphx-glr-script-out .. code-block:: none [[ 0.556 -0.282 0.016 0.051 0.131 0.01 0. 0. 0. ] [ 0.051 0.131 0.01 0.556 -0.282 0.016 0. 0. 0. ] [ 0. 0. 0. 0.633 -0.152 0.017 0. 0. 0. ] [ 0. 0. 0. 0.633 -0.152 0.017 0. 0. 0. ] [ 0. 0. 0. 0. 0. 0. 0.586 -0.157 0.036] [ 0. 0. 0. 0. 0. 0. 0.586 -0.157 0.036]] .. GENERATED FROM PYTHON SOURCE LINES 191-194 And using the metadata attached to the block, we can understand which part of the data is zero and why. For example, the lower-right corner of the array corresponds to nitrogen atoms (the last two samples): .. GENERATED FROM PYTHON SOURCE LINES 195-198 .. code-block:: Python print(block.samples.print(max_entries=-1)) .. rst-class:: sphx-glr-script-out .. code-block:: none system atom center_type 0 0 6 0 1 8 0 2 8 0 3 8 0 4 7 0 5 7 .. GENERATED FROM PYTHON SOURCE LINES 199-201 And these two bottom rows are zero everywhere, except in the part representing the nitrogen neighbor density: .. GENERATED FROM PYTHON SOURCE LINES 202-204 .. code-block:: Python print(block.properties.print(max_entries=-1)) .. rst-class:: sphx-glr-script-out .. code-block:: none neighbor_type n 6 0 6 1 6 2 8 0 8 1 8 2 7 0 7 1 7 2 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.034 seconds) .. _sphx_glr_download_examples_core_2-handling-sparsity.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 2-handling-sparsity.ipynb <2-handling-sparsity.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 2-handling-sparsity.py <2-handling-sparsity.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 2-handling-sparsity.zip <2-handling-sparsity.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_