.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/core/2-handling-sparsity.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_core_2-handling-sparsity.py: .. _core-tutorial-sparsity: Handling sparsity ================= The one sentence introduction to metatensor mentions that it is a "self-describing **sparse** tensor data format". The :ref:`previous tutorial ` explained the self-describing part of the format, and in this tutorial we will explore what makes metatensor a sparse format; and how to remove this sparsity when required. Like in the :ref:`previous tutorial `, we will load the data we need from a file. The code used to generate this file can be found below: .. details:: Show the code used to generate the :file:`radial-spectrum.mts` file .. The data was generated with `featomic`_, a package to compute atomistic representations for machine learning applications. .. _featomic: https://metatensor.github.io/featomic/latest/index.html .. literalinclude:: radial-spectrum.py.example :language: python The file contains a representation of two molecules called the radial spectrum. The atom :math:`i` is represented by the radial spectrum :math:`R_i^\alpha`, which is an expansion of the neighbor density :math:`\rho_i^\alpha(r)` on a set of radial basis functions :math:`f_n(r)` .. math:: R_i^\alpha(n) = \int f_n(r) \rho_i(r) dr The density :math:`\rho_i^\alpha(r)` associated with all neighbors of species :math:`\alpha` for the atom :math:`i` (where each neighbor is replaced with a Gaussian function centered on the neighbor's coordinates :math:`g(r_{ij})`) is defined as: .. math:: \rho_i^\alpha(r) = \sum_{j \in \text{ neighborhood of i }} g(r_{ij}) \delta_{\alpha_j,\alpha} The exact mathematical details above don't matter too much for this tutorial, the main point being that this representation treats each atomic species as an independent quantity, effectively using the neighboring species :math:`\alpha` for `one-hot encoding`_. .. _one-hot encoding: https://en.wikipedia.org/wiki/One-hot .. py:currentmodule:: metatensor .. GENERATED FROM PYTHON SOURCE LINES 61-70 .. code-block:: Python import ase import ase.visualize.plot import matplotlib.pyplot as plt import numpy as np import metatensor as mts .. GENERATED FROM PYTHON SOURCE LINES 71-73 We will work on the radial spectrum representation of three molecules in our system: a carbon monoxide molecule, an oxygen molecule and a nitrogen molecule. .. GENERATED FROM PYTHON SOURCE LINES 74-85 .. code-block:: Python atoms = ase.Atoms( "COO2N2", positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)], ) fig, ax = plt.subplots(figsize=(3, 3)) ase.visualize.plot.plot_atoms(atoms, ax) ax.set_axis_off() plt.show() .. image-sg:: /examples/core/images/sphx_glr_2-handling-sparsity_001.png :alt: 2 handling sparsity :srcset: /examples/core/images/sphx_glr_2-handling-sparsity_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 86-93 Sparsity in ``TensorMap`` ------------------------- The radial spectrum representation has two keys: ``central_species`` indicating the species of the central atom (atom :math:`i` in the equations); and ``neighbor_type`` indicating the species of the neighboring atoms (atom :math:`j` in the equations) .. GENERATED FROM PYTHON SOURCE LINES 94-99 .. code-block:: Python radial_spectrum = mts.load("radial-spectrum.mts") print(radial_spectrum) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 5 blocks keys: center_type neighbor_type 6 6 6 8 7 7 8 6 8 8 .. GENERATED FROM PYTHON SOURCE LINES 100-112 This shows the first level of sparsity in ``TensorMap``: block sparsity. Out of all possible combinations of ``central_species`` and ``neighbor_type``, some are missing such as ``central_species=7, neighbor_type=8``. This is because we are using a spherical cutoff of 2.5 Å, and as such there are no oxygen neighbor atoms close enough to the nitrogen centers. This means that all the corresponding radial spectrum coefficients :math:`R_i^\alpha(n)` will be zero (since the neighbor density :math:`\rho_i^\alpha(r)` is zero everywhere). Instead of wasting memory by storing all of these zeros explicitly, we simply avoid creating the corresponding blocks from the get-go and save a lot of memory! .. GENERATED FROM PYTHON SOURCE LINES 116-118 Let's now look at the block containing the representation for oxygen centers and carbon neighbors: .. GENERATED FROM PYTHON SOURCE LINES 119-122 .. code-block:: Python block = radial_spectrum.block(center_type=8, neighbor_type=6) .. GENERATED FROM PYTHON SOURCE LINES 123-125 Naively, this block should contain samples for all oxygen atoms (since ``center_type=8``); in practice we only have a single sample! .. GENERATED FROM PYTHON SOURCE LINES 126-129 .. code-block:: Python print(block.samples) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( system atom 0 1 ) .. GENERATED FROM PYTHON SOURCE LINES 130-137 There is a second level of sparsity here, using a format related to the `coordinate sparse arrays (COO format) `_. Since there is only one oxygen atom with carbon neighbors, we only include this atom in the samples, and the density/radial spectrum coefficient for all the other oxygen atoms is assumed to be zero. .. _COO: https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO) .. GENERATED FROM PYTHON SOURCE LINES 141-154 Making the data dense again --------------------------- Sometimes, we might have to use data in a sparse metatensor format with code that does not understands this sparsity pattern. One solution is to convert the data to a dense format, making the zeros explicit. Metatensor provides functionalities to convert sparse data to a dense format for the keys sparsity; and metadata to convert to a dense format for sample sparsity. First, the sample sparsity can be removed block by block by creating a new array full of zeros, and copying the data according to the indices in ``block.samples`` .. GENERATED FROM PYTHON SOURCE LINES 155-164 .. code-block:: Python dense_block_data = np.zeros((len(atoms), block.values.shape[1])) # only copy the non-zero data stored in the block dense_block_data[block.samples["atom"]] = block.values print(dense_block_data) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0. 0. 0. ] [0.05052215 0.13111562 0.01025378] [0. 0. 0. ] [0. 0. 0. ] [0. 0. 0. ] [0. 0. 0. ]] .. GENERATED FROM PYTHON SOURCE LINES 165-173 Alternatively, we can undo the keys sparsity with :py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`, which merge multiple blocks along the samples or properties dimensions respectively. Which one of these functions to call will depend on the data you are handling. Typically, one-hot encoding (the ``neighbor_types`` key here) should be merged along the properties dimension; and keys that define subsets of the samples (``center_type``) should be merged along the samples dimension. .. GENERATED FROM PYTHON SOURCE LINES 174-178 .. code-block:: Python dense_radial_spectrum = radial_spectrum.keys_to_samples("center_type") dense_radial_spectrum = dense_radial_spectrum.keys_to_properties("neighbor_type") .. GENERATED FROM PYTHON SOURCE LINES 179-181 After calling these two functions, we now have a :py:class:`TensorMap` with a single block and no keys: .. GENERATED FROM PYTHON SOURCE LINES 182-187 .. code-block:: Python print(dense_radial_spectrum) block = dense_radial_spectrum.block() .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 1 blocks keys: _ 0 .. GENERATED FROM PYTHON SOURCE LINES 188-190 We can see that the resulting dense data array contains a lot of zeros (and has a well defined block-sparse structure): .. GENERATED FROM PYTHON SOURCE LINES 191-195 .. code-block:: Python with np.printoptions(precision=3): print(block.values) .. rst-class:: sphx-glr-script-out .. code-block:: none [[ 0.556 -0.282 0.016 0.051 0.131 0.01 0. 0. 0. ] [ 0.051 0.131 0.01 0.556 -0.282 0.016 0. 0. 0. ] [ 0. 0. 0. 0.633 -0.152 0.017 0. 0. 0. ] [ 0. 0. 0. 0.633 -0.152 0.017 0. 0. 0. ] [ 0. 0. 0. 0. 0. 0. 0.586 -0.157 0.036] [ 0. 0. 0. 0. 0. 0. 0.586 -0.157 0.036]] .. GENERATED FROM PYTHON SOURCE LINES 196-199 By using the metadata attached to the block, we can understand which part of the data is zero and why. For example, the lower-right corner of the array corresponds to the nitrogen atoms (the last two samples): .. GENERATED FROM PYTHON SOURCE LINES 200-203 .. code-block:: Python print(block.samples.print(max_entries=-1)) .. rst-class:: sphx-glr-script-out .. code-block:: none system atom center_type 0 0 6 0 1 8 0 2 8 0 3 8 0 4 7 0 5 7 .. GENERATED FROM PYTHON SOURCE LINES 204-206 And these two bottom rows are zero everywhere, except in the part representing the nitrogen neighbor density: .. GENERATED FROM PYTHON SOURCE LINES 207-209 .. code-block:: Python print(block.properties.print(max_entries=-1)) .. rst-class:: sphx-glr-script-out .. code-block:: none neighbor_type n 6 0 6 1 6 2 8 0 8 1 8 2 7 0 7 1 7 2 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.035 seconds) .. _sphx_glr_download_examples_core_2-handling-sparsity.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 2-handling-sparsity.ipynb <2-handling-sparsity.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 2-handling-sparsity.py <2-handling-sparsity.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 2-handling-sparsity.zip <2-handling-sparsity.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_