.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/core/1-first-steps.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_core_1-first-steps.py: .. _core-tutorial-first-steps: First steps with metatensor =========================== .. |CO2| replace:: CO\ :sub:`2` This tutorial explores how data is stored inside metatensor's ``TensorMap``, and how to access the associated metadata. This is a companion to the :ref:`core classes overview ` page of this documentation, presenting the same concepts with code examples. To this end, we will need some data in metatensor format, which for the sake of simplicity will be loaded from a file. The code used to generate this file can be found below: .. details:: Show the code used to generate the :download:`spherical-expansion.npz` file, or use the link to download it .. The data was generated with `rascaline`_, a package to compute atomistic representations for machine learning applications. .. _rascaline: https://luthaf.fr/rascaline/latest/index.html .. literalinclude:: spherical-expansion.py.example :language: python The :py:class:`TensorMap` stored in the file contains a machine learning representation (the spherical expansion) of all the atoms in a |CO2| molecule. You don't need to know anything the spherical expansion to follow this tutorial! .. py:currentmodule:: metatensor .. GENERATED FROM PYTHON SOURCE LINES 41-49 .. code-block:: Python import ase import ase.visualize.plot import matplotlib.pyplot as plt import metatensor .. GENERATED FROM PYTHON SOURCE LINES 50-51 For reference, we are working with a representation of this |CO2| molecule: .. GENERATED FROM PYTHON SOURCE LINES 52-64 .. code-block:: Python co2 = ase.Atoms( "CO2", positions=[(0, 0, 0), (-0.2, -0.65, 0.94), (0.2, 0.65, -0.94)], ) fig, ax = plt.subplots(figsize=(3, 3)) ase.visualize.plot.plot_atoms(co2, ax) ax.set_axis_off() plt.show() .. image-sg:: /examples/core/images/sphx_glr_1-first-steps_001.png :alt: 1 first steps :srcset: /examples/core/images/sphx_glr_1-first-steps_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 65-70 The main entry point: ``TensorMap`` ----------------------------------- We'll start by loading our data with :py:func:`metatensor.load`. The ``tensor`` returned by this function is a :py:class:`TensorMap`, the core class of metatensor. .. GENERATED FROM PYTHON SOURCE LINES 71-76 .. code-block:: Python tensor = metatensor.load("spherical-expansion.npz") print(type(tensor)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 78-80 Looking at the tensor tells us that it is composed of 12 blocks, each associated with a key: .. GENERATED FROM PYTHON SOURCE LINES 81-84 .. code-block:: Python print(tensor) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorMap with 12 blocks keys: o3_lambda o3_sigma center_type neighbor_type 0 1 6 6 1 1 6 6 2 1 6 6 0 1 6 8 1 1 6 8 2 1 6 8 0 1 8 6 1 1 8 6 2 1 8 6 0 1 8 8 1 1 8 8 2 1 8 8 .. GENERATED FROM PYTHON SOURCE LINES 85-106 We can see that here, the keys of the :py:class:`TensorMap` have four named *dimensions*. Two of these are used to describe the behavior of the data under spatial transformations (rotations and inversions in the O3 group): - ``o3_lambda``, indicating the character of o3 irreducible representation this block is following. In general, a block with ``o3_lambda=3`` will transform under rotations like a ``l=3`` spherical harmonics. - ``o3_sigma``, which describe the behavior of the data under inversion symmetry. Here all blocks have ``o3_sigma=1``, meaning we only have data with the usual inversion symmetry (``o3_sigma=-1`` would be used for pseudo-tensors); And the other two are related to the composition of the system: - ``center_type`` represents the atomic type of the central atom in consideration. For |CO2|, we have both carbons (type 6) and oxygens (type 8); - ``neighbor_type`` represents the atomic type of the neighbor atoms considered by the machine learning representation, in this case it takes the values 6 and 8 as well. These keys can be accessed with :py:attr:`TensorMap.keys`, and they are an instance of the :py:class:`Labels` class: .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: Python keys = tensor.keys print(type(keys)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 113-124 ``Labels`` to store metadata ---------------------------- One of the :ref:`main goals of metatensor ` is to be able to store both data and metadata together. We've just encountered the first example of this metadata as the :py:class:`TensorMap` keys! In general, most metadata will be stored in the :py:class:`Labels` class. Let's explore this class a bit. As already mentioned, :py:class:`Labels` can have multiple dimensions, and each dimension has a name. We can look at all the dimension names simultaneously with :py:func:`Labels.names`: .. GENERATED FROM PYTHON SOURCE LINES 125-127 .. code-block:: Python print(keys.names) .. rst-class:: sphx-glr-script-out .. code-block:: none ['o3_lambda', 'o3_sigma', 'center_type', 'neighbor_type'] .. GENERATED FROM PYTHON SOURCE LINES 128-130 :py:class:`Labels` then contains multiple entries, each entry being described by a set of integer values, one for each dimension of the labels. .. GENERATED FROM PYTHON SOURCE LINES 131-134 .. code-block:: Python print(keys.values) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0 1 6 6] [1 1 6 6] [2 1 6 6] [0 1 6 8] [1 1 6 8] [2 1 6 8] [0 1 8 6] [1 1 8 6] [2 1 8 6] [0 1 8 8] [1 1 8 8] [2 1 8 8]] .. GENERATED FROM PYTHON SOURCE LINES 135-137 We can access all the values taken by a given dimension/column in the labels with :py:func:`Labels.column` or by indexing with a string: .. GENERATED FROM PYTHON SOURCE LINES 138-141 .. code-block:: Python print(keys["o3_lambda"]) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 2 0 1 2 0 1 2 0 1 2] .. GENERATED FROM PYTHON SOURCE LINES 143-146 .. code-block:: Python print(keys.column("center_type")) .. rst-class:: sphx-glr-script-out .. code-block:: none [6 6 6 6 6 6 8 8 8 8 8 8] .. GENERATED FROM PYTHON SOURCE LINES 147-149 We can also access individual entries in the labels by iterating over them or indexing with an integer: .. GENERATED FROM PYTHON SOURCE LINES 150-160 .. code-block:: Python print("Entries with o3_lambda=2:") for entry in keys: if entry["o3_lambda"] == 2: print(" ", entry) print("\nEntry at index 3:") print(" ", keys[3]) .. rst-class:: sphx-glr-script-out .. code-block:: none Entries with o3_lambda=2: LabelsEntry(o3_lambda=2, o3_sigma=1, center_type=6, neighbor_type=6) LabelsEntry(o3_lambda=2, o3_sigma=1, center_type=6, neighbor_type=8) LabelsEntry(o3_lambda=2, o3_sigma=1, center_type=8, neighbor_type=6) LabelsEntry(o3_lambda=2, o3_sigma=1, center_type=8, neighbor_type=8) Entry at index 3: LabelsEntry(o3_lambda=0, o3_sigma=1, center_type=6, neighbor_type=8) .. GENERATED FROM PYTHON SOURCE LINES 161-168 ``TensorBlock`` to store the data --------------------------------- Each entry in the :py:attr:`TensorMap.keys` is associated with a :py:class:`TensorBlock`, which contains the actual data and some additional metadata. We can extract the block from a key by indexing our :py:class:`TensorMap`, or with the :py:func:`TensorMap.block` .. GENERATED FROM PYTHON SOURCE LINES 169-177 .. code-block:: Python # this is equivalent to `block = tensor[tensor.keys[0]]` block = tensor[0] block = tensor.block(o3_lambda=1, center_type=8, neighbor_type=6) print(block) .. rst-class:: sphx-glr-script-out .. code-block:: none TensorBlock samples (2): ['system', 'atom'] components (3): ['o3_mu'] properties (5): ['n'] gradients: ['positions'] .. GENERATED FROM PYTHON SOURCE LINES 178-187 Each block contains some data, stored inside the :py:attr:`TensorBlock.values`. Here, the values contains the different coefficients of the spherical expansion, i.e. our atomistic machine learning representation. The problem with this array is that we do not know what the different numbers correspond to: different libraries might be using different convention and storage order, and one has to read documentation carefully if they want to use this kind of data. Metatensor helps by making this data self-describing; by attaching metadata to each element of the array indicating what exactly we are working with. .. GENERATED FROM PYTHON SOURCE LINES 188-191 .. code-block:: Python print(block.values) .. rst-class:: sphx-glr-script-out .. code-block:: none [[[ 2.41688320e-02 1.37159979e-01 4.01218353e-02 -1.59115730e-04 3.03056007e-04] [-3.49518493e-02 -1.98354431e-01 -5.80223464e-02 2.30105825e-04 -4.38265610e-04] [ 7.43656369e-03 4.22030705e-02 1.23451801e-02 -4.89586862e-05 9.32480021e-05]] [[-2.41688320e-02 -1.37159979e-01 -4.01218353e-02 1.59115730e-04 -3.03056007e-04] [ 3.49518493e-02 1.98354431e-01 5.80223464e-02 -2.30105825e-04 4.38265610e-04] [-7.43656369e-03 -4.22030705e-02 -1.23451801e-02 4.89586862e-05 -9.32480021e-05]]] .. GENERATED FROM PYTHON SOURCE LINES 192-195 The metadata is attached to the different array axes, and stored in :py:class:`Labels`. The array must have at least two axes but can have more if required. Here, we have three: .. GENERATED FROM PYTHON SOURCE LINES 196-199 .. code-block:: Python print(block.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (2, 3, 5) .. GENERATED FROM PYTHON SOURCE LINES 200-209 The **first** dimension of the ``values`` array is described by the :py:attr:`TensorBlock.samples` labels, and correspond to **what** is being described. This follows the usual convention in machine learning, using the different rows of the array to store separate samples/observations. Here, since we are working with a per-atom representation, the samples contain the index of the structure and atomic center in this structure. Since we are looking at a block for ``center_type=8``, we have two samples, one for each oxygen atom in our single |CO2| molecule. .. GENERATED FROM PYTHON SOURCE LINES 210-213 .. code-block:: Python print(block.samples) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( system atom 0 1 0 2 ) .. GENERATED FROM PYTHON SOURCE LINES 214-218 The **last** dimension of the ``values`` array is described by the :py:attr:`TensorBlock.properties` labels, and correspond to **how** we are describing our subject. Here, we are using a radial basis, indexed by an integer ``n``: .. GENERATED FROM PYTHON SOURCE LINES 219-222 .. code-block:: Python print(repr(block.properties)) .. rst-class:: sphx-glr-script-out .. code-block:: none Labels( n 0 1 2 3 4 ) .. GENERATED FROM PYTHON SOURCE LINES 223-228 Finally, each **intermediate** dimension of the ``values`` array is described by one set of :py:attr:`TensorBlock.components` labels. These dimensions correspond to one or more *vectorial components* in the data. Here the only component corresponds to the different :math:`m` number in spherical harmonics :math:`Y_l^m`, going from -1 to 1 since we are looking at the block for ``o3_lambda = 1``: .. GENERATED FROM PYTHON SOURCE LINES 229-232 .. code-block:: Python print(block.components) .. rst-class:: sphx-glr-script-out .. code-block:: none [Labels( o3_mu -1 0 1 )] .. GENERATED FROM PYTHON SOURCE LINES 233-240 All this metadata allow us to know exactly what each entry in the ``values`` corresponds to. For example, we can see that the value at position ``(1, 0, 3)`` corresponds to: - the center at index 2 inside the structure at index 0; - the ``m=-1`` part of the spherical harmonics; - the coefficients on the ``n=3`` radial basis function. .. GENERATED FROM PYTHON SOURCE LINES 241-248 .. code-block:: Python print("value =", block.values[1, 0, 3]) print("sample =", block.samples[1]) print("component =", block.components[0][0]) print("property =", block.properties[3]) .. rst-class:: sphx-glr-script-out .. code-block:: none value = 0.00015911573016680835 sample = LabelsEntry(system=0, atom=2) component = LabelsEntry(o3_mu=-1) property = LabelsEntry(n=3) .. GENERATED FROM PYTHON SOURCE LINES 249-258 Wrapping it up -------------- .. figure:: /../static/images/TensorMap.* :width: 400px :align: center Illustration of the structure of a :py:class:`TensorMap`, with multiple keys and blocks. .. GENERATED FROM PYTHON SOURCE LINES 261-283 To summarize this tutorial, we saw that a :py:class:`TensorMap` contains multiple :py:class:`TensorBlock`, each associated with a key. The key describes the block, and what kind of data will be found inside. The blocks contains the actual data, and multiple set of metadata, one for each axis of the data array. - The rows are described by ``samples`` labels, which describe **what** is being stored; - the (generalized) columns are described by ``properties``, which describe **how** the data is being represented; - Additional axes of the array correspond to vectorial ``components`` in the data. All the metadata is stored inside :py:class:`Labels`, where each entry is described by the integer values is takes along some named dimensions. For a more visual approach to this data organization, you can also read the :ref:`core classes overview `. We have learned how metatensor organizes its data, and what makes it a "self describing data format". In the :ref:`next tutorial `, we will explore what makes metatensor :py:class:`TensorMap` a "sparse data format". .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.372 seconds) .. _sphx_glr_download_examples_core_1-first-steps.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 1-first-steps.ipynb <1-first-steps.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 1-first-steps.py <1-first-steps.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 1-first-steps.zip <1-first-steps.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_