Note
Go to the end to download the full example code
Managing gradients#
Another big difference between metatensor and other more generic data storage formats is its ability to store and manipulate one or more gradients of the data together with the data itself. In this tutorial, we will see how and where gradients are stored, and what one can do with them.
Note
Metatensor supports multiple ways of managing gradients: explicit forward gradients,
presented in this tutorial; and implicit backward mode gradients (only when the data
is stored in torch.Tensor
). Both can be mixed as well, to compute
backward mode gradients of explicit forward mode gradients when training a model
with gradient data (forces and virial).
In general, the explicit forward gradients presented here are mainly relevant to you if you are working within the numpy ecosystem; and the implicit backward gradients are more interesting if you are working in the PyTorch ecosystem.
The code used to generate spherical-expansion.npz
is in the first
tutorial, and the code for radial-spectrum.npz
is shown in the second. Notice how in both cases, the
data was computed with gradients=["positions"]
, meaning the gradients with respect
to atomic positions are included.
import metatensor
from metatensor import TensorBlock, TensorMap
Amazing gradients and where to find them#
In the first tutorial, we have seen how metatensor
stores data and metadata together inside TensorBlock
; and groups multiple
blocks to form a full TensorMap
. To refresh our memory, let’s load some
data (the radial spectrum from the sparsity tutorial).
It is a tensor map with two dimensions for the keys; and 5 blocks:
radial_spectrum = metatensor.load("radial-spectrum.npz")
print(radial_spectrum)
TensorMap with 5 blocks
keys: species_center species_neighbor
6 6
6 8
7 7
8 6
8 8
If we look at one of the block, we can see that is contains gradients with respect to
"positions"
:
block = radial_spectrum.block(species_center=7, species_neighbor=7)
print(block)
TensorBlock
samples (2): ['structure', 'center']
components (): []
properties (3): ['n']
gradients: ['positions']
Gradients are stored inside normal TensorBlock
, with their own set of
metadata in the samples, components and properties dimensions.
gradient = block.gradient("positions")
print(gradient)
Gradient TensorBlock ('positions')
samples (4): ['sample', 'structure', 'atom']
components (3): ['direction']
properties (3): ['n']
gradients: None
The samples are different from the values blocks (the block to which this gradient it
attached to): there is a first "sample"
dimension, followed by a pair of indices
(structure, atom)
.
The "sample"
dimension is always present in gradients, and indicate which of the
samples in the values block we are taking the gradient of. Here, the first row of the
gradients will contain a gradient of the first sample in the values; with respect to
the position of atom 4; while the last row of the gradients contains a gradient of the
second row of the values with respect to the position of atom 5.
print(gradient.samples)
Labels(
sample structure atom
0 0 4
0 0 5
1 0 4
1 0 5
)
Re-using the notation from the previous tutorial, the values contain \(\rho_i\), for a given atomic center \(i\).
print(block.samples)
Labels(
structure center
0 4
0 5
)
If we look a the samples for the values, we can express the four samples in this gradient block as
\(\nabla_4 \rho_4\): gradient of the representation of atom 4 with respect to the position of atom 4;
\(\nabla_5 \rho_4\): gradient of the representation of atom 4 with respect to the position of atom 5;
\(\nabla_4 \rho_5\): gradient of the representation of atom 5 with respect to the position of atom 4;
\(\nabla_5 \rho_5\): gradient of the representation of atom 5 with respect to the position of atom 5.
You’ll realize that some of the combinations of atoms are missing here: there is no gradient of \(\rho_4\) with respect to the positions of atom 0, 1, 2, etc. This is another instance of the data sparsity that metatensor enable: only the non-zero gradients are actually stored in memory.
The gradient block can also differ from the values block in the components: here the values have no components, but the gradient have one, representing the x/y/z cartesian coordinate direction of the gradient with respect to positions.
print(gradient.components)
[Labels(
direction
0
1
2
)]
Finally, the gradient properties are guaranteed to be the same as the values properties.
print(block.properties == gradient.properties)
True
The gradient block also contains the data for the gradient, in the values
attribute. Here the gradients are zeros everywhere except in the x direction because
in the original input, the N2 molecule was oriented along the x axis.
print(gradient.values)
[[[ 0.17209336 0.08442732 -0.09967984]
[ 0. 0. 0. ]
[ 0. 0. 0. ]]
[[-0.17209336 -0.08442732 0.09967984]
[ 0. 0. 0. ]
[ 0. 0. 0. ]]
[[ 0.17209336 0.08442732 -0.09967984]
[ 0. 0. 0. ]
[ 0. 0. 0. ]]
[[-0.17209336 -0.08442732 0.09967984]
[ 0. 0. 0. ]
[ 0. 0. 0. ]]]
What if the values have components?#
We have seen that the gradient samples are related to the values samples with the
sample
dimension; and that the gradient are allowed to have custom components
.
You might be wondering what happen if the values already have some components!
Let’s load such an example, the spherical expansion from the first steps tutorial:
spherical_expansion = metatensor.load("spherical-expansion.npz")
print(spherical_expansion)
TensorMap with 12 blocks
keys: spherical_harmonics_l species_center species_neighbor
0 6 6
1 6 6
2 6 6
0 6 8
1 6 8
2 6 8
0 8 6
1 8 6
2 8 6
0 8 8
1 8 8
2 8 8
In the TensorMap
above, the value blocks already have a set of components
corresponding to the \(m\) index of spherical harmonics:
block = spherical_expansion.block(2)
print(block.components)
[Labels(
spherical_harmonics_m
-2
-1
0
1
2
)]
If we look at the gradients with respect to positions, we see that they contain two
sets of components: the same direction
component as the radial spectrum example
earlier; and the same spherical_harmonics_m
as the values.
gradient = block.gradient("positions")
print(gradient)
Gradient TensorBlock ('positions')
samples (1): ['sample', 'structure', 'atom']
components (3, 5): ['direction', 'spherical_harmonics_m']
properties (5): ['n']
gradients: None
print("first set of components:", gradient.components[0])
print("second set of components:", gradient.components[1])
first set of components: Labels(
direction
0
1
2
)
second set of components: Labels(
spherical_harmonics_m
-2
-1
...
1
2
)
In general, the gradient blocks are allowed to have additional components when compared to the values, but these extra components must come first, and are followed by the same set of components as the values.
Using gradients in calculations#
Now that we know about gradient storage in metatensor, we should try to compute a new set of values and their corresponding gradients.
Let’s compute the square of the radial spectrum, \(h(r) = \rho^2(r)\), and the corresponding gradients with respect to atomic positions. The chain rules tells us that the gradient should be
Since the calculation can happen block by block, let’s define a function to compute a new \(h(r)\) block:
def compute_square(block: TensorBlock) -> TensorBlock:
# compute the new values
new_values = block.values**2
# store the new values in a block
new_block = TensorBlock(
values=new_values,
samples=block.samples,
components=block.components,
properties=block.properties,
)
# compute the new gradient
gradient = block.gradient("positions")
# `block.values[gradient.samples["sample"]]` gives us an array with a shape
# compatible with `gradient.values`; using the right row in the values to compute
# the a given row of the gradients. ``None`` creates an additional dimension to
# match the components of the gradients.
broadcasted_values = block.values[gradient.samples["sample"], None, :]
new_gradient_values = 2.0 * broadcasted_values * gradient.values
new_gradient = TensorBlock(
values=new_gradient_values,
samples=gradient.samples,
components=gradient.components,
properties=gradient.properties,
)
# store the gradient in the new block
new_block.add_gradient("positions", new_gradient)
return new_block
One issue when applying the equation above blindly is that block.values
(i.e.
\(\rho(r)\)) and gradient.values
(i.e. \(\nabla \rho(r)\)) have different
shape. Fortunately, we already know how to match them: gradient.samples["sample"]
contains the indices of block.values
matching each row of gradient.values
.
gradient = radial_spectrum.block(2).gradient("positions")
print(gradient.samples["sample"])
[0 0 1 1]
We can now apply this function on all the blocks, and reconstruct a new
TensorMap
:
blocks = [compute_square(block) for block in radial_spectrum.blocks()]
squared = TensorMap(keys=radial_spectrum.keys, blocks=blocks)
squares
has the same shape and sparsity pattern as radial_spectrum
, but
contains different values:
print(squared)
TensorMap with 5 blocks
keys: species_center species_neighbor
6 6
6 8
7 7
8 6
8 8
rs_block = radial_spectrum.block(2)
squared_block = squared.block(2)
print("radial_spectrum block:", rs_block)
print("square block:", squared_block)
radial_spectrum block: TensorBlock
samples (2): ['structure', 'center']
components (): []
properties (3): ['n']
gradients: ['positions']
square block: TensorBlock
samples (2): ['structure', 'center']
components (): []
properties (3): ['n']
gradients: ['positions']
print("radial_spectrum values:", rs_block.values)
print("square values:", squared_block.values)
radial_spectrum values: [[ 0.58644107 -0.15682146 0.03612176]
[ 0.58644107 -0.15682146 0.03612176]]
square values: [[0.34391313 0.02459297 0.00130478]
[0.34391313 0.02459297 0.00130478]]
print("radial_spectrum gradient:", rs_block.gradient("positions").values[:, 0])
print("square gradient:", squared_block.gradient("positions").values[:, 0])
radial_spectrum gradient: [[ 0.17209336 0.08442732 -0.09967984]
[-0.17209336 -0.08442732 0.09967984]
[ 0.17209336 0.08442732 -0.09967984]
[-0.17209336 -0.08442732 0.09967984]]
square gradient: [[ 0.20184523 -0.02648003 -0.00720122]
[-0.20184523 0.02648003 0.00720122]
[ 0.20184523 -0.02648003 -0.00720122]
[-0.20184523 0.02648003 0.00720122]]
Tip
We provide many functions that operate on TensorMap
and
TensorBlock
as part of the metatensor-operations module (installed by default with the main
metatensor
package). These operations already support the different sparsity
levels of metatensor, and support for explicit forward gradients. In general you
will not have to write the type of code from this tutorial yourself, and you
should use the corresponding operation.
For example, squared
from this tutorial can be calculated with:
squared = metatensor.multiply(radial_spectrum, radial_spectrum)
# alternatively
squared = metatensor.pow(radial_spectrum, 2)
squared_operations = metatensor.multiply(radial_spectrum, radial_spectrum)
print(metatensor.equal(squared_operations, squared))
True
Total running time of the script: (0 minutes 0.022 seconds)