Operations and PyTorch¶
PyTorch is a very popular framework for machine learning, providing multiple tools to make writing and training models easier. There are two ways to use the operations with PyTorch:
Using the pure Python version of metatensor, one can store values in a
metatensor.TensorBlock
usingtorch.Tensor
. In this case, all operations will be compatible with torch autograd (automatic gradient tracking and differentiation). This allows to train models from Python, but not to export the models to run without the Python interpreter. When running a model with the pure Python version of metatensor, you should use the operations frommetatensor.<operation_name>
.When using the TorchScript version of metatensor, one can also compile the Python code to TorchScript and then run the model without a Python interpreter. This is particularly useful to export and then use an already trained model, for example to run molecular simulations. If you want to do this, you should use classes and operations from
metatensor.torch
, i.e.metatensor.torch.TensorMap
and using the operation frommetatensor.torch.<operation_name>
. All the operation are available in themetatensor.torch
module.
Handling of gradients in the operations¶
There are two ways in which the gradients of some values can be computed with metatensor operations. Let’s consider for example an operation that takes one TensorMap \(X\) and returns some transformation of that TensorMap \(y = f(X)\).
if you are using
torch.Tensor
as arrays — either with the pure Python (metatensor
) or TorchScript (metatensor.torch
) backend — then all transformations will be recorded in the computational graph of the output data. This means that ify_block.values.requires_grad
isTrue
;y_block.values.grad_fn
will be set for all blocks in \(y\), and callingy_block.values.backward()
will propagate the gradient through the transformations applied by the operation.your input TensorMap \(X\) contains explicit gradients, stored in
x_block.gradient(<parameter>)
for all blocks. The operation will forward propagate these gradients (or raise an error if they can not do so), and the output blocks will contain the same set of explicit gradients, now containing the gradients of \(y\) with respect to the same parameters.
These two methods can be used together: you can store explicit gradients in
x_block.gradient(<parameter>)
using torch.Tensor
, forward
propagate these gradients to some final quantity; then compute a loss taking
into account the gradients of this quantity (e.g. \(\ell = |y -
y^\text{ref}|^2 + |\nabla y - \nabla y^\text{ref}|^2\)); and finally call
backward
on \(\ell\). This would allow to train a model on gradients of
a quantity replacing a double backward propagation with a single forward and a
single backward propagation of gradients.