LLPR¶
The LLPR architecture is a “wrapper” architecture that enables cheap uncertainty
quantification via the last-layer prediction rigidity (LLPR) approach proposed by Bigi
et al:footcite:p:bigi_mlst_2024. It is compatible with the following metatrain
models constructed from NN-based architectures: PET and
SOAP-BPNN.
This implementation further allows the user to perform gradient-based tuning of the ensemble weights sampled from the LLPR formalism, which can lead to improved uncertainty estimates. Gradients (e.g. forces and stresses) are not yet used.
Note that the uncertainties computed with this implementation are returned as standard deviations, and not variances.
Additional outputs¶
In addition to the outputs already availble from the wrapped model, the LLPR architecture can also output the following additional quantity:
mtt::aux::{target}_uncertainty: The uncertainty (standard deviation) for a given target, computed with the LLPR approach.
mtt::aux::{target}_ensemble: The ensemble predictions for a given target, computed with the LLPR approach.
Installation¶
To install this architecture along with the metatrain package, run:
pip install metatrain[llpr]
where the square brackets indicate that you want to install the optional
dependencies required for llpr.
Default Hyperparameters¶
The description of all the hyperparameters used in llpr is provided
further down this page. However, here we provide you with a yaml file containing all
the default hyperparameters, which might be convenient as a starting point to
create your own hyperparameter files:
architecture:
name: llpr
model:
num_ensemble_members: {}
training:
distributed: false
distributed_port: 39591
batch_size: 8
regularizer: null
model_checkpoint: null
loss: gaussian_nll_ensemble
num_epochs: null
train_all_parameters: false
warmup_fraction: 0.01
learning_rate: 0.0003
weight_decay: null
log_interval: 1
checkpoint_interval: 100
per_structure_targets: []
num_workers: null
log_mae: false
log_separate_blocks: false
best_model_metric: loss
grad_clip_norm: 1.0
batch_atom_bounds:
- null
- null
calibration_method: absolute_residuals
Model hyperparameters¶
The parameters that go under the architecture.model section of the config file
are the following:
Trainer hyperparameters¶
The parameters that go under the architecture.trainer section of the config file
are the following:
- TrainerHypers.batch_size: int = 8¶
This defines the batch size used in the computation of last-layer features, covariance matrix, etc.
- TrainerHypers.regularizer: float | None = None¶
This is the regularizer value \(\varsigma\) that is used in applying Eq. 24 of Bigi et al [1]:
\[\sigma^2_\star = \alpha^2 \boldsymbol{\mathrm{f}}^{\mathrm{T}}_\star (\boldsymbol{\mathrm{F}}^{\mathrm{T}} \boldsymbol{\mathrm{F}} + \varsigma^2 \boldsymbol{\mathrm{I}})^{-1} \boldsymbol{\mathrm{f}}_\star\]If set to
null, the internal routine will determine the smallest regularizer value that guarantees numerical stability in matrix inversion. Having exposed the formula here, we also note to the user that the training routine of the LLPR wrapper model finds the ideal global calibration factor \(\alpha\).
- TrainerHypers.model_checkpoint: str | None = None¶
This should provide the checkpoint to the model for which the user wants to perform UQ based on the LLPR approach. Note that the model architecture must comply with the requirement that the last-layer features are exposed under the convention defined by metatrain.
- TrainerHypers.loss: str | dict[str, LossSpecification] = 'gaussian_nll_ensemble'¶
This section describes the loss function to be used during LLPR ensemble weight training. We strongly suggest only using ensemble-specific loss functions, i.e. one of “gaussian_nll_ensemble”, “gaussian_crps_ensemble”, “empirical_crps_ensemble”. Please refer to the Loss functions documentation for more details of the rest of the hypers.
- TrainerHypers.num_epochs: int | None = None¶
Number of epochs for which the LLPR ensemble weight training should take place. If set to
null, only the LLPR covariance matrix computation and calibration will be performed, without ensemble weight training.
- TrainerHypers.train_all_parameters: bool = False¶
Whether to train all parameters of the LLPR-wrapped model, or only the ensemble weights. If
true, all parameters will be trained, including those of the base model. Iffalse, only the last-layer ensemble weights will be trained. Note that training all parameters (i.e., setting this flag totrue) will potentially change the uncertainty estimates given by the LLPR through theuncertaintyoutputs (because the last-layer features will change). In that case, only uncertainties calculated as standard deviations over the ensemble members (ensembleoutputs) will be meaningful.
- TrainerHypers.warmup_fraction: float = 0.01¶
Fraction of training steps used for learning rate warmup.
- TrainerHypers.num_workers: int | None = None¶
Number of workers for data loading. If not provided, it is set automatically.
- TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'loss'¶
Metric used to select best checkpoint (e.g.,
rmse_prod)
- TrainerHypers.grad_clip_norm: float = 1.0¶
Maximum gradient norm value, by default inf (no clipping)
- TrainerHypers.batch_atom_bounds: list[int | None] = [None, None]¶
Bounds for the number of atoms per batch as [min, max]. Batches with atom counts outside these bounds will be skipped during training. Use
Nonefor either value to disable that bound. This is useful for preventing out-of-memory errors and ensuring consistent computational load. Default:[None, None].
- TrainerHypers.calibration_method: Literal['absolute_residuals', 'squared_residuals', 'crps'] = 'absolute_residuals'¶
This determines how to calculate the calibration factor \(\alpha\) in Eq. 24 of Bigi et al [1]:
\[\sigma^2_\star = \alpha^2 \boldsymbol{\mathrm{f}}^{\mathrm{T}}_\star (\boldsymbol{\mathrm{F}}^{\mathrm{T}} \boldsymbol{\mathrm{F}} + \varsigma^2 \boldsymbol{\mathrm{I}})^{-1} \boldsymbol{\mathrm{f}}_\star\]In any case, a Gaussian error distribution is assumed. If set to
squared_residuals, the calibration factor is computed minimizing the negative log-likelihood. If set toabsolute_residuals, the calibration factor is computed from mean absolute error assuming Gaussian errors. The latter choice is more robust to outliers and we recommend using it for large and/or uncurated datasets. If set tocrps, continuous ranked probability score (CRPS) is minimized to find the calibration factor. You might want to use this option if you then want to train the ensemble weights using a CRPS loss.