LLPR¶
The LLPR architecture is a “wrapper” architecture that enables cheap uncertainty
quantification (UQ) via the last-layer prediction rigidity (LLPR) approach proposed
by Bigi et al. [1] It is compatible with the following
metatrain models constructed from NN-based architectures: PET and SOAP-BPNN.
The implementation of the LLPR as a separate architecture within metatrain
allows the users to compute the uncertainties without dealing with the fine details
of the LLPR implementation.
This implementation further allows the user to perform gradient-based tuning of the ensemble weights sampled from the LLPR formalism, which can lead to improved uncertainty estimates. Gradients (e.g. forces and stresses) are not yet used in this implementation of the LLPR.
Note that the uncertainties computed with this implementation are returned as standard deviations, and not variances.
Installation¶
To install this architecture along with the metatrain package, run:
pip install metatrain[llpr]
where the square brackets indicate that you want to install the optional
dependencies required for llpr.
Default Hyperparameters¶
The description of all the hyperparameters used in llpr is provided
further down this page. However, here we provide you with a yaml file containing all
the default hyperparameters, which might be convenient as a starting point to
create your own hyperparameter files:
architecture:
name: llpr
model:
num_ensemble_members: {}
training:
distributed: false
distributed_port: 39591
batch_size: 8
regularizer: null
model_checkpoint: null
loss: ensemble_nll
num_epochs: null
train_all_parameters: false
warmup_fraction: 0.01
learning_rate: 0.0003
weight_decay: null
log_interval: 1
checkpoint_interval: 100
per_structure_targets: []
num_workers: null
log_mae: false
log_separate_blocks: false
best_model_metric: loss
grad_clip_norm: 1.0
Model hyperparameters¶
The parameters that go under the architecture.model section of the config file
are the following:
Trainer hyperparameters¶
The parameters that go under the architecture.trainer section of the config file
are the following:
- TrainerHypers.batch_size: int = 8¶
This defines the batch size used in the computation of last-layer features, covariance matrix, etc.
- TrainerHypers.regularizer: float | None = None¶
This is the regularizer value \(\varsigma\) that is used in applying Eq. 24 of Bigi et al [1]:
\[\sigma^2_\star = \alpha^2 \boldsymbol{\mathrm{f}}^{\mathrm{T}}_\star (\boldsymbol{\mathrm{F}}^{\mathrm{T}} \boldsymbol{\mathrm{F}} + \varsigma^2 \boldsymbol{\mathrm{I}})^{-1} \boldsymbol{\mathrm{f}}_\star\]If set to
null, the internal routine will determine the smallest regularizer value that guarantees numerical stability in matrix inversion. Having exposed the formula here, we also note to the user that the training routine of the LLPR wrapper model finds the ideal global calibration factor \(\alpha\).
- TrainerHypers.model_checkpoint: str | None = None¶
This should provide the checkpoint to the model for which the user wants to perform UQ based on the LLPR approach. Note that the model architecture must comply with the requirement that the last-layer features are exposed under the convention defined by metatrain.
- TrainerHypers.loss: str | dict[str, LossSpecification] = 'ensemble_nll'¶
This section describes the loss function to be used during LLPR ensemble weight calibration. We strongly suggest only using “ensemble_nll” loss. see Loss functions for more details of the rest of the hypers.
- TrainerHypers.num_epochs: int | None = None¶
Number of epochs for which the LLPR ensemble weight calibration should take place. If set to
null, only the LLPR covariance matrix computation and calibration will be performed, without ensemble weight training.
- TrainerHypers.train_all_parameters: bool = False¶
Whether to train all parameters of the LLPR-wrapped model, or only the ensemble weights. If
true, all parameters will be trained, including those of the base model. Iffalse, only the last-layer ensemble weights will be trained. Note that training all parameters (i.e., setting this flag totrue) will potentially change the uncertainty estimates given by the LLPR through theuncertaintyoutputs (because the last-layer features will change). In that case, only uncertainties calculated as standard deviations over the ensemble members (ensembleoutputs) will be meaningful.
- TrainerHypers.warmup_fraction: float = 0.01¶
Fraction of training steps used for learning rate warmup.
- TrainerHypers.num_workers: int | None = None¶
Number of workers for data loading. If not provided, it is set automatically.