LLPR

The LLPR architecture is a “wrapper” architecture that enables cheap uncertainty quantification via the last-layer prediction rigidity (LLPR) approach proposed by Bigi et al:footcite:p:bigi_mlst_2024. It is compatible with the following metatrain models constructed from NN-based architectures: PET and SOAP-BPNN.

This implementation further allows the user to perform gradient-based tuning of the ensemble weights sampled from the LLPR formalism, which can lead to improved uncertainty estimates. Gradients (e.g. forces and stresses) are not yet used.

Note that the uncertainties computed with this implementation are returned as standard deviations, and not variances.

Additional outputs

In addition to the outputs already availble from the wrapped model, the LLPR architecture can also output the following additional quantity:

Installation

To install this architecture along with the metatrain package, run:

pip install metatrain[llpr]

where the square brackets indicate that you want to install the optional dependencies required for llpr.

Default Hyperparameters

The description of all the hyperparameters used in llpr is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: llpr
  model:
    num_ensemble_members: {}
  training:
    distributed: false
    distributed_port: 39591
    batch_size: 8
    regularizer: null
    model_checkpoint: null
    loss: gaussian_nll_ensemble
    num_epochs: null
    train_all_parameters: false
    warmup_fraction: 0.01
    learning_rate: 0.0003
    weight_decay: null
    log_interval: 1
    checkpoint_interval: 100
    per_structure_targets: []
    num_workers: null
    log_mae: false
    log_separate_blocks: false
    best_model_metric: loss
    grad_clip_norm: 1.0
    batch_atom_bounds:
    - null
    - null
    calibration_method: absolute_residuals

Model hyperparameters

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.num_ensemble_members: dict[str, int] = {}

Number of ensemble members for each target property for which LLPR ensembles should be generated. No ensembles will be generated for targets which are not listed.

Trainer hyperparameters

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.distributed: bool = False

Whether to use distributed training

TrainerHypers.distributed_port: int = 39591

Port for distributed communication among processes

TrainerHypers.batch_size: int = 8

This defines the batch size used in the computation of last-layer features, covariance matrix, etc.

TrainerHypers.regularizer: float | None = None

This is the regularizer value \(\varsigma\) that is used in applying Eq. 24 of Bigi et al [1]:

\[\sigma^2_\star = \alpha^2 \boldsymbol{\mathrm{f}}^{\mathrm{T}}_\star (\boldsymbol{\mathrm{F}}^{\mathrm{T}} \boldsymbol{\mathrm{F}} + \varsigma^2 \boldsymbol{\mathrm{I}})^{-1} \boldsymbol{\mathrm{f}}_\star\]

If set to null, the internal routine will determine the smallest regularizer value that guarantees numerical stability in matrix inversion. Having exposed the formula here, we also note to the user that the training routine of the LLPR wrapper model finds the ideal global calibration factor \(\alpha\).

TrainerHypers.model_checkpoint: str | None = None

This should provide the checkpoint to the model for which the user wants to perform UQ based on the LLPR approach. Note that the model architecture must comply with the requirement that the last-layer features are exposed under the convention defined by metatrain.

TrainerHypers.loss: str | dict[str, LossSpecification] = 'gaussian_nll_ensemble'

This section describes the loss function to be used during LLPR ensemble weight training. We strongly suggest only using ensemble-specific loss functions, i.e. one of “gaussian_nll_ensemble”, “gaussian_crps_ensemble”, “empirical_crps_ensemble”. Please refer to the Loss functions documentation for more details of the rest of the hypers.

TrainerHypers.num_epochs: int | None = None

Number of epochs for which the LLPR ensemble weight training should take place. If set to null, only the LLPR covariance matrix computation and calibration will be performed, without ensemble weight training.

TrainerHypers.train_all_parameters: bool = False

Whether to train all parameters of the LLPR-wrapped model, or only the ensemble weights. If true, all parameters will be trained, including those of the base model. If false, only the last-layer ensemble weights will be trained. Note that training all parameters (i.e., setting this flag to true) will potentially change the uncertainty estimates given by the LLPR through the uncertainty outputs (because the last-layer features will change). In that case, only uncertainties calculated as standard deviations over the ensemble members (ensemble outputs) will be meaningful.

TrainerHypers.warmup_fraction: float = 0.01

Fraction of training steps used for learning rate warmup.

TrainerHypers.learning_rate: float = 0.0003

Learning rate.

TrainerHypers.weight_decay: float | None = None
TrainerHypers.log_interval: int = 1

Interval to log metrics.

TrainerHypers.checkpoint_interval: int = 100

Interval to save checkpoints.

TrainerHypers.per_structure_targets: list[str] = []

Targets to calculate per-structure losses.

TrainerHypers.num_workers: int | None = None

Number of workers for data loading. If not provided, it is set automatically.

TrainerHypers.log_mae: bool = False

Log MAE alongside RMSE

TrainerHypers.log_separate_blocks: bool = False

Log per-block error.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'loss'

Metric used to select best checkpoint (e.g., rmse_prod)

TrainerHypers.grad_clip_norm: float = 1.0

Maximum gradient norm value, by default inf (no clipping)

TrainerHypers.batch_atom_bounds: list[int | None] = [None, None]

Bounds for the number of atoms per batch as [min, max]. Batches with atom counts outside these bounds will be skipped during training. Use None for either value to disable that bound. This is useful for preventing out-of-memory errors and ensuring consistent computational load. Default: [None, None].

TrainerHypers.calibration_method: Literal['absolute_residuals', 'squared_residuals', 'crps'] = 'absolute_residuals'

This determines how to calculate the calibration factor \(\alpha\) in Eq. 24 of Bigi et al [1]:

\[\sigma^2_\star = \alpha^2 \boldsymbol{\mathrm{f}}^{\mathrm{T}}_\star (\boldsymbol{\mathrm{F}}^{\mathrm{T}} \boldsymbol{\mathrm{F}} + \varsigma^2 \boldsymbol{\mathrm{I}})^{-1} \boldsymbol{\mathrm{f}}_\star\]

In any case, a Gaussian error distribution is assumed. If set to squared_residuals, the calibration factor is computed minimizing the negative log-likelihood. If set to absolute_residuals, the calibration factor is computed from mean absolute error assuming Gaussian errors. The latter choice is more robust to outliers and we recommend using it for large and/or uncurated datasets. If set to crps, continuous ranked probability score (CRPS) is minimized to find the calibration factor. You might want to use this option if you then want to train the ensemble weights using a CRPS loss.

References