LLPR

The LLPR architecture is a “wrapper” architecture that enables cheap uncertainty quantification (UQ) via the last-layer prediction rigidity (LLPR) approach proposed by Bigi et al. [1] It is compatible with the following metatrain models constructed from NN-based architectures: PET and SOAP-BPNN. The implementation of the LLPR as a separate architecture within metatrain allows the users to compute the uncertainties without dealing with the fine details of the LLPR implementation.

This implementation further allows the user to perform gradient-based tuning of the ensemble weights sampled from the LLPR formalism, which can lead to improved uncertainty estimates. Gradients (e.g. forces and stresses) are not yet used in this implementation of the LLPR.

Note that the uncertainties computed with this implementation are returned as standard deviations, and not variances.

Installation

To install this architecture along with the metatrain package, run:

pip install metatrain[llpr]

where the square brackets indicate that you want to install the optional dependencies required for llpr.

Default Hyperparameters

The description of all the hyperparameters used in llpr is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: llpr
  model:
    num_ensemble_members: {}
  training:
    distributed: false
    distributed_port: 39591
    batch_size: 8
    regularizer: null
    model_checkpoint: null
    loss: ensemble_nll
    num_epochs: null
    train_all_parameters: false
    warmup_fraction: 0.01
    learning_rate: 0.0003
    weight_decay: null
    log_interval: 1
    checkpoint_interval: 100
    per_structure_targets: []
    num_workers: null
    log_mae: false
    log_separate_blocks: false
    best_model_metric: loss
    grad_clip_norm: 1.0

Model hyperparameters

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.num_ensemble_members: dict[str, int] = {}

Number of ensemble members for each target property for which LLPR ensembles should be constructed. No ensembles will be constructed for targets which are not listed.

Trainer hyperparameters

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.distributed: bool = False

Whether to use distributed training

TrainerHypers.distributed_port: int = 39591

Port for distributed communication among processes

TrainerHypers.batch_size: int = 8

This defines the batch size used in the computation of last-layer features, covariance matrix, etc.

TrainerHypers.regularizer: float | None = None

This is the regularizer value \(\varsigma\) that is used in applying Eq. 24 of Bigi et al [1]:

\[\sigma^2_\star = \alpha^2 \boldsymbol{\mathrm{f}}^{\mathrm{T}}_\star (\boldsymbol{\mathrm{F}}^{\mathrm{T}} \boldsymbol{\mathrm{F}} + \varsigma^2 \boldsymbol{\mathrm{I}})^{-1} \boldsymbol{\mathrm{f}}_\star\]

If set to null, the internal routine will determine the smallest regularizer value that guarantees numerical stability in matrix inversion. Having exposed the formula here, we also note to the user that the training routine of the LLPR wrapper model finds the ideal global calibration factor \(\alpha\).

TrainerHypers.model_checkpoint: str | None = None

This should provide the checkpoint to the model for which the user wants to perform UQ based on the LLPR approach. Note that the model architecture must comply with the requirement that the last-layer features are exposed under the convention defined by metatrain.

TrainerHypers.loss: str | dict[str, LossSpecification] = 'ensemble_nll'

This section describes the loss function to be used during LLPR ensemble weight calibration. We strongly suggest only using “ensemble_nll” loss. see Loss functions for more details of the rest of the hypers.

TrainerHypers.num_epochs: int | None = None

Number of epochs for which the LLPR ensemble weight calibration should take place. If set to null, only the LLPR covariance matrix computation and calibration will be performed, without ensemble weight training.

TrainerHypers.train_all_parameters: bool = False

Whether to train all parameters of the LLPR-wrapped model, or only the ensemble weights. If true, all parameters will be trained, including those of the base model. If false, only the last-layer ensemble weights will be trained. Note that training all parameters (i.e., setting this flag to true) will potentially change the uncertainty estimates given by the LLPR through the uncertainty outputs (because the last-layer features will change). In that case, only uncertainties calculated as standard deviations over the ensemble members (ensemble outputs) will be meaningful.

TrainerHypers.warmup_fraction: float = 0.01

Fraction of training steps used for learning rate warmup.

TrainerHypers.learning_rate: float = 0.0003

Learning rate.

TrainerHypers.weight_decay: float | None = None
TrainerHypers.log_interval: int = 1

Interval to log metrics.

TrainerHypers.checkpoint_interval: int = 100

Interval to save checkpoints.

TrainerHypers.per_structure_targets: list[str] = []

Targets to calculate per-structure losses.

TrainerHypers.num_workers: int | None = None

Number of workers for data loading. If not provided, it is set automatically.

TrainerHypers.log_mae: bool = False

Log MAE alongside RMSE

TrainerHypers.log_separate_blocks: bool = False

Log per-block error.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'loss'

Metric used to select best checkpoint (e.g., rmse_prod)

TrainerHypers.grad_clip_norm: float = 1.0

Maximum gradient norm value, by default inf (no clipping)

References