PhACE (Experimental)

https://codecov.io/gh/metatensor/metatrain/branch/main/graph/badge.svg?component=phace

PhACE is a physics-inspired equivariant neural network architecture. Compared to, for example, MACE and GRACE, it uses a geometrically motivated basis and a fast and elegant tensor product implementation. The tensor product used in PhACE leverages a equivariant representation that differs from the typical spherical one. You can read more about it here: https://pubs.acs.org/doi/10.1021/acs.jpclett.4c02376.

Installation

To install this architecture along with the metatrain package, run:

pip install metatrain[phace]

where the square brackets indicate that you want to install the optional dependencies required for phace.

Additional outputs

  • features: the internal FlashMD features, before the different heads for each target.

  • mtt::aux::{target}_last_layer_features: The features for a given target, taken before the last linear layer of the corresponding head.

Default Hyperparameters

The description of all the hyperparameters used in phace is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: experimental.phace
  model:
    num_tensor_products: 6
    num_gnn_layers: 3
    cutoff: 8.0
    num_neighbors_adaptive: 16
    cutoff_width: 1.0
    num_element_channels: 128
    force_rectangular: false
    radial_basis:
      max_eigenvalue: 25.0
      element_scale: 0.7
      mlp_depth: 3
      mlp_expansion_ratio: 4
    initial_scaling: 1.0
    message_scaling: 0.1
    final_scaling: 1.0
    use_sphericart: false
    mlp_head_num_layers: 1
    mlp_head_expansion_ratio: 4
    tensor_product_expansion_ratio: 2
    heads: {}
    zbl: false
  training:
    compile: false
    distributed: false
    distributed_port: 39591
    batch_size: 8
    num_epochs: 1000
    learning_rate: 0.003
    warmup_fraction: 0.01
    gradient_clipping: 1.0
    ema_decay: 0.999
    log_interval: 1
    checkpoint_interval: 25
    scale_targets: true
    atomic_baseline: {}
    fixed_scaling_weights: {}
    num_workers: null
    per_structure_targets: []
    log_separate_blocks: false
    log_mae: false
    best_model_metric: rmse_prod
    loss: mse

Tuning hyperparameters

The default hyperparameters above will work well in most cases, but they may not be optimal for your specific use case. There is good number of parameters to tune, both for the model and the trainer. Here, we provide a list of the parameters that are in general the most important (in decreasing order of importance) for the PhACE architecture:

ModelHypers.radial_basis: RadialBasisHypers = {'element_scale': 0.7, 'max_eigenvalue': 25.0, 'mlp_depth': 3, 'mlp_expansion_ratio': 4}

Hyperparameters for the radial basis functions.

Raising``max_eigenvalue`` from its default will increase the number of spherical irreducible representations (irreps) used in the model, which can improve accuracy at the cost of computational efficiency. Increasing this value will also increase the number of radial basis functions (and therefore internal features) used for each irrep.

ModelHypers.num_element_channels: int = 128

Number of channels per element.

This determines the size of the embedding used to encode the atomic species, and it increases or decreases the size of the internal features used in the model.

TrainerHypers.num_epochs: int = 1000

Number of epochs to train the model.

A larger number of epochs might lead to better accuracy. In general, if you see that the validation metrics are not much worse than the training ones at the end of training, it might be a good idea to increase this value.

TrainerHypers.batch_size: int = 8

Batch size for training.

Decrease this value if you run into out-of-memory errors during training. You can try to increase it if your structures are very small (less than 20 atoms) and you have a good GPU.

ModelHypers.num_gnn_layers: int = 3

Number of GNN layers.

Increasing this value might increase the accuracy of the model (especially on larger datasets), at the expense of computational efficiency.

TrainerHypers.learning_rate: float = 0.003

Learning rate for the optimizer.

You can try to increase this value (e.g., to 0.01) if training is stable and slow or decrease it (e.g., to 0.001 or less) if you see divergence in the first few epochs and/or instabilities.

ModelHypers.cutoff: float = 8.0

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

ModelHypers.force_rectangular: bool = False

Makes the number of channels per irrep the same.

This might improve accuracy with a limited increase in computational cost.

Model hyperparameters

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.num_tensor_products: int = 6

Number of tensor products per GNN layer.

ModelHypers.num_gnn_layers: int = 3

Number of GNN layers.

Increasing this value might increase the accuracy of the model (especially on larger datasets), at the expense of computational efficiency.

ModelHypers.cutoff: float = 8.0

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

ModelHypers.num_neighbors_adaptive: int | None = 16

Target number of neighbors for the adaptive cutoff scheme.

This parameter activates the adaptive cutoff functionality. Each atomic environment has a different cutoff, that is chosen such that the number of neighbors is approximately equal to this value. This can be useful to have a more uniform number of neighbors per atom, especially in sparse systems. Setting it to None disables this feature and uses all neighbors within the fixed cutoff radius.

ModelHypers.cutoff_width: float = 1.0

Width of the cutoff smoothing function.

ModelHypers.num_element_channels: int = 128

Number of channels per element.

This determines the size of the embedding used to encode the atomic species, and it increases or decreases the size of the internal features used in the model.

ModelHypers.force_rectangular: bool = False

Makes the number of channels per irrep the same.

This might improve accuracy with a limited increase in computational cost.

ModelHypers.radial_basis: RadialBasisHypers = {'element_scale': 0.7, 'max_eigenvalue': 25.0, 'mlp_depth': 3, 'mlp_expansion_ratio': 4}

Hyperparameters for the radial basis functions.

Raising``max_eigenvalue`` from its default will increase the number of spherical irreducible representations (irreps) used in the model, which can improve accuracy at the cost of computational efficiency. Increasing this value will also increase the number of radial basis functions (and therefore internal features) used for each irrep.

ModelHypers.initial_scaling: float = 1.0

Scaling for the initial features.

ModelHypers.message_scaling: float = 0.1

Scaling for message passing.

ModelHypers.final_scaling: float = 1.0

Final scaling factor applied to the model outputs.

ModelHypers.use_sphericart: bool = False

Whether to use spherical Cartesian coordinates.

ModelHypers.mlp_head_num_layers: int = 1

Number of layers in the heads for MLP heads.

ModelHypers.mlp_head_expansion_ratio: int = 4

Expansion ratio for the hidden layers of the MLP head.

ModelHypers.tensor_product_expansion_ratio: int = 2

Expansion ratio for the tensor product iterations.

ModelHypers.heads: dict[str, Literal['linear', 'mlp']] = {}

Heads to use in the model, with options being “linear” or “mlp”.

ModelHypers.zbl: bool = False

Whether to use the ZBL potential in the model.

Trainer hyperparameters

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.compile: bool = False

Whether to use torch.compile during training.

This can lead to significant speedups, but it will cause a compilation step at the beginning of training which might take up to 5-10 minutes, mainly depending on max_eigenvalue. Note that this option does not work at the moment with adaptive cutoffs.

TrainerHypers.distributed: bool = False

Whether to use distributed training.

TrainerHypers.distributed_port: int = 39591

Port for DDP communication.

TrainerHypers.batch_size: int = 8

Batch size for training.

Decrease this value if you run into out-of-memory errors during training. You can try to increase it if your structures are very small (less than 20 atoms) and you have a good GPU.

TrainerHypers.num_epochs: int = 1000

Number of epochs to train the model.

A larger number of epochs might lead to better accuracy. In general, if you see that the validation metrics are not much worse than the training ones at the end of training, it might be a good idea to increase this value.

TrainerHypers.learning_rate: float = 0.003

Learning rate for the optimizer.

You can try to increase this value (e.g., to 0.01) if training is stable and slow or decrease it (e.g., to 0.001 or less) if you see divergence in the first few epochs and/or instabilities.

TrainerHypers.warmup_fraction: float = 0.01

Fraction of training steps for learning rate warmup.

TrainerHypers.gradient_clipping: float | None = 1.0

Gradient clipping value. If None, no clipping is applied.

TrainerHypers.ema_decay: float | None = 0.999

Decay factor for exponential moving average of model parameters. If None, EMA is not used.

TrainerHypers.log_interval: int = 1

Interval to log metrics during training.

TrainerHypers.checkpoint_interval: int = 25

Interval to save model checkpoints.

TrainerHypers.scale_targets: bool = True

Whether to scale targets during training.

TrainerHypers.atomic_baseline: dict[str, float | dict[int, float]] = {}

The baselines for each target.

By default, metatrain will fit a linear model (CompositionModel) to compute the least squares baseline for each atomic species for each target.

However, this hyperparameter allows you to provide your own baselines. The value of the hyperparameter should be a dictionary where the keys are the target names, and the values are either (1) a single baseline to be used for all atomic types, or (2) a dictionary mapping atomic types to their baselines. For example:

  • atomic_baseline: {"energy": {1: -0.5, 6: -10.0}} will fix the energy baseline for hydrogen (Z=1) to -0.5 and for carbon (Z=6) to -10.0, while fitting the baselines for the energy of all other atomic types, as well as fitting the baselines for all other targets.

  • atomic_baseline: {"energy": -5.0} will fix the energy baseline for all atomic types to -5.0.

  • atomic_baseline: {"mtt:dos": 0.0} sets the baseline for the “mtt:dos” target to 0.0, effectively disabling the atomic baseline for that target.

This atomic baseline is substracted from the targets during training, which avoids the main model needing to learn atomic contributions, and likely makes training easier. When the model is used in evaluation mode, the atomic baseline is added on top of the model predictions automatically.

Note

This atomic baseline is a per-atom contribution. Therefore, if the property you are predicting is a sum over all atoms (e.g., total energy), the contribution of the atomic baseline to the total property will be the atomic baseline multiplied by the number of atoms of that type in the structure.

Note

If a MACE model is loaded through the mace_model hyperparameter, the atomic baselines in the MACE model are used by default for the target indicated in mace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.

TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}

Fixed scaling weights for the model.

TrainerHypers.num_workers: int | None = None

Number of workers for data loading.

TrainerHypers.per_structure_targets: list[str] = []

List of targets to calculate per-structure losses.

TrainerHypers.log_separate_blocks: bool = False

Whether to log per-block error during training.

TrainerHypers.log_mae: bool = False

Whether to log MAE alongside RMSE during training.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'

Metric used to select the best model checkpoint.

TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'

Loss function used for training.

References