DPA3 (Experimental)¶

https://codecov.io/gh/metatensor/metatrain/branch/main/graph/badge.svg?component=dpa3

Maintained by @HaoZeke.

This is an interface to the DPA3 (Deep Potential Attention 3) architecture [1] implemented in deepmd-kit.

DPA3 extends the DPA series with a Line Graph representation and the RepFlow framework, enabling richer many-body interactions through joint edge-angle message passing. See the paper and the deepmd-kit documentation for further details.

Note

The type_map required by deepmd-kit is derived automatically from the atomic numbers present in the dataset; it is not a user-facing hyperparameter.

Installation¶

To install this architecture along with the metatrain package, run:

pip install metatrain[dpa3]

where the square brackets indicate that you want to install the optional dependencies required for dpa3.

Default Hyperparameters¶

The description of all the hyperparameters used in dpa3 is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: experimental.dpa3
  model:
    dpa3_model: null
    descriptor:
      type: dpa3
      repflow:
        n_dim: 128
        e_dim: 64
        a_dim: 32
        nlayers: 6
        e_rcut: 6.0
        e_rcut_smth: 5.3
        e_sel: 1200
        a_rcut: 4.0
        a_rcut_smth: 3.5
        a_sel: 300
        axis_neuron: 4
        skip_stat: true
        a_compress_rate: 1
        a_compress_e_rate: 2
        a_compress_use_split: true
        update_angle: true
        update_style: res_residual
        update_residual: 0.1
        update_residual_init: const
        smooth_edge_update: true
        use_dynamic_sel: true
        sel_reduce_factor: 10.0
      activation_function: custom_silu:10.0
      use_tebd_bias: false
      precision: 32
      concat_output_tebd: false
    fitting_net:
      neuron:
      - 240
      - 240
      - 240
      resnet_dt: true
      seed: 1
      precision: 32
      activation_function: custom_silu:10.0
      type: ener
      numb_fparam: 0
      numb_aparam: 0
      dim_case_embd: 0
      trainable: true
      rcond: null
      atom_ener: []
      use_aparam_as_mask: false
  training:
    distributed: false
    distributed_port: 39591
    batch_size: 8
    num_epochs: 100
    learning_rate: 0.001
    scheduler_patience: 100
    scheduler_factor: 0.8
    log_interval: 1
    checkpoint_interval: 100
    scale_targets: true
    fixed_composition_weights: {}
    per_structure_targets: []
    log_mae: false
    log_separate_blocks: false
    best_model_metric: rmse_prod
    loss: mse

Tuning hyperparameters¶

The most impactful hyperparameters (roughly in decreasing order of importance):

ModelHypers.descriptor: DescriptorHypers = {'activation_function': 'custom_silu:10.0', 'concat_output_tebd': False, 'precision': 32, 'repflow': {'a_compress_e_rate': 2, 'a_compress_rate': 1, 'a_compress_use_split': True, 'a_dim': 32, 'a_rcut': 4.0, 'a_rcut_smth': 3.5, 'a_sel': 300, 'axis_neuron': 4, 'e_dim': 64, 'e_rcut': 6.0, 'e_rcut_smth': 5.3, 'e_sel': 1200, 'n_dim': 128, 'nlayers': 6, 'sel_reduce_factor': 10.0, 'skip_stat': True, 'smooth_edge_update': True, 'update_angle': True, 'update_residual': 0.1, 'update_residual_init': 'const', 'update_style': 'res_residual', 'use_dynamic_sel': True}, 'type': 'dpa3', 'use_tebd_bias': False}: Descriptor configuration (RepFlow block and related settings).

TrainerHypers.learning_rate: float = 0.001: Learning rate.

TrainerHypers.batch_size: int = 8: The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.

Increasing descriptor.repflow.nlayers typically improves accuracy at the cost of training time. descriptor.repflow.e_rcut controls the interaction range and should be chosen based on the physical system. Reduce e_sel and a_sel for faster iteration on small systems.

Using a pretrained model¶

Set dpa3_model to a deepmd-kit model file to fine-tune from pretrained weights instead of training from scratch:

model:
  dpa3_model: path/to/deepmd-model.pt

Energy biases and standard deviations are extracted from the loaded model and handed to metatrain’s composition model and scaler automatically.

Model hyperparameters¶

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.dpa3_model: str | None = None¶

Path to a pretrained DPA3 model file (deepmd-kit checkpoint or saved Module). When provided, the model weights are loaded from this file instead of being initialised from scratch. Energy biases and standard deviations stored in the deepmd-kit model are extracted and handed to metatrain’s CompositionModel and Scaler so that fine-tuning starts from the pretrained values.

ModelHypers.descriptor: DescriptorHypers = {'activation_function': 'custom_silu:10.0', 'concat_output_tebd': False, 'precision': 32, 'repflow': {'a_compress_e_rate': 2, 'a_compress_rate': 1, 'a_compress_use_split': True, 'a_dim': 32, 'a_rcut': 4.0, 'a_rcut_smth': 3.5, 'a_sel': 300, 'axis_neuron': 4, 'e_dim': 64, 'e_rcut': 6.0, 'e_rcut_smth': 5.3, 'e_sel': 1200, 'n_dim': 128, 'nlayers': 6, 'sel_reduce_factor': 10.0, 'skip_stat': True, 'smooth_edge_update': True, 'update_angle': True, 'update_residual': 0.1, 'update_residual_init': 'const', 'update_style': 'res_residual', 'use_dynamic_sel': True}, 'type': 'dpa3', 'use_tebd_bias': False}¶

Descriptor configuration (RepFlow block and related settings).

ModelHypers.fitting_net: FittingNetHypers = {'activation_function': 'custom_silu:10.0', 'atom_ener': [], 'dim_case_embd': 0, 'neuron': [240, 240, 240], 'numb_aparam': 0, 'numb_fparam': 0, 'precision': 32, 'rcond': None, 'resnet_dt': True, 'seed': 1, 'trainable': True, 'type': 'ener', 'use_aparam_as_mask': False}¶

Fitting network configuration.

with the following definitions needed to fully understand some of the parameters:

class metatrain.experimental.dpa3.documentation.DescriptorHypers[source]¶

Descriptor hyperparameters wrapping the RepFlow block.

type: str = 'dpa3'¶: Descriptor type identifier used by deepmd-kit.

repflow: RepflowHypers = {'a_compress_e_rate': 2, 'a_compress_rate': 1, 'a_compress_use_split': True, 'a_dim': 32, 'a_rcut': 4.0, 'a_rcut_smth': 3.5, 'a_sel': 300, 'axis_neuron': 4, 'e_dim': 64, 'e_rcut': 6.0, 'e_rcut_smth': 5.3, 'e_sel': 1200, 'n_dim': 128, 'nlayers': 6, 'sel_reduce_factor': 10.0, 'skip_stat': True, 'smooth_edge_update': True, 'update_angle': True, 'update_residual': 0.1, 'update_residual_init': 'const', 'update_style': 'res_residual', 'use_dynamic_sel': True}¶: RepFlow block parameters.

activation_function: str = 'custom_silu:10.0'¶: Activation function. Format: "name" or "name:param". Supported names include "tanh", "gelu", "custom_silu".

use_tebd_bias: bool = False¶: Add bias to the type embedding.

precision: int = 32¶: Floating-point precision for the descriptor (32 or 64). This controls the internal precision of deepmd-kit’s descriptor computation. For mixed-precision training, set this independently of fitting_net.precision; for uniform precision, set both to the same value and match base_precision accordingly.

concat_output_tebd: bool = False¶: Concatenate type embedding to descriptor output.

class metatrain.experimental.dpa3.documentation.RepflowHypers[source]¶

RepFlow descriptor block parameters.

n_dim: int = 128¶: Node feature dimension.

e_dim: int = 64¶: Edge feature dimension.

a_dim: int = 32¶: Angle feature dimension.

nlayers: int = 6¶: Number of RepFlow interaction layers.

e_rcut: float = 6.0¶: Edge (pair) cutoff radius in length units.

e_rcut_smth: float = 5.3¶: Start of cosine smoothing for the edge cutoff.

e_sel: int = 1200¶: Maximum number of edge neighbors per atom.

a_rcut: float = 4.0¶: Angle (triplet) cutoff radius in length units.

a_rcut_smth: float = 3.5¶: Start of cosine smoothing for the angle cutoff.

a_sel: int = 300¶: Maximum number of angle neighbors per atom.

axis_neuron: int = 4¶: Number of axis neurons in the embedding network.

skip_stat: bool = True¶: Skip statistics computation (use pretrained stats).

a_compress_rate: int = 1¶: Compression rate for angle features.

a_compress_e_rate: int = 2¶: Compression rate for angle-edge features.

a_compress_use_split: bool = True¶: Use split compression for angle features.

update_angle: bool = True¶: Update angle features at each layer.

update_style: str = 'res_residual'¶: Residual update style. Options: "res_residual", "res_avg".

update_residual: float = 0.1¶: Residual scaling factor for updates.

update_residual_init: str = 'const'¶: Initialisation for the residual scaling. Options: "const", "norm".

smooth_edge_update: bool = True¶: Apply smooth cutoff function to edge updates.

use_dynamic_sel: bool = True¶: Dynamically adjust neighbor selection at runtime.

sel_reduce_factor: float = 10.0¶: Reduction factor for dynamic neighbor selection.

class metatrain.experimental.dpa3.documentation.FittingNetHypers[source]¶

Fitting network hyperparameters.

neuron: list[int] = [240, 240, 240]¶: Hidden layer sizes for the fitting network.

resnet_dt: bool = True¶: Use a ResNet-style time step in each hidden layer.

seed: int = 1¶: Random seed for weight initialisation.

precision: int = 32¶: Floating-point precision for the fitting network (32 or 64). Can differ from descriptor.precision for mixed-precision training.

activation_function: str = 'custom_silu:10.0'¶: Activation function (same format as the descriptor).

type: str = 'ener'¶: Fitting type. "ener" for energy fitting.

numb_fparam: int = 0¶: Number of frame-level parameters.

numb_aparam: int = 0¶: Number of atom-level parameters.

dim_case_embd: int = 0¶: Dimension of the case embedding (multi-task).

trainable: bool = True¶: Whether fitting network weights are trainable.

rcond: float | None = None¶: Cutoff for pseudo-inverse in linear fitting.

atom_ener: list[float] = []¶: Per-type atomic energy offsets.

use_aparam_as_mask: bool = False¶: Treat atom-level parameters as a mask.

Trainer hyperparameters¶

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.distributed: bool = False¶

Whether to use distributed training.

TrainerHypers.distributed_port: int = 39591¶

Port for DDP communication.

TrainerHypers.batch_size: int = 8¶

The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.

TrainerHypers.num_epochs: int = 100¶

Number of epochs.

TrainerHypers.learning_rate: float = 0.001¶

Learning rate.

TrainerHypers.scheduler_patience: int = 100¶

Number of epochs with no improvement before reducing the learning rate.

TrainerHypers.scheduler_factor: float = 0.8¶

Factor by which the learning rate is reduced on plateau.

TrainerHypers.log_interval: int = 1¶

Interval to log metrics.

TrainerHypers.checkpoint_interval: int = 100¶

Interval to save checkpoints.

TrainerHypers.scale_targets: bool = True¶

Normalize targets to unit std during training.

If true, a single scale is computed for each target, given by the uncentered standard deviation across all values in the dataset for that target.

For targets with more than one property (i.e. > 1 block or >= 1 block with > 1 property), per-property scales are also computed, and used to re-scale model predictions.

See also Target scaling.

TrainerHypers.fixed_composition_weights: dict[str, float | dict[int, float]] = {}¶

Weights for atomic contributions.

This is passed to the fixed_weights argument of CompositionModel.train_model, see its documentation to understand exactly what to pass here.

TrainerHypers.per_structure_targets: list[str] = []¶

Targets to calculate per-structure losses.

TrainerHypers.log_mae: bool = False¶

Log MAE alongside RMSE.

TrainerHypers.log_separate_blocks: bool = False¶

Log per-block error.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'¶

Metric used to select best checkpoint (e.g., rmse_prod).

TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'¶

This section describes the loss function to be used. See the Loss functions for more details.

DPA3 (Experimental)¶

Installation¶

Default Hyperparameters¶

Tuning hyperparameters¶

Using a pretrained model¶

Model hyperparameters¶

Trainer hyperparameters¶

References¶