NanoPET (deprecated)

Warning

This is a deprecated model. You should not use it for anything important, and support for it will be removed in future versions of metatrain. Please use the PET model instead.

Installation

To install this architecture along with the metatrain package, run:

pip install metatrain[nanopet]

where the square brackets indicate that you want to install the optional dependencies required for nanopet.

Default Hyperparameters

The description of all the hyperparameters used in nanopet is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: deprecated.nanopet
  model:
    cutoff: 5.0
    cutoff_width: 0.5
    d_pet: 128
    num_heads: 4
    num_attention_layers: 2
    num_gnn_layers: 2
    heads: {}
    zbl: false
    long_range:
      enable: false
      use_ewald: false
      smearing: 1.4
      kspace_resolution: 1.33
      interpolation_nodes: 5
  training:
    distributed: false
    distributed_port: 39591
    batch_size: 16
    num_epochs: 10000
    learning_rate: 0.0003
    scheduler_patience: 100
    scheduler_factor: 0.8
    log_interval: 10
    checkpoint_interval: 100
    scale_targets: true
    fixed_composition_weights: {}
    remove_composition_contribution: true
    fixed_scaling_weights: {}
    per_structure_targets: []
    num_workers: null
    log_mae: false
    log_separate_blocks: false
    best_model_metric: rmse_prod
    loss: mse

Model hyperparameters

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.cutoff: float = 5.0

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

ModelHypers.cutoff_width: float = 0.5

Width of the smoothing function at the cutoff

ModelHypers.d_pet: int = 128

Dimension of the edge features.

This hyperparameters controls width of the neural network. In general, increasing it might lead to better accuracy, especially on larger datasets, at the cost of increased training and evaluation time.

ModelHypers.num_heads: int = 4

Attention heads per attention layer.

ModelHypers.num_attention_layers: int = 2

The number of attention layers in each layer of the graph neural network. Depending on the dataset, increasing this hyperparameter might lead to better accuracy, at the cost of increased training and evaluation time.

ModelHypers.num_gnn_layers: int = 2

The number of graph neural network layers.

In general, decreasing this hyperparameter to 1 will lead to much faster models, at the expense of accuracy. Increasing it may or may not lead to better accuracy, depending on the dataset, at the cost of increased training and evaluation time.

ModelHypers.heads: dict[str, Literal['linear', 'mlp']] = {}

The type of head (“linear” or “mlp”) to use for each target (e.g. heads: {"energy": "linear", "mtt::dipole": "mlp"}). All omitted targets will use a MLP (multi-layer perceptron) head. MLP heads consist of two hidden layers with dimensionality d_pet.

ModelHypers.zbl: bool = False

Use ZBL potential for short-range repulsion

ModelHypers.long_range: LongRangeHypers = {'enable': False, 'interpolation_nodes': 5, 'kspace_resolution': 1.33, 'smearing': 1.4, 'use_ewald': False}

Long-range Coulomb interactions parameters.

Trainer hyperparameters

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.distributed: bool = False

Whether to use distributed training

TrainerHypers.distributed_port: int = 39591

Port for DDP communication

TrainerHypers.batch_size: int = 16

The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.

TrainerHypers.num_epochs: int = 10000

Number of epochs.

TrainerHypers.learning_rate: float = 0.0003

Learning rate.

TrainerHypers.scheduler_patience: int = 100

Patience for the learning rate scheduler.

TrainerHypers.scheduler_factor: float = 0.8

Factor to reduce the learning rate by

TrainerHypers.log_interval: int = 10

Interval to log metrics.

TrainerHypers.checkpoint_interval: int = 100

Interval to save checkpoints.

TrainerHypers.scale_targets: bool = True

Normalize targets to unit std during training.

TrainerHypers.fixed_composition_weights: dict[str, dict[int, float]] = {}

Weights for atomic contributions.

This is passed to the fixed_weights argument of CompositionModel.train_model, see its documentation to understand exactly what to pass here.

TrainerHypers.remove_composition_contribution: bool = True

Whether to remove the atomic composition contribution from the targets by fitting a linear model to the training data before training the neural network.

TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}

Weights for target scaling.

This is passed to the fixed_weights argument of Scaler.train_model, see its documentation to understand exactly what to pass here.

TrainerHypers.per_structure_targets: list[str] = []

Targets to calculate per-structure losses.

TrainerHypers.num_workers: int | None = None

Number of workers for data loading. If not provided, it is set automatically.

TrainerHypers.log_mae: bool = False

Log MAE alongside RMSE

TrainerHypers.log_separate_blocks: bool = False

Log per-block error.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'

Metric used to select best checkpoint (e.g., rmse_prod)

TrainerHypers.loss: str | dict[str, LossSpecification | str] = 'mse'

This section describes the loss function to be used. See the Loss functions for more details.

References