PET¶
PET is a cleaner, more user-friendly reimplementation of the original
PET model [1]. It is designed for better
modularity and maintainability, while preseving compatibility with the original
PET implementation in metatrain. It also adds new features like long-range
features, better fine-tuning implementation, a possibility to train on
arbitrarty targets, and a faster inference due to the fast attention.
Installation¶
To install this architecture along with the metatrain package, run:
pip install metatrain[pet]
where the square brackets indicate that you want to install the optional
dependencies required for pet.
Default Hyperparameters¶
The description of all the hyperparameters used in pet is provided
further down this page. However, here we provide you with a yaml file containing all
the default hyperparameters, which might be convenient as a starting point to
create your own hyperparameter files:
architecture:
name: pet
model:
cutoff: 4.5
cutoff_width: 0.2
d_pet: 128
d_head: 128
d_node: 256
d_feedforward: 256
num_heads: 8
num_attention_layers: 2
num_gnn_layers: 2
normalization: RMSNorm
activation: SwiGLU
transformer_type: PreLN
featurizer_type: feedforward
zbl: false
long_range:
enable: false
use_ewald: false
smearing: 1.4
kspace_resolution: 1.33
interpolation_nodes: 5
training:
distributed: false
distributed_port: 39591
batch_size: 16
num_epochs: 1000
warmup_fraction: 0.01
learning_rate: 0.0001
weight_decay: null
log_interval: 1
checkpoint_interval: 100
scale_targets: true
fixed_composition_weights: {}
remove_composition_contribution: true
fixed_scaling_weights: {}
per_structure_targets: []
num_workers: null
log_mae: true
log_separate_blocks: false
best_model_metric: mae_prod
grad_clip_norm: 1.0
loss: mse
finetune:
read_from: null
method: full
config: {}
inherit_heads: {}
Tuning hyperparameters¶
The default hyperparameters above will work well in most cases, but they may not be optimal for your specific dataset. There is good number of parameters to tune, both for the model and the trainer. Since seeing them for the first time might be overwhelming, here we provide a list of the parameters that are in general the most important (in decreasing order of importance):
- ModelHypers.cutoff: float = 4.5
Cutoff radius for neighbor search.
This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.
- TrainerHypers.learning_rate: float = 0.0001
Learning rate.
- TrainerHypers.batch_size: int = 16
The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.
- ModelHypers.d_pet: int = 128
Dimension of the edge features.
This hyperparameters controls width of the neural network. In general, increasing it might lead to better accuracy, especially on larger datasets, at the cost of increased training and evaluation time.
- ModelHypers.d_node: int = 256
Dimension of the node features.
Increasing this hyperparameter might lead to better accuracy, with a relatively small increase in inference time.
- ModelHypers.num_gnn_layers: int = 2
The number of graph neural network layers.
In general, decreasing this hyperparameter to 1 will lead to much faster models, at the expense of accuracy. Increasing it may or may not lead to better accuracy, depending on the dataset, at the cost of increased training and evaluation time.
- ModelHypers.num_attention_layers: int = 2
The number of attention layers in each layer of the graph neural network. Depending on the dataset, increasing this hyperparameter might lead to better accuracy, at the cost of increased training and evaluation time.
- TrainerHypers.loss: str | dict[str, LossSpecification | str] = 'mse'
This section describes the loss function to be used. See the Loss functions for more details.
- ModelHypers.long_range: LongRangeHypers = {'enable': False, 'interpolation_nodes': 5, 'kspace_resolution': 1.33, 'smearing': 1.4, 'use_ewald': False}
Long-range Coulomb interactions parameters.
Model hyperparameters¶
The parameters that go under the architecture.model section of the config file
are the following:
- ModelHypers.cutoff: float = 4.5¶
Cutoff radius for neighbor search.
This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.
- ModelHypers.d_pet: int = 128¶
Dimension of the edge features.
This hyperparameters controls width of the neural network. In general, increasing it might lead to better accuracy, especially on larger datasets, at the cost of increased training and evaluation time.
- ModelHypers.d_node: int = 256¶
Dimension of the node features.
Increasing this hyperparameter might lead to better accuracy, with a relatively small increase in inference time.
- ModelHypers.num_attention_layers: int = 2¶
The number of attention layers in each layer of the graph neural network. Depending on the dataset, increasing this hyperparameter might lead to better accuracy, at the cost of increased training and evaluation time.
- ModelHypers.num_gnn_layers: int = 2¶
The number of graph neural network layers.
In general, decreasing this hyperparameter to 1 will lead to much faster models, at the expense of accuracy. Increasing it may or may not lead to better accuracy, depending on the dataset, at the cost of increased training and evaluation time.
- ModelHypers.transformer_type: Literal['PreLN', 'PostLN'] = 'PreLN'¶
The order in which the layer normalization and attention are applied in a transformer block. Available options are
PreLN(normalization before attention) andPostLN(normalization after attention).
- ModelHypers.featurizer_type: Literal['residual', 'feedforward'] = 'feedforward'¶
Implementation of the featurizer of the model to use. Available options are
residual(the original featurizer from the PET paper, that uses residual connections at each GNN layer for readout) andfeedforward(a modern version that uses the last representation after all GNN iterations for readout). Additionally, the feedforward version uses bidirectional features flow during the message passing iterations, that favors features flowing from atomito atomjto be not equal to the features flowing from atomjto atomi.
- ModelHypers.long_range: LongRangeHypers = {'enable': False, 'interpolation_nodes': 5, 'kspace_resolution': 1.33, 'smearing': 1.4, 'use_ewald': False}¶
Long-range Coulomb interactions parameters.
Trainer hyperparameters¶
The parameters that go under the architecture.trainer section of the config file
are the following:
- TrainerHypers.batch_size: int = 16¶
The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.
- TrainerHypers.warmup_fraction: float = 0.01¶
Fraction of training steps used for learning rate warmup.
- TrainerHypers.fixed_composition_weights: dict[str, dict[int, float]] = {}¶
Weights for atomic contributions.
This is passed to the
fixed_weightsargument ofCompositionModel.train_model, see its documentation to understand exactly what to pass here.
- TrainerHypers.remove_composition_contribution: bool = True¶
Whether to remove the atomic composition contribution from the targets by fitting a linear model to the training data before training the neural network.
- TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}¶
Weights for target scaling.
This is passed to the
fixed_weightsargument ofScaler.train_model, see its documentation to understand exactly what to pass here.
- TrainerHypers.num_workers: int | None = None¶
Number of workers for data loading. If not provided, it is set automatically.
- TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'mae_prod'¶
Metric used to select best checkpoint (e.g.,
rmse_prod)
- TrainerHypers.grad_clip_norm: float = 1.0¶
Maximum gradient norm value, by default inf (no clipping)
- TrainerHypers.loss: str | dict[str, LossSpecification | str] = 'mse'¶
This section describes the loss function to be used. See the Loss functions for more details.
- TrainerHypers.finetune: NoFinetuneHypers | FullFinetuneHypers | LoRaFinetuneHypers | HeadsFinetuneHypers = {'config': {}, 'inherit_heads': {}, 'method': 'full', 'read_from': None}¶
Parameters for fine-tuning trained PET models.
See Fine-tune a pre-trained model for more details.