MTP usage examples - skw32/skw32.github.io GitHub Wiki


UPDATE: There has been a new MLIP release since these instructions were written. When using below check the updated manual here and see the latest MLIP paper here.


0. Installation

See the 'Compilation recipes' page for instructions to install mlp on the clusters: Irene and summer. Once installed, it is useful to add mlp to your path by putting the following line in your .bashrc: export PATH=path_to_mlp_bin:$PATH

Documentation for mlp can be obtained using:

mlp help

mlp list --> to see a list of possible commands

mlp help [command] --> shows options for each command and states the default values


1. Train a potential using a data set

Input files:

  • .cfg file for the structures in the training data
  • (8,16,20,...)g.mtp: an untrained potential of chosen basis set size
  • or pre-trained potential you wish to train further with additional data (note it is necessary to still include structures from previous training set in the cfg file if you do this)
  • submission script

Outputs:

  • Trained.mtp_
  • training.out or whatever file streamed to in submission script (contains RMSE, etc. during training)

Example submission script for irene for this task:

#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r C12oh2tdvolCon

#MSUB -m work

module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb

# mlp train potential.mtp train_set.cfg [options]:
#  trains potential.mtp on the training set from train_set.cfg

ccc_mprun -n 48 mlp train 08g.mtp train_all_670.cfg > training.out

# Options during training should be added as args to line above, e.g.:
# ccc_mprun -n 48 mlp train 08g.mtp train_all_670.cfg --energy-weight=0.9 > training.out
# For the full list of options and to see the default values, use 'mlp help train'

Untrained potentials

Select an untrained potential to start with from those distributed with the MTP package, e.g. 16g.mtp, 20g.mtp or 24g.mtp. A higher number implies a larger basis set, this is something that should be tested to see which potential produces the best predictive performance. Potentials with larger basis sets are slower to train, but can improve performance. Note for small data sets, large basis sets may also result in over fitting.

cfg file

There is a utility to produce this file automatically for a data set using a feature from mtp, an example submission script (and 3 options) for doing this task are:

#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r mlp_prep
#MSUB -m work

module purge
module load lammps tbb

# Directories storing all OUTCARs of structures you wish to add to your cfg file
for dir in /ccc/work/cont003/gen10765/wallaces/Projects/Co_xMn_{3-x}O_4/cleanedSets_ABC/labelled/*/con*/OUTCAR 

do
    # Option 1: Add all structures during relaxation
    mlp convert-cfg --input-format=vasp-outcar $dir $outDir/tmp.cfg
    # Option 2: Add only last structure
    #mlp convert-cfg --last --input-format=vasp-outcar $dir $outDir/tmp.cfg
    # Option 3: Add only first structure
    #mlp convert-cfg --fist --input-format=vasp-outcar $dir $outDir/tmp.cfg

    cat $outDir/tmp.cfg >> $outDir/CoMn3O4_alloy.cfg    

done

However, if you wish to modify the generation of the cfg file in anyway, a customisable jupyter notebook for this task can be found here.


2. Calculate energies, forces, stresses (EFS) of a test set

Caution! If you want the order of structures in your output to match your input .cfg files, you should not run in parallel (e.g. submission script below runs in serial).

Input files:

  • Trained potential from step1 (usually named Trained.mtp_)
  • .cfg file for the structures in the test set
  • submission script
  • mlip.ini

Outputs:

  • cfg file (with specified name in specified output dir, 'out/testSet_EFSbyMTP.cfg' in the e.g. below)

Small bash script to extract just the energies from this cfg file:

#!/bin/bash

# Specify cfg file to extract from
inp='testSet_EFSbyMTP.cfg'
# Specify name for energy output file
out='orderedMTP.dat'

grep -A1 Energy $inp | sed -n '1~3!p' | sed -n '0~2!p' > $out

Example mlip.ini for this task:

mlip						mtpr			# <string>	MLIP type: "MTP" or "void"
	mlip:load-from			Trained.mtp_	# <string> 	Filename with MTP. If not specified driver operates directly with Ab-Initio model (without additional routines)
calculate-efs			TRUE			# <bool> 	Enables/disables EFS calculation by MTP (disabled learning may be usefull in pure learning/selection regime for best performance)
write-cfgs				out/testSet_EFSbyMTP.cfg	# <string>	File for writing all processed configurations. No confuguration recording if not specified
log				stdout			# <string>	Where to write MLIP log. No logging if not specified; if "stdout" and "stderr" corresponds to standard output streams; otherwise log will be outputed to file with that name

Example submission script for irene for this task:

#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r C12oh2tdVolCon

#MSUB -m work

module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb


mkdir ./out
# Body:
# This calculates energy, forces and stresses (EFS) with fitted.mtp
# for all configurations from the database trainset.cfg, 
# and saves configurations with this data to out/testSet_EFSbyMTP.cfg
mlp run mlip.ini --filename=test_C12oh2td_115.cfg --log=stdout


3. Check errors (RMSE, etc.) for predictions on a test set

Input files:

  • Trained potential from step1 (usually named Trained.mtp_)
  • .cfg file for the structures in the test set
  • submission script
  • mlip.ini

Outputs:

  • output file streamed to in submission script ('out' in the e.g. below)

Example mlip.ini for this task:

abinitio    void # Defines Ab-initio model, if void EFS data should be provided (Used if driver provides EFS data with configurations. No additional EFS calculation is performed)
mlip		mtp			# <string>	MLIP type: "MTP" or "void"
	mlip:load-from          Trained.mtp_		# <string> 	Filename with MTP. If not specified driver operates directly with Ab-Initio model (without additional routines)
    mlip: fit				FALSE			# <bool> 	Enables/disables MTP learning
    mlip:check_errors       errors.log
    mlip:write_cfgs         record.cfgs
driver						1			# <0-2> 	Defines the configuration driver. Makes no sense if external driver is attached
	# 1 - read configurations from database file
	driver:cfg-reader:filename		test_C12oh2td_115.cfg		# <string>	Configuration file name 
	driver:cfg-reader:log			stdout			# <string> 	Where to write reading log. 

Example submission script for irene for this task:

#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r ConlyCheckErrors

#MSUB -m work

module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb

# mlp calc-errors pot.mtp db.cfg:
# calculates errors of "pot.mtp" on the database "db.cfg"

ccc_mprun -n 48 mlp calc-errors Trained.mtp_ test_C12oh2td_115.cfg > out

4. Active selection of structures from a data set (to calculate and add to a training set)

Used with a potential already trained on some data to determine if structures from a new data set would result in an extrapolation of the potential, and hence should be calculated with DFT and added to the training set. The 'threshold' can be tuned in the mlip.ini file and is commonly set to be between 2.0 and 10.0 (it must always be >1.0).

Note: It is the structures in the 'diff.cfg' produced by this proceedure that should be added to the training set.

Input files:

  • Trained potential from step1 (usually named Trained.mtp_)
  • .cfg file for the structures in the data set
  • submission script
  • mlip.ini

Outputs:

  • cfg file (in 'out/selected.cfg' in e.g. below) containing all structures selected from the data set

Example mlip.ini for this task:

Note: it is important to select 'mtpr' not 'mtp' for mlip on the top line to use the more recent version of the software (for multicomponent systems).

mlip                        mtpr            # <string>  MLIP type: "MTP" or "void"
    mlip:load-from              Trained.mtp_        # <string>  Filename with MTP. If not specified driver operates directly with Ab-Initio model (without additional routines)
select              TRUE            # <bool>    Activates/deactivates selection (active learning engine)
    select:site-en-weight   0.0         # <double>  Weight for site energy equations in selection procedure
    select:energy-weight    1.0         # <double>  Weight for energy equation in selection procedure
    select:force-weight 0.0         # <double>  Weight for forces equations in selection procedure
    select:stress-weight    0.0         # <double>  Weight for stresses equations in selection procedure
    select:threshold        1.0001        # <double>  Selection threshold - maximum allowed extrapolation level
    select:save-selected    out/selected.cfg    # <string>  Selected configurations will be saved in this file after selection is complete. No configuration saving if not specified
    select:efs-ignored      TRUE                    # <bool>    Indicates that driver actually does not need EFS to be calculated (e.g. in fitting scenario). "TRUE" value may speed up processing by skipping some extra EFS calculations 
    select:log          stdout          # <string>  Where to write selection log. No logging if not specified; if "stdout" and "stderr" corresponds to standard output streams; otherwise log will be outputed to file with that name

Example submission script for irene for this task:

#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 86000
#MSUB -A gen10765
#MSUB -r activeTest

#MSUB -m work

module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb


# Preamble, common for all examples
TMP_DIR=./out
mkdir -p $TMP_DIR


# !! OUTDATED!! Use command below instead
# Body:
# This reads sparse.cfg, selects the configurations with active learning (by energy equation),
# and saves the selected configurations to out/selected.cfg
#mlp run mlip.ini --filename=test_C12oh2td_115.cfg --log=stdout

mlp select-add Trained.mtp_ train_all_820.cfg test_C12oh2td_115.cfg diff.cfg

⚠️ **GitHub.com Fallback** ⚠️