MTP usage examples - skw32/skw32.github.io GitHub Wiki
UPDATE: There has been a new MLIP release since these instructions were written. When using below check the updated manual here and see the latest MLIP paper here.
See the 'Compilation recipes' page for instructions to install mlp on the clusters: Irene and summer. Once installed, it is useful to add mlp to your path by putting the following line in your .bashrc: export PATH=path_to_mlp_bin:$PATH
Documentation for mlp can be obtained using:
mlp help
mlp list --> to see a list of possible commands
mlp help [command] --> shows options for each command and states the default values
- .cfg file for the structures in the training data
- (8,16,20,...)g.mtp: an untrained potential of chosen basis set size
- or pre-trained potential you wish to train further with additional data (note it is necessary to still include structures from previous training set in the cfg file if you do this)
- submission script
- Trained.mtp_
- training.out or whatever file streamed to in submission script (contains RMSE, etc. during training)
#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r C12oh2tdvolCon
#MSUB -m work
module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb
# mlp train potential.mtp train_set.cfg [options]:
# trains potential.mtp on the training set from train_set.cfg
ccc_mprun -n 48 mlp train 08g.mtp train_all_670.cfg > training.out
# Options during training should be added as args to line above, e.g.:
# ccc_mprun -n 48 mlp train 08g.mtp train_all_670.cfg --energy-weight=0.9 > training.out
# For the full list of options and to see the default values, use 'mlp help train'
Select an untrained potential to start with from those distributed with the MTP package, e.g. 16g.mtp, 20g.mtp or 24g.mtp. A higher number implies a larger basis set, this is something that should be tested to see which potential produces the best predictive performance. Potentials with larger basis sets are slower to train, but can improve performance. Note for small data sets, large basis sets may also result in over fitting.
There is a utility to produce this file automatically for a data set using a feature from mtp, an example submission script (and 3 options) for doing this task are:
#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r mlp_prep
#MSUB -m work
module purge
module load lammps tbb
# Directories storing all OUTCARs of structures you wish to add to your cfg file
for dir in /ccc/work/cont003/gen10765/wallaces/Projects/Co_xMn_{3-x}O_4/cleanedSets_ABC/labelled/*/con*/OUTCAR
do
# Option 1: Add all structures during relaxation
mlp convert-cfg --input-format=vasp-outcar $dir $outDir/tmp.cfg
# Option 2: Add only last structure
#mlp convert-cfg --last --input-format=vasp-outcar $dir $outDir/tmp.cfg
# Option 3: Add only first structure
#mlp convert-cfg --fist --input-format=vasp-outcar $dir $outDir/tmp.cfg
cat $outDir/tmp.cfg >> $outDir/CoMn3O4_alloy.cfg
done
However, if you wish to modify the generation of the cfg file in anyway, a customisable jupyter notebook for this task can be found here.
Caution! If you want the order of structures in your output to match your input .cfg files, you should not run in parallel (e.g. submission script below runs in serial).
- Trained potential from step1 (usually named Trained.mtp_)
- .cfg file for the structures in the test set
- submission script
- mlip.ini
- cfg file (with specified name in specified output dir, 'out/testSet_EFSbyMTP.cfg' in the e.g. below)
Small bash script to extract just the energies from this cfg file:
#!/bin/bash
# Specify cfg file to extract from
inp='testSet_EFSbyMTP.cfg'
# Specify name for energy output file
out='orderedMTP.dat'
grep -A1 Energy $inp | sed -n '1~3!p' | sed -n '0~2!p' > $out
mlip mtpr # <string> MLIP type: "MTP" or "void"
mlip:load-from Trained.mtp_ # <string> Filename with MTP. If not specified driver operates directly with Ab-Initio model (without additional routines)
calculate-efs TRUE # <bool> Enables/disables EFS calculation by MTP (disabled learning may be usefull in pure learning/selection regime for best performance)
write-cfgs out/testSet_EFSbyMTP.cfg # <string> File for writing all processed configurations. No confuguration recording if not specified
log stdout # <string> Where to write MLIP log. No logging if not specified; if "stdout" and "stderr" corresponds to standard output streams; otherwise log will be outputed to file with that name
#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r C12oh2tdVolCon
#MSUB -m work
module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb
mkdir ./out
# Body:
# This calculates energy, forces and stresses (EFS) with fitted.mtp
# for all configurations from the database trainset.cfg,
# and saves configurations with this data to out/testSet_EFSbyMTP.cfg
mlp run mlip.ini --filename=test_C12oh2td_115.cfg --log=stdout
- Trained potential from step1 (usually named Trained.mtp_)
- .cfg file for the structures in the test set
- submission script
- mlip.ini
- output file streamed to in submission script ('out' in the e.g. below)
abinitio void # Defines Ab-initio model, if void EFS data should be provided (Used if driver provides EFS data with configurations. No additional EFS calculation is performed)
mlip mtp # <string> MLIP type: "MTP" or "void"
mlip:load-from Trained.mtp_ # <string> Filename with MTP. If not specified driver operates directly with Ab-Initio model (without additional routines)
mlip: fit FALSE # <bool> Enables/disables MTP learning
mlip:check_errors errors.log
mlip:write_cfgs record.cfgs
driver 1 # <0-2> Defines the configuration driver. Makes no sense if external driver is attached
# 1 - read configurations from database file
driver:cfg-reader:filename test_C12oh2td_115.cfg # <string> Configuration file name
driver:cfg-reader:log stdout # <string> Where to write reading log.
#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 43200
#MSUB -A gen10765
#MSUB -r ConlyCheckErrors
#MSUB -m work
module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb
# mlp calc-errors pot.mtp db.cfg:
# calculates errors of "pot.mtp" on the database "db.cfg"
ccc_mprun -n 48 mlp calc-errors Trained.mtp_ test_C12oh2td_115.cfg > out
Used with a potential already trained on some data to determine if structures from a new data set would result in an extrapolation of the potential, and hence should be calculated with DFT and added to the training set. The 'threshold' can be tuned in the mlip.ini file and is commonly set to be between 2.0 and 10.0 (it must always be >1.0).
Note: It is the structures in the 'diff.cfg' produced by this proceedure that should be added to the training set.
- Trained potential from step1 (usually named Trained.mtp_)
- .cfg file for the structures in the data set
- submission script
- mlip.ini
- cfg file (in 'out/selected.cfg' in e.g. below) containing all structures selected from the data set
Note: it is important to select 'mtpr' not 'mtp' for mlip on the top line to use the more recent version of the software (for multicomponent systems).
mlip mtpr # <string> MLIP type: "MTP" or "void"
mlip:load-from Trained.mtp_ # <string> Filename with MTP. If not specified driver operates directly with Ab-Initio model (without additional routines)
select TRUE # <bool> Activates/deactivates selection (active learning engine)
select:site-en-weight 0.0 # <double> Weight for site energy equations in selection procedure
select:energy-weight 1.0 # <double> Weight for energy equation in selection procedure
select:force-weight 0.0 # <double> Weight for forces equations in selection procedure
select:stress-weight 0.0 # <double> Weight for stresses equations in selection procedure
select:threshold 1.0001 # <double> Selection threshold - maximum allowed extrapolation level
select:save-selected out/selected.cfg # <string> Selected configurations will be saved in this file after selection is complete. No configuration saving if not specified
select:efs-ignored TRUE # <bool> Indicates that driver actually does not need EFS to be calculated (e.g. in fitting scenario). "TRUE" value may speed up processing by skipping some extra EFS calculations
select:log stdout # <string> Where to write selection log. No logging if not specified; if "stdout" and "stderr" corresponds to standard output streams; otherwise log will be outputed to file with that name
#!/bin/bash
#MSUB -q skylake
#MSUB -n 48
#MSUB -T 86000
#MSUB -A gen10765
#MSUB -r activeTest
#MSUB -m work
module purge
module load lammps tbb
module load intel/17.0.6.256 mpi/openmpi/2.0.4 tbb
# Preamble, common for all examples
TMP_DIR=./out
mkdir -p $TMP_DIR
# !! OUTDATED!! Use command below instead
# Body:
# This reads sparse.cfg, selects the configurations with active learning (by energy equation),
# and saves the selected configurations to out/selected.cfg
#mlp run mlip.ini --filename=test_C12oh2td_115.cfg --log=stdout
mlp select-add Trained.mtp_ train_all_820.cfg test_C12oh2td_115.cfg diff.cfg