VSCode & Cheminformatics - BNNLab/BN_Group_Wiki GitHub Wiki
1. Download and install Anaconda via AppsAnywhere
- Ensure you have VPN connection via pulse Secure if not using a university computer.
- Navigate to appsanywhere.
- Search for
Anaconda Python
and press Launch. - Once launched, install with the default settings.
- You do not have to install jupyter or pycharm as part of the installation, you do not need to open the program after it has been installed.
2. VS Code Setup Guide for Python & Jupyter
Step 1. Install Python & Jupyter Extensions
- Open VS Code.
- Go to Extensions (Ctrl+Shift+X).
- Search for and install:
- Python (Microsoft)
- Jupyter (Microsoft)
Step 2. Set Up Anaconda as the Default Python Interpreter
-
Open VS Code and press Ctrl+Shift+P to open the Command Palette.
-
Type "Python: Select Interpreter" and select it.
-
Choose the Anaconda environment (it should have
conda
in the path).If the Anaconda interpreter is missing:
- Open a terminal (`Ctrl+``) and run:
conda init
- Restart VS Code and repeat the above steps.
- Open a terminal (`Ctrl+``) and run:
flake8
(Linting) and black
(Formatting)
Step 3. Install & Configure flake8
and black
Install Run the following command inside your Conda environment:
conda install -c conda-forge black `flake8`
flake8
Enable Linting with - Open VS Code Settings (Ctrl+,).
- Search for
"python.linting.enabled"
and enable it. - Search for
"python.linting.flake8Enabled"
and enable it.
Alternatively, add the following to your settings.json:
{
"python.linting.enabled": true,
"python.linting.flake8Enabled": true
}
Enable Auto-Formatting with black
- Go to Settings (Ctrl+,).
- Search for
"Editor: Default Formatter"
and select Black. - Enable format-on-save by searching for
"Editor: Format On Save"
and checking the box.
Alternatively, add this to settings.json: { "editor.defaultFormatter": "ms-python.black-formatter", "editor.formatOnSave": true }
Step 4. Verify the Setup
- Open a Python file in VS Code.
- Run Ctrl+S to check if black auto-formats your code.
- Introduce a small mistake in your code and check if flake8 highlights it.
- Open a Jupyter Notebook (.ipynb) and ensure it runs correctly.
Your VS Code is now set up for Python development with Anaconda, Jupyter, flake8
, and black
.
2. RDKit
The best way to use RDKit is through Anaconda. You can create a conda environment with the RDKit Python package installed in a few lines of code. Simply open up a terminal/command prompt (Windows button and type "cmd," then press enter), ensure that Anaconda is activated by typing:
conda activate
If Conda is installed properly, you should see the text (base) before the directory address. It is not recommended to use the base environment for working. Instead, you should create a new environment. To initiate this, use the keyword create followed by the --name or -n flag, then specify the environment name, e.g., "MChem_env". If you need a specific Python version (lower than the latest available), you can specify it using the "python=" (e.g. --python=3.8) flag. Additionally, if certain channels (repositories) are required, such as conda-forge for RDKit, you can specify them using the -c flag.
The full command to create a conda environment named MChem_env with the latest Python version and RDKit installed is:
conda create -n MChem_env -c conda-forge rdkit
To activate the newly created environment, use:
conda activate MChem_env
Once inside the environment, you can verify that RDKit is installed by opening Python and running:
from rdkit import Chem
print(Chem.MolFromSmiles("CCO"))
If RDKit is installed correctly, this command should return a valid molecular object rather than None.
For more information on RDKit and its functionalities, visit the official documentation, with a more in-depth guide on set-up here.
The RDKit blog is also a useful source of information for newer RDKit functions such as fingerprint generators.
1. SMILES code
SMILES (Simplified Molecular Input Line Entry System) code is a chemical notation which can be used to represent the chemical structure of a variety of molecules and complexes in a format which is readable in a large amount of computer software. Generic SMILES are simple representations and contain information on both atoms and bonds, isomeric SMILES also contain information on isotopic and chiral specifications.
Any non-hydrogen atoms is specified in the code with single, double, triple, and aromatic bonds are represented by the symbols -, =, #, and :, respectively. For more information on understanding he structure of SMILES code see:
https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
SMILES can be generated from chemical names through use of several different Anaconda packages including RDKit, Chemical Identity Resolver, PubChemPy and Pybel (OpenBabel).
SMILES code can then be used to create/convert to a large amount of additional information, including, XYZ data, 2D Images, InChiKey etc. See:
https://www.rdkit.org/docs/GettingStartedInPython.html
2. InChiKey
InChiKeys are string representations of molecules, the important thing about InChiKeys is that they produce a unique string for each molecule, meaning they are useful for identifying unique molecules or indexing databases.
3. CCDC Conquest
Extracting metal complexes:
Perform a CSD search as usual. Make sure the structures you want to export are ticked. File>Export data as... Select the desired molecular format (e.g. SMILES or mol2). Make sure the Keep heaviest component box is ticked as well as the one molecule per file box. Then save the data. This will export the largest component only, which should be the metal complex. Excluding solvents and counter ions.