Lesson 7 Molecules - joslynnlee/CHEM-454 GitHub Wiki

How do we get structures?

There are many different file formats, we learned about the Protein Data Bank for larger biomolecules: DNA and protein, these are .pdb format.

For small molecules, we learned about .sdf and .mol or .mol2 format. When we begin to use Molecular Modeling tools like AMBER, CHARRM and Maestro, structures can be imported different ways.

SMILES and INChi (from Molecular Modeling Basics)

Good reference paper with more background: (https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-4-22)

The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. The file extension is .smi

The IUPAC International Chemical Identifier (InChI) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. This doesn't have a file extension.

Example: Ethane SMILES: CC InChI: InChI=1S/C2H6/c1-2/h1-2H3

SMILES is pretty straight forward: ethane is two C atoms singly bonded to each other. Two atoms next to each other imply a single bond and programs that read SMILES strings are smart enough to know that each C atom needs 3 H atoms to fill the valence shell.

InChI requires a little more explanation: "1S" is version 1 of Standard InChi (more about that later). C2H6 is the empirical formula and tells you that the first two atoms are C atoms. c1-2 means that atom 1 and 2 are connected. h1-2H3 means that atom 1-2 each have 3 H atoms. Programs that read SMILES strings are smart enough to know that this means that the two C atoms are connected by a single bond.

Example 2: Ethene and ethyne SMILES: C=C and C#C InChI: InChI=1S/C2H4/c1-2/h1-2H2 and InChI=1S/C2H2/c1-2/h1-2H

SMILES: double and triple bonds are indicated by "=" and "#", respectively. InChI: "CH2" and "CH" are indicated by "H2" and "H", respectively.

Example 3: cis- and trans-2-butene SMILES: C/C=C\C and C/C=C/C InCHhI: InChI=1S/C4H8/c1-3-4-2/h3-4H,1-2H3/b4-3- and InChI=1S/C4H8/c1-3-4-2/h3-4H,1-2H3/b4-3+

Direct download from Protein Data Bank

You can go directly to crystallography data to get structures. This can vary from a strand of DNA, peptides, ligands or proteins.

Download Avogadro

Go to the website: https://avogadro.cc/
Click on Download, which will take you to SourceForge.
Run this to install on to your desktop.

Building a peptide in Avogadro

Go to Build and toggle down to insert and over to peptide.
A window will pop-up with selections of amino acids, stereochemistry (in nature there is L), you can also choose the N or C terminus. The ordering of the peptide will be the N to C.
Build a short tetramer by select the following amino acids: Gly-Asp-Arg-Gly
Click on the white finger pointer icon to rotate the molecule. Identify the N-terminal end and C-terminal end in your structure.
Re-build the short tetramer with the same amino acids with charged N-terminal and C-terminal.
These charged ends will change the 3D shape of these types of molecules.

Building a DNA or RNA strand in Avogadro

1. Go to Build and toggle down to insert and over to DNA/RNA. Here you can choose between a single strand or double strand. The Sequence goes from the 5' end with phosphate to 3' end with free hydroxyl.
Type in the sequence: AAGGTT and select double strand.
Looking at the structure, find the 5' and 3' end. You can add labels to give the name of the molecule.