Preprocessing
MolecularGraph.jl version: 0.17.1
This tutorial includes following preprocessing strategies.
Remove hydrogen vertices
Extract molecules of interest
Standardize charges
Dealing with resonance structure
Customize property updater
Public databases (e.g. PubChem, ChEMBL) and flat file databases (e.g. SDFile) have different formats and may not always be used for your analysis as is. For example,
whether hydrogens are explicitly written or omitted
whether salt and water molecules are included in the molecular graph or provided as metadata
representation of resonance structure (e.g. diazo group; [C-]-[N+]#N <-> C-[N+]=[N-])
charges depend on the condition - powder, dissolved or in physiological condition
"_data"
fetch_mol! (generic function with 1 method)
"_data/Cefditoren Pivoxil.mol"
Remove hydrogen vertices
SDFiles downloaded from PubChem have hydrogen nodes. In practice, hydrogens which is not important are removed from molecular graphs for simplicity.
remove_hydrogens!(mol)
removes hydrogen vertices that are not important (no charge, no unpaired electron, no specific isotope composition and not involved in stereochemistry).remove_all_hydrogens!(mol)
removes all hydrogen vertices.
Extract molecules of interest
connected_components(mol)
returns connected components that are sets of vertices of the individual molecules in the molecular graph object.
1
2
3
4
5
6
7
8
9
13
14
15
16
17
To extract the molecule of interest, you can iterate over the connected components and apply
induced_subgraph(mol, vertices)
to extract the molecules and filter them one by one.Or simply
extract_largest_component!(mol)
can be used. This removes vertices not belong to the largest component (connected component which has the largest number of vertices) from the graph.
Standardize charges
protonate_acids!(mol)
removes charges on oxo/thio acid anionsdeprotonate_oniums!(mol)
removes charges on ammonium/oxonium cations
Dealing with resonance structure
Substructure match methods in this library compares atom symbols and the number of $\pi$ electrons, so in many cases you don't have to care about fluctuations in resonance structure.