Escolar Documentos
Profissional Documentos
Cultura Documentos
1
and Search Systems
Sheffield, UK
Purpose of my 7 lectures
2
NH2
O 1
CH CH2 OH
HO
OC(=O)C(N)CC1=CC=C(O)C=C1
Simplified SMILES encoding rules
21
Full rules:
http://www.daylight.com/smiles/smiles-intro.html
22
Other line notations
5
3 NH2 12 11
O 6 13
1 CH CH2 OH
HO 4
8 9
ROSDAL (Beilstein)
Representation Of Structure Diagram Arranged Linearly
1O-2=3O,2-4-5N,4-6-7=-12-7,10-13O
Sybyl Line Notation (Tripos)
OHC(=O)CH(NH2)CH2C[1]=CHCH=C(OH)CH=CH@1
Wiswesser Line Notation (WLN) (obsolete)
QVYZ1R DQ
Connection Tables (CTs)
23
usually redundant
every bond shown twice, once for each atom
implemented as array of records
record for each atom might store
atomic type
hydrogen count
formal charge
2D display co-ordinates
bonds to neighbouring atoms
etc.
MDL Connection Table
26
Header Block
data on molecule name and file origin
counts of atoms and bonds etc.
Tyrosine
-ISIS- 08220120432D
13 13 0 0 0 0 0 0 0 0999 V2000
MDL Connection Table
29
Atoms block
one line per atom
specifies X,Y,Z-coords, atom symbol, isotope, charge,
stereo code etc.
0.2459 -1.4736 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5815 -1.4724 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9944 -2.1872 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5810 -2.9037 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.2495 -2.9008 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6586 -2.1854 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4836 -2.1830 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-1.9042 -2.1792 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.1027 -2.1870 0.0000 C 0 0 3 0 0 0 0 0 0 0 0 0
-3.1359 -1.1516 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-3.9070 -2.1847 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.4070 -2.6845 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-4.4989 -1.5618 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
MDL Connection Table
30
Bonds Block
one line per bond (each bond shown once)
specifies row numbers for atoms, and codes for bond
type, bond stereochemistry etc.
1 2 2 0 0 0 0
6 7 1 0 0 0 0
3 4 2 0 0 0 0
3 8 1 0 0 0 0
4 5 1 0 0 0 0
9 10 1 0 0 0 0
2 3 1 0 0 0 0
9 11 1 0 0 0 0
5 6 2 0 0 0 0
11 12 1 0 0 0 0
6 1 1 0 0 0 0
11 13 2 0 0 0 0
8 9 1 0 0 0 0
M END
Standard Connection Table
31Formats
branch of mathematics
particularly useful in chemical informatics
and in computer science generally
study of graphs which
consist of
a set of nodes
a set of edges joining
pairs of nodes
Properties of graphs
36
CH2
O
H2N CH
OH
Structure Diagrams as Graphs
38