Você está na página 1de 105

Chemometrical Methods in Expert Systems for the Molecular Structure Elucidation.

Mikhail Elyashberg Advanced Chemistry Development (ACD), Moscow-Toronto

Pioneering works. DENDRAL system (Stanford)

Joshua Lederberg

Edward Feigenbaum

Carl Djerassi

J.Lederberg, E.A.Feigenbaum, C.Djerassi. Application of Artificial Intelligence to Chemical Inference. I. The Number of Possible OrganicCompounds. Acyclic Structures Containing C, H, O and N. J. Am. Chem. Soc, 1968, V. 91, P. 2973

Pioneering works. CASE system (Arizona)

Morton Munk D.B.Nelson, M.E.Munk, K.B.Gasli, D.L.Horald. Alanylactinobicyclon. An Application of Computer Techniques to Structure Elucidation. J. Org. Chem., 1969, V. 34, P. 3800

Pioneering works. CHEMICS system (Japan)

Shin-Ichi Sasaki

S.I.Sasaki, H.Abe, T.Ouki, M.Sakamoto and S.Ochiai. Automated Structure Elucidation of Several Kinds of Aliphatic and Alicyclic Compounds. Analytical Chemistry, 1968, V. 40, p. 2220

Pioneering works. STREC system (Moscow)

M.E. Elyashberg, L.A. Gribov Formal logical interpretation of IR spectra using characteristic frequencies. Zhurn. Appl. Spectrosc. (J. Appl. Spectrosc.) 8, 1968, 998.

Molecule as a machine for coding the structural information.


X-Rays Stream of electrons IR/VIS radiation

Radio frequency + Magnetic field

3D MODEL

MASSSPECTRUM

IR/RAMAN SPECTRA

NMR SPECTRUM

Number of isomers of some natural products


Cl

O
Br Br OH OH

C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624


O

~43 mln. are real

OH

C15H20O1, 37, 568, 150, 635


O OH

C12H12O3,
HO O H N

68,930,547,646

HO

H2N

C13H20O3, 14,431,269,166

C11H12N2O2, 310 <n10


11

12

Properties of an isomer set


Isomer numbers for molecules of medium size are comparable with Avogadros number (1028). Though the number of isomers is huge the isomers corresponding to the given molecular formula make up a countable and finite set.

General strategy of Computer-Aided Structure Elucidation (CASE)

Elimination of superfluous isomers from the full set by imposing different structural constraints.
Sources of structural constraints: Spectra, a priory information (sample origin, chemical rules, etc.)

Direct problems

Structures
Molecular Formulae

Nominal mass

Inverse problems

NMR, IR\Raman, MS. Molecular Formula

SpectrumStructure Correlations

Selection of fragments. Generation of fragment sets

Structure Generation from atoms and fragments. Structural and Spectral Filtering of isomers

Spectrum prediction for candidate structures Choice of the most probable structure

The most probable structure

Separate section

Computer Techniques and Optimization

Prof. Jean-Thomas Clerc (1934-1998)

Chemometrics in Analytical Chemistry, CAC-1996 Tarragona, Spain

Prominent scientist and vivid person.

L.A. Gribov, M.E. Elyashberg Computer-Assisted Identification of Organic Molecules by their Molecular Spectra. 1979. Monographic review.

Achievements of the Storm and Stress period were generalized in monographs:


M.E. Elyashberg, L.A.Gribov, V.V. Serov. Molecular Spectral Analysis and Computer. Nauka, Moscow, 1980.
N.A.B. Gray. Computer-Assisted Structure Elucidation. Wiley, N.Y. 1986

Examples of structures identified with the aids of X-PERT program.


CH3 H3C CH3 CH3 CH3 O O O CH3 O N
+O -

+O

O S H3C

N O

Cl O H3C S NH O O O

Cl Cl OH O CH3

O NH O N O H3C

H3C CH3 CH2

O N

H3C

O O OH

Development of NMR techniques

1986

2006

Direct H-C correlations (HSQC)


Interaction between H and C atoms through one bond.
H-1
1H

Spectrum

1J C-H

correlations
0 8

13C

H C H

Spectrum

H 1 C 1 C

16 24 32 40 48
F1 Chemical Shift (ppm)

56 64 72 80 88 96 104 112 120 128

C-1
5.5 5.0 4.5 4.0 3.5 3.0 2.5 F2 Chemical Shif t (ppm) 2.0 1.5 1.0 0.5

136 144

correlations (COSY) Proton interaction trough three bonds.


H-2

1H-1H

Spectrum 1

3J H-H

correlations

H-1

0.5

1.0

Spectrum 1

1.5

3.0

3.5

4.0

C1 H

C2

C
5.5 5.0 4.5 4.0 3.5 3.0 2.5 F2 Chemical Shif t (ppm) 2.0 1.5 1.0 0.5

4.5

5.0

5.5

F1 Chemical Shift (ppm)

H1

H2

2.0

2.5

Long-range 13C-1H correlations (HMBC).


Interaction between 13 and 1 nuclei trough two and three bonds. Spectrum 1
H H k
0

C C C

C 1 C i

C k

10 20 30

Spectrum 13

40 50 60 70 80 90 100 110 120 130 140 150 160


F1 Chemical Shift (ppm)

HMBC peaks corresponding to 2- and 3-bonds correlations

170 180

H-i H-k -1
5.5 5.0 4.5 4.0 3.5 3.0 2.5 F2 Chemical Shif t (ppm) 2.0 1.5 1.0 0.5

190 200 210 220

are undistinguishable!

Ratio Nobs\ Ntheor correlations for COSY and HMBC


2.5 2
Ratio

1.5 1 0.5 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 Problems

COSY

1.4 1.2 1
Ratio

HMBC

0.8 0.6 0.4 0.2 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 Problems

Nuclear Overhauser Effect (NOE)


Interaction between 1H-1 and 1H-2 when they are distanced in the space by r <5 . NOE produced NOESY / ROESY 2D NMR spectra.

H C

H C n

Structural interpretation of 2D NMR spectra. Main axioms


COSY If a peak (H-1, H-2) is observed in COSY, then a molecule contains the chemical bond

(C-1)(C-2).

HMBC If a peak (H-1, C-2) is observed in HMBC, then atoms C-1 and C-2 are separated in the structure by ONE or TWO chemical bonds:

(C-1)(C-2) or (C-1)(X)(C-2), X=C, O, N

NOESY If a peak (H-1, H-2) is observed in NOESY (ROESY), then the distance between H-1 H-2 in space is less than 5.

Interpretation of the Structure Elucidation problem in terms of an axiomatic theory.


Creation of the set of axioms and hypotheses necessary for solution of a given problem is equivalent to creation of some particular axiomatic theory. To obtain a valid solution to the problem (i.e. manageable output file containing the correct structure) the set of axioms must be true, complete and consistent.

Example of an expert system based on 1D and 2D NMR data.


STRUCTURE ELUCIDATOR Advance Chemistry Development Ltd., Moscow -Toronto

K.A.Blinov, D.Carlson, M.E. Elyashberg et al. J. Magn. Reson. Chem. 2003, 41, 359-372. M.E.Elyashberg, K.A. Blinov, S.G. Molodtsov et al. J. Chem. Inf. Model. 2004, 44, 771-792

Knowledge of Structure Elucidator


Factual knowledge: Database of Structures (280,000) and Fragments (1.7 mln) with assigned NMR spectra (subspectra). Axiomatic Knowledge: Correlation Tables for spectral structure filtering by NMR and IR spectra. Atom Property Correlation Table (APCT). It is used for setting atom hybridization and possibility of neighboring with heteroatoms.

Distribution of 1.7 million fragments with skeletal atom number (max=16) and number of carbons (max=10)
Number of fragments (DB 1.7 mln)
120000 100000 80000 60000 40000 20000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76
Number of skeletal atoms

Number of fragments (DB 1.7 mln)


180000 160000 140000 120000 100000 80000 60000 40000 20000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76
Number of carbon atoms

Skeletal atoms

Carbon atoms

From 10 to 100 fragments selected by program from 13C spectrum usually exist in a molecule under investigation.

Checking Knowledge reliability.


98% of 280 000 structures passed checking by the Spectral Filter and Atom Property Correlation Tables. 99.8% of 17 000 natural product stood the same verification.

Risk to lose the correct structure is minimal.

Spectral data input 2D peak coordinates


HMBC peak table

C-C connectivities.
Table of HMBC connectivities

Molecular Connectivity Diagram (MCD) of unknown compound for spectrum. 31 50 7

Structure Generation combined with Structural and Spectral Filtering


Internal Badlist User Badlist User Goodlist Geometry Rings: Obligatory, Forbidden Bredts Rule Maximum Match Factor Filter Tolerance: Tight, Medium, Loose

Output Structural File:


Number of structures, k = 3. Structure Generation Time, tg = 0.6
H3C O H3C O HO CH3 OH HO H3C H3C O O H3C O HO CH3 OH HO O CH3 CH3 HO CH3 O CH3 CH3 CH3 O H3C CH3 CH3 H3C O O HO CH3 OH O CH3 CH3 O CH3

Selection of the Preferable Structure


1. Chemical shift calculation for all structures of the output file. Removing duplicate structures. 2. Structure ranking in ascending order of average chemical shift deviation, d, found for calculated and experimental spectra. A structure having minimum d value is declared as the most probable.
13C

Methods of 1. 2. 3.

13C

and 1H spectrum prediction:

Fragment based approach Method of increments (PLS) Artificial neural nets.

Recently speed and accuracy of Incremental Approach were significantly improved.


Speed: 6000-10000 13C chemical shifts per second. For molecules 20-30: 200-400 spectra per second. Accuracy: Average chemical shift deviation: 1.8 ppm.

Y.D. Smurnyy, K.A. Blinov, T.S.Churanova, M.E. Elyashberg, A.J. Williams. J. Chem. Inf. Model. 2008, 48, 128-134

The ranked output file r(all)=1


1 (ID:1)
O H3C H3C O OH O O

2 (ID:3)
CH3

3 (ID:2)
H3 C CH3 O H3 C O O OH O O O O O OH CH3 OH

HO

H3 C

H3C HO CH3 H3 C H3C CH3

HO

H3 C HO

CH3 CH3 H3 C CH3 H3C CH3 CH3 OH

dA(13C): 0.415 dI(13C): 1.789 dN(13C): 1.650 dA(1H): 0.058 dI(1H): 0.134 dN(1H): 0.144 dA Complex: 3.085

dA(13C): 0.912 dI(13C): 2.291 dN(13C): 2.261 dA(1H): 0.172 dI(1H): 0.257 dN(1H): 0.273 dA Complex: 4.992

dA(13C): 0.992 dI(13C): 2.267 dN(13C): 2.329 dA(1H): 0.173 dI(1H): 0.255 dN(1H): 0.276 dA Complex: 5.084

The higher speed and accuracy of chemical shift prediction influenced the system strategy.

Then:
Output file should be minimal. For this goal, severe constraints (axioms) must be introduced. Consequence: great risk to lose the correct structure.

Now:
Structural file is admitted to contain 105 and more structures (tcalc=5-10 min). Severe constraints may be removed. Solutions became more reliable

Acceleration of Structure Generation


The Structure Generation algorithm first produces substructures which are then complemented by new bonds until full structures are generated. We suggested that fast 13C chemical shift prediction for incomplete structures would prevent generation of such structural branches that contradict experimental 13C NMR spectrum. The expected result : significant acceleration of the Structure Generation.

StrucEluc as a checker of structural hypotheses. Example 1


Original structure Found from 2D NMR
H3C HO
173.00 66.60 27.60 77.40

Revised by 2D NMR and DFT-calculations of 13C spectrum


CH3
15.00

15.00

O O
107.00

O
66.60
198.60 131.70 15.20

27.60

OH O

O
46.00 71.90 54.50

CH3

O H3C

107.00 46.00 77.40 54.50 71.90 21.00 173.00

145.00 30.10

CH3 OH

21.00

OH

30.10

W.-G. Kim et al. Org. Lett., 2004, 6, 823-826, W. Steglich et al. Org. Lett.,2004, 6, 3175-3177, A. Bagno et al. Chem. Eur. J. 2006, 12, 5514 5525

145.00

198.60 131.70

CH3

15.20

The top of ranked output file found by StrucEluc: k=37176Filter149Remove Dupl.135, tg=1 m 40 s
1 (ID:27)

Revised

2 (ID:30) H3C HO CH3 O O OH

dNN=2.17
H3C O

O HO O O CH3 CH3
dN(13C):

O CH3

OH

2.167 3 (ID:60) CH3 O HO OH O

dN(13C): 2.650

4 (ID:140)

Original
CH3 H3C

dNN=3.08
HO O O O H3C OH CH3 O
dN(13C):

H3C O

dN(13C): 3.075

3.079

StrucEluc as a checker of structural hypotheses. Example 2


M=262, C16H10N2O2
HC HC
128.02 128.07

N C
140.92

HC HC
128.76

128.73

CH

124.81

HC

128.68

C HC
128.07

138.16

HC N

128.73

C HC
124.81

128.80

144.42

134.68

C C
134.68

134.68

O O H

A.Balandina et al, J. Mol. Struct. 791, 2006, 77-81

Structural Hypotheses to be checked by DFT 13C prediction

O O N N H O O N N N O H
+

H N O

O N
-

O OH N

C16H12N2O2

Results of 13C chemical shit predictions by DFT calculations

Struc. A B C

R2 0.4586 0.1458 0.9768

rms 11.62 13.80 1.16

a 1.39 0.76 0.95

sd 12.06 14.32 1.20

MAD 11.39 12.93 7.03

D
E F

0.2231
0.5744 0.0115

20.56
8.89 21.14

1.45
1.33 0.30

21.33
9.22 21.94

13.06
8.92 13.10

Molecular Connectivity Diagram


M=262, C16H10N2O2
HC HC
128.02 128.07

N C
140.92

HC HC
128.76

128.73

CH

124.81

HC

128.68

C HC
128.07

138.16

HC N

128.73

C HC
124.81

128.80

144.42

134.68

C C
134.68

134.68

O O H

A.Balandina et al, J. Mol. Struct. 791, 2006, 77-81

Solution to the problem by Structure Elucidator


Structure Generation and Filtering: k=247Filter16Duplicates4 tg= 1 s 434 ms

1 (ID:14)

HO

2 (ID:1)

3 (ID:8) OH NH O

4 (ID:3) OH

N N

N O O

N O

dI(13 C) : 1.416 13 dN( C) : 1.809

dI(13 C) : 6.003 13 dN( C) : 4.054

dI(13 C) : 7.309 13 dN( C) : 6.313

dI(13 C) : 9.275 13 dN( C) : 7.866

Expected by authors

Linear Regression data for Correct structure


dQ= 6.929 dI= 1.416 dN=1.809

OH N

Data INC NN QM

Adj. RR-squar. squar. 9.32E-01 8.88E-01 9.22E-01

0.97 9.36E-01 0.95 8.95E-01 0.96 9.27E-01

0.9768

Example 3. Inconsistent structural hypotheses were checked by DFT calculations.


Measured accurate mass produced MF =

C27H22N4O3
CH
121.89

CH
117.31

CH
126.91

C
122.16

HC
115.82

C
129.71

N
H2C
38.33

C
154.90

CH2

40.92

N
CH2
44.18

C
157.77

H2C

61.63

A. Balandina et al. Rus. Chem. Bul., Int. Ed., 2006, 55, 2256-2264

Proposed structures for C27H22N4O3


which were checked by DFT calculations

N N

N N

O O

O O

Correct

Proposed structures with different MFs which were checked by DFT calculations.
Experimental MF=C27H22N4O3
N N O

C27H23N4O2
N
+

C27H23N4O2
O

N H

C27H23N4O2

Doublet!
N

N
N
O
+

O 154 N

O N

N O

NH

154 ~sp3!

C27H22N4O2

Structure Generation was run from MCD


CH HC
129.94 128.77

CH CH
129.16

127.30

HC

127.85

CH

129.34

HC

128.77

C CH CH 117.31
129.16

131.61

HC C
144.01

127.30

C CH
129.34

132.72

143.70

HC 121.89 CH 126.91

N C 122.16 C 154.90 C 129.71 N 14 H2C 38.33 H2C 40.92 N 13 C 157.77

C 117.38 O O O N
CH2 61.63

CH 115.82

CH2 44.18

Result: k=4425, tg=0 s 891 ms


1 (ID:43)
127.85 127.30 127.30 129.34 129.34 132.72

2 (ID:38)

3 (ID:17)
129.94 128.77 128.77 129.16 129.16 131.61 38.33 40.92

O
44.18 38.33 N 40.92 61.63

N
127.30 127.85 129.34

O O

44.18

154.90 115.82 129.71 126.91 122.16

N
61.63 157.77

154.90 143.70 117.38 N

O
157.77

127.30 132.72 144.01 129.34 143.70

N
117.38

121.89 117.31

N
144.01

O O
117.38

129.71 122.16 115.82

O N O
61.63 44.18 157.77

129.16 144.01 128.77 131.61 143.70

N
154.90

131.61 129.16 117.31 126.91 129.16 121.89 128.77 128.77 129.94

N
40.92 38.33

117.31 122.16 121.89 129.71 126.91 115.82

129.94

129.16 132.72 129.34 128.77 129.34 127.30 127.30 127.85

dI dN

: 2.740 : 2.278
38.33 40.92 44.18

dI(13C): 3.310 dN(13C): 3.747 5 (ID:28)


127.85 127.30 127.30 129.34 129.34 132.72 143.70 129.16128.77 129.94

dI(13C): 3.535 dN(13C): 4.208 6 (ID:25)


44.18 61.63

4 (ID:30)
115.82 126.91 121.89 117.31 122.16 154.90

129.71

N
40.92 38.33

N O

N O
157.77 61.63

157.77

N
144.01 117.38

O N
143.70

O
144.01131.61 117.38

N
129.71 115.82 126.91

154.90 143.70 129.34 132.72 127.30 117.38 122.16N 144.01 129.34 127.85 127.30

O
131.61 129.16 132.72 129.16 129.34 129.34 128.77 127.30 127.30 127.85

N
154.90

N
61.63 44.18

129.16128.77 117.31 122.16 129.71 121.89 126.91

157.77

117.31 121.89

128.77

O
40.92 38.33

115.82

129.94

N dI(13C): 3.838 dN(13C): 3.924

131.61 129.16 129.16 128.77 128.77 129.94

dI(13C): 3.756 dN(13C): 3.955

dI(13C): 3.799 dN(13C): 4.171

Nonstandard correlations (NSCs)


a=2
H H H

a=1
H

If the axioms upon correlation length are violated, the data become contradictory.

COSY a=1
H

a=2 HMBC

Automatic removing contradictions from 2D NMR data. Case when a=1.


1. Logical analysis of integrated 2D NMR data is performed. Such atoms are detected at which nonstandard connectivities can present. 2. All connectivities at suspicious atoms are lengthened by one bond (a=1). Structure Generation is performed from the modified connectivity set.
S.G.Molodtsov, M.E.Elyashberg, K.A. Blinov et al. J. Chem. Inf. Model. 2004, 44, 1737-1751.

Example of molecule with many NSCs of extreme lengths (a=2-3).

H3C
1 18 5 16 17 9 12 15 19 3 13 7 14 6 11 8

CH3
4 20

CH2
10

H3C
2

CH3

OH
22

HO

m=15, a=1-3

21

Fuzzy Structure Generation. General approach.


N total number of correlations in 2D NMR data. m number of connectivities to be lengthened number of bonds by which connectivities should be lengthened All possible combinations of N connectivities, CNm, are produced and logically analyzed. Unreal (useless) combinations are removed. Structure generation is performed from each of remaining combinations at given a.

M.E.Elyashberg, K.A.Blinov, S.G.Molodtsov et al. J. Chem. Inf. Model. 2007, 47, 1053-1066

Modes of Fuzzy Structure Generation


Program allows 6 modes of Fuzzy Structure Generation. The safest mode: The connectivity lengthening is replaced by connectivity removing (symbolized as =x) at m<15. This mode allows solving the problems for which 2D NMR data contain unknown number of NSCs having unknown lengths.

Example. 15 NSC, m=15, =3


H3C
1 18 5 16 17 9 12 7

H3C
2

The Safest mode: {m<15, a=x} 40,225,345,056 combinations CH3 4 14 are theoretically possible. 20 10,637,725 connectivity 6 11 combinations were used CH2 10 during Structure Generation. 8 Solution: 19 13 k=28289; tg=24 min; OH CH3 22 r=1 15 3

HO

21

10.6 mln attempts of structure generation was made!

About 15 000 of ~ 200 000 natural products posses symmetry.


Peculiarities of structure generation of symmetric molecules from 2D NMR data were not investigated. H2 N

Br HN OH

HO Br

NH

NH2

Structure generation was stopped after 44(!) h of program running.

New algorithm of structure generation reveals symmetry in NMR data. Algorithm is capable of automatic adjusting to generation of symmetric molecules.

Example: C44H72O16, n=60


There are 2 NSCs in HMBC.
FUZZY STRUCTURE GENERATION:

m=015, a=x. RESULT: k=5304174139; tg=4 m 30s; r=1


H3C H3C O O H C 3 O O O O CH3 O O O CH 3 CH3 O O O O O O CH3 O CH3

H3C

CH3

Ionic structures
O N HO NH
+

O
+

H N

H N

S N O

S N H
N
+

H O

N H HO H3C NH

N
OH OH

+ +

Properties of information obtained from 2D NMR data


Information is fuzzy by the nature (2 or 3 bonds between H and C in ). Not all possible correlations are observed in spectra, i.e. information is incomplete. Presence of nonstandard correlations frequently makes information contradictory. Number of NSCs and they lengths are unknown. Signal overlapping leads to appearance of ambiguous correlations. Information is else indefinite.

, ... O, if you knew from which rubbish Poetry grows Anna Akhmatova

To overcome the lack of information, Database Fragments (1.7 mln) or/and Users Fragments are used. Introduction of fragments is necessary IF: 1. Number of observed 2D NMR correlations is markedly smaller than theoretically expected one. 2. Deficit of hydrogen atoms has place. As a result even the theoretically expected number of correlations is too small. Taking this into account an algorithm of fragment implantation into MCD was developed.

Example of Fragment Usage. Symmetric molecule C56H78O12S1,n=69


CH3

tg k
Number of correlations is small.
HC
6.42

1.99

H3C CH3 H2C O H3C H2C


1.12 0.65

0.88

H2C CH
4.29

C C C O CH2
4.13

2.36

CH CH CH2

OH
4.18

C C C CH OH
5.35

CH

1.10 1.60

CH

CH2

HC

1.38

5.76

CH OH
3.73

2.66

1.38 1.60

HO 5.35 HO 3.73 CH CH 2.66 C CH C


1.10

5.76 6.42

CH2 H2C O HO C
4.13

CH C O

CH

HC

HC O CH C CH2
2.36

4.18

HC

CH3
0.65

CH2

CH2 CH3
1.12

H2C

4.29

CH3
0.88

Ashwaganhanolide

H3C

1.99

Fragments were found in DB from 13C NMR search. Number of Found Fragments L=5524.
Fragment # 1 17222

Mol.

Frag.

Solution
960 MCDs were created from the fragment #1 Structure Generation from 960 MCDs: k=960246 tg= 29 m 30 s

Ashwaganhanolide. Output file.


1 (ID:17)

O CH3 HO HO OH S OH

CH3CH3 O O OH

2 (ID:16)

3 (ID:12)

O O O H3C H3C H3C O O O O OH HO CH3 OH OH S HO HO CH3 O

OH CH3 H3C H3C O CH3 HO HO S HO HO

H3C O

HO O O O CH3CH3 OH CH3

CH3 CH3

CH3 CH3 H3C

H3C CH3

CH3 O

13 dA( C): 2.340 (v.10.05) dI(13C): 2.777 dN(13C): 2.584 4 (ID:8)

dA(13C): 2.556 (v.10.05) dI(13C): 2.810 dN(13C): 2.847 5 (ID:1)

dA(13C): 3.013 (v.10.05) dI(13C): 3.091 dN(13C): 3.625 6 (ID:9)

HO HO HO HO S

O CH 3

O O OH CH3 O HO HO H3C H3C O O HO H3C CH3 O S OH OH O CH3 O

OH CH3 HO HO

O O O CH3 CH CH3 3

OH CH3

CH CH3 3

O H3C O O

CH3 CH3

HO HO

S CH3 O

CH3 CH3

CH3 CH3 O O

OH

CH3

H3C HO
dA(13C): 2.976 (v.10.05) dI(13C): 3.487 dN(13C): 3.590

dA(13C): 2.757 (v.10.05) dI(13C): 3.330 dN(13C): 3.024

dA(13C): 3.367 (v.10.05) dI(13C): 3.466 dN(13C): 3.215

C42H2810, n=52 Common Mode, k= 8 1, t= 8


HO O OH OH

HO

O O O

O OH

C44H51NO18, n=63, n(NSC)=8, L=4845, n(MCD) = 188, k=1, t= 4 min


CH3 O O O O O O O O CH3 O CH3

H3C O O HO O CH3 O O H3C O H3C O

O CH3

C43H69NO12, n=56 Common Mode, k=1, t=4 sec


O O

N H3C O O HO O H3C H3C CH3 HO O CH3 CH3 O H3C H3C O CH3 HO O

C52H80N8O8S, n=69 L=13 934, n(MCD)=12, k=49912143; t=6 m, r=1


HC O CH3 H3C N H3C O H3C CH3 N N N O H3C CH3 H3C H3C H3C O N CH3 S N N O N O CH3 CH3 CH3 O H3C O

C62H92O28, n=90 Common Mode, k=514059; t =9 m 32 s, r =1


CH3 HO O HO H3C O OH HO O CH3 O O H3C OH CH3 O O O O OH H3C OH CH3 O O OH O O O CH3 H3C H3C O O OH CH3 O OH OH

79131N3O20 , n=102 Common Mode, k=134749835, t=16 m 34 s, r=1


CH3 H3C OH O O H3C H3C O H3C OH OH OH H N O HO O OH O CH3 CH3 H3C CH3 NH H3C CH3 CH3 H3C H3C H3C H3C CH3 CH3 H C 3 CH3 HN OH OH O HO OH

O OH

Typical examples of medium size structures elucidated by using StrucEluc.


O

O
N H3C

O O
OH

CH3 O

O O
O

O OH H C 3 O OH

CH2 O O CH3 O

O
OH

CH3

O
O O CH3

H3C H3C H3C H3C N H3C H3C H O OH O

CH3 CH3

O O

O OH

Usage of fragments is not panacea for all cases. Possible causes of failures:
Large fragments capable of helping to solve a problem are absent from DB of the system. Appropriate fragments are found or introduced by chemist, but the number of possible shift assignments is so huge (more than 100 million), that CPU resources fail (combinatorial explosion).

Number of MCDs created by program is huge. Structure generation CPU time becomes not acceptable.

22.14

31.28 28.58 28.67

24.81 36.18

H
OH
144.59 115.25 139.47

H3C

14.04

R
O
176.41 151.33

75.17

O
162.02

O HO H2C
64.62 156.19 111.24 136.97 97.20 164.90

126.25

107.28

OH

R R
3

107.41

OH

95.11

139.06 106.87

158.46 101.58

162.41

158.56

OH

OH

C30H28O11 DBE=17

Region of signals from AR and =: 17 singlets (>C<) 5 doublets (>CH-)

To introduce 1,2,3,4,5-AR fragment it is necessary to check 4 mln different shift assignments to carbon atoms of the fragment .

Between two combinatorial explosions


Attempt of structure generation from free atoms (Common Mode) leads to combinatorial explosion (too many structures). Introducing large fragments to overcome the explosion leads to another combinatorial explosion (too many assignments) In this situation User Database can help.

Alkaloids of cryptolepine series showing deficit of hydrogen atoms


Cryptolepicarboline Cryptospirolepine
CH3 N

N N N
N N N H O

N CH3

H3C

C27H18N4, n=31, ncycl =7 DBE=21

C34H24N4O, n=39 , ncycl=9 DBE=25

Alkaloids of cryptolepine series for which signals in 13C 1H NMR are assigned.
O N H N N N CH3 . H3C N CH3 H N O H N

N N . N N H3C N O N H O H N N H H N H3C

User Fragment Data Base (UDB) was created. UDB contains 342 fragments.

Both structures were successfully elucidated with UDB


Cryptolepicarboline Cryptospirolepine
CH3 N

N N N
N N N H O

N CH3

H3C

C27H18N4, n=31, ncycl =7 DBE=21

C34H24N4O, n=39 , ncycl=9 DBE=25

Structure elucidation of cryptospirolepine degradation product.


Sample of this compound was stored by Gary Martin (Pharmacia Inc., USA) in a sealed tube in his garage for 10 years.
CH3 N

N H O N N H3C

LC chromatogram of degradation products (26 peaks).

5
7.60

10
10.09 11.65

15
17.24 18.58 19.08 18.12

13.96 14.69 15.61 15.70 16.59

DP-1

DP-2

20
21.40 22.94 23.51

35 %

16 %

25 30 35

25.10 26.08

34.79

DP-2 separation and spectra registration were performed by several groups in USA.
DP-1 (35%, 1.1 mg), DP-2 (16%, 200 g).
DP-2: solution of 100 g in 150 l of D-DMSO; ampoule 3 mm, =25 , HSQC (17 h), HMBC (17h), 1H-15N HMBC (72 h), sensitivity to 15N is 50 times lower than to 13C - ROESY It was found from MS: MS\MS :MH+=479, C32H22N4O

DP-2. Solution to the problem.


From MS/MS: C32H22N4O 101 fragments were selected from UDB by NMR 13C. 1376 MCDs were created from the fragments

Structure generation from 1376 MCDs. Results: k=78575, tgen = 6 min.

First 8 structures of ranked output file.


1 (ID:113) CH3 N

2 (ID:38) CH3 N

3 (ID:114) H3C

4 (ID:119) CH3 N O N N N N O N

N O N N O N

N CH3 13 : 2.849 (4.434) d A( C) 13 d ( : 4.873 (7.561) F C) 1H) d ( : 0.271 (0.460) A 5 (ID:461) H3C N O N N O N

N CH3 13 : 4.754 (7.446) d A( C) 13 d ( : 5.009 (7.766) F C) 1H) d ( : 0.552 (0.743) A 6 (ID:90)

N N CH3 13 : 5.012 (7.787) d A( C) 13 d ( : 5.733 (9.436) F C) 1H) d ( : 0.443 (0.592) A 7 (ID:422) CH3 13 : 5.431 (9.206) d A( C) 13 d ( : 5.415 (9.269) F C) 1H) d ( : 0.526 (0.680) A 8 (ID:93)

CH3 N O N N N

CH3 N

CH3 N

N O N N CH3 13 : 6.397 (9.617) d A( C) 13 d ( : 5.654 (8.658) F C) 1H) d ( : 0.672 (0.878) A

N CH3 13 : 5.981 (8.981) d A( C) 13 d ( : 6.074 (8.656) F C) 1H) d ( : 0.612 (0.815) A N CH3 13 : 6.190 (9.610) d A( C) 13 d ( : 5.893 (8.972) F C) 1H) d ( : 0.525 (0.666) A

H3C

13 : 6.334 (8.688) d A( C) 13 d ( : 5.740 (8.326) F C) 1H) d ( : 0.630 (0.846) A

COST OF THE VICTORY


Martin, G. E.; Hadden, B. D.; Russell, C. E.; Kaluzny, D. J.; Guido, J.E.; Duholke, W.K; Stiemsma, B. A.; Thamann, T. J.; Crouch, R. C.; Blinov, K. A.; Elyashberg, M. E.; Martirosian, E. R.; Molodtsov, S.G.; Williams, A. J.; Schiff, P. L. Jr. Identification of Degradants of a Complex Alkaloid Using NMR Cryoprobe Technology and ACD/Structure Elucidator. J. Het. Chem. 2002, 39, 1241-1250 .

Iliya Repin. Barge haulers on Volga. 1872

-6. The greatest challenge for CASE systems


Gary Martins group has separated unknown alkaloid -6 of cryptolepine series. Martin, a prominent expert in NMR and the structure elucidation, failed to determine structure of this compound during 10 years (since 90th).

Solution was found using StrucEluc in interactive mode. Initial MCD was transformed into the final one by spectroscopist during 12 hours of program operating.

SOLUTION: k=353266, tgen=2 s

The first 8 structures of the output file.


1
2 (ID:85) 3 (ID:36) 4 (ID:334)

N N

N N CH3

N N N N CH3
dA (13C): 4.449 (6.855) dF(13C): 5.189 (7.377) dA (1H): 0.381 (0.553) d(MS): 0.846 6 (ID:25)

N N N N CH3

H3C N N N N
dA (13C): 4.793 (6.089) dF(13C): 6.073 (7.424) dA (1H): 0.406 (0.544) d(MS): 0.905 8 (ID:92)

13 A( C): 3.540 (4.507) 13 F( C): 5.703 (6.914) 1 A( H): 0.351 (0.507) : 0.905

dA (13C): 4.570 (6.296) dF(13C): 6.181 (7.982) dA (1H): 0.375 (0.565) d(MS): 0.905 7 (ID:232)

N N N N CH3
dA (13C): 5.385 (7.437) dF(13C): 5.704 (7.960) dA (1H): 0.509 (0.684) d(MS): 0.751

N N N N H3C N N CH3 N
dA (13C): 5.424 (7.277) dF(13C): 5.496 (7.263) dA (1H): 0.566 (0.814) d(MS): 0.751

N
13 A( C): 5.342 (8.180) 13 F( C): 5.645 (8.486) 1 A( H): 0.415 (0.617) : 0.905

N CH3
dA (13C): 5.442 (7.451) dF(13C): 6.371 (9.112) dA (1H): 0.492 (0.659) d(MS): 0.905

Spectrum ROESY provided a first criterion for choice of correct structure (r<5). 1 peak 2 peaks OR
2.5
H3C N

2.5 2.5

CH3 N N

5.9

Only one CH3H peak was observed!

Two strongest peaks in MS are 232 and 217. 232+217=M Second criterion: each peak can be assigned to upper or lower part of the molecule.

m/z=232
OR

m/z=217

m/z=217

m/z=232

08/Apr/2003 15:09:39 Generated Molecules Page: 1(1) 1 (ID:86)

Top of the output file

ROE

2 (ID:85)

TC-6

3 (ID:36)

ROE

4 (ID:334)

ROE

H3C N N

N N

N N N N N

N N CH3

N N

N CH3
dA(13C): 3.540 (4.507) dF(13C): 5.703 (6.914) dA(1H): 0.351 (0.507) d(MS): 0.905 5 (ID:35) dA(13C): 4.449 (6.855) dF(13C): 5.189 (7.377) dA(1H): 0.381 (0.553) d(MS): 0.846 6 (ID:25)

N CH3
dA(13C): 4.570 (6.296) dF(13C): 6.181 (7.982) dA(1H): 0.375 (0.565) d(MS): 0.905 7 (ID:232)

dA (13C): 4.793 (6.089) dF(13C): 6.073 (7.424) dA (1H): 0.406 (0.544) d(MS): 0.905 8 (ID:92)

ROE

MS

MS

ROE
N

N N

N N

H3C

N N N

Only structure #2 meets MS and ROESY constraints.

N N
dA(13C): 5.342 (8.180) dF(13C): 5.645 (8.486) dA(1H): 0.415 (0.617) d(MS): 0.905 9 (ID:41)

CH3
dA(13C): 5.385 (7.437) dF(13C): 5.704 (7.960) dA(1H): 0.509 (0.684) d(MS): 0.751 10 (ID:179)

N CH 3

N N CH3
dA(13C): 5.424 (7.277) dF(13C): 5.496 (7.263) dA(1H): 0.566 (0.814) d(MS): 0.751 11 (ID:84) dA (13C): 5.442 (7.451) dF(13C): 6.371 (9.112) dA (1H): 0.492 (0.659) d(MS): 0.905 12 (ID:231)

MS

ROE

MS
N

MS

N N

N N N N

H3C N N N

N N N

N CH3
dA(13C): 5.485 (6.919) dF(13C): 6.405 (7.624) dA(1H): 0.703 (0.996) d(MS): 0.751

N CH3
dA(13C): 5.487 (7.038) dF(13C): 5.658 (7.071) dA(1H): 0.377 (0.533) d(MS): 0.905 dA(13C): 5.676 (7.254) dF(13C): 6.083 (7.563) dA(1H): 0.675 (0.980) d(MS): 0.751

CH3
dA (13C): 5.679 (7.714) dF(13C): 5.266 (7.129) dA (1H): 0.573 (0.797) d(MS): 0.751

The most probable structure of -6


H C
CH3 N N

232

217
N

C31H20N4, n=35, DBE=24, ncycl=8


Blinov, K.A.; Elyashberg, M.E.; Martirosian, E.R.; Molodtsov et al. Magn. Reson. Chem., 2003, 41, 577-584

For the first time, application of ES allowed solving a structural problem, which a prominent expert in NMR spectroscopy and structure elucidation failed to solve.

One more challenge...


MW = 1515.38 Da for (M+H)+
Raw spectra: 1: 13C NMR , 13C NMR DEPT , 1H NMR, 2: 1H/13C HSQC, 1H/13C HMBC, 1H/1H COSY, 1H/1H TOCSY. From 13C NMR: C69 From 1H NMR and 1H/13C HSQC: H66

Fuzzy Structure Generation m=0-15, mg=2, a=1; k=164104, t=30 sec


H2C H3C O N N N O S NH S O HO N NH N HN H C O 2 H3C O O O NH NH2 S O NH S N S N N O N O NH O NH O CH2 NH2

C69 H66 O13 N18 S5 n=106

Determination of relative stereochemistry of identified structures.


Biological activity of substances depends on their stereochemistry.

StrucEluc was enhanced by algorithm of determining the most probable relative stereochemistry of rigid structures. . Stereochemistry is determined using NOESY \ ROESY data. For structures having more than 7 stereocenters, optimization of geometry is performed by means of Genetic Algorithm (GA).

Brevetoxin B
Number of stereocenters: N=23 Number of stereoisomers ~ 8,400 000 CPU time necessary for optimizing geometry of all 8.4 mln stereoisomers ~ 1 month
Configuration of all 23 stereocenters was correctly determined by GA in 2 h 50 m.
H H O HO H3C H O CH3 H CH3 CH3 O O O H O H H H H CH3 O H CH3 O CH2

CH3

H O

3D model against X-ray structure


The X-ray crystal structure of brevetoxin B (yellow) and the 3D model of the best stereoisomer from the final pool (blue) of the stereochemistry determination system are superimposed.

Y. D. Smurnyy, M. E. Elyashberg, K. A. Blinov et al. Tetrahedron, 2005, 61, 99809989

Efficiency of Structure Elucidator System efficiency was proved by structure elucidation of ~300 natural products.
Permanent solving new complicated problems is a basis for creation and further development of the Structure Elucidator.

Other CASE systems


SESAMI (USA) CISOC-SES (USA) LSD (France) COCON (Germany) SENECA (Germany) All system have no Database containing Structures and Fragments with assigned NMR spectra. All systems cannot do with nonstandard correlations. Only ideal 2D NMR data can be processed. Some of these systems are used by authors.
M.E. Elyashberg, A.J. Williams and G.E. Martin. Computer-Assisted Structure Verification and Elucidation Tools in NMR-Based Structure Elucidation. Progress in NMR Spectroscopy, 2008, No 2. Monographic review.

StrucEluc is used in ca. 100 organizations in many countries.


Pfizer Roche Eli Lilly Novartis AstraZeneca Merck Bayer Mitsubishi Chemical Shell Chimie Samsung Electronics
Schering-Plough Microbial Screening Technologies Crompton Corporation MNL Pharma Fujisawa Pharm. Co Amgen Inc Sankyo Co. Ltd Astellas Pharma Inc Biovitrum AB NCI-FRED CANCER INOVACIA SWEDEN Janssen Pharm.

Expert system as a kernel of research center

It should be expected that an expert system similar to Structure Elucidator can serve as a kernel of a research center intended for molecular structure elucidation and investigation.

Expert systems like the StrucEluc will be used widespread in the nearest 5-10 years. They will become a routine tool in laboratories engaged in spectroscopy, organic chemistry, chemistry of natural products and analytical chemistry.

Structure Elucidator Team

Sergey Molodtsov, Mikhail Elyashberg, Tatiyana Churanova, Kirill Blinov

Você também pode gostar