There are five options for structure input format, as follows:
SMILES code was developed by David Weininger
(D. Weininger, J. Chem. Inf. Comput. Sci., 28, 31-36, 1988) to provide
a string code for the input of molecular structure. The user is referred
to this reference and subsequent papers for the description of the SMILES
code and techniques for creation of SMILES code for molecular structures.
The following two structures illustrate the application of SMILES code.
Essentially, the chemical graph is reduced to a tree (noncyclic) graph
by removing one bond for each ring; the atoms between which the bond was
broken are labeled with a number. Branches are enclosed in parentheses.
(For more information on SMILES see
Chapter 3
of the Daylight Theory Manual.)
The SGI, SUN, and LINUX versions of standalone Molconn-Z have the added capability of reading and decoding SMILES files using the Daylight Toolkit SMILES interpreter instead of the OELIB SMILES interpreter. This is an optional feature that requires a run-time SMILES Toolkit license from Daylight Chemical Information Systems, Inc. The format of the Daylight SMILES or OELIB SMILES files is the same. Each record of the SMILES files, which are generally named with the .smi extension, is simply a SMILES string followed by the Molecule Name (space delimited). There is no file termination code. This file format matches what is supported by the Daylight database software and will be a useful option for some sites that have large databases already encoded in this way. The other potential advantage is that the Daylight Toolkit is the defacto standard for interpretation of SMILES codes; which could be a consideration for those who plan to work with large complex SMILES libraries on UNIX computers. In our experience, there have been several problems with OELIB mis-interpreting molecular structure (see Molconn-Z 4.10 Release Notes
EduSoft includes two demo SMILES files: demo1.smi is a single molecule, benzene, and demo2.smi is a database file that contains 100 structures. Note the molecule name/identifier is at the end of each line, in this case the CAS Registry Number.
The File demo2.smi as Supplied With All Versions of Molconn-Z Software:
OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C(=O)COC(=O)C 50-03-3 BrC43C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3Br)(CCC(C4)Cl)C 5337-45-1 O(C5C(C4C(C3C(C1(C(C2C(CC1)(CCC2C(=C)C)C)CC3)C)(CC4)C)(CC5)C)(C)C)C(=O)c6ccccc6 1617-69-2 OC1C(C4C(CC1)(C3=C(C2(C(C(CC2)C(CCCC(C)C)C)(CC3=O)C)C)C(=O)C4)C)(C)C 5346-40-7 OC4CC3C(C2C(C1C(C(CC1)C(O)C)(CC2)C)CC3)(CC4)C 80-92-2 OC4C(C3C(C2C(C1(C(C(CC1)C(CCC(=O)O)C)(CC2=O)C)C)CC3)(CC4)C)(C)C 5346-42-9 OC4C(C3C(C2C(C1(C(C(CC1)C(CCC(=O)O)C)(CC2=O)C)C)C(=O)C3)(CC4)C)(C)C 5399-41-7 O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 470-03-1 O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 470-01-9 O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 126-19-2 O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 126-18-1 O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 77-60-1 OC4CCC3(C2C(C1C(C(CC1)C(=O)C)(CC2)C)CC=C3C4)C 145-13-1 OC4CCC3(C2C(C1C(C(CC1)C(=O)C)(CC2)C)CC=C3C4)C 566-63-2 ClC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C 910-31-6 O4C(=O)C3C51C(CCC(C1)OC(=O)C)(C2=CCC6(C(C2(C3C4=O)C=C5)CCC6C(=O)C)C)C 25495-42-5 OC2C3C(C1C(C(CC1)C(=O)C)(C2)C)CCC4=CC(=O)CCC43C 600-57-7 OC2C3C(C1C(C(CC1)C(=O)C)(C2)C)CCC4=CC(=O)CCC43C 80-75-1 OC4C3(C(C2C(C1CCC(=O)C=C1CC2)CC3)CC4)C 434-22-0 O(C)C(=O)C=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(=O)C3)CC4)C 1474-15-3 OC1C3C(C2C(C1)(C(=CCOC(=O)C)CC2)C)CCC4=CC(=O)CCC43C 5327-59-3 O(C4C1(C(C3C(CC1)c2c(cc(cc2)O)CC3)CC4)C)C(=O)CCC5CCCC5 313-06-4 OC(=O)C(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C 5327-60-6 OC4(C3(C(C2CCC1=CC(=O)CCC1(C2=CC3)C)CC4)C)C 1039-17-4 OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C 1807-02-9 OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C 1043-10-3 O1C32C1CC5(C(C2CCC4=CC(=O)CCC43C)CCC5(O)C)C 1042-33-7 O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC)OC7CC6(C(C5C(C3(C(C(CC3)C4=CC(=O)OC4)(CC5)C)O)CC6)(CC7)C=O)O)C 560-53-2 OC5C(C4C(C3C(C1(C(C2C(CC1)(CCC2C(=C)C)CO)CC3)C)(CC4)C)(CC5)C)(C)C 473-98-3 BrC(=C(C)C)CCC(C1C4(C(CC1)(C3=C(C2(C(C(C(CC2)O)(C)C)CC3)C)CC4)C)C)C 50719-45-4 OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 17608-41-2 OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 516-92-7 OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 80-97-7 OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 360-68-9 C1(CCCC1)C2CCCCC2 1606-08-2 C43(C(C2C(C1(CCC=CC1=CC2)C)CC3)CCC4C(CCCC(C)C)C)C 747-90-0 C1(C(CCC1)C)C2CCCCC2 5405-90-3 S(=O)(=O)(NC(=O)CCC(C4C3(C(C2C(C1(C(CC(CC1)O)CC2)C)CC3O)CC4)C)C)c5ccc(cc5)N 5407-24-9 OC3C4(C(C1C(C2(C(CC1O)CC(CC2)O)C)C3)CCC4C(CCC(=O)O)C)C 81-25-4 N(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)C 1865-62-9 OC4(C3(C(C2C(C1(C(=CC(=O)C=C1)CC2)C)C(=O)C3)CC4C)C)C(=O)CO 1247-42-3 [N+](=O)([O-])c1c(ccc(c1)[N+](=O)[O-])NN=C5CCC4(C3C(C2C(C(CC2)O)(CC3)C)CCC4=C5)C 2347-93-5 N6C1C(OC5(C1C)CCC4C3C(C2(CCC(CC2=CC3)O)C)C(=O)C4=C5C)CC(C6)C 469-59-0 O1C(C(C(C(C1C)O)O)O)OC6CCC5(C4C(C2(C(C(CC2)C3=COC(=O)C=C3)(CC4)C)O)CCC5=C6)C 466-06-8 O1C(CC(C(C1C)O)OC)OC6CC5(C(C4C(C2(C(C(CC2)C3=CC(=O)OC3)(CC4)C)O)CC5)(CC6)C=O)O 508-77-0 O1C(C(C(C(C1CO)O)O)O)OC6CCC5(C4C(C2(C(C(CC2)C3=COC(=O)C=C3)(CC4)C)O)(CC(C5=C6)OC(=O)C)O)C 507-60-8 N81C(C(C7(C(C1)C6(C(C5C2(OC3(C(C2(CCC3OC(=O)c4cc(c(cc4)OC)OC)C)CC5)O)C6)(CC7O)O)O)O)(O)C)CCC(C8)C 71-62-5 O1C(C(C(C(C1CO)O)O)O)OC2C(C(OC(C2O)C)OC7CCC6(C5C(C3(C(C(CC3)C4=COC(=O)C=C4)(CC5)C)O)CCC6=C7)C)O 124-99-2 N71C(C(C6C(C1)C5C(C4C2(OC3(C(C2(CCC3OC(=O)C(O)(CC)C)C)C(C4OC(=O)C)OC(=O)C)O)C5)(C(C6O)OC(=O)C(CC)C)O)(O)C)CCC(C7)C 143-57-7 N71C(C(C6C(C1)C5C(C4C2(OC3(C(C2(CCC3OC(=O)C(O)(C(O)C)C)C)C(C4OC(=O)C)OC(=O)C)O)C5)(C(C6O)OC(=O)C(CC)C)O)(O)C)CCC(C7)C 124-97-0 O1C(CC(C(C1C)OC2OC(C(C(C2)O)O)C)O)OC3C(OC(CC3O)OC8CC7C(C6C(C4(C(C(CC4)C5=CC(=O)OC5)(CC6)C)O)CC7)(CC8)C)C 71-63-6 O1C(C(C(C(C1COC2OC(C(C(C2O)O)O)CO)O)O)O)OC3C(OC(CC3OC)OC8CC7(C(C6C(C4(C(C(CC4)C5=CC(=O)OC5)(CC6)C)O)CC7)(CC8)C=O)O)C 33279-57-1 O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C7C(C5(C(C(CC5)C6=CC(=O)OC6)(CC7)C)O)CC8)(CC9)C)C)C)C 17575-20-1 O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C5C(C6(C(C(C5)O)(C(CC6)C7=CC(=O)OC7)C)O)CC8)(CC9)C)C)C)C 17575-22-3 O1C=C(C=CC1=O)C5C4(C(C3(C(C2(CCC(C=C2C(C3)OC(=O)C)O)C)CC4)O)(CC5)O)C 507-59-5 O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C7C(C5(C(C(C(C5)O)C6=CC(=O)OC6)(CC7)C)O)CC8)(CC9)C)C)C)C 17575-21-2 OC4CCC3(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC=C3C4)C 83-48-7 OC4CCC3(C2C(C1C(C(CC1)C(CCC(C(C)C)CC)C)(CC2)C)CC=C3C4)C 83-46-5 OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C=C 1235-98-9 O(C4CC3C(C2C(C1C(C(CC1)(O)C(=O)C)(CC2)C)CC3)(CC4)C)C(=O)C 5456-44-0 O(C4CCC3(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC=C3C4)C)C(=O)C 4651-48-3 O(C4CC3C(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC3)(CC4)C)C(=O)C 13010-52-1 O(C6CC5C(C4C(C1C(C(CC1)C(CC=C(c2ccccc2)c3ccccc3)C)(CC4)C)CC5)(CC6)C)C(=O)C 4144-29-0 OC4CC3C(C2C(C1C(C(CC1)C(CCC(=O)OC)C)(CC2)C)CC3)(CC4)C 15074-01-8 OC4CC3C(C2C(C1C(C(CC1)C(CCC(=O)OC)C)(CC2)C)CC3)(CC4)C 1249-75-8 OC4(C3(C(C1C(C2(C(CC1)CC(=O)CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)C 3751-02-8 OC4(C3(C(C1C(C2(C(CC1)CC(=O)CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)C 1499-59-8 OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)CCCCCCCCCCCCCCCCC 5432-63-3 OC(=O)CCC(C4C3(C(C1C(C2(C(CC1=O)CC(=O)CC2)C)CC3=O)CC4)C)C 81-23-2 OC3C4(C(C2C(C1(C(CC(CC1)O)CC2)C)C3)CCC4C(CCC(=O)O)C)C 30635-00-8 OC3C4(C(C2C(C1(C(CC(CC1)O)CC2)C)C3)CCC4C(CCC(=O)O)C)C 83-44-3 OC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C 57-88-5 O(C4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C)C(=O)C 604-35-3 ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 128-10-9 ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 60-57-1 ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 72-20-8 OC4(C1(C(C3C(C(C1)O)C2(C(=CC(=O)C=C2)CC3)C)CC4)C)C(=O)CO 50-24-8 OC1C3C(C2C(C1)(C(=CCO)CC2)C)CCC4=CC(=O)CCC43C 3103-13-7 BrC4CC3(C1C(C2C(CC1=O)(C(=CC(=O)OC)CC2)C)CCC3=CC4=O)C 5415-46-3 BrC4CC3(C1C(C2C(CC1O)(C(=CC(=O)OC)CC2)C)CCC3=CC4=O)C 5415-47-4 O=C4CCC3(C2C(C1C(C(CC1)C(C)C=O)(CC2)C)CCC3=C4)C 66289-21-2 O=C4CCC3(C2C(C1C(C(CC1)C(C)C=O)(CC2)C)CCC3=C4)C 3986-89-8 S1C5(NC(C1)C(=O)O)CC4C(C3C(C2C(C(CC2)C(=O)C)(CC3=O)C)CC4)(CC5)C 6293-78-3 OC4(C1(C(C3C(C(C1)O)C2(C(=CC(=O)C=C2)CC3)C)CC4)C)C(=O)COC(=O)CCC(=O)O 1715-33-9 OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C(=O)COC(=O)CCC(=O)O 125-04-2 O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CCC5CCCC5 58-20-8 N1(CCCCC1)C=C(C5C4(C(C3C(C2(CCC(=O)C=C2CC3)C)CC4)CC5)C)C 24377-48-8 O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CC 58769-88-3 O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CC 57-85-2 S=P(OC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C)(OCC)OCC 24352-66-7 N%10C9(OC8C(C7(C(C6C(C1(C(CC(CC1)OC2OC(C(C(C2O)O)OC3OC(C(C(C3OC4OC(C(C(C4O)O)O)CO)OC5OCC(C(C5O)O)O)O)CO)CO)CC6)C)CC7)C8)C)C9C)CCC(C%10)C 17406-45-0 O=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 18485-76-2 O=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 63-05-8 OC4(C3(C(C2C(C1CCC(=O)C=C1CC2)CC3)CC4)C)C#C 68-22-4 OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C#C 434-03-7 OC5C1(C(C4C(CC1)c2c(cc(cc2)OC(=O)c3ccccc3)CC4)CC5)C 50-50-0 O(CC(=O)C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)C 56-47-3 Oc1cc4c(cc1)C2C(C3C(CC2)(C(=O)CC3)C)CC4 53-16-7 OC4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 571-41-5 OC4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 86335-11-7
EduSoft includes a demo MOL2 database file that contains 10 structures.
The First Two Structures in the demo4.mol2 File Supplied With Molconn-Z Software:
# Name: B_ESTRADIOL # Creating user name: gkellogg # Creation time: Tue Nov 9 13:17:55 1993 # Modifying user name: gkellogg # Modification time: Tue Nov 9 13:27:21 1993 @
MOLECULE B_ESTRADIOL 44 47 1 0 0 SMALL USER_CHARGES INVALID_CHARGES @ ATOM 1 C0 3.5269 1.2282 -0.5910 C.3 1 MOL1 -0.3000 2 C1 2.0219 0.8438 -0.1562 C.3 1 MOL1 0.0000 3 C2 1.0537 1.8981 -0.7458 C.3 1 MOL1 -0.2000 4 C3 0.0299 1.4727 -1.7954 C.3 1 MOL1 -0.2000 5 C4 -0.6674 0.2085 -1.3476 C.3 1 MOL1 -0.1000 6 C5 0.3651 -0.9761 -1.1904 C.3 1 MOL1 -0.1000 7 C6 -0.3887 -2.2712 -0.7075 C.3 1 MOL1 -0.2000 8 C7 -1.9470 -2.2001 -0.8458 C.3 1 MOL1 -0.2000 9 C8 -2.5481 -0.9510 -0.2037 C.ar 1 MOL1 0.0000 10 C9 -3.6511 -1.0182 0.6047 C.ar 1 MOL1 -0.1000 11 C10 -4.1448 0.1316 1.2039 C.ar 1 MOL1 0.0300 12 O11 -5.2368 0.0644 2.0110 O.3 1 MOL1 -0.3800 13 C12 -3.5084 1.3467 0.9895 C.ar 1 MOL1 -0.1000 14 C13 -2.3852 1.4139 0.1732 C.ar 1 MOL1 -0.1000 15 C14 -1.8960 0.2944 -0.4257 C.ar 1 MOL1 0.0000 16 C15 1.7503 -0.6620 -0.6221 C.3 1 MOL1 -0.1000 17 C16 2.3448 -1.4886 0.6115 C.3 1 MOL1 -0.2000 18 C17 1.8447 -0.7429 1.8521 C.3 1 MOL1 -0.2000 19 C18 1.9854 0.7486 1.4492 C.3 1 MOL1 -0.0700 20 O19 3.0303 1.3907 2.2055 O.3 1 MOL1 -0.3800 21 H21 3.7606 2.2203 -0.2175 H 1 MOL1 0.1000 22 H22 3.6066 1.2177 -1.6741 H 1 MOL1 0.1000 23 H23 4.2684 0.5450 -0.1910 H 1 MOL1 0.1000 24 H24 0.4635 2.4607 0.0315 H 1 MOL1 0.1000 25 H25 1.5259 2.7971 -1.2309 H 1 MOL1 0.1000 26 H26 0.4426 1.4496 -2.8310 H 1 MOL1 0.1000 27 H27 -0.6977 2.3024 -1.9616 H 1 MOL1 0.1000 28 H28 -1.2830 0.0097 -2.2984 H 1 MOL1 0.1000 29 H29 0.5285 -1.2724 -2.2903 H 1 MOL1 0.1000 30 H30 -0.0612 -3.1901 -1.1806 H 1 MOL1 0.1000 31 H31 -0.2539 -2.4288 0.3800 H 1 MOL1 0.1000 32 H32 -2.3687 -3.0871 -0.3978 H 1 MOL1 0.1000 33 H33 -2.1839 -2.1834 -1.9121 H 1 MOL1 0.1000 34 H34 -4.1287 -1.9704 0.7788 H 1 MOL1 0.1000 35 H35 -5.5691 0.8589 2.4352 H 1 MOL1 0.3500 36 H36 -3.8869 2.2397 1.4650 H 1 MOL1 0.1000 37 H37 -1.9092 2.3662 0.0343 H 1 MOL1 0.1000 38 H38 2.5108 -1.0804 -1.3610 H 1 MOL1 0.1000 39 H39 2.0677 -2.5309 0.5916 H 1 MOL1 0.1000 40 H40 3.4400 -1.5052 0.6368 H 1 MOL1 0.1000 41 H41 0.8164 -1.0696 2.0500 H 1 MOL1 0.1000 42 H42 2.4339 -1.0074 2.7284 H 1 MOL1 0.1000 43 H43 1.0919 1.2266 1.8949 H 1 MOL1 0.1000 44 H44 3.8454 0.8990 2.0823 H 1 MOL1 0.3500 @ BOND 1 1 2 1 2 1 21 1 3 1 22 1 4 1 23 1 5 2 3 1 6 2 16 1 7 2 19 1 8 3 4 1 9 3 24 1 10 3 25 1 11 4 5 1 12 4 26 1 13 4 27 1 14 5 6 1 15 5 15 1 16 5 28 1 17 6 7 1 18 6 16 1 19 6 29 1 20 7 8 1 21 7 30 1 22 7 31 1 23 8 9 1 24 8 32 1 25 8 33 1 26 9 10 ar 27 9 15 ar 28 10 11 ar 29 10 34 1 30 11 12 1 31 11 13 ar 32 12 35 1 33 13 14 ar 34 13 36 1 35 14 15 ar 36 14 37 1 37 16 17 1 38 16 38 1 39 17 18 1 40 17 39 1 41 17 40 1 42 18 19 1 43 18 41 1 44 18 42 1 45 19 20 1 46 19 43 1 47 20 44 1 @ SUBSTRUCTURE 1 MOL1 1 TEMP 0 **** **** 0 ROOT # Name: ACETYLCHOLINE # Creating user name: gkellogg # Creation time: Tue Nov 9 13:28:28 1993 # Modifying user name: gkellogg # Modification time: Tue Nov 9 13:30:31 1993 @ MOLECULE ACETYLCHOLINE 26 25 1 0 0 SMALL USER_CHARGES INVALID_CHARGES @ ATOM 1 C0 -4.3973 -0.2463 0.5026 C.3 1 MOL1 -0.3000 2 C1 -2.9663 -0.5262 0.1201 C.2 1 MOL1 0.4100 3 O2 -2.4730 -1.6120 0.3866 O.2 1 MOL1 -0.3800 4 O3 -2.2053 0.4016 -0.5122 O.3 1 MOL1 -0.1800 5 C4 -0.8689 0.1813 -0.8888 C.3 1 MOL1 -0.0500 6 C5 0.1357 0.3156 0.2722 C.3 1 MOL1 0.2200 7 N6 1.5948 0.1395 0.0157 N.4 1 MOL1 -0.6800 8 C7 2.3386 0.3304 1.3574 C.3 1 MOL1 0.1200 9 C8 2.1496 1.1730 -0.9461 C.3 1 MOL1 0.1200 10 C9 1.9383 -1.2514 -0.4837 C.3 1 MOL1 0.1200 11 H11 -4.6967 0.7455 0.1659 H 1 MOL1 0.1000 12 H12 -5.0472 -0.9927 0.0417 H 1 MOL1 0.1000 13 H13 -4.5015 -0.3041 1.5875 H 1 MOL1 0.1000 14 H14 -0.5718 0.9172 -1.6399 H 1 MOL1 0.1000 15 H15 -0.7202 -0.8022 -1.3248 H 1 MOL1 0.1000 16 H16 -0.2368 -0.3946 1.0400 H 1 MOL1 0.1000 17 H17 -0.0872 1.3068 0.7218 H 1 MOL1 0.1000 18 H18 2.1823 1.3190 1.8019 H 1 MOL1 0.1000 19 H19 3.4141 0.2178 1.2571 H 1 MOL1 0.1000 20 H20 2.0340 -0.3870 2.1268 H 1 MOL1 0.1000 21 H21 1.7384 1.0660 -1.9552 H 1 MOL1 0.1000 22 H22 1.9316 2.2021 -0.6474 H 1 MOL1 0.1000 23 H23 3.2344 1.1254 -1.0896 H 1 MOL1 0.1000 24 H24 1.5616 -2.0481 0.1641 H 1 MOL1 0.1000 25 H25 3.0109 -1.4412 -0.5993 H 1 MOL1 0.1000 26 H26 1.5188 -1.4541 -1.4741 H 1 MOL1 0.1000 @ BOND 1 1 2 1 2 1 11 1 3 1 12 1 4 1 13 1 5 2 3 2 6 2 4 1 7 4 5 1 8 5 6 1 9 5 14 1 10 5 15 1 11 6 7 1 12 6 16 1 13 6 17 1 14 7 8 1 15 7 9 1 16 7 10 1 17 8 18 1 18 8 19 1 19 8 20 1 20 9 21 1 21 9 22 1 22 9 23 1 23 10 24 1 24 10 25 1 25 10 26 1 @ SUBSTRUCTURE 1 MOL1 1 TEMP 0 **** **** 0 ROOT
@<TRIPOS>MOLECULE
Sybyl is developed and distributed by
Tripos, Inc
St. Louis, MO
These file formats (SDF and MOL) are carefully described in A. Dalby, J. G. Nourse, et al., J. Chem. Inf. Comput. Sci., 32, 244-255 (1992) or online at www.mdl.com/solutions/white_papers/ctfile_formats.jsp.
The use of the SDFile or Molfile format produced by MDL software is easily done. The user first produces the desired molecule files by using MDL software in its usual manner (or converting with a suitable conversion tool like BABEL). The MOL file is a single molecule with no extraneous information, but the SDF files are essentially MOL files that terminate with a line "M END" and then may include additional information like BOILING POINT, MELTING POINT, etc, and there can be any number of molecules in this SDF file which are separated by blank lines and a line with "$$$$". See the example below and the reference given above. The whole SDFile is terminated with a blank record.
For example, included with Molconn-Z are two SDF files, demo3.sdf which contains 50 anonymous structures, and demo5.sdf which contains 12 common structures. Also included are 2 MOL files demo6a.mol which is phenol and demo6b.mol which is a substituted phenol.
The First Two Structures in the demo5.sdf file Supplied with Molconn-Z Software:
Benzoic Acid ChemDraw02260010222D 9 9 0 0 0 0 0 0 0 0999 V2000 -2.4700 1.3000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.7200 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7800 -1.0625 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2200 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 -1.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 -1.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 1.2975 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 1.2975 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 4 9 2 0 0 0 0 M END > 25
249.2 > 25 122.4 > 25 Benzenecarboxylic acid > 25 3-04-00 $$$$ m-methylbenzoic acid ChemDraw02250014002D 10 10 0 0 0 0 0 0 0 0999 V2000 -2.4700 1.9500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.7200 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7800 -0.4125 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2200 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 -0.6500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 -0.6500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 1.9475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 1.9475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 -1.9525 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 4 9 2 0 0 0 0 6 10 1 0 0 0 0 M END > 25 263 > 25 111-113 > 25 m-Toluic acid > 25 3-04-00 $$$$
(Note "blank" record to terminate file!!!)
The Phenol Structure in the demo6a.mol file Supplied with Molconn-Z Software:
PHENOL JFMACCS 8302248414282D 1 0.00213 0.00000 0 JF FOR PROGRAM MOLCONNZ 7 7 0 0 0 0.7943 -0.2132 0.0000 C 0 0 0 0 0 0.0023 -1.5022 0.0000 C 0 0 0 0 0 -1.5284 -1.4655 0.0000 C 0 0 0 0 0 -2.2648 -0.1072 0.0000 C 0 0 0 0 0 -1.4690 1.1987 0.0000 C 0 0 0 0 0 0.0565 1.1609 0.0000 C 0 0 0 0 0 2.3413 -0.2625 0.0000 O 0 0 0 0 0 1 2 2 0 0 0 2 3 1 0 0 0 3 4 2 0 0 0 4 5 1 0 0 0 5 6 2 0 0 0 6 1 1 0 0 0 1 7 1 0 0 0
MDL Information Systems, Inc.
San Leandro, CA