CHAPTER 5

Input File Formats

The Molconn-Z software package has provision for four formats of files (SMILES, SDF, MOL, MOL2) for the input of molecule structure. The flow of information from input structure files to output is described in Chapter 4. The input structure can be interpreted by either Daylight's SMILES toolkit (a separate license is required from Daylight), or by OpenEye Software's OELIB (open source). The type of input is defined in the control file with the keyword option, INPUTFORMAT (options: daylightsmiles, oelibsmiles, oelibsdf, oelibmol oelibmol2). The input molecular structure information is contained in the file that is provided as the second argument to the molconnz command, <name of input structure file>. With each of these file formats, one or more molecules can be contained within the file.

There are five options for structure input format, as follows:

  1. SMILES using Daylight Toolkit (unix versions - optional)
  2. SMILES code using OELIB
  3. Tripos MOL2 (Sybyl) format using OELIB
  4. MDL Informations Systems, Inc. MOL format using OELIB
  5. MDL Informations Systems, Inc. SDFile format using OELIB
The input file can be created or obtained in several ways:
    using a text editor and entering directly the necessary information as described below for SMILES, MOL2, MOL or SDF structure information;
    using a graphic type input or database which also produces a connection table which corresponds to one of the formats described below.
    obtaining the connection table from a preexisting database and converting it to the format described below.
The three formats are briefly described in subsequent sections. However, no attempt is made here to give a complete description of the particular format. Rather, our purpose is to illustrate how such formats may be utilized in the Molconn-Z software. For specific information about each of the formats, the user is directed to the appropriate company representative or literature source.

    Use of SMILES Input Format

    Molconn-Z has two options for using SMILES input: one from Daylight Chemical Information Systems, Inc. (INPUTFORMAT keyword daylightsmiles) and one from OpenEye Software (INPUTFORMAT keyword oelibsmiles). These are ASCII files that contain one or more structures (one per line) which includes a special linear coding segment followed by the molecule name. The source of SMILES strings could be an existing database, something you create with a text editor, or by converting another file using a conversion tool like BABEL (or OELIB).

    SMILES code was developed by David Weininger (D. Weininger, J. Chem. Inf. Comput. Sci., 28, 31-36, 1988) to provide a string code for the input of molecular structure. The user is referred to this reference and subsequent papers for the description of the SMILES code and techniques for creation of SMILES code for molecular structures. The following two structures illustrate the application of SMILES code. Essentially, the chemical graph is reduced to a tree (noncyclic) graph by removing one bond for each ring; the atoms between which the bond was broken are labeled with a number. Branches are enclosed in parentheses.  (For more information on SMILES see Chapter 3 of the Daylight Theory Manual.)


    The SGI, SUN, and LINUX versions of standalone Molconn-Z have the added capability of reading and decoding SMILES files using the Daylight Toolkit SMILES interpreter instead of the OELIB SMILES interpreter. This is an optional feature that requires a run-time SMILES Toolkit license from Daylight Chemical Information Systems, Inc. The format of the Daylight SMILES or OELIB SMILES files is the same. Each record of the SMILES files, which are generally named with the .smi extension, is simply a SMILES string followed by the Molecule Name (space delimited). There is no file termination code. This file format matches what is supported by the Daylight database software and will be a useful option for some sites that have large databases already encoded in this way. The other potential advantage is that the Daylight Toolkit is the defacto standard for interpretation of SMILES codes; which could be a consideration for those who plan to work with large complex SMILES libraries on UNIX computers. In our experience, there have been several problems with OELIB mis-interpreting molecular structure (see Molconn-Z 4.10 Release Notes

    EduSoft includes two demo SMILES files: demo1.smi is a single molecule, benzene, and demo2.smi is a database file that contains 100 structures. Note the molecule name/identifier is at the end of each line, in this case the CAS Registry Number.

    The File demo2.smi as Supplied With All Versions of Molconn-Z Software:

    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C(=O)COC(=O)C 50-03-3
    BrC43C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3Br)(CCC(C4)Cl)C 5337-45-1
    O(C5C(C4C(C3C(C1(C(C2C(CC1)(CCC2C(=C)C)C)CC3)C)(CC4)C)(CC5)C)(C)C)C(=O)c6ccccc6 1617-69-2
    OC1C(C4C(CC1)(C3=C(C2(C(C(CC2)C(CCCC(C)C)C)(CC3=O)C)C)C(=O)C4)C)(C)C 5346-40-7
    OC4CC3C(C2C(C1C(C(CC1)C(O)C)(CC2)C)CC3)(CC4)C 80-92-2
    OC4C(C3C(C2C(C1(C(C(CC1)C(CCC(=O)O)C)(CC2=O)C)C)CC3)(CC4)C)(C)C 5346-42-9
    OC4C(C3C(C2C(C1(C(C(CC1)C(CCC(=O)O)C)(CC2=O)C)C)C(=O)C3)(CC4)C)(C)C 5399-41-7
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 470-03-1
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 470-01-9
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 126-19-2
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 126-18-1
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 77-60-1
    OC4CCC3(C2C(C1C(C(CC1)C(=O)C)(CC2)C)CC=C3C4)C 145-13-1
    OC4CCC3(C2C(C1C(C(CC1)C(=O)C)(CC2)C)CC=C3C4)C 566-63-2
    ClC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C 910-31-6
    O4C(=O)C3C51C(CCC(C1)OC(=O)C)(C2=CCC6(C(C2(C3C4=O)C=C5)CCC6C(=O)C)C)C 25495-42-5
    OC2C3C(C1C(C(CC1)C(=O)C)(C2)C)CCC4=CC(=O)CCC43C 600-57-7
    OC2C3C(C1C(C(CC1)C(=O)C)(C2)C)CCC4=CC(=O)CCC43C 80-75-1
    OC4C3(C(C2C(C1CCC(=O)C=C1CC2)CC3)CC4)C 434-22-0
    O(C)C(=O)C=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(=O)C3)CC4)C 1474-15-3
    OC1C3C(C2C(C1)(C(=CCOC(=O)C)CC2)C)CCC4=CC(=O)CCC43C 5327-59-3
    O(C4C1(C(C3C(CC1)c2c(cc(cc2)O)CC3)CC4)C)C(=O)CCC5CCCC5 313-06-4
    OC(=O)C(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C 5327-60-6
    OC4(C3(C(C2CCC1=CC(=O)CCC1(C2=CC3)C)CC4)C)C 1039-17-4
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C 1807-02-9
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C 1043-10-3
    O1C32C1CC5(C(C2CCC4=CC(=O)CCC43C)CCC5(O)C)C 1042-33-7
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC)OC7CC6(C(C5C(C3(C(C(CC3)C4=CC(=O)OC4)(CC5)C)O)CC6)(CC7)C=O)O)C 560-53-2
    OC5C(C4C(C3C(C1(C(C2C(CC1)(CCC2C(=C)C)CO)CC3)C)(CC4)C)(CC5)C)(C)C 473-98-3
    BrC(=C(C)C)CCC(C1C4(C(CC1)(C3=C(C2(C(C(C(CC2)O)(C)C)CC3)C)CC4)C)C)C 50719-45-4
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 17608-41-2
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 516-92-7
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 80-97-7
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 360-68-9
    C1(CCCC1)C2CCCCC2 1606-08-2
    C43(C(C2C(C1(CCC=CC1=CC2)C)CC3)CCC4C(CCCC(C)C)C)C 747-90-0
    C1(C(CCC1)C)C2CCCCC2 5405-90-3
    S(=O)(=O)(NC(=O)CCC(C4C3(C(C2C(C1(C(CC(CC1)O)CC2)C)CC3O)CC4)C)C)c5ccc(cc5)N 5407-24-9
    OC3C4(C(C1C(C2(C(CC1O)CC(CC2)O)C)C3)CCC4C(CCC(=O)O)C)C 81-25-4
    N(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)C 1865-62-9
    OC4(C3(C(C2C(C1(C(=CC(=O)C=C1)CC2)C)C(=O)C3)CC4C)C)C(=O)CO 1247-42-3
    [N+](=O)([O-])c1c(ccc(c1)[N+](=O)[O-])NN=C5CCC4(C3C(C2C(C(CC2)O)(CC3)C)CCC4=C5)C 2347-93-5
    N6C1C(OC5(C1C)CCC4C3C(C2(CCC(CC2=CC3)O)C)C(=O)C4=C5C)CC(C6)C 469-59-0
    O1C(C(C(C(C1C)O)O)O)OC6CCC5(C4C(C2(C(C(CC2)C3=COC(=O)C=C3)(CC4)C)O)CCC5=C6)C 466-06-8
    O1C(CC(C(C1C)O)OC)OC6CC5(C(C4C(C2(C(C(CC2)C3=CC(=O)OC3)(CC4)C)O)CC5)(CC6)C=O)O 508-77-0
    O1C(C(C(C(C1CO)O)O)O)OC6CCC5(C4C(C2(C(C(CC2)C3=COC(=O)C=C3)(CC4)C)O)(CC(C5=C6)OC(=O)C)O)C 507-60-8
    N81C(C(C7(C(C1)C6(C(C5C2(OC3(C(C2(CCC3OC(=O)c4cc(c(cc4)OC)OC)C)CC5)O)C6)(CC7O)O)O)O)(O)C)CCC(C8)C 71-62-5
    O1C(C(C(C(C1CO)O)O)O)OC2C(C(OC(C2O)C)OC7CCC6(C5C(C3(C(C(CC3)C4=COC(=O)C=C4)(CC5)C)O)CCC6=C7)C)O 124-99-2
    N71C(C(C6C(C1)C5C(C4C2(OC3(C(C2(CCC3OC(=O)C(O)(CC)C)C)C(C4OC(=O)C)OC(=O)C)O)C5)(C(C6O)OC(=O)C(CC)C)O)(O)C)CCC(C7)C 143-57-7
    N71C(C(C6C(C1)C5C(C4C2(OC3(C(C2(CCC3OC(=O)C(O)(C(O)C)C)C)C(C4OC(=O)C)OC(=O)C)O)C5)(C(C6O)OC(=O)C(CC)C)O)(O)C)CCC(C7)C 124-97-0
    O1C(CC(C(C1C)OC2OC(C(C(C2)O)O)C)O)OC3C(OC(CC3O)OC8CC7C(C6C(C4(C(C(CC4)C5=CC(=O)OC5)(CC6)C)O)CC7)(CC8)C)C 71-63-6
    O1C(C(C(C(C1COC2OC(C(C(C2O)O)O)CO)O)O)O)OC3C(OC(CC3OC)OC8CC7(C(C6C(C4(C(C(CC4)C5=CC(=O)OC5)(CC6)C)O)CC7)(CC8)C=O)O)C 33279-57-1
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C7C(C5(C(C(CC5)C6=CC(=O)OC6)(CC7)C)O)CC8)(CC9)C)C)C)C 17575-20-1
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C5C(C6(C(C(C5)O)(C(CC6)C7=CC(=O)OC7)C)O)CC8)(CC9)C)C)C)C 17575-22-3
    O1C=C(C=CC1=O)C5C4(C(C3(C(C2(CCC(C=C2C(C3)OC(=O)C)O)C)CC4)O)(CC5)O)C 507-59-5
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C7C(C5(C(C(C(C5)O)C6=CC(=O)OC6)(CC7)C)O)CC8)(CC9)C)C)C)C 17575-21-2
    OC4CCC3(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC=C3C4)C 83-48-7
    OC4CCC3(C2C(C1C(C(CC1)C(CCC(C(C)C)CC)C)(CC2)C)CC=C3C4)C 83-46-5
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C=C 1235-98-9
    O(C4CC3C(C2C(C1C(C(CC1)(O)C(=O)C)(CC2)C)CC3)(CC4)C)C(=O)C 5456-44-0
    O(C4CCC3(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC=C3C4)C)C(=O)C 4651-48-3
    O(C4CC3C(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC3)(CC4)C)C(=O)C 13010-52-1
    O(C6CC5C(C4C(C1C(C(CC1)C(CC=C(c2ccccc2)c3ccccc3)C)(CC4)C)CC5)(CC6)C)C(=O)C 4144-29-0
    OC4CC3C(C2C(C1C(C(CC1)C(CCC(=O)OC)C)(CC2)C)CC3)(CC4)C 15074-01-8
    OC4CC3C(C2C(C1C(C(CC1)C(CCC(=O)OC)C)(CC2)C)CC3)(CC4)C 1249-75-8
    OC4(C3(C(C1C(C2(C(CC1)CC(=O)CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)C 3751-02-8
    OC4(C3(C(C1C(C2(C(CC1)CC(=O)CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)C 1499-59-8
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)CCCCCCCCCCCCCCCCC 5432-63-3
    OC(=O)CCC(C4C3(C(C1C(C2(C(CC1=O)CC(=O)CC2)C)CC3=O)CC4)C)C 81-23-2
    OC3C4(C(C2C(C1(C(CC(CC1)O)CC2)C)C3)CCC4C(CCC(=O)O)C)C 30635-00-8
    OC3C4(C(C2C(C1(C(CC(CC1)O)CC2)C)C3)CCC4C(CCC(=O)O)C)C 83-44-3
    OC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C 57-88-5
    O(C4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C)C(=O)C 604-35-3
    ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 128-10-9
    ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 60-57-1
    ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 72-20-8
    OC4(C1(C(C3C(C(C1)O)C2(C(=CC(=O)C=C2)CC3)C)CC4)C)C(=O)CO 50-24-8
    OC1C3C(C2C(C1)(C(=CCO)CC2)C)CCC4=CC(=O)CCC43C 3103-13-7
    BrC4CC3(C1C(C2C(CC1=O)(C(=CC(=O)OC)CC2)C)CCC3=CC4=O)C 5415-46-3
    BrC4CC3(C1C(C2C(CC1O)(C(=CC(=O)OC)CC2)C)CCC3=CC4=O)C 5415-47-4
    O=C4CCC3(C2C(C1C(C(CC1)C(C)C=O)(CC2)C)CCC3=C4)C 66289-21-2
    O=C4CCC3(C2C(C1C(C(CC1)C(C)C=O)(CC2)C)CCC3=C4)C 3986-89-8
    S1C5(NC(C1)C(=O)O)CC4C(C3C(C2C(C(CC2)C(=O)C)(CC3=O)C)CC4)(CC5)C 6293-78-3
    OC4(C1(C(C3C(C(C1)O)C2(C(=CC(=O)C=C2)CC3)C)CC4)C)C(=O)COC(=O)CCC(=O)O 1715-33-9
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C(=O)COC(=O)CCC(=O)O 125-04-2
    O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CCC5CCCC5 58-20-8
    N1(CCCCC1)C=C(C5C4(C(C3C(C2(CCC(=O)C=C2CC3)C)CC4)CC5)C)C 24377-48-8
    O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CC 58769-88-3
    O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CC 57-85-2
    S=P(OC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C)(OCC)OCC 24352-66-7
    N%10C9(OC8C(C7(C(C6C(C1(C(CC(CC1)OC2OC(C(C(C2O)O)OC3OC(C(C(C3OC4OC(C(C(C4O)O)O)CO)OC5OCC(C(C5O)O)O)O)CO)CO)CC6)C)CC7)C8)C)C9C)CCC(C%10)C 17406-45-0
    O=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 18485-76-2
    O=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 63-05-8
    OC4(C3(C(C2C(C1CCC(=O)C=C1CC2)CC3)CC4)C)C#C 68-22-4
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C#C 434-03-7
    OC5C1(C(C4C(CC1)c2c(cc(cc2)OC(=O)c3ccccc3)CC4)CC5)C 50-50-0
    O(CC(=O)C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)C 56-47-3
    Oc1cc4c(cc1)C2C(C3C(CC2)(C(=O)CC3)C)CC4 53-16-7
    OC4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 571-41-5
    OC4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 86335-11-7

    Use of MOL2 Input Format

    Reading the Tripos MOL2 format is provided by OELIB-linked software. The MOL2 file structure is described at www.tripos.com/services/mol2/index.html. The keyword INPUTFORMAT "oelibmol2"is required for this file format. Although SMILES files take more time to interpret, MOL2 files are much more complex than SMILES files and can contain 3-D coordinates, although these coordinates are not required by Molconn-Z. A typical source of a MOL2 file would be Tripos' Sybyl, although other programs can produce MOL2 files. MOL2 files could be created with a text editor, and can be converted from other file types using  program like BABEL.

    EduSoft includes a demo MOL2 database file that contains 10 structures.

    The First Two Structures in the demo4.mol2 File Supplied With Molconn-Z Software:

    # Name: B_ESTRADIOL # Creating user name: gkellogg # Creation time: Tue Nov 9 13:17:55 1993 # Modifying user name: gkellogg # Modification time: Tue Nov 9 13:27:21 1993 @MOLECULE B_ESTRADIOL 44 47 1 0 0 SMALL USER_CHARGES INVALID_CHARGES @ATOM 1 C0 3.5269 1.2282 -0.5910 C.3 1 MOL1 -0.3000 2 C1 2.0219 0.8438 -0.1562 C.3 1 MOL1 0.0000 3 C2 1.0537 1.8981 -0.7458 C.3 1 MOL1 -0.2000 4 C3 0.0299 1.4727 -1.7954 C.3 1 MOL1 -0.2000 5 C4 -0.6674 0.2085 -1.3476 C.3 1 MOL1 -0.1000 6 C5 0.3651 -0.9761 -1.1904 C.3 1 MOL1 -0.1000 7 C6 -0.3887 -2.2712 -0.7075 C.3 1 MOL1 -0.2000 8 C7 -1.9470 -2.2001 -0.8458 C.3 1 MOL1 -0.2000 9 C8 -2.5481 -0.9510 -0.2037 C.ar 1 MOL1 0.0000 10 C9 -3.6511 -1.0182 0.6047 C.ar 1 MOL1 -0.1000 11 C10 -4.1448 0.1316 1.2039 C.ar 1 MOL1 0.0300 12 O11 -5.2368 0.0644 2.0110 O.3 1 MOL1 -0.3800 13 C12 -3.5084 1.3467 0.9895 C.ar 1 MOL1 -0.1000 14 C13 -2.3852 1.4139 0.1732 C.ar 1 MOL1 -0.1000 15 C14 -1.8960 0.2944 -0.4257 C.ar 1 MOL1 0.0000 16 C15 1.7503 -0.6620 -0.6221 C.3 1 MOL1 -0.1000 17 C16 2.3448 -1.4886 0.6115 C.3 1 MOL1 -0.2000 18 C17 1.8447 -0.7429 1.8521 C.3 1 MOL1 -0.2000 19 C18 1.9854 0.7486 1.4492 C.3 1 MOL1 -0.0700 20 O19 3.0303 1.3907 2.2055 O.3 1 MOL1 -0.3800 21 H21 3.7606 2.2203 -0.2175 H 1 MOL1 0.1000 22 H22 3.6066 1.2177 -1.6741 H 1 MOL1 0.1000 23 H23 4.2684 0.5450 -0.1910 H 1 MOL1 0.1000 24 H24 0.4635 2.4607 0.0315 H 1 MOL1 0.1000 25 H25 1.5259 2.7971 -1.2309 H 1 MOL1 0.1000 26 H26 0.4426 1.4496 -2.8310 H 1 MOL1 0.1000 27 H27 -0.6977 2.3024 -1.9616 H 1 MOL1 0.1000 28 H28 -1.2830 0.0097 -2.2984 H 1 MOL1 0.1000 29 H29 0.5285 -1.2724 -2.2903 H 1 MOL1 0.1000 30 H30 -0.0612 -3.1901 -1.1806 H 1 MOL1 0.1000 31 H31 -0.2539 -2.4288 0.3800 H 1 MOL1 0.1000 32 H32 -2.3687 -3.0871 -0.3978 H 1 MOL1 0.1000 33 H33 -2.1839 -2.1834 -1.9121 H 1 MOL1 0.1000 34 H34 -4.1287 -1.9704 0.7788 H 1 MOL1 0.1000 35 H35 -5.5691 0.8589 2.4352 H 1 MOL1 0.3500 36 H36 -3.8869 2.2397 1.4650 H 1 MOL1 0.1000 37 H37 -1.9092 2.3662 0.0343 H 1 MOL1 0.1000 38 H38 2.5108 -1.0804 -1.3610 H 1 MOL1 0.1000 39 H39 2.0677 -2.5309 0.5916 H 1 MOL1 0.1000 40 H40 3.4400 -1.5052 0.6368 H 1 MOL1 0.1000 41 H41 0.8164 -1.0696 2.0500 H 1 MOL1 0.1000 42 H42 2.4339 -1.0074 2.7284 H 1 MOL1 0.1000 43 H43 1.0919 1.2266 1.8949 H 1 MOL1 0.1000 44 H44 3.8454 0.8990 2.0823 H 1 MOL1 0.3500 @BOND 1 1 2 1 2 1 21 1 3 1 22 1 4 1 23 1 5 2 3 1 6 2 16 1 7 2 19 1 8 3 4 1 9 3 24 1 10 3 25 1 11 4 5 1 12 4 26 1 13 4 27 1 14 5 6 1 15 5 15 1 16 5 28 1 17 6 7 1 18 6 16 1 19 6 29 1 20 7 8 1 21 7 30 1 22 7 31 1 23 8 9 1 24 8 32 1 25 8 33 1 26 9 10 ar 27 9 15 ar 28 10 11 ar 29 10 34 1 30 11 12 1 31 11 13 ar 32 12 35 1 33 13 14 ar 34 13 36 1 35 14 15 ar 36 14 37 1 37 16 17 1 38 16 38 1 39 17 18 1 40 17 39 1 41 17 40 1 42 18 19 1 43 18 41 1 44 18 42 1 45 19 20 1 46 19 43 1 47 20 44 1 @SUBSTRUCTURE 1 MOL1 1 TEMP 0 **** **** 0 ROOT # Name: ACETYLCHOLINE # Creating user name: gkellogg # Creation time: Tue Nov 9 13:28:28 1993 # Modifying user name: gkellogg # Modification time: Tue Nov 9 13:30:31 1993 @MOLECULE ACETYLCHOLINE 26 25 1 0 0 SMALL USER_CHARGES INVALID_CHARGES @ATOM 1 C0 -4.3973 -0.2463 0.5026 C.3 1 MOL1 -0.3000 2 C1 -2.9663 -0.5262 0.1201 C.2 1 MOL1 0.4100 3 O2 -2.4730 -1.6120 0.3866 O.2 1 MOL1 -0.3800 4 O3 -2.2053 0.4016 -0.5122 O.3 1 MOL1 -0.1800 5 C4 -0.8689 0.1813 -0.8888 C.3 1 MOL1 -0.0500 6 C5 0.1357 0.3156 0.2722 C.3 1 MOL1 0.2200 7 N6 1.5948 0.1395 0.0157 N.4 1 MOL1 -0.6800 8 C7 2.3386 0.3304 1.3574 C.3 1 MOL1 0.1200 9 C8 2.1496 1.1730 -0.9461 C.3 1 MOL1 0.1200 10 C9 1.9383 -1.2514 -0.4837 C.3 1 MOL1 0.1200 11 H11 -4.6967 0.7455 0.1659 H 1 MOL1 0.1000 12 H12 -5.0472 -0.9927 0.0417 H 1 MOL1 0.1000 13 H13 -4.5015 -0.3041 1.5875 H 1 MOL1 0.1000 14 H14 -0.5718 0.9172 -1.6399 H 1 MOL1 0.1000 15 H15 -0.7202 -0.8022 -1.3248 H 1 MOL1 0.1000 16 H16 -0.2368 -0.3946 1.0400 H 1 MOL1 0.1000 17 H17 -0.0872 1.3068 0.7218 H 1 MOL1 0.1000 18 H18 2.1823 1.3190 1.8019 H 1 MOL1 0.1000 19 H19 3.4141 0.2178 1.2571 H 1 MOL1 0.1000 20 H20 2.0340 -0.3870 2.1268 H 1 MOL1 0.1000 21 H21 1.7384 1.0660 -1.9552 H 1 MOL1 0.1000 22 H22 1.9316 2.2021 -0.6474 H 1 MOL1 0.1000 23 H23 3.2344 1.1254 -1.0896 H 1 MOL1 0.1000 24 H24 1.5616 -2.0481 0.1641 H 1 MOL1 0.1000 25 H25 3.0109 -1.4412 -0.5993 H 1 MOL1 0.1000 26 H26 1.5188 -1.4541 -1.4741 H 1 MOL1 0.1000 @BOND 1 1 2 1 2 1 11 1 3 1 12 1 4 1 13 1 5 2 3 2 6 2 4 1 7 4 5 1 8 5 6 1 9 5 14 1 10 5 15 1 11 6 7 1 12 6 16 1 13 6 17 1 14 7 8 1 15 7 9 1 16 7 10 1 17 8 18 1 18 8 19 1 19 8 20 1 20 9 21 1 21 9 22 1 22 9 23 1 23 10 24 1 24 10 25 1 25 10 26 1 @SUBSTRUCTURE 1 MOL1 1 TEMP 0 **** **** 0 ROOT


    You should note that databases of molecules can be contained within a single file. Each new molecule in the file start with the line:

    @<TRIPOS>MOLECULE

    Sybyl is developed and distributed by
    Tripos, Inc
    St. Louis, MO


    Use of SDFile (SDF) and MOL Input Formats

    MDL supports two similar file formats, the old standard MOL file (which is strictly a single molecular structure), and the SDF, a type of file format which includes structure data in the form of the Molfile. This SDFile also includes provision for an unspecified number of records which contain data of various types for each molecule. The data may be numerical or alphabetic. Reading the MDL MOL and SDF formats is provided by OELIB-linked software. The keyword INPUTFORMAT "oelibsdf"is required for the SDF file format and "oelibmol" is required for the MOL file format. Please note that a Sybyl "MOL" file may not be useful in Molconn-Z as it may not conform to the MDL standards.

    These file formats (SDF and MOL) are carefully described in A. Dalby, J. G. Nourse, et al., J. Chem. Inf. Comput. Sci., 32, 244-255 (1992) or online at www.mdl.com/solutions/white_papers/ctfile_formats.jsp.

    The use of the SDFile or Molfile format produced by MDL software is easily done. The user first produces the desired molecule files by using MDL software in its usual manner (or converting with a suitable conversion tool like BABEL). The MOL file is a single molecule with no extraneous information, but the SDF files are essentially MOL files that terminate with a line "M END" and then may include additional information like BOILING POINT, MELTING POINT, etc, and there can be any number of molecules in this SDF file which are separated by blank lines and a line with "$$$$". See the example below and the reference given above. The whole SDFile is terminated with a blank record.

    For example, included with Molconn-Z are two SDF files, demo3.sdf which contains 50 anonymous structures, and demo5.sdf which contains 12 common structures. Also included are 2 MOL files demo6a.mol which is phenol and demo6b.mol which is a substituted phenol.

    The First Two Structures in the demo5.sdf file Supplied with Molconn-Z Software:

    Benzoic Acid ChemDraw02260010222D 9 9 0 0 0 0 0 0 0 0999 V2000 -2.4700 1.3000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.7200 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7800 -1.0625 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2200 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 -1.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 -1.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 1.2975 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 1.2975 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 4 9 2 0 0 0 0 M END > 25 249.2 > 25 122.4 > 25 Benzenecarboxylic acid > 25 3-04-00 $$$$ m-methylbenzoic acid ChemDraw02250014002D 10 10 0 0 0 0 0 0 0 0999 V2000 -2.4700 1.9500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.7200 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7800 -0.4125 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2200 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 -0.6500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 -0.6500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 1.9475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 1.9475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 -1.9525 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 4 9 2 0 0 0 0 6 10 1 0 0 0 0 M END > 25 263 > 25 111-113 > 25 m-Toluic acid > 25 3-04-00 $$$$

    (Note "blank" record to terminate file!!!)

    The Phenol Structure in the demo6a.mol file Supplied with Molconn-Z Software:

    PHENOL JFMACCS 8302248414282D 1 0.00213 0.00000 0 JF FOR PROGRAM MOLCONNZ 7 7 0 0 0 0.7943 -0.2132 0.0000 C 0 0 0 0 0 0.0023 -1.5022 0.0000 C 0 0 0 0 0 -1.5284 -1.4655 0.0000 C 0 0 0 0 0 -2.2648 -0.1072 0.0000 C 0 0 0 0 0 -1.4690 1.1987 0.0000 C 0 0 0 0 0 0.0565 1.1609 0.0000 C 0 0 0 0 0 2.3413 -0.2625 0.0000 O 0 0 0 0 0 1 2 2 0 0 0 2 3 1 0 0 0 3 4 2 0 0 0 4 5 1 0 0 0 5 6 2 0 0 0 6 1 1 0 0 0 1 7 1 0 0 0

    MDL Information Systems, Inc.
    San Leandro, CA