CHAPTER 5

Input File Formats

The HintLogP software package has provision for three formats of files (SMILES, SDF, MOL2) for the input of molecule structure. The flow of information from input structure files to output is described in Using HintLogP. The input structure can be interpreted by either Daylight's SMILES toolkit (a separate license is required from Daylight), or by OpenEye Software's OELIB (open source). The type of input is defined in the control file with the keyword option, INPUTFORMAT (options: daylightsmiles, oelibsmiles, oelibsdf, oelibmol2). The input molecular structure information is contained in the file that is provided as the second argument to the molconnz command, <name of input structure file>. With each of these file formats, one or more molecules can be contained within the file.

There are a three of options for structure input format, as follows:

    SMILES using Daylight Toolkit (unix versions - optional)
    SMILES code using OELIB
    Tripos MOL2 (Sybyl) format

    MDL Informations Systems, Inc. SDFile format

The input file can be created or obtained in several ways:
    using a text editor and entering directly the necessary information as described below for SMILES, MOL2 or SDF structure information;
    using a graphic type input or database which also produces a connection table which corresponds to one of the formats described below.
    obtaining the connection table from a preexisting database and converting it to the format described below.
The three formats are briefly described in subsequent sections. However, no attempt is made here to give a complete description of the particular format. Rather, our purpose is to illustrate how such formats may be utilized in the HintLogP software. For specific information about each of the formats, the user is directed to the appropriate company representative or literature source.

    Use of SMILES Input Format

    HintLogP has two options for using SMILES input: one from Daylight Chemical Information Systems, Inc. (INPUTFORMAT keyword daylightsmiles) and one from OpenEye Software (INPUTFORMAT keyword oelibsmiles). These are ASCII files that contain one or more structures (one per line) which includes a special linear coding segment followed by the molecule name. The source of SMILES strings could be an existing database, something you create with a text editor, or by converting another file using a conversion tool like BABEL (or OELIB).

    SMILES code was developed by David Weininger (D. Weininger, J. Chem. Inf. Comput. Sci., 28, 31-36, 1988) to provide a string code for the input of molecular structure. The user is referred to this reference and subsequent papers for the description of the SMILES code and techniques for creation of SMILES code for molecular structures. Essentially, the chemical graph is reduced to a tree (noncyclic) graph by removing one bond for each ring; the atoms between which the bond was broken are labeled with a number. Branches are enclosed in parentheses. A short description of the SMILES rules is given below.   For more information on SMILES see Chapter 3 of the Daylight Theory Manual.

    SMILES (Simplified Molecular Input Line Entry System)

    SMILES is a system for the simple input of molecular structures from text input. The system was developed by David Weininger of Daylight Chemical Information Systems, Inc. A complete set of rules for SMILES has been published: (Weininger, D. J. Chem. Info. and Comput. Sci. 1988, 28, 31) Here we present a brief description.

    Atoms

    Atoms are denoted by their chemical symbols. All hydrogens to fill the normal valence are implicitly assumed by SMILES. Atoms in the organic subset {B, C, N, O, P, S, F, Cl, Br, I} are written directly unless it is necessary to explicitly denote an attached hydrogen(s) or to specify a formal charge. For example, a quaternary nitrogen is written as [N+]. Other elements must be enclosed in square brackets, e.g., [Co] or [Si]. Aromatic atoms are represented by small case letters such as {c, n, o, s, etc.}

    Bonds

    The symbols {-, =, #, and :} represent single, double, triple, and aromatic bonds, respectively. Single and aromatic bonds are assumed by SMILES between the appropriate atoms. Thus, the string "CCCC" represents butane as all bonds are assumed to be single and all valences are assumed to be filled with hydrogens. The string "C=C" represents ethylene, and the string CCO represents ethanol.

    Branches

    Branches are represented by parentheses. For example the string "CC(C)(C)O" is t-butanol. Branches (and parentheses) can be nested.

    Rings

    Rings (cyclic structures) are constructed in SMILES by first mentally breaking one of the bonds in each ring and assigning the "broken bond" a single digit reference number. Then, when SMILES encoding the structure, the atoms involved in the broken bonds are denoted with their atom designation plus the bond reference number. Thus one SMILES code for cyclopentane is "C1CCCC1" indicating that the two atoms with the bond reference number "1" should be connected. One SMILES code for benzene is "c1ccccc1". Napthalene may be represented as "c1cc2ccccc2cc1".


    The SGI, SUN, and LINUX versions of HintLogP have the added capability of reading and decoding SMILES files using the Daylight Toolkit SMILES interpreter instead of the OELIB SMILES interpreter. This is an optional feature that requires a run-time SMILES Toolkit license from Daylight Chemical Information Systems, Inc. The format of the Daylight SMILES or OELIB SMILES files is the same. Each record of the SMILES files, which are generally named with the .smi extension, is simply a SMILES string followed by the Molecule Name (space delimited). There is no file termination code. This file format matches what is supported by the Daylight database software and will be a useful option for some sites that have large databases already encoded in this way. The other potential advantage is that the Daylight Toolkit is the defacto standard for interpretation of SMILES codes; which could be a consideration for those who plan to work with large complex SMILES libraries on UNIX computers.

    EduSoft includes two demo SMILES files: demo1.smi is a single molecule, benzene, and demo2.smi is a database file that contains 100 structures. Note the molecule name/identifier is at the end of each line, in this case the CAS Registry Number.

    The File demo2.smi as Supplied With All Versions of HintLogP Software:

    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C(=O)COC(=O)C 50-03-3
    BrC43C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3Br)(CCC(C4)Cl)C 5337-45-1
    O(C5C(C4C(C3C(C1(C(C2C(CC1)(CCC2C(=C)C)C)CC3)C)(CC4)C)(CC5)C)(C)C)C(=O)c6ccccc6 1617-69-2
    OC1C(C4C(CC1)(C3=C(C2(C(C(CC2)C(CCCC(C)C)C)(CC3=O)C)C)C(=O)C4)C)(C)C 5346-40-7
    OC4CC3C(C2C(C1C(C(CC1)C(O)C)(CC2)C)CC3)(CC4)C 80-92-2
    OC4C(C3C(C2C(C1(C(C(CC1)C(CCC(=O)O)C)(CC2=O)C)C)CC3)(CC4)C)(C)C 5346-42-9
    OC4C(C3C(C2C(C1(C(C(CC1)C(CCC(=O)O)C)(CC2=O)C)C)C(=O)C3)(CC4)C)(C)C 5399-41-7
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 470-03-1
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 470-01-9
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 126-19-2
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 126-18-1
    O2C1(OCC(CC1)C)C(C3C2CC5C3(CCC6C4(C(CC(CC4)O)CCC65)C)C)C 77-60-1
    OC4CCC3(C2C(C1C(C(CC1)C(=O)C)(CC2)C)CC=C3C4)C 145-13-1
    OC4CCC3(C2C(C1C(C(CC1)C(=O)C)(CC2)C)CC=C3C4)C 566-63-2
    ClC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C 910-31-6
    O4C(=O)C3C51C(CCC(C1)OC(=O)C)(C2=CCC6(C(C2(C3C4=O)C=C5)CCC6C(=O)C)C)C 25495-42-5
    OC2C3C(C1C(C(CC1)C(=O)C)(C2)C)CCC4=CC(=O)CCC43C 600-57-7
    OC2C3C(C1C(C(CC1)C(=O)C)(C2)C)CCC4=CC(=O)CCC43C 80-75-1
    OC4C3(C(C2C(C1CCC(=O)C=C1CC2)CC3)CC4)C 434-22-0
    O(C)C(=O)C=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(=O)C3)CC4)C 1474-15-3
    OC1C3C(C2C(C1)(C(=CCOC(=O)C)CC2)C)CCC4=CC(=O)CCC43C 5327-59-3
    O(C4C1(C(C3C(CC1)c2c(cc(cc2)O)CC3)CC4)C)C(=O)CCC5CCCC5 313-06-4
    OC(=O)C(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C 5327-60-6
    OC4(C3(C(C2CCC1=CC(=O)CCC1(C2=CC3)C)CC4)C)C 1039-17-4
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C 1807-02-9
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C 1043-10-3
    O1C32C1CC5(C(C2CCC4=CC(=O)CCC43C)CCC5(O)C)C 1042-33-7
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC)OC7CC6(C(C5C(C3(C(C(CC3)C4=CC(=O)OC4)(CC5)C)O)CC6)(CC7)C=O)O)C 560-53-2
    OC5C(C4C(C3C(C1(C(C2C(CC1)(CCC2C(=C)C)CO)CC3)C)(CC4)C)(CC5)C)(C)C 473-98-3
    BrC(=C(C)C)CCC(C1C4(C(CC1)(C3=C(C2(C(C(C(CC2)O)(C)C)CC3)C)CC4)C)C)C 50719-45-4
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 17608-41-2
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 516-92-7
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 80-97-7
    OC4CC3C(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC3)(CC4)C 360-68-9
    C1(CCCC1)C2CCCCC2 1606-08-2
    C43(C(C2C(C1(CCC=CC1=CC2)C)CC3)CCC4C(CCCC(C)C)C)C 747-90-0
    C1(C(CCC1)C)C2CCCCC2 5405-90-3
    S(=O)(=O)(NC(=O)CCC(C4C3(C(C2C(C1(C(CC(CC1)O)CC2)C)CC3O)CC4)C)C)c5ccc(cc5)N 5407-24-9
    OC3C4(C(C1C(C2(C(CC1O)CC(CC2)O)C)C3)CCC4C(CCC(=O)O)C)C 81-25-4
    N(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)C 1865-62-9
    OC4(C3(C(C2C(C1(C(=CC(=O)C=C1)CC2)C)C(=O)C3)CC4C)C)C(=O)CO 1247-42-3
    [N+](=O)([O-])c1c(ccc(c1)[N+](=O)[O-])NN=C5CCC4(C3C(C2C(C(CC2)O)(CC3)C)CCC4=C5)C 2347-93-5
    N6C1C(OC5(C1C)CCC4C3C(C2(CCC(CC2=CC3)O)C)C(=O)C4=C5C)CC(C6)C 469-59-0
    O1C(C(C(C(C1C)O)O)O)OC6CCC5(C4C(C2(C(C(CC2)C3=COC(=O)C=C3)(CC4)C)O)CCC5=C6)C 466-06-8
    O1C(CC(C(C1C)O)OC)OC6CC5(C(C4C(C2(C(C(CC2)C3=CC(=O)OC3)(CC4)C)O)CC5)(CC6)C=O)O 508-77-0
    O1C(C(C(C(C1CO)O)O)O)OC6CCC5(C4C(C2(C(C(CC2)C3=COC(=O)C=C3)(CC4)C)O)(CC(C5=C6)OC(=O)C)O)C 507-60-8
    N81C(C(C7(C(C1)C6(C(C5C2(OC3(C(C2(CCC3OC(=O)c4cc(c(cc4)OC)OC)C)CC5)O)C6)(CC7O)O)O)O)(O)C)CCC(C8)C 71-62-5
    O1C(C(C(C(C1CO)O)O)O)OC2C(C(OC(C2O)C)OC7CCC6(C5C(C3(C(C(CC3)C4=COC(=O)C=C4)(CC5)C)O)CCC6=C7)C)O 124-99-2
    N71C(C(C6C(C1)C5C(C4C2(OC3(C(C2(CCC3OC(=O)C(O)(CC)C)C)C(C4OC(=O)C)OC(=O)C)O)C5)(C(C6O)OC(=O)C(CC)C)O)(O)C)CCC(C7)C 143-57-7
    N71C(C(C6C(C1)C5C(C4C2(OC3(C(C2(CCC3OC(=O)C(O)(C(O)C)C)C)C(C4OC(=O)C)OC(=O)C)O)C5)(C(C6O)OC(=O)C(CC)C)O)(O)C)CCC(C7)C 124-97-0
    O1C(CC(C(C1C)OC2OC(C(C(C2)O)O)C)O)OC3C(OC(CC3O)OC8CC7C(C6C(C4(C(C(CC4)C5=CC(=O)OC5)(CC6)C)O)CC7)(CC8)C)C 71-63-6
    O1C(C(C(C(C1COC2OC(C(C(C2O)O)O)CO)O)O)O)OC3C(OC(CC3OC)OC8CC7(C(C6C(C4(C(C(CC4)C5=CC(=O)OC5)(CC6)C)O)CC7)(CC8)C=O)O)C 33279-57-1
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C7C(C5(C(C(CC5)C6=CC(=O)OC6)(CC7)C)O)CC8)(CC9)C)C)C)C 17575-20-1
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C5C(C6(C(C(C5)O)(C(CC6)C7=CC(=O)OC7)C)O)CC8)(CC9)C)C)C)C 17575-22-3
    O1C=C(C=CC1=O)C5C4(C(C3(C(C2(CCC(C=C2C(C3)OC(=O)C)O)C)CC4)O)(CC5)O)C 507-59-5
    O1C(C(C(C(C1CO)O)O)O)OC2C(OC(CC2OC(=O)C)OC3C(OC(CC3O)OC4C(OC(CC4O)OC9CC8C(C7C(C5(C(C(C(C5)O)C6=CC(=O)OC6)(CC7)C)O)CC8)(CC9)C)C)C)C 17575-21-2
    OC4CCC3(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC=C3C4)C 83-48-7
    OC4CCC3(C2C(C1C(C(CC1)C(CCC(C(C)C)CC)C)(CC2)C)CC=C3C4)C 83-46-5
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C=C 1235-98-9
    O(C4CC3C(C2C(C1C(C(CC1)(O)C(=O)C)(CC2)C)CC3)(CC4)C)C(=O)C 5456-44-0
    O(C4CCC3(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC=C3C4)C)C(=O)C 4651-48-3
    O(C4CC3C(C2C(C1C(C(CC1)C(C)C=CC(C(C)C)CC)(CC2)C)CC3)(CC4)C)C(=O)C 13010-52-1
    O(C6CC5C(C4C(C1C(C(CC1)C(CC=C(c2ccccc2)c3ccccc3)C)(CC4)C)CC5)(CC6)C)C(=O)C 4144-29-0
    OC4CC3C(C2C(C1C(C(CC1)C(CCC(=O)OC)C)(CC2)C)CC3)(CC4)C 15074-01-8
    OC4CC3C(C2C(C1C(C(CC1)C(CCC(=O)OC)C)(CC2)C)CC3)(CC4)C 1249-75-8
    OC4(C3(C(C1C(C2(C(CC1)CC(=O)CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)C 3751-02-8
    OC4(C3(C(C1C(C2(C(CC1)CC(=O)CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)C 1499-59-8
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(=O)C3)CC4)C)C(=O)COC(=O)CCCCCCCCCCCCCCCCC 5432-63-3
    OC(=O)CCC(C4C3(C(C1C(C2(C(CC1=O)CC(=O)CC2)C)CC3=O)CC4)C)C 81-23-2
    OC3C4(C(C2C(C1(C(CC(CC1)O)CC2)C)C3)CCC4C(CCC(=O)O)C)C 30635-00-8
    OC3C4(C(C2C(C1(C(CC(CC1)O)CC2)C)C3)CCC4C(CCC(=O)O)C)C 83-44-3
    OC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C 57-88-5
    O(C4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C)C(=O)C 604-35-3
    ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 128-10-9
    ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 60-57-1
    ClC43C2C5C1OC1C(C2C(C3(Cl)Cl)(C(=C4Cl)Cl)Cl)C5 72-20-8
    OC4(C1(C(C3C(C(C1)O)C2(C(=CC(=O)C=C2)CC3)C)CC4)C)C(=O)CO 50-24-8
    OC1C3C(C2C(C1)(C(=CCO)CC2)C)CCC4=CC(=O)CCC43C 3103-13-7
    BrC4CC3(C1C(C2C(CC1=O)(C(=CC(=O)OC)CC2)C)CCC3=CC4=O)C 5415-46-3
    BrC4CC3(C1C(C2C(CC1O)(C(=CC(=O)OC)CC2)C)CCC3=CC4=O)C 5415-47-4
    O=C4CCC3(C2C(C1C(C(CC1)C(C)C=O)(CC2)C)CCC3=C4)C 66289-21-2
    O=C4CCC3(C2C(C1C(C(CC1)C(C)C=O)(CC2)C)CCC3=C4)C 3986-89-8
    S1C5(NC(C1)C(=O)O)CC4C(C3C(C2C(C(CC2)C(=O)C)(CC3=O)C)CC4)(CC5)C 6293-78-3
    OC4(C1(C(C3C(C(C1)O)C2(C(=CC(=O)C=C2)CC3)C)CC4)C)C(=O)COC(=O)CCC(=O)O 1715-33-9
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)C(C3)O)CC4)C)C(=O)COC(=O)CCC(=O)O 125-04-2
    O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CCC5CCCC5 58-20-8
    N1(CCCCC1)C=C(C5C4(C(C3C(C2(CCC(=O)C=C2CC3)C)CC4)CC5)C)C 24377-48-8
    O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CC 58769-88-3
    O(C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)CC 57-85-2
    S=P(OC4CCC3(C2C(C1C(C(CC1)C(CCCC(C)C)C)(CC2)C)CC=C3C4)C)(OCC)OCC 24352-66-7
    N%10C9(OC8C(C7(C(C6C(C1(C(CC(CC1)OC2OC(C(C(C2O)O)OC3OC(C(C(C3OC4OC(C(C(C4O)O)O)CO)OC5OCC(C(C5O)O)O)O)CO)CO)CC6)C)CC7)C8)C)C9C)CCC(C%10)C 17406-45-0
    O=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 18485-76-2
    O=C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 63-05-8
    OC4(C3(C(C2C(C1CCC(=O)C=C1CC2)CC3)CC4)C)C#C 68-22-4
    OC4(C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C#C 434-03-7
    OC5C1(C(C4C(CC1)c2c(cc(cc2)OC(=O)c3ccccc3)CC4)CC5)C 50-50-0
    O(CC(=O)C4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C)C(=O)C 56-47-3
    Oc1cc4c(cc1)C2C(C3C(CC2)(C(=O)CC3)C)CC4 53-16-7
    OC4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 571-41-5
    OC4C3(C(C2C(C1(CCC(=O)C=C1CC2)C)CC3)CC4)C 86335-11-7

    Use of MOL2 Input Format

    Reading the Tripos MOL2 format is provided by OELIB-linked software. The MOL2 file structure is described at www.tripos.com/services/mol2/index.html. The keyword INPUTFORMAT "oelibmol2"is required for this file format. Although SMILES files take more time to interpret, MOL2 files are much more complex than SMILES files and can contain 3-D coordinates, although these coordinates are not required by HintLogP. A typical source of a MOL2 file would be Tripos' Sybyl, although other programs can produce MOL2 files. MOL2 files could be created with a text editor, and can be converted from other file types using  program like BABEL.

    EduSoft includes a demo MOL2 database file that contains 10 structures.

    The First Two Structures in the demo4.mol2 File Supplied With HintLogP Software:

    # Name: B_ESTRADIOL # Creating user name: gkellogg # Creation time: Tue Nov 9 13:17:55 1993 # Modifying user name: gkellogg # Modification time: Tue Nov 9 13:27:21 1993 @MOLECULE B_ESTRADIOL 44 47 1 0 0 SMALL USER_CHARGES INVALID_CHARGES @ATOM 1 C0 3.5269 1.2282 -0.5910 C.3 1 MOL1 -0.3000 2 C1 2.0219 0.8438 -0.1562 C.3 1 MOL1 0.0000 3 C2 1.0537 1.8981 -0.7458 C.3 1 MOL1 -0.2000 4 C3 0.0299 1.4727 -1.7954 C.3 1 MOL1 -0.2000 5 C4 -0.6674 0.2085 -1.3476 C.3 1 MOL1 -0.1000 6 C5 0.3651 -0.9761 -1.1904 C.3 1 MOL1 -0.1000 7 C6 -0.3887 -2.2712 -0.7075 C.3 1 MOL1 -0.2000 8 C7 -1.9470 -2.2001 -0.8458 C.3 1 MOL1 -0.2000 9 C8 -2.5481 -0.9510 -0.2037 C.ar 1 MOL1 0.0000 10 C9 -3.6511 -1.0182 0.6047 C.ar 1 MOL1 -0.1000 11 C10 -4.1448 0.1316 1.2039 C.ar 1 MOL1 0.0300 12 O11 -5.2368 0.0644 2.0110 O.3 1 MOL1 -0.3800 13 C12 -3.5084 1.3467 0.9895 C.ar 1 MOL1 -0.1000 14 C13 -2.3852 1.4139 0.1732 C.ar 1 MOL1 -0.1000 15 C14 -1.8960 0.2944 -0.4257 C.ar 1 MOL1 0.0000 16 C15 1.7503 -0.6620 -0.6221 C.3 1 MOL1 -0.1000 17 C16 2.3448 -1.4886 0.6115 C.3 1 MOL1 -0.2000 18 C17 1.8447 -0.7429 1.8521 C.3 1 MOL1 -0.2000 19 C18 1.9854 0.7486 1.4492 C.3 1 MOL1 -0.0700 20 O19 3.0303 1.3907 2.2055 O.3 1 MOL1 -0.3800 21 H21 3.7606 2.2203 -0.2175 H 1 MOL1 0.1000 22 H22 3.6066 1.2177 -1.6741 H 1 MOL1 0.1000 23 H23 4.2684 0.5450 -0.1910 H 1 MOL1 0.1000 24 H24 0.4635 2.4607 0.0315 H 1 MOL1 0.1000 25 H25 1.5259 2.7971 -1.2309 H 1 MOL1 0.1000 26 H26 0.4426 1.4496 -2.8310 H 1 MOL1 0.1000 27 H27 -0.6977 2.3024 -1.9616 H 1 MOL1 0.1000 28 H28 -1.2830 0.0097 -2.2984 H 1 MOL1 0.1000 29 H29 0.5285 -1.2724 -2.2903 H 1 MOL1 0.1000 30 H30 -0.0612 -3.1901 -1.1806 H 1 MOL1 0.1000 31 H31 -0.2539 -2.4288 0.3800 H 1 MOL1 0.1000 32 H32 -2.3687 -3.0871 -0.3978 H 1 MOL1 0.1000 33 H33 -2.1839 -2.1834 -1.9121 H 1 MOL1 0.1000 34 H34 -4.1287 -1.9704 0.7788 H 1 MOL1 0.1000 35 H35 -5.5691 0.8589 2.4352 H 1 MOL1 0.3500 36 H36 -3.8869 2.2397 1.4650 H 1 MOL1 0.1000 37 H37 -1.9092 2.3662 0.0343 H 1 MOL1 0.1000 38 H38 2.5108 -1.0804 -1.3610 H 1 MOL1 0.1000 39 H39 2.0677 -2.5309 0.5916 H 1 MOL1 0.1000 40 H40 3.4400 -1.5052 0.6368 H 1 MOL1 0.1000 41 H41 0.8164 -1.0696 2.0500 H 1 MOL1 0.1000 42 H42 2.4339 -1.0074 2.7284 H 1 MOL1 0.1000 43 H43 1.0919 1.2266 1.8949 H 1 MOL1 0.1000 44 H44 3.8454 0.8990 2.0823 H 1 MOL1 0.3500 @BOND 1 1 2 1 2 1 21 1 3 1 22 1 4 1 23 1 5 2 3 1 6 2 16 1 7 2 19 1 8 3 4 1 9 3 24 1 10 3 25 1 11 4 5 1 12 4 26 1 13 4 27 1 14 5 6 1 15 5 15 1 16 5 28 1 17 6 7 1 18 6 16 1 19 6 29 1 20 7 8 1 21 7 30 1 22 7 31 1 23 8 9 1 24 8 32 1 25 8 33 1 26 9 10 ar 27 9 15 ar 28 10 11 ar 29 10 34 1 30 11 12 1 31 11 13 ar 32 12 35 1 33 13 14 ar 34 13 36 1 35 14 15 ar 36 14 37 1 37 16 17 1 38 16 38 1 39 17 18 1 40 17 39 1 41 17 40 1 42 18 19 1 43 18 41 1 44 18 42 1 45 19 20 1 46 19 43 1 47 20 44 1 @SUBSTRUCTURE 1 MOL1 1 TEMP 0 **** **** 0 ROOT # Name: ACETYLCHOLINE # Creating user name: gkellogg # Creation time: Tue Nov 9 13:28:28 1993 # Modifying user name: gkellogg # Modification time: Tue Nov 9 13:30:31 1993 @MOLECULE ACETYLCHOLINE 26 25 1 0 0 SMALL USER_CHARGES INVALID_CHARGES @ATOM 1 C0 -4.3973 -0.2463 0.5026 C.3 1 MOL1 -0.3000 2 C1 -2.9663 -0.5262 0.1201 C.2 1 MOL1 0.4100 3 O2 -2.4730 -1.6120 0.3866 O.2 1 MOL1 -0.3800 4 O3 -2.2053 0.4016 -0.5122 O.3 1 MOL1 -0.1800 5 C4 -0.8689 0.1813 -0.8888 C.3 1 MOL1 -0.0500 6 C5 0.1357 0.3156 0.2722 C.3 1 MOL1 0.2200 7 N6 1.5948 0.1395 0.0157 N.4 1 MOL1 -0.6800 8 C7 2.3386 0.3304 1.3574 C.3 1 MOL1 0.1200 9 C8 2.1496 1.1730 -0.9461 C.3 1 MOL1 0.1200 10 C9 1.9383 -1.2514 -0.4837 C.3 1 MOL1 0.1200 11 H11 -4.6967 0.7455 0.1659 H 1 MOL1 0.1000 12 H12 -5.0472 -0.9927 0.0417 H 1 MOL1 0.1000 13 H13 -4.5015 -0.3041 1.5875 H 1 MOL1 0.1000 14 H14 -0.5718 0.9172 -1.6399 H 1 MOL1 0.1000 15 H15 -0.7202 -0.8022 -1.3248 H 1 MOL1 0.1000 16 H16 -0.2368 -0.3946 1.0400 H 1 MOL1 0.1000 17 H17 -0.0872 1.3068 0.7218 H 1 MOL1 0.1000 18 H18 2.1823 1.3190 1.8019 H 1 MOL1 0.1000 19 H19 3.4141 0.2178 1.2571 H 1 MOL1 0.1000 20 H20 2.0340 -0.3870 2.1268 H 1 MOL1 0.1000 21 H21 1.7384 1.0660 -1.9552 H 1 MOL1 0.1000 22 H22 1.9316 2.2021 -0.6474 H 1 MOL1 0.1000 23 H23 3.2344 1.1254 -1.0896 H 1 MOL1 0.1000 24 H24 1.5616 -2.0481 0.1641 H 1 MOL1 0.1000 25 H25 3.0109 -1.4412 -0.5993 H 1 MOL1 0.1000 26 H26 1.5188 -1.4541 -1.4741 H 1 MOL1 0.1000 @BOND 1 1 2 1 2 1 11 1 3 1 12 1 4 1 13 1 5 2 3 2 6 2 4 1 7 4 5 1 8 5 6 1 9 5 14 1 10 5 15 1 11 6 7 1 12 6 16 1 13 6 17 1 14 7 8 1 15 7 9 1 16 7 10 1 17 8 18 1 18 8 19 1 19 8 20 1 20 9 21 1 21 9 22 1 22 9 23 1 23 10 24 1 24 10 25 1 25 10 26 1 @SUBSTRUCTURE 1 MOL1 1 TEMP 0 **** **** 0 ROOT


    You should note that databases of molecules can be contained within a single file. Each new molecule in the file start with the line:

    @<TRIPOS>MOLECULE

    Sybyl is developed and distributed by
    Tripos, Inc
    St. Louis, MO


    Use of SDFile (SDF) Input Format

    MDL supports a type of file format which includes structure data in the form of the Molfile. This SDFile also includes provision for an unspecified number of records which contain data of various types for each molecule. The data may be numerical or alphabetic. Reading the MDL SDF format is provided by OELIB-linked software. The keyword INPUTFORMAT "oelibsdf"is required for this file format.

    This Structure Data file (SDFile) is carefully described in A. Dalby, J. G. Nourse, et al., J. Chem. Inf. Comput. Sci., 32, 244-255 (1992) or online at www.mdl.com/downloads/ctfile/ctfile_subs.html.

    The use of the SDFile format produced by MDL software is easily done. The user first produces the desired molecule files by using MDL software in its usual manner. These molecule files are incorporated into the input file along with the data lines desired by the user, following each Molfile. The record separating the Molfile from the data records contains 'M END'. See the example below and the reference given above. The information for each molecule is terminated by a blank record followed by a record containing $$$$. The whole SDFile is terminated with a blank record.

    For example, included with HintLogP are two SDF files, demo3.sdf which contains 50 anonymous structures, and demo5.sdf which contains 12 common structures.

    The First Two Structures in the demo5.sdf file Supplied with HintLogP Software:

    Benzoic Acid ChemDraw02260010222D 9 9 0 0 0 0 0 0 0 0999 V2000 -2.4700 1.3000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.7200 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7800 -1.0625 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2200 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 -1.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 -1.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 -0.0025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 1.2975 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 1.2975 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 4 9 2 0 0 0 0 M END > 25 249.2 > 25 122.4 > 25 Benzenecarboxylic acid > 25 3-04-00 $$$$ m-methylbenzoic acid ChemDraw02250014002D 10 10 0 0 0 0 0 0 0 0999 V2000 -2.4700 1.9500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.7200 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.7800 -0.4125 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2200 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 -0.6500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 -0.6500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 0.6475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 1.9475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5300 1.9475 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.7825 -1.9525 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 4 9 2 0 0 0 0 6 10 1 0 0 0 0 M END > 25 263 > 25 111-113 > 25 m-Toluic acid > 25 3-04-00 $$$$

    (Note "blank" record to terminate file!!!)

    MDL Information Systems, Inc.
    San Leandro, CA