CHAPTER 4

Using HintLogP

General Program Description

HintLogP is a file-oriented program with a terminal command-line interface. The program requires two input files (a molecular structure file and a control file) and it produces an output file whose structure is defined in the control file.. The program is executed on the command line of a character terminal (DOS window in Windows version) and has the following syntax:
hintlogp <name of control file> <name of molecular structure file> <name of output file>
for example:
hintlogp demo_control.dat demo.sdf demo.sdf.out
CONTROL FILE

The control file contains keywords which define the molecular structure file format, the algorithms used, the information output to the output file, etc. The keywords are described in the table below. An example of this file is shown here.

DETAIL moleculelevel 
INPUTFORMAT oelibsmiles 
WARNINGS on 
ERRORS skip 
INDEX on 
OUTPUT record 
OUTPUT name 
OUTPUT formula 
OUTPUT weight 
OUTPUT logp 
DELIMITER space 
GO
Each control file must contain a "GO" at the end of all other options. In this control file we only want to view the logP data for the whole molecule (rather than atom level) along with record number, molecule name, molecular formula, and molecular weight. This information is separated by spaces. INDEX writes the record number for each processed molecule to the stderr. Our molecular structure file will be in the SMILES format, but read by the OELIB code. The 4 options for INPUTFORMAT are "daylightsmiles", "oelibsmiles", "oelibsdf" and "oelibmol2". WARNINGS are turned on (sending information to the stderr), and ERRORS will cause the code to skip the calculations on that specific structure, and procede to the next.
 

Table 3: Control File KEYWORDS

keyword

options

function

DETAIL

moleculelevel

print the logP only for the whole molecule

atomlevel

print hydropathy information for atom components of the molecule

OUTPUT

record or norecord

print the record number in the outfile (or not)

name or noname

print the molecule name in the outfile (or not)

formula or noformula

print the molecular formula in the outfile (or not)

weight or noweight

print the molecular weight in the outfile (or not)

logp or nologp

print the value of the logP in the outfile (or not)

INPUTFORMAT

daylightsmiles

use (licensed) Daylight toolkit to interpret molecule structure

oelibsmiles

use OElib SMILES routine to interpret molecule structure

oelibsdf

use OElib SDF (MDL) routine to interpret molecule structure

oelibmol2

use OElib MOL2 (Sybyl) routine to interpret molecule structure

DELIMITER

space

information in each record is delimited by a space

comma

information in each record is delimited by a comma

WARNINGS

on*

write non-serious warning messages to stderr

off

do not write non-serious warning messages to stderr

ERRORS

exit

exit on serious error

skip*

skip current calculation, move to next molecule on serious error

continue

continue current calculation, even with compromised input data

INDEX

on*

write index (or record) number for each processed molecule to stderr

off

do not write index number for each processed molecule to stderr

GO

n/a

end of options input, begin calculations

* - default settings for program parameters.

MOLECULAR STRUCTURE FILE

The program expects an input molecular structure file which can be in one of three formats. The formats are described in more detail in Input File Formats

    Daylight SMILES format read by OElib code from OpenEye (this does not require any additional code or license).

    Daylight SMILES format read by Daylight SMILES Toolkit (this requires a run-time "smiles" license from Daylight).

    MDL SDFile format read by OElib code from OpenEye (this does not require any additional code or license).

    Tripos Sybyl/MOL2 format read by OElib code from OpenEye (this does not require any additional code or license).
     

In a typical application the user would include in the molecular structure file all the molecules which are a part of an investigation. Thus, the input molecular structure file can contain one or many structures. Other molecules may, of course, be added later or done separately. It is critical that the keyword INPUTFORMAT match the file format that is provided for input molecular structure file that is provided in the argument. That is, if the keyword INPUTFORMAT is set to "oelibsdf" then no matter what the name of the file is, it must be an SDF format file.

OUTPUT FILE

The structure of the output file depends on the keywords used in the control file. For example, use of the the keyword DETAIL MOLECULELEVEL will provide the most consise logP output, while DETAIL ATOMLEVEL will provide extensive details on hydropathy components in both atoms and fragments of the molecule. One should be careful using the ATOMLEVEL keyword with large databases as this could produce a very large output file.

For the cases where a large database is used, and INDEX is set to "on", it may be desirable to save this indexing information to a file so that you may determine which molecules in the database have problems that may need correcting. This information is printed to the "stderr" port and therfore it can be collected in a separate file using the following command:

UNIX/LINUX: $HINT_RUN/hintlogp control.dat database1.smi database1.s >& hint.log

Windows 2000/XP: HINTLOGP control.dat database1.smi database1.s 2> hint.log

You should note that the LICENSE errors (which are the most common) are also printed to the "stderr" port, so anytime your output file is missing information, you should check the terminal or the stderr log file.
 
 


Typical HintLogP Session

The following steps are generally followed in using HintLogP:

Demo HintLogP Sessions

Using the demo files described in "Getting Started With HintLogP", we can test the output and function of HintLogP.

First, copy the files in the hintlogp3.06_/demo directory into a working directory on your computer. (Note: for the Windows version, the file names for the test will need to be different since Windows does not distinguish between demo1.out and demo1.OUT.)

DEMO1

The file demo1.smi is simply benzene.  To run (the $HINT_RUN is not necessary if this directory is in the $PATH):

$HINT_RUN/hintlogp demo_control.dat demo1.smi demo1.out

Compare demo1.out (new) and demo1.OUT (archival) for differences.
 

DEMO2

The file demo2.smi contains 100 molecules of varying complexity.  To run:

$HINT_RUN/hintlogp demo_control.dat demo2.smi demo2.out

Compare demo2.out with demo2.OUT.
 

DEMO3

Edit demo_control.dat and change the INPUTFORMAT keyword from oelibsmiles to oelibsdf.

The file demo3.sdf contains 12 simple molecules.  To run:

$HINT_RUN/hintlogp demo_control.dat demo3.sdf demo3.out

Compare demo3.out with demo3.OUT
 

DEMO4

The file demo4.sdf contains 50 molecules of moderate complexity.  To run:

$HINT_RUN/hintlogp demo_control.dat demo4.sdf demo4.out

Compare demo4.out with demo4.OUT
 

DEMO5

Edit demo_control.dat and change the INPUTFORMAT keyword to oelibmol2.

The file demo5.mol2 contains 10 fairly simple molecules.  To run:

$HINT_RUN/hintlogp demo_control.dat demo5.mol2 demo5.out

Compare demo5.out with demo5.OUT