HINT 2.30 Manual: Chapter Two

Theory

Introduction

HINT (for Hydropathic Interactions) is a program designed for quantifying and visualizing hydrophobic and polar interactions, which are collectively referred to as hydropathy, between molecules in biologically-important systems. Thus, hydropathic attractions between species include hydrogen-bonding, acid-base interactions, Coulombic attractions as well as hydrophobic interactions which are sometimes called the "hydrophobic force". All of these are related to solvent partitioning phenomena because the dissolution of a "ligand" in a mixed solvent system involves the same fundamental processes and atom-atom interactions as biomolecular interactions within or between proteins and ligands. Solvent partition constants such as the log P for water/octanol solubility actually encode significant thermodynamic and interaction information. Because polar species are clearly differentiated from hydrophobic species, the nature of interactions between an agent and its receptor can be postulated, as can the general environment of the receptor site (i.e., whether hydrophobic or polar). Likewise, and more exciting, derivation of hydrophobic atom constants for each atom in a drug or protein leads to localized parameterized thermodynamic information. In fact, Coulombic, hydrogen-bonding, dispersion, as well as hydrophobic effects may be extracted from the hydrophobic atom constant by examination of the sign and magnitude of the constant. Also significant is that since hydrophobicity is defined in terms of solubilities the effects of solvent are also encoded within the constants.

Figure 1: The "Shake Flask" experiment to measure hydrophobicity (LogP).

The conception behind HINT was to reduce the information from bulk molecular solvent partitioning to discrete interactions between atoms (ligand to proteins, protein to protein, etc.) This is an empirical approach, and represents an important difference between hydrophobic constants and interaction parameters derived solely from molecular mechanics since hydrophobicity (which is a free energy-like property) includes a contribution from entropy which is usually ignored in most modeling approaches. Hydrophobic atom constants thus can be information-rich thermodynamic parameters, and an empirical, experimentally-based model of molecular interactions may be constructed from hydrophobicity and the solvent partitioning phenomena. These interactions have significant influence on substrate binding, protein folding, and protein subunit quaternization. The genesis of HINT was contained in a paper by Abraham and Leo (Abraham, D.J.; Leo, A.J. Proteins: Structure, Function and Genetics 1987, 2, 130) in which the suggestion was made that hydrophobic fragment constants, reduced to atomic values with inherent bond, ring, chain, branching, and proximity factors, could be used to evaluate interactions between small molecules and large molecules. These atomic values, called hydrophobic atom constants, are the key parameters for HINT, and are calculated within the program for small molecules, or alternatively (especially for proteins) are obtained from a residue-based dictionary.

Partition Calculations

The solvent partition calculations performed by HINT are based on the hydrophobic fragment constant approach of Hansch and Leo ("Substituent Constants for Correlation Analysis in Chemistry and Biology", J. Wiley and Sons: New York, 1979), in which fragment constants (and some atom constants) for a variety of organic species of biological importance are tabulated. In addition, there are a number of "factors" and application rules which modify the total partition constant depending on specific bond, chain, branching, etc. molecular attributes. It is important to note that the goal of the Hansch and Leo approach was the calculation of the total solvent partition constant for a molecule from which predictions of solubility and QSAR could be made. The solvent partition constant data used by HINT requires an opposite mind-set -- rather than summation of fragment constants and factors to a single value, HINT distributes and assigns hydrophobic constants and factors to each atom in the molecule. Partitioning between atoms within a fragment invokes the assumption that frontier or mantle atoms of a fragment have a more significant role in hydropathic interactions than shielded interior atoms. Thus, they are generally maintained at near-atomic values within fragments while the interior atom hydrophobic constants are adjusted to reflect the cumulative bonding effects on hydrophobicity within the fragment; i.e.,

a_i = f_i (these hydrophobic atom constants can be summed to obtain a total molecular partition constant). In order to achieve this goal and provide compatibility with the host modeling system (i.e., SYBYL, insightII, etc.) the Hansch and Leo fragment data were re-parameterized with: new force field atom types; slightly modified bond, branching, ring, and chain factors; and the inclusion of a new type of polar proximity factor for directly adjacent polar groups not otherwise represented by fragment data. In most cases the total partition coefficients for small molecules calculated by HINT and CLOG-P (Pomona Med-Chem Modeling Software) agree within 0.1 log units or a few percent, and both are generally in good agreement with the measured octanol-water partition coefficients. The apparent key factor in experimental-empirical agreement of these two approaches is the utilization of the aforementioned "factors" which tailor atomic and/or fragment constants to specific molecular environments.

The HINT Partition Data Bases

There are two primary data bases that enable calculation of hydrophobic atom constants for small molecules and proteins. These are provided in the $HINT_BASE directory as the binary (non-text readable) files "ff_sm_hydro_molecule.bin" and "ff_aa_hydro_protein.bin", where ff refers to the force field atom set -- currently HINT supports the atom-typing definitions of the Tripos Force Field (taff) for SYBYL and the Consistent Valence Force Field (cvff) for insightII. The ASCII readable versions of these bases are "ff_sm_hydro_hold.dat" and "ff_aa_hydro_hold.dat". The $HINT_PARTITION_ FILE environment variable (set in the HINT cshrc file) is set to either BINARY or ASCII to select the database files. The ASCII versions are provided for user-customization (instructions provided upon request), but eduSoft makes no warranty on modified calculational databases.

Described below is an outline of the concept for the data bases and how they are used in calculation of hydrophobic atom constants. The "small molecule" data bases are used for a structure/connection-based calculation of LogP and are especially appropriate for molecules that are neither proteins nor nucleic acids. The "protein" data base is used for a dictionary-based calculation of LogP and is only appropriate for macromolecules constructed of well-established and previously registered substructures or monomers such as amino acid residues.

LogP: Small Molecule/Calculate

This data base contains, and the partition_calculate subroutine of HINT is designed to utilize, a data tree of partition information. For each force field atom type there is a "bare" hydrophobic atom constant present in the base. Initially, each atom in the molecule is assigned a "bare" value which may or may not be modified by environment. Second level data in the small molecule base focuses on two-atom "fragments". If the atom being partitioned is the central atom of a previously-defined two atom fragment then an additive correction is applied to this atom to bring the total of the two atoms into agreement with the fragment value. This procedure is repeated sequentially for three, four, five, six, and seven-atom fragments. At this stage of the partition process terminal atoms of fragments are maintained at their atomic values. For example the initial partitioning of COOH is described below:

Atomic Constants: C(=)(-)(-) 0.155

O(=) -1.915

OH(-) -1.640

Fragment Constants: CO(-)(-) -1.900

COOH(-) -1.110

The total of C and O is -1.760, which is in error by -0.140 from the accepted fragment value for CO. -0.140 is added to the atom constant for C, while the O value is not modified because it is terminal.

Atomic constants: C(=)(-)(-) 0.015

O(=) -1.915

OH(-) -1.640

The total of (new) C, O, and OH is -3.540, which is in error by 2.440 from the accepted fragment value for COOH. 2.440 is added to the atom constant for C, while O and OH are both not modified.

Atomic constants: C(=)(-)(-) 2.455

O(=) -1.915

OH(-) -1.640

Further refinement for all of these values will be accomplished by the application of "factors" (below).

Small Molecule "Factors"

There are a variety of "factors", specific to the molecular environment that modify the hydrophobic atom constants for small molecules. Situations appropriate for these factors are identified by a series of simplistic rules for the atom's connectivity. The bulk of these factors and their values are as tabulated by Hansch and Leo, while a few of them have been created or modified to be appropriate for the force field atom types used by HINT. A brief summary of the factors follows:

HYD_CJG = 0.045: Applied if atom is ConJuGated
HYD_RFC = 0.100: Applied if atom is a Ring Fusion Carbon
HYD_RFH = 0.315: Applied if atom is a Ring Fusion Heteroatom
HYD_CHN = -0.120: Applied if atom is in a CHaiN of n>2
HYD_RNG = -0.090: Applied if atom is in a RiNG
HYD_BCH = -0.130: Applied if atom is at an aliphatic BranCH
HYD_BGH = -0.220: Applied if atom is at a polar BrancH
HYD_VIC = 0.140: Applied if atom is attached to VICinal halogens
HYD_ENH = -0.080: Applied if atom is in a branched long chain
HYD_CDN1 = -0.330: Applied if atom is next to a ChargeD Nitrogen
HYD_CDN2 = -0.140: Applied if atom is 2 atoms away from ChargeD N
HYD_CDN3 = -0.070: Applied if atom is 3 atoms away from ChargeD N
HYD_CDN4 = -0.035: Applied if atom is 4 atoms away from ChargeD N

Polar proximity factors are also applied to central atoms in polar groups that are proximate to other polar groups. The factors listed below are multiplicative scaling dependent on the number of intervening insulating (i.e., carbon) atoms between the polar groups or fragments. The listed factors for n = 0 are estimated by extrapolation of the other values, and will only be applied in cases where two polar groups are immediately adjacent and no fragment data exists for the composite "fragment". Calculationally, polar proximity is applied to the central atom of the polar group as the additive factor: (F) multiplied by the hydrophobic fragment constant for the polar group. If the group is in proximity to more than one polar fragment, proximity is applied serially from closest to furthest groups with recalculation of the group fragment constant after each iteration.

Polar Proximity Factors

Type	Number of "insulating" carbons (n)
	0	1	2	3	4	5
HYD_PPP (normal)	-0.380	-0.320	-0.260	-0.100	0.000	0.000
HYD_PPO (hydroxyl)	-0.580	-0.420	-0.260	-0.100	0.000	0.000
HYD_PPN (charged N)	-0.580	-0.420	-0.270	-0.240	-0.220	-0.200
HYD_PPR (aliph. ring)	-0.440	-0.320	-0.200	0.000	0.000	0.000
HYD_PPA (arom. ring)	-0.240	-0.160	-0.080	0.000	0.000	0.000

LogP: Proteins/Dictionary

The hydrophobic atom constants for the dictionary method of LogP calculation are tabulated in "ff_aa_hydro_protein.bin" by residue and atom name. The residue names are identified via a search through the "ff_aa_residue_definitions.dat" data file, which is the master list for HINT registered monomers. Each atom in each residue is correlated with the atom name list in "ff_aa_hydro_protein.bin" where three distinct values of the hydrophobic atom constant are available, depending upon the solvent conditions (Acidic, Neutral, or Basic). These values are not further modified by "factors" because, as described below, the factors are inherently included.

The atomic hydrophobic parameters in "ff_aa_hydro_protein.bin" were previously calculated by small molecule-type partition calculations for the acetyl amide analogs (AAA) for each amino acid residue (or modified AAA for C-terminal or N-terminal residues), common protein cofactors such as heme, and appropriately capped nucleic acid bases. Use of the AAA simulates the effects of the proximate polar groups of the adjacent backbone amide linkages on the hydrophobicity of the residue. The advantages of the dictionary are consistency in atom partitioning for macromolecules, the ability to quickly explore the effects of changing solvent conditions, and no reliance on potential type assignment for proteins. The dictionary data base relies on atom names only (host modeling system convention) to make hydrophobic parameter assignments.

Polar and Hydrophobic Protons

Early versions of HINT were developed around united atoms where hydrogens were implicitly treated as part of the heavy atoms. This model was developed because: a) in protein crystallography protons are never directly observed, and b) the Leo fragment method of calculating hydrophobicity always treats hydrogens as members of fragments. Consequently, the most accurate calculation of LogP is obtained through use of the united atom method. However, explicit hydrogens are necessary in order to correctly model interactions, especially hydrogen bonding. The Leo method assigns an a_i (LogP) value of 0.23 (1/2 logP H₂) to all hydrogens, but compensates for the difference between polar and non-polar hydrogens by intramolecular H-bond factors. Molecules containing intramolecular H-bonds involving nitrogen are considered to be 0.6 log units, and molecules containing intramolecular H-bonds involving oxygen are considered to be 1.0 log units more hydrophobic than the same compounds without hydrogen bonds. Note that, although hydrogens attached to polar heavy atoms are formally hydrophobic (a_i > 0), they are attracted to polar heavy atoms. HINT uses a_i = 0.83 for hydrogens attached to nitrogen, and a_i = 1.23 for hydrogens attached to oxygen or sulfur. Hydrogens attached to carbon retain a_i = 0.23. These parameters appear to scale intermolecular hydrogen bonding in line with other interaction terms. Clearly, however, the polar hydrogen is only "formally" hydrophobic. For interaction tables and maps HINT properly processes the hydrogens, but ambiguity arises in calculating and displaying hydropathic maps. What HINT does is (for grid calculational purposes only) change the sign of the polar hydrogen a_i values to negative, and compensate at the parent heavy atom.

Solvent Accessible Surface Area

Solvent accessible surface area (SASA) is also calculated (or assigned) for each atom. For atoms in small molecules the SASA is calculated from intersecting spheres having radii equal to the van der Waals radius for the atom type plus 1.4 Angstroms (the van der Waals radius of water). The SASA for protein atoms are from literature values (Shrake, A.; Rupley, J.A. Journal of Molecular Biology 1973, 79, 351-371.).

The HINT Calculation

The HINT (Hydropathic INTeractions) calculation is the central purpose of the HINT program. It is, simply, a summation of hydropathic interactions between all atom pairs:

B = b_ij; which can be summed for all i and j;

b_ij = S_i a_i S_j a_j R_ij T_ij;

where b_ij is a MicroInteraction constant representing the attraction/interaction between atoms i and j, S_i is the solvent accessible surface area for i, a_i is the hydrophobic atom constant for i, and R_ij is the functional distance behavior for the interaction of i and j. For InterMolecular calculations i and j are atoms on the two molecules; for IntraMolecular calculations i and j are two distinct indices on the same molecule where i is not equal to, or covalently bonded to, or involved in a 1-3 interaction with j.

T_ij is a discriminant function designed to keep the signs of interactions consistent with the HINT convention that favorable interactions are positive and unfavorable interactions are negative. (Much of this related to the issue of Polar and Hydrophobic Protons discussed above.) However, the other point is that while there is magnitude information for polar atoms in LogP (and a_i) there is no "sign" information; that is, the sign and effect of a charge on a polar (atom) species must be added in by HINT.

The following is an interaction matrix which details some of the conventions used by HINT to calculate T_ij.

*Atom Type*	H (apolar)	H (polar)	C (apolar)	Polar (N,O,etc.)
H (apolar)	+1¹	-1²	+1¹	-1²
H (polar)	-1²	-1³	-1²	+1⁴
C (apolar)	+1¹	-1²	+1¹	-1²
Polar (N,O,etc.)	-1²	+1⁴	-1²	-1⁵

Notes:

1: hydrophobic-hydrophobic
2: hydrophobic-polar
3: acid-acid (two polar hydrogens)
4: acid-base or hydrogen bond
5: may depend on charge, but probably base-base and unfavorable (T_ij = -1)

HINT Distance Functions

HINT allows several different custom distance functions for the interaction calculation: exponential, 1/rⁿ and combinations of these with a Lennard-Jones 6-12 function. The electrostatic-like R_ij = r^-n function (for n < 3) does not decay faster than the number of interactions increases, such that artificial cutoffs (or boundary conditions) are neccessary for this function. As the goal of the HINT model is empirical rather than rigorously theoretical modeling of interactions, the HINT model does not employ boundary conditions, etc. Since the Leo polar proximity factors define a through-bond (rather than through-space) polar (and hence hydropathic) distance function, these empirical parameters were fit to a variety of through-space mathematical functions using common bond lengths and angles. The best fit was obtained for the simple exponential: R_ij = e^-r (See Figure 2A). This functionality was also reported by Fauchere et al. (Fauchere, J.-L.; Quarendon, P.; Kaetterer, L. Journal of Molecular Graphics 1988, 6, 203-206). Also, Israelachvili and Pashley (Nature, 1982, 300, 341.) report that the hydrophobic interaction is long range, decaying exponentially with distance on the basis of direct experimental measurements in aqeuous solutions.

Figure 2: The HINT Distance Functions: A) Hydropathy; B) Complementary.

In Complement calculations, i.e., to use the hydropathic field of a known molecule to predict the field of a complementary entity, we assume that (a) the hydropathic nature of the complementary species is the same as the defining atoms; (b) atoms in the complementary species have the same van der Waals radii as those in the defining species; and (c) the optimum atom-atom distance between species is the sum of the van der Waals radii. The functional form of this distance behavior is R_ij = e^-|2r[vdw]-r|, as shown in Figure 2B for two values of r_vdw. Thus, the regions with the greatest hydropathic character in Complement maps are those 2r_vdw from the defining atom set. Also, to direct complementary hydropathic density to unoccupied space, the hydropathic density of grid points within one van der Waals are set to zero.

However, neither exponential nor power functions extract a penalty for too-close atom-atom interactions. This is the province of the Lennard-Jones potential/van der Waals potential attractions which have no electrostatic (and by inference hydropathic) contribution. E_ij values for the adaptation of this function used in HINT are from the literature (Levitt, M. Journal of Molecular Biology 1983, 168, 595-620; Levitt, M.; Perutz, M.F. Journal of Molecular Biology 1988, 201, 751-754). The table below sets out the e parameters for atoms of interest where e_i * e_j = E_ij. HINT distance functions including both exponential hydropathic and Lennard-Jones steric contributions appear to give the best results for Interaction calculations. It must be emphasized that HINT is an empirical, phenomenological model that relies on intuitive principle rather than a rigorous theoretical treatment to produce understanding of molecular interactions. This approach suggests the equations, distance functions, and parameterization used in the HINT model.

Lennard-Jones Parameters

Atom	e_i (kcal/mol)^0.5
H	0.1949
C,S,most others	0.2717
C(sp²)	0.1940
N	0.6428
O	0.4299

Directionality Vectors

This optional component to the HINT Distance Function optimizes the approach direction of two interacting atoms. Three levels of directionality are available in the model. The lowest level, the default, is to treat all atoms as spherical with no preferred orientation for interaction; i.e., the "quality" or score of an interaction is based solely upon the distance between the interacting atoms. The second level of directionality in the HINT model is to define the bond axes of the interacting atoms as vectors defining the optimum direction for interaction. In this model, the optimum interaction would occur when the two atoms are pointed directly at one another (i.e., 0 degrees in Figure 3), and would degrade as the angle of approach varied from zero. The third level of directionality is where the lone pairs and pi orbitals of polar atoms and unsaturated atoms have explicit and scaled (to electron count) direction vectors. Optimum interactions occur when these lone pairs or pi orbitals are oriented towards suitable electron acceptor atoms such as polar hydrogens, etc. Nonpolar atoms are treated as spherical in all models.

Figure 3: Measurement of Angle Between Vectors.

When direction vectors are in use, the HINT interaction score for specific atom-atom interactions are modified with a exponential function,

b_ij = b_ij * electron_count * exp ( -ang * focus ),

that gives the largest score for antiparallel (i.e., ang = 0 degrees) vectors. Electron_count is simply the number of electrons in the orbital(s) associated with the direction vector. This is nominally 2.0 electrons for lone pairs and 0.5 electrons for pi orbitals. Electrons present but not assigned to specific vectors are assumed to be spherically distributed at each atom. The vector "focus" parameter simply forces a smaller and tighter cone of interactions to be scored favorably (see Figure 4). Note that the trivial case of vector focus zero is identical to the spherical model for all atoms.

Figure 4: The Effect of the Vector_Focus Parameter on Interactions.

Direction vectors for lone pairs and pi orbitals are derived from the geometry of the atom and its attachments with three geometry classes: tetrahedral, trigonal bipyramidal, or octahedral. The specific case and algorithm is then chosen based on how many attachments are known and how many vectors are required. The relevent cases are coded in the following Table (see also Figure 5):

Case	Geometry	known attach.	required vectors	vector weighting	example atom
413	tetrahedral	1	3 lone pairs	2.0	chloride
422	tetrahedral	2	2 lone pairs	2.0	ether O
431	tetrahedral	3	1 lone pair	2.0	sp³ N
514	trigonal bipy.	1	2 lone pairs 2 pi orbitals	2.0 0.5	carbonyl O
523	trigonal bipy.	2	1 lone pair 2 pi orbitals	2.0 0.5	sp² N
532	trigonal bipy.	3	2 pi orbitals	0.5	sp² C
615	octahedral	1	1 lone pair 4 pi orbitals	2.0 0.5	sp N
624	octahedral	2	4 pi orbitals	0.5	sp C

Figure 5: The Geometries for HINT Direction Vectors. The long (black) vectors are associated with lone pairs and the shorter (gray) vectors are associated with pi orbitals.

HintMap Calculations

HINT calculates Molecular grid maps in much the same way as electrostatic grid mapping programs such as GRID or other mapping algorithms. A three-dimensional "grid" of test points is superimposed over the molecule or region of interest. At each grid point is assumed to be a test atom that has a hydrophobic atom constant (a_t) and solvent accessible surface area (S_t) both equal to one. Actually, in HINT it is irrelevent what the precise values of a_t and S_t are (as long as they are consistent) because meaningful and interpretable results are derived from trends and differences between molecules. The field value of each grid point is given by,

A_t = a_i S_i R_it,

where R_it is a function of the distance between each atom in the system (i), and the grid point (t).

Interaction HintMaps can be envisioned as a calculation where the test (grid) points are acting as observers to the interactions at their locations. The test points measure the effects from atoms i and j and then reconcile the two effects into a localized MicroInteraction constant _i _j which can be summed for all atom-atom pairs interacting at the grid point,

Figure 6: Calculation of Hint Interaction Map grid values.

_ti = a_i S_i R_it;

_tj = a_j S_j R_jt

C_t = _it _jt,

where and are atom-test point interactions for atoms i and j, respectively, and C_t is the interaction grid point value.

Small Ligand Optimization

In the course of our model building we have developed a rich appreciation for the importance of water in stabilizing (and destabilizing where appropriate) biomolecular structures. Our experience in using molecular mechanics methods to optimize water positions and orientations has been less than satisfactory, and we have often been forced to manually reorient water molecules (even after extensive molecular mechanics structure optimization) in order to realistically model the water in place at a protein interface or mediating a protein-ligand interaction.

In an attempt to increase the utility of HINT we have added a modest function to optimize the HINT score between a small ligand (like water) and the surrounding molecular structure. The algorithm is described, briefly, in this section. First, the site is created from partitioned molecules or fragments within the cutoff range from the ligand center. Next, the ligand is systematically moved and rotated within a sphere of radius equal to the translation limit. Each new orientation is "scored" intermolecularly between the ligand and site. The sphere is reduced in size each iteration in response to the highest scoring orientation/position. Convergence is reached when the size of the sphere is smaller than the convergence limit.

For each iteration, the ligand is moved to the center of, and a number of points on, the sphere and then completely rotated through a number of orientations at each of those points; the actual number of these orientations and positions is manipulated throughout the process to optimize speed and accuracy. There is a parameter called level which represents the densities of translational positions and rotational orientations.

level angle number of
points number of
orientations

1 /2 5 64

2 /4 27 512

3 /8 115 4096

4 /16 483 32768

5 /32 1987 262144

6 /64 8067 2097152

7 /128 32515 16777216

8 /256 130563 134217728

A single variable, termed ispeed, controls a variety of functions that impact speed and accuracy as described in the table below.

ispeed shrink
ratio drop
frequency cast
level look
ahead level
limit

1 0.90 1 3 3 7

2 0.85 2 2 2 6

3 0.80 3 2 2 6

4 0.75 4 1 1 6

5 0.70 5 1 1 5

6 0.65 6 1 1 5

7 0.60 7 0 0 4

8 0.55 8 0 0 4

9 0.50 9 0 0 4

10 0.45 10 0 0 4

ispeed can be selected by the user in the range of 1 to 6. shrink ratio is the multiplier applied each iteration to the sphere radius; drop frequency is the number of iterations between increases in the level for rotational density; cast level is a planned increase in point density for translations early in the iteration process (where the sphere is largest); look ahead increases the level of rotational density for the center point of the sphere; and level limit is the highest level of rotational density that will be examined.

LockSmith

LockSmith is a hydropathic-based 3D QSAR method under development. It is included in the present release of some versions of HINT so that interested users may try it out and perhaps offer suggestions. As the goal of a QSAR is to relate structural information to some form of biological activity, the Hint hydropathic maps with their encoding of structural and thermodynamic information are used in LockSmith as the structural component of the QSAR. LockSmith creates an activity- weighted hydropathic consensus map of molecules in an overlapped set. Each Molecule HintGrid (all grid points) is multiplied by a constant that is a function of the molecule's biological activity. The consensus (LockSmith) map is constructed by summation of these activity-weighted maps.

The QSAR is calculated as a least squares fit of a matching function relating each individual map to the LockSmith map to the biological activity of each molecule in the set. The LockSmith map itself graphically displays the 3D hydropathic structure of a target molecule for design.

The LockSmith method requires a carefully laid out and chemically meaningful superimposition of the molecules in the input (learning) set. This implies a predetermined pharmacophore model and that all molecules that are included in the set are presumed to have a similar action and binding mode at a common receptor.

SMILES (Simplified Molecular Input Line Entry System)

SMILES is a system for the simple input of molecular structures from text input. The system was developed by David Weininger of Daylight Chemical Information Systems, Inc. A complete set of rules for SMILES has been published: (Weininger, D. J. Chem. Info. and Comput. Sci. 1988, 28, 31) Here we present a brief description.

Atoms

Atoms are denoted by their chemical symbols. All hydrogens to fill the normal valence are implicitly assumed by SMILES. Atoms in the organic subset {B, C, N, O, P, S, F, Cl, Br, I} are written directly unless it is necessary to explicitly denote an attached hydrogen(s) or to specify a formal charge. For example, a quaternary nitrogen is written as [N+]. Other elements must be enclosed in square brackets, e.g., [Co] or [Si]. Aromatic atoms are represented by small case letters such as {c, n, o, s, etc.}

Bonds

The symbols {-, =, #, and :} represent single, double, triple, and aromatic bonds, respectively. Single and aromatic bonds are assumed by SMILES between the appropriate atoms. Thus, the string "CCCC" represents butane as all bonds are assumed to be single and all valences are assumed to be filled with hydrogens. The string "C=C" represents ethylene, and the string CCO represents ethanol.

Branches

Branches are represented by parentheses. For example the string "CC(C)(C)O" is t-butanol. Branches (and parentheses) can be nested.

Rings

Rings (cyclic structures) are constructed in SMILES by first mentally breaking one of the bonds in each ring and assigning the "broken bond" a single digit reference number. Then, when SMILES encoding the structure, the atoms involved in the broken bonds are denoted with their atom designation plus the bond reference number. Thus one SMILES code for cyclopentane is "C1CCCC1" indicating that the two atoms with the bond reference number "1" should be connected. One SMILES code for benzene is "c1ccccc1". Napthalene may be represented as "c1cc2ccccc2cc1".

Atomic Constants:	C(=)(-)(-)	0.155
	O(=)	-1.915
	OH(-)	-1.640
Fragment Constants:	CO(-)(-)	-1.900
	COOH(-)	-1.110

*level*	angle	number of points	number of orientations
1	/2	5	64
2	/4	27	512
3	/8	115	4096
4	/16	483	32768
5	/32	1987	262144
6	/64	8067	2097152
7	/128	32515	16777216
8	/256	130563	134217728

*ispeed*	shrink ratio	drop frequency	cast level	look ahead	level limit
1	0.90	1	3	3	7
2	0.85	2	2	2	6
3	0.80	3	2	2	6
4	0.75	4	1	1	6
5	0.70	5	1	1	5
6	0.65	6	1	1	5
7	0.60	7	0	0	4
8	0.55	8	0	0	4
9	0.50	9	0	0	4
10	0.45	10	0	0	4