HINT 2.30 Manual: Chapter 6S

LESSON 6: Using HINT with SYBYL 3D QSAR


This lesson demonstrates the use of HINT with the SYBYL QSAR module. We will compare the QSAR analysis of HINTLOGP as a ComFA field with HINTCOMFA (which contains much more detailed hydropathic information on the pharmacophore) as a ComFA field. It would be best for this lesson if there were no molecules or backgrounds from previous lessons currently active in SYBYL. If you are entering the HINT Tutorial at this point, follow the instructions in Step 1 of Lesson 1.

  1. Open a SYBYL Molecular Spreadsheet and Database

    From the File pulldown on the menubar select Molecular Spreadsheet and New.... The rows will represent Molecules. In the DATABASE_FILE dialog box, enter $TA_DEMO in the "Database containing molecules" text field and press Search Directory. This will allow us to select one of the already prepared molecular databases in the SYBYL Demo directory. Choose jacs.mdb for this lesson.

  2. Fill a Spreadsheet column with CBG values

    Import biological data for the jacs set of steroids. Select File, Import from the Sybyl Molecular Spreadsheet window. Set the Format to Tripos and enter the filename $TA_DEMO/cbg.tripos.

  3. Fill a Spreadsheet column with HINT LogP values

    After the spreadsheet is initialized and appears, select the AutoFill button on the speadsheet menubar. We are creating a new Column. From the list of New column types pick HINTLOGP. If HINTLOGP is not one of the column types listed then you must follow #9 of the HINT FAQ, Cancel the current AutoFill and type "mss!reset_eslc" in the Sybyl text window. The Add Column (HintLogP) dialog box allows you to tailor the method for calculating LogP. For this set of small molecules, the Partition Method should be Calculate, the Hydrogen Treatment should be All, and the Polar Proximity should be Via Bond. Press OK and accept logP2 as the Column name. This operation will take a few minutes as Column 1 of the spreadsheet is AutoFilled with parameters.

  4. Fill a Spreadsheet column with the HINT hydropathic field

    Again from the spreadsheet menubar select the AutoFill button (and choose a new Column). This time select HINTCOMFA as the New column type (see HINT FAQ as above if it is not there). The Add Column (HintCoMFA) dialog box contains options to tailor the HINT field that will be entered into the QSAR table. For this first run, we will choose mostly the default settings: (Map Type = Molecule, Smoothing = None, Information = Hydrophobic/Polar, Partition Method = Calculate, Hydrogen Treatment = All, Polar Proximity = Via Bond, Distance Function Hydropathic Term = exp(-nr), Distance Function Steric Term = off, Inside Mol Cut Off = off, Van der Waals Limit = 1.0). The Region will be from Calculate Automatically... using the Calculate CoMFA Region Automatically dialog box, where all Spacings should be 2 Angstroms and all Margins should be 4 Angstroms. Use jacs.rgn as the CoMFA Region File name. Press OK to calculate the region and then press OK to the Add Column (HintCoMFA) dialog box and accept hint3 as the Column name. This AutoFill operation will take about 5-10 minutes. Note: this column will appear to contain zeros, see #8 of the HINT FAQ.

    Molecular Spreadsheet filled with CBG, HINTLOGP and HINTCOMFA values in columns 1,2,3

  5. Run a PLS analysis on LogP as a function of the HINT field.

    In this section we are going to run 2 analyses, one relating LOGP (column2) to the CBG (column1) and one relating HINTCOMFA (column3, although note this contains many columns of data within this one column) to CBG (column1).

    From the QSAR pulldown, select Partial Least Squares... to call the Partial Least Squares Analysis dialog box. Select the Columns to Use: to be CBG,LOGP2. The Dependent Column is 1 or CBG. Select Leave-1-Out Validation, 5 Components, CoMFA Std Scaling, turn Use SAMPLS off and Columns Filtering off. This run will take about 1 minute, so you may run it interactively. If you run it interactively, be sure to choose OK for This PLS Analysis will be saved as:. In this run the optimum number of components is 1 (which makes sense since there is only one variable, LogP), and the cross-validated r^2 is 0.233.

    Now change the variables by selecting different Columns to Use:, this time CBG,HINT3. The Dependent Column is again 1 or CBG. The remainder of the settings should be as before, except the Analysis Name:. In this run the optimum number of components is 2, and the cross-validated r^2 is 0.630. Because the LogP column is a composite of components in the HINTCOMFA, you are not likely to obtain any better results by running an analysis that combines the effects of LogP and HINTCOMFA: the the cross-validated r^2 is 0.602 for 3 components.

    In this case we can see that the correlation of hydropathic data with CBG is significantly better using the HINTCOMFA data than it is with just the HINTLOGP data (as the HINTLOGP analysis uses 2 components and provides a statistical R^2 closer to 1.0).

  6. Review some of the HINT field optimization options

    The Add Column (HintCoMFA) dialog box provides a large number of options for optimizing the HINT field, much as the analogous CoMFA field dialog box does. Many of these options are only appropriate for certain data sets, e.g., it may be advisable to partition with the Dictionary method if the data set consists of peptides. If the HINT field is being combined with other fields, such as the CoMFA steric and/or electrostatic fields, Information = Hydrophobic only may yield better cross-validation statistics. Setting Smoothing to Box often improves a CoMFA model. Changing the grid spacing and other region definition parameters may improve a model, but usually at a significant cost in terms of speed. The other major form of field tuning in the Add Column (HintCoMFA) dialog box is associated with changing some of the field Cutoffs. The standard CoMFA practice is to set steric and electrostatic field values for grid points that are "inside" the molecular van der Waals surface to constant values. HINT simulates this technique with the Inside Mol Cut Off option and its associated parameters Hydrophobic and Polar. If the Inside Mol Cut Off is turned on and Polar is set at -2 and Hydrophobic is set at 1 for this data set, a model with a cross-validated r2 of 0.850 with 4 components can be derived. In order to repeat this result, however, it may be necessary to Save the Spreadsheet and restart SYBYL. There apparently is a SYBYL bug that prevents multiple External Field columns from being properly stored in a QSAR table in the same SYBYL session.

  7. Using the HINT field with the standard CoMFA fields

    The HINT field can be used in combination with the SYBYL steric and/or electrostatic fields for 2 or 3 field CoMFA studies. Note that the region must be the same, and that the SYBYL methodology for generating the region is much faster than the HINT algorithms because SYBYL does the calculation internally, while HINT must use an SPL script to collect the region information. Thus, add the SYBYL CoMFA column(s) to the table first, before the HINT column, and use as the HINT region definition the Preexisting region file generated by SYBYL for the CoMFA column(s).

    There is a graphical command in the HINT software to aid in graphing multifield CoMFA results. From the eslc pulldown on the main SYBYL menubar select the Hint, HintQSAR, Graph HintQSAR... command. This brings up the Retrieve HintQSAR dialog box that guides you through retrieving and graphing the CoMFA field contours. Choose which field types you wish to graph and their Columns. Important: This dialog does not work when there is only one CoMFA type column in the analysis.