Jcsearch

Version 5.0.6

Contents

 

Introduction

The jcsearch program is a command-line interface of the JChem chemical structure search. It is able to perform substructure, superstructure, exact, exact fragment, similarity and perfect searches as well as match counts on the specified query and target molecules. These molecules can be specified as filenames, SMARTS/SMILES strings or database tables (target only). A number of different molecular file formats are supported. Refer to the JChem Query Guide for a detailed description of search options and query features.

Note that the R-group decomposition functionality has been moved to a different script, the R-group Decomposition documentation contains specific information on this subject with examples.

Usage

For correct behaviour, please prepare the usage of the jcsearch script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts.

The program should be invoked in one of the following forms:

       jcsearch [options] [files...]
   or  jcsearch [options] DB:[table name]
With no file, or when file is -, it reads the standard input. When DB is specified, search is done in the database, using connection information saved by other JChem programs (e.g. jcman)

Options:

-h         help message
-H         help on output file formats
-q query   SMARTS string or name of file that contains the query structure(s)
             (More than one can be specified in non-database mode. Please see
              options --and and --or.) For a detailed description about how
              to formulate queries, see the JChem Query Guide.
              In case of -t:p or --tautomer expects SMILES instead of SMARTS.
-t:type    search type.
      -t:s                                substructure search (default)
      -t:e                                exact search
      -t:ef                               exact fragment search
      -t:p                                "perfect" search
      -t:i:[dissimilarity_threshold]      similarity search
      -t:u                                superstructure search
                                          (default for query tables) 
      -t:c                                count all hits
--queryAbsoluteStereo:y/n     All chiral atoms are absolute(y, default) or
                              consider chiral flag(n) in case of MDL mol files
                              w/o enhanced stereo labels. Has no effect in
                              database mode.
--targetAbsoluteStereo:y/n    All chiral atoms are absolute(y, default) or
                              consider chiral flag(n) in case of MDL mol files
                              w/o enhanced stereo labels. Has no effect in
                                database mode.
--DBAbsoluteStereo:T/C/A      In database mode, sets the above two
                              AbsoluteStereo flags.
                                T:(default) as set for table in database.
                                C: always check chiral flag(false)
                                A: always absolute stereo(true)
--exactAtomMatching:y/n       Exact atom matching(y) or not(n). Default is n.
                              (Deprecated.)
--exactQueryAtomMatching:y/n  Exact query atom matching(y) or not(n).
                              Default is n.
--exactRadicalMatching:y/n    Exact radical matching(y) or not(n).
                              Default is n. (--radical is preferred instead.)
--exactIsotopeMatching:y/n    Exact isotope matching(y) or not(n).
                              Default is n. (--isotope is preferred instead.)
--exactChargeMatching:y/n     Exact charge matching(y) or not(n).
                              Default is n. (--charge is preferred instead.)
--charge:d/e/i                Charge matching mode: d-default,
                              e-exact, i-ignore
--isotope:d/e/i               Isotope matching mode: d-default,
                              e-exact, i-ignore
--radical:d/e/i               Radical matching mode: d-default,
                              e-exact, i-ignore
--valence:d/i                 Valence matching mode: d-default, i-ignore
--vagueBond:n/1/2/3/4         Vague handling of bond types: n-off, 1-handling of
                                certain 5-membered ambiguous aromatic rings,
                                like [C,N]1C=CC=C1 (default)
                                2-all ringsingle and double bonds match aromatic
                                3-all single and double bonds match aromatic
                                4-ignore all bond types.
--mix:d/i                     Handling of com, mix and for brackets: d-default, 
                                i-ignore
--exactStereoSearch:y/n       Turns exact stereo search on/off.
--doubleBondStereo:N/M/A      Double bond stereo Matching mode:None/Marked/All
                              Default is M.
--stereoSearch:y/n            Turns stereo feature checking on/off.
--stereoModel:l/g/c           Sets the used stereo model (for tetrahedral and
                              double bond stereo). Possible values:
                              l - local(default), g - global, c - comprehensive
--reactionUnpairedMap:All/unpairedOnly Option for reaction search unpaired maps: 
                              All(default): match to any atom map,
                              unPairedOnly: match to unpaired map only.
--HCountMatching:G/E/A        Hydrogen count query property interpretation.
      Values:
        G    (greater or equal, mdl behaviour) target atom must have H-s
             greater or equal to query H-s, in excess of explicit H-s.
             H0 means no extra H other than explicitly drawn.
        E    (equal, daylight behaviour) target atom must have H-s equal to
             H count number.
        A    automatically determine whether G or E should be used, from the
             query source. (smiles and smarts source: E, all other: G).
--implicitHMatching:d/y/n     Describes the matching of implicit and explicit hydrogens.
      Values:
        d    default: the behaviour will depend on the circumstances of the search.
        y    Implicit and explicit hydrogens can match.
        n    Implicit and explicit hydrogens cannot match.
--keepQueryOrder              Does not rearrange the atoms of the query which
                              is done to achieve best search performance.
--markush:n/y                 Disable/enable special handling of Markush targets
                              Default is n. Enabling requires special license.
--markushHitSupergraph        For Markush targets returns hits for the
                              supergraph, instead of original Markush diagram
                              hits (See --allHits).
--optimizeQueries:y/n         Tries to speed up search when query molecule contains 
                              special query features (atom lists, bond lists, ...)
                              Default is y.
--maxResults:<n>              Limits the number of molecules returned.
-f format  output format (default: smiles). Run jcsearch -H for details
           possible formats: mrv, mol, sdf, rdf, rxn, csmol, cssdf, csrdf,
           csrxn, cxsmiles, cxsmarts, cml, smiles, smarts, sybyl, pdb, pov,
           cube or xyz
-o file    write output to file
-s SMILES  read target from SMILES string
-v         verbose
-vv        very verbose, stack trace on error
-0         skip coordinate calculation for SMILES input
-d         use Daylight-type aromatization (Huckel-rule) instead of
           the standard ChemAxon aromatization.
-2[:[On][e]]  2D coordinate calculation (useful if the input is SMILES)
      -2      coordinate calculation with default options (O1)
      -2:O0   no optimization    -2:O1  optimize if needed
      -2:O2   optimize           -2:e   make double either (cis/trans) bonds
-n         List non-hits. For using with multiple targets, see options --and
           and --or.
--and      If two or more queries are present, all are required to match.
           (Default) For DB targets, only the first query is considered.
           If used together with option -n , a hit is returned if none of the
           query molecules match.
--or       If more than one queries are present, at least one is required to
           match. For DB targets, only the first query is considered.
           If used together with option -n , a hit is returned if at least
           one query molecules does not match.
--allHits  Instead of checking the existence of matching, all matchings of
           the query molecule(s) are reported.
-- hitColoring    if DB option has been set, and output format is MRV, colors
           the hits depend of search type.
-- align   align or template based clean hits if DB option has been set, and
           output format is MRV.
           -- align:r	rotate. If query molecule has 0 dimension, it
           will be cleaned in 2d for alignment.
           -- align:p	partial clean (template based clean). If query molecule
           has 0 dimension, it would be equal as rotate.
--orderSensitive  Switches on order sensitive search
--tautomer        Switches on tautomer search
-e "expression" |<file>                  A Chemical Terms filtering expression
  or --expression "expression"|<file>    for filtering hits. For syntax, see the
                                     Filtering expression syntax
-c config file          Configuration xml file for Chemical Terms (optional)
  or --config config file
-S, --standardize <file/string>      standardize query and target
                                     according to configuration file/string. See the
                                     Standardizer manual.
-g, --ignore-error                   continue with next molecule on error

Filtering expression syntax

Option -e or --expression requires an additional parameter, a filtering expression formulated in ChemAxon's Chemical Terms language. (It can also be the name of a file containing the filtering expression.) Only targets (and hits in case of the --allHits option) satisfying the filtering expression are reported. Note that the filter expression applies to all query molecules if more than one are specified (in case the filter expression uses the query molecule at all).

The expression syntax is described in the Chemical Terms Language Reference. Search specific functions contained in the search context provide access to the query and the target molecules, the search hit array and its elements:

The default input molecule is the target molecule (e.g. mass() is the same as mass(target()), both refer to the molecule mass of the target molecule).

In most cases the function and plugin definitions provided by the built-in evaluator.xml are sufficient, but it is possible to specify a user-defined configuration xml in the --config parameter. The user-defined configuration is added to the definitions contained in the built-in evaluator.xml. The syntax is described in the Chemical Terms Language Reference, which includes a set of search filter examples. The short reference tables give a summary of the functions and plugins provided by the built-in configuration. A set of working examples is also available.

Examples

  1. Searching chlorobenzol in a SMILES file and sending the results to the standard output in SMILES format:
     jcsearch -q "c1ccccc1Cl" -f smiles input.smi
  2. Searching molecules with chlorobenzol and bromine at the same time. Output: smiles and molecule name (which is stored in input.smi, separated from the smiles by spaces .)
     jcsearch -q "c1ccccc1Cl" --and -q "Br" -f smiles:Tfield_0 input.smi
  3. Searching chlorobenzol in an SDfile file and writing the result (structures and all other data) into another SDfile:
     jcsearch -q "c1ccccc1Cl" -f sdf -o hits.sdf input.sdf
  4. Like the above, but reading the query from a molfile and displaying the results using mview:
     jcsearch -q clbenz.mol -f sdf input.sdf | mview -f ID -
  5. Like the above, but reading targets from a database table called molecules.
     jcsearch -q clbenz.mol -f sdf DB:molecules | mview -f ID -
  6. Listing atom numbers with less than -0.3 partial charge in a specific molecule.
     jcsearch --allHits -e "charge(h(0)) < -0.3" -q '[*]' '[O-]C(=O)CCCCCC(=O)CCCC([O-])=O'
  7. Listing carboxylic groups with acidic pKa value on the carboxylic OH greater than 4.
     jcsearch --allHits -e "pka('acidic',hm(1)) > 4" -q "[H][O:1]C=[O:2]" target.mol
  8. Filtering target molecules by both molecule mass and substructure search:
     jcsearch -e "mass() >= 250" -q query.mol targets.sdf
  9. Similiarity search, threshold should be between 0 (very similar) to 1 (not similar):
     jcsearch -q "CC(C)(O)C#N" input.smi -t:i:0.4
  10. Rgroup decomposition, SMILES table output, query is hydrogenized to force ligand attchment points matching query rgroup atoms:
    jcsearch -q query.mol target.mol -t:d:H
  11. Rgroup decomposition, SDF output, add rgroups to query automatically to define attachment points, set atom map attachment type:
    jcsearch -q query.mol target.mol -t:d:Rm -f sdf -o decomposition.sdf
    Then view the result with colors defined in Colors.ini in MView:
    mview -t DMAP -p Colors.ini decomposition.sdf
  12. Rgroup decomposition, MRV output, set atom label attachment type (these labels can only be saved in MRV format), pipe the result directly to MView:
    jcsearch -q query.mol target.mol -t:d:l -f mrv | mview -t DMAP -p Colors.ini -
 
Copyright © 1999-2008 ChemAxon Ltd.    All rights reserved.