The molecule structures found in the examples of this document are drawn using MarvinSketch, the integrated structure drawing tool of JChem.
If special molecular features are present on the query (eg. stereochemistry, charge, etc.), only those targets match which also contain the feature. However, if a feature is missing from the query, it is not checked by default.
An exact structure search finds molecules that are equal (in size) to the query structure. (No additional fragments or heavy atoms are allowed.) Molecular features (by default) are evaluated the same way as described above for substructure search.
Table 1 Exact structure search, substructure search
| query | target | hit | |
| exact structure search | substructure search | ||
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Similarity is only used in database searches, and its similarity concept is based on hashed binary chemical fingerprints with Tanimoto metrics. (For a more detailed description, see the Developers Guide.) For a more sophisticated approach of similarity, we provide the Screen package.
Perfect search is mainly used before database inserts to check whether the given molecule already appears in the database or not. All molecular features need to be equal here, eg. non-stereo query will only match non-stereo target, etc.
Superstructure search is the opposite of substructure search: It searches for those target molecules which can be found in the given superstructure query. (In this case the roles of the query and target molecules are simply exchanged, so query properties should be specified on the target!)
Exact fragment search is between substructure and exact search: the query must exactly match to a full fragment of the target. Other fragments may be present in the target, they are ignored. This search type is useful to perform an "Exact search" that ignores salts or solvents beside the main structure in the target.
The following table details the main differences amongst these search types.
| Search type | Search feature | |||||||
| Similarity | Tests if target contains query |
Tests if query contains target |
Full fragment coverage |
Exact topology matching |
Exact stereo matching |
Exact atom features matching |
Exact bond matching |
|
| SUBSTRUCTURE | n/a | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| SUPERSTRUCTURE | n/a | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| EXACT_FRAGMENT | n/a | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| EXACT | n/a | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| PERFECT | n/a | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| SIMILARITY | ![]() |
n/a | n/a | n/a | n/a | n/a | n/a | n/a |
The definition of the search features are:
| Query | Target | Hit | Remark | |
| EXACT | PERFECT | |||
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
with option DoubleBondStereoMatching
set to DBS_MARKED (default) |
|
![]() |
![]() |
![]() |
(A) denotes aliphatic query property | |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
||
The diagrams below show further examples of substructure, exact fragment, exact and perfect searches. The arrow between a query and target molecules denotes matching.




For the different stereo features, see section Stereochemistry.
Table 2 Atom lists
| target | ||||
![]() |
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
Table 3 Atom not lists
| target | ||||
![]() |
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
A Any (any atom except hydrogen. Neither matches to explicit nor
implicit hydrogens. Please note that in JChem the SMARTS
primitive "*" is imported as any atom and does not match to
plain hydrogens. (Neither explicit nor implicit.)
For differences between matching any atoms appearing in
different file formats, see here )
AH Any atom, including hydrogen.
Q Hetero (any atom except hydrogen and carbon)
QH Hetero atom or hydrogen (any atom except carbon)
M Metal (contains alkali metals, alkaline earth metals, transition metals, actinides, lanthanides, poor(basic) metals, Ge, Sb and Po)
MH Metal or hydrogen
X Halogen (F,Cl,B or I)
XH Halogen or hydrogen
Gn Member of group (column) n in the periodic system (n = 1..18)
Attention: G17 is NOT the same as X, as it contains At!
Table 4 Generic query atoms
| target | ||||
![]() |
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
a aromatic (has aromatic bond)
A aliphatic (does not have aromatic bond)
D<n> degree (number of explicit connections; default for "n" is one)
H<n> total hydrogens (total number of hydrogen substituents)
h<n> implicit hydrogens (number of implicit hydrogen substituents*)
R<n> rings (number of rings the atom is a member of)
r<n> smallest ring size (size of the smallest ring the atom is a member of)
R ring membership (whether atom is part of a ring or not)
v<n> valence (total bond order)
X<n> connections (number of substituents including hydrogens)
s<n> substitution count (number of non-H substituents)
s0-s5:exact substitution count; s6: 6 or more substitutions
s* substitution as drawn (no extra non-H substituents)
rb<n> ring bond count (number of ring bonds next to the atom)
rb0-rb3:exact ring bond count; rb4: 4 or more ring bonds
rb* ring bond count as drawn (no extra ring bonds)
u unsaturated atom (atom has double, triple or aromatic bond)
* Corresponds to both ISIS and Daylight behaviours, depending on the
source of the Molecule object. For details, see below at the
differences section.
Table 5 Atom properties
| target | ||||
![]() |
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_CHARGE_MATCHING, CHARGE_MATCHING_DEFAULT /
CHARGE_MATCHING_EXACT / CHARGE_MATCHING_IGNORE) setOption(OPTION_ISOTOPE_MATCHING, ISOTOPE_MATCHING_DEFAULT /
ISOTOPE_MATCHING_EXACT / ISOTOPE_MATCHING_IGNORE) setOption(OPTION_RADICAL_MATCHING, RADICAL_MATCHING_DEFAULT /
RADICAL_MATCHING_EXACT / RADICAL_MATCHING_IGNORE) chemaxon.sss.SearchConstants.)
The following tables show some examples.
| target | |||
![]() |
![]() |
||
setOption(OPTION_CHARGE_MATCHING,
CHARGE_MATCHING_DEFAULT) (Default) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_CHARGE_MATCHING,
CHARGE_MATCHING_EXACT) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_CHARGE_MATCHING,
CHARGE_MATCHING_IGNORE) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
| target | |||
![]() |
![]() |
||
setOption(OPTION_ISOTOPE_MATCHING,
ISOTOPE_MATCHING_DEFAULT) (Default) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_ISOTOPE_MATCHING,
ISOTOPE_MATCHING_EXACT) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_ISOTOPE_MATCHING,
ISOTOPE_MATCHING_IGNORE) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
| target | |||
![]() |
![]() |
||
setOption(OPTION_RADICAL_MATCHING,
RADICAL_MATCHING_DEFAULT) (Default) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_RADICAL_MATCHING,
RADICAL_MATCHING_EXACT) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setOption(OPTION_RADICAL_MATCHING,
RADICAL_MATCHING_IGNORE) |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
| Query | Possible meanings | ||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
SMARTS atoms are depicted the following way in marvin:
The following additional query features are handled as part of this:
| Operator | Name |
| ! | not (unary operator) |
| & | high precedence and (default operator, i.e. can be omitted between two query expressions) |
| , | or |
| ; | low precedence and |
Examples:
| Query | Target | |||
| NCC(O)=O | [N+]CC([O-])=O | [H]OC(=O)N([H])C | COC | |
| [OX2H,OX1-] | ![]() |
![]() |
![]() |
![]() |
| [O&X2&H,O&X1&-] | ![]() |
![]() |
![]() |
![]() |
| [NX3;H2,H1] | ![]() |
![]() |
![]() |
![]() |
| [OX2!-] | ![]() |
![]() |
![]() |
![]() |
For example:
| SMARTS | Meaning |
| [OX2$(OaaN)] | Aliphatic oxygen with two connections, next to an aromatic ring having an aliphatic N in ortho position. |
| [OX2$(*aaN)] | Same as above. |
| [$([OX2]aaN)] | Same as above. |
| [NX3;H2,H1;!$(NC=O)] | Primary or secondary amine, not amide. |
| [$(N~*~*~[O!$(O([C,c])[C,c])])] | Aliphatic N three bonds away to a non-ether aliphatic O. |
Examples:
| Query | Target | |||
![]() |
![]() |
![]() |
||
| [OX2$(OaaN)] | ![]() |
![]() |
![]() |
|
| [$(OCC),$(OCN)] | ![]() |
![]() |
![]() |
|
| [$(O([C,c])[C,c])] | ![]() |
![]() |
![]() |
|
| [$(N~*~*~[O!$(O([C,c])[C,c])])] | ![]() |
![]() |
![]() |
|
Please note that uppercase atom symbols only match to aliphatic atoms and lowercase only to aromatic.
In JChem the SMARTS primitive "*" (any atom) does not match to plain hydrogens. (Neither explicit nor implicit.) However, it matches deuterium and charged H. See below.
Further SMARTS examples can be found on Daylight's page.
Pseudo atoms have user-defined atom types, and they only match another pseudo atom of the same name (case insensitive). Commonly used pseudo atoms include "Resin" and "Pol".
Examples:
| Query | Target | |||
![]() |
![]() |
|||
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
||
It should be noted that there is no chemical intelligence associated with pseudo atoms. This means that if a common abbreviation is used as pseudo atom, it will not match the corresponding molecular group. To achieve this, correct abbreviations (Superatom S-groups) must be used.
JChem search can handle query and target atoms having lone pairs associated with them. Lone pairs on the query side match explicit and implied lone pairs, but please note that lone pairs are only considered when attached to an atom, ie isolated lone pairs will not match anything.
Examples:
| Query | Target | |||
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
![]() |
|
![]() |
any |
![]() |
single or double |
![]() |
single or aromatic |
![]() |
double or aromatic |
Table 6 Generic bond types
| target | ||
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
| target | |||
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
SMARTS bonds are depicted the following way in marvin:
Like at SMARTS atoms, SMARTS logical operators "!" (not), "&", ";" (high and low precedence and), "," (or) can be used. "&" is the default operator, hence "and" is assumed if there is no operator between two SMARTS primitives. Furthermore, the following characters have valid meanings:
| Bond expression | Meaning |
| - | Single bond |
| = | double bond |
| # | triple bond |
| : | aromatic bond |
| @ | any ring bond |
| / | directional bond: single "up" (used at cis/trans) |
| \ | directional bond: single "down" (used at cis/trans) |
Examples:
| SMARTS | Meaning |
| C-,=,#C | Two aliphatic carbons connected by single, double or triple bond. |
| *-!@* | Two atoms connected by a nonring single bond. |
| *@-,!@&/*=*@-,!@&/* | Double bond between two single bonds in ring or not in ring but in trans configuration. |
Matching examples:
| Query | Target | |||
![]() |
![]() |
![]() |
||
| C-,=,#C | ![]() |
![]() |
![]() |
|
| *-!@* | ![]() |
![]() |
![]() |
|
| *@-,!@&/*=*@-,!@&/* | ![]() |
![]() |
![]() |
|
Further SMARTS examples can be found on Daylight's page.
| Query | Target | |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
So individual and multicenter representations can both be used during searching, in all combinations. See examples below. (The thin dotted bonds represent ANY query bond types.)
| Query | Target | |||
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| SMARTS representation | Meaning |
| C.C | No restrictions. |
| (C.C) | The two carbons must appear in the same component. |
| (C).(C) | The two carbons must appear in different components. |
Examples:
| Query | Target | |||
|
![]() |
|
||
| C.C | ![]() |
![]() |
![]() |
|
| (C.C) | ![]() |
![]() |
![]() |
|
| (C).(C) | ![]() |
![]() |
![]() |
|
| (C).(C).C | ![]() |
![]() |
![]() |
|
Ordered mixtures (FOR type S-groups), on the other hand contain ordered components, which define the order of addition. Example:
Component brackets without surrounding mix or for brackets are considered as being in mix (unordered mixture) brackets and molecules not drawn in any component brackets are considered to be in the same component.
Examples:
| Query | Target | |||
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
| Query | Target | ||
![]() |
![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
Exact fragment matching ensures that all query components (atoms connected by bonds) match only full components. See its description in the Search types section.
Table 8 Explicit hydrogens
| target | ||||
![]() |
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
mol(), target(): both refer to the search target molecule
query(): refers to the search query molecule
m(int i): refers to the query atom index with atom map i
hit(), h(): both refer to the search hit array
hit(int i), h(int i): both refer to the i-th element of the
search hit array, this is the target atom index matching the query atom with
atom index i
hm(int i): refers to the target atom index matching the query atom with
atom map i (shorthand for h(m(i)))
The default input molecule is the target molecule (e.g. mass() is the same as
mass(target()), both refer to the molecule mass of the target molecule).
The filtering expression can be set by
setFilter(filteringExpression)
setFilter(filteringExpression, config)
evaluator.xml.
The following table shows some examples (pKa values are shown at target atoms).
| target | |||
![]() |
![]() |
||
setFilter("pka(hm(1)) > 2") |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
setFilter("pka('acidic', hm(1)) > 2 && mass() > 100") |
|||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
A set of working examples is also available.
When the query does not contain stereo information, the hits will include results both with and without stereo information. Otherwise, the stereo information is taken into account during the search.
Search options may modify the above behaviour. For example, when the stereo search option is switched off, all stereo information is ignored. And when the exact stereo option is switched on, all stereo information is tested for equality, meaning that a non-stereo query only matches non-stereo targets.
![]() |
up (wedge bond ) the ligand on the wide end is above the atom at the narrow end |
![]() |
down (hatch bond) atom on the wide end is below the atom at the narrow end |
![]() |
up or down (wiggly bond) specifies tetrahedral chirality information, but the actual stereo configuration is irrelevant; or |
| cis/trans configuration of double bond is irrelevant (see below) |
The table below depicts a few examples of tetrahedral stereo matching, assuming absolute stereochemistry (see next section for further details):
| target | |||||
![]() |
![]() |
![]() |
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
|
Stereogenic centers belonging to ABS represent absolute
stereochemistry, i.e. chirality. (All unlabeled stereo centers are
also thought
to belong to the ABS group by default. Unlabelled stereo centers may
be
interpreted as an independent AND group only if (1) chiral flag is
not set AND
(2) the absolute stereo search options (
Query/TargetAbsoluteStereo,
AbsoluteStereo) are set to false. See the following sections
for
further explanation.)
Stereogenic centers belonging to an ORn group (e.g. OR1) represents one stereoisomer that is either the structure as drawn (R, S) OR the epimer in which the stereogenic centers have the opposite configuration (S, R).
Stereogenic centers belonging to an ANDn group (e.g. AND1) represents a mixture of two enantiomers: the structure as drawn AND the epimer in which the stereogenic centers have the opposite configuration. (e.g. Racemic mixture.)
For example,
| molecule | interpretation |
![]() |
A pure sample of one stereoisomer:![]() |
![]() |
A pure sample of one of these enantiomers: or
|
![]() |
A pure sample of one of these enantiomers: or
|
![]() |
A sample that is a mixture of the two
enantiomers: and
|
![]() |
A pure sample of one of these diastereomers:
or
or or
|
For example:
| target | ||||||||
|
|
(No stereo info) |
|
|
|
|
||
| query |
|
|
|
|
|
|
|
|
(No stereo info) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For down wedge query bonds:
| target | ||||||||
|
|
(No stereo info) |
|
|
|
|
||
| query |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
At AND and OR groups the relative configuration of the group must match: (i.e. All match as drawn or all match the opposite way.) There is no restrictions when the chiral centers belong to different groups. (Bottom row.)
| target | |||||
|
|
|
|
||
| query |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| target | |||||
|
|
|
|
||
| query |
|
|
|
|
|
However, when the absolute stereo options
(Query/TargetAbsoluteStereo, AbsoluteStereo) are set to
false,
the Chiral flags in MDL molfiles and sdfiles
are considered. In this case, molecules lacking the chiral flag are
considered as if their unlabeled stereogenic centers were in an AND
group hence expressing relative stereo configuration:
| target | ||||||
Chiral |
Chiral |
Chiral |
Chiral |
|
||
| query |
Chiral |
|
|
|
|
|
|
|
|
|
|
|
|
There is a search option which controls the behaviour regarding double bond cis/trans isomerism: setDoubleBondStereoMatchingMode(). It can set three different search states:
In case of DBS_MARKED, a small box should be placed on the query double bond to indicate the stereo search flag. This means that those double bonds will be considered as stereo during the search. In this case, the corresponding double bond in the target molecule structure must have the same stereo configuration as drawn in the query (Table 7).
|
cis (the two atoms are on the same side of the double bond) |
|
trans (the two atoms are on the opposite sides of the double bond) |
|
cis or trans (stereo bond with either cis or trans configuration) |
|
cis or trans (stereo bond with either cis or trans configuration) |
|
not trans |
|
not cis |
Examples(DBS_MARKED):
| target | ||
![]() |
||
| query | ![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
If you are interested in searching combinatorial Markush library targets (tables) described by R-group notation, see this following section.
![]() |
![]() |
![]() |
![]() |
![]() |
>0 for R1 specifies that
the target molecule must contain at least one of the R1 substituents listed in
the R1 group definition on its corresponding atom. This is the default value of
the occurrence.0,2-5,>6" means the specified
R-group may occur zero, two to five, or more than six times.
If R1 then R2 means that if the conditions for R1 are satisfied,
then the conditions for R2 must also be satisfied. If the conditions for R1 are
not satisfied, the conditions for R2 are ignored. This If/Then condition implies
that the molecule may be retrieved even though R1 is not satisfied.
Table 9 R-group query structures
| target | ||||
![]() |
||||
| query | ![]() |
default | ![]() |
|
| R1>1 | ![]() |
|||
| R2>1 | ![]() |
|||
| RestH on R1 |
![]() |
|||
![]() |
default | ![]() |
||
| if R1 then R3 |
![]() |
|||
| if R2 then R3 |
![]() |
|||
JChem allows searching in combinatorial libraries described as Markush structures, without the need to explicitly enumerate all molecules of the Markush library. The searching can handle the same generic features as the Markush Enumeration Plugin.
| Name | Description | Example | Example Markush library member |
|---|---|---|---|
| R-groups | R-groups (also referred to as "substituent variation") are the most widely known Markush generic features. The variable part of th |