Standardizer brings molecules to a standardized form by applying standardization actions to the molecules. These actions can be simple tasks such as conversion of explicit H atoms to implicit form or vice versa, aromatization or dearomatization, keeping only one fragment of a salt molecule, clear stereo data or set / remove the absolute stereo (chiral) flag and recalculating atom coordinates (clean). The other action type is defined by a reaction equation given in a reaction molecule file or as a SMARTS string. These reactions specify transformations of functional groups (e.g. transforming nitro groups) which are applied to transform all matching functional groups in the input molecules.
Standardizer actions are specified in a Configuration XML. It is also possible to define actions in a simple action string in case when simple actions mostly use their default options and the reaction actions can be given in SMARTS form.
The standardization transformations performed by the Standardizer are
determined by the configuration file (specified following the --config
mandatory command line parameter). Transformations can be given as
The configuration file is an XML file.
The transformations are given in subelements of the
<Actions> subsection, identified with an
ID attribute and processed in the order they are given
in the configuration XML:
<Aromatize> subsections:
performs aromatization, can also be referenced from the
simple action string.
In most cases this task is essential and it is best to be put in the first place, see the notes on task ordering and aromatization.
This section can optionally have a Type attribute which
enables aromatization in ChemAxon style (basic) if the attribute value is set to "basic",
"chemaxon" or "1".
The corresponding action string is "aromatize:b". The default aromatization is performed in
Daylight (general) style. Refer to the
aromaticity documentation
for a detailed description of the two methods.
Please note, that aromatization does not change existing aromatic bonds, so if you need to rearomatize an incorrectly aromatized molecule, please place a Dearomatize action before Aromatize.
<Dearomatize>
subsections:
performs dearomatization, can also be referenced from the
simple action string.
<AddExplicitH> (former <Hydrogenize>)
subsections:
transforms implicit H atoms to explicit hydrogens,
can also be referenced from the simple action string.
<RemoveExplicitH> (former
<ImplH> or <Dehydrogenize>) subsections:
transforms explicit H atoms to implicit hydrogens,
these elements enable the user to fine tune explicit hydrogen removal.
Specific attributes specify which hydrogen atoms should be removed.
By default, only bound, non-isotope, neutral, non-radical, non-mapped
hydrogen atoms are removed. The following attributes can be set to "true"
to remove specific hydrogen types:
Lonely
Isotope
Charged
Radical
Mapped
Wedged
<ClearIsotopes>
subsections:
converts isotopes to non-isotopic form. Can also be referenced from the
simple action string as "clearisotopes".
<Neutralize>
subsections:
neutralizes molecules (converts charged atoms to non charged, if it doesn't cause valence errors). Can also be referenced from the
simple action string as "neutralize".
<Transformation> subsections:
these elements specify the reaction based functional group transformations
and removals. The reaction is specified in the
Structure attribute either as a
SMILES string or as a molecule file path. an optional Type
attribute can be added to specify whether the structure is given as a
string (Type="string") or as a file path (Type="path").
If the Type attribute is omitted then the structure type is
automatically decided based on its format which gives the correct result
in most cases.
For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.
Removal of a functional group is given by a reaction with empty product side.
An additional optional Exact attribute can be set to
"true" to specify exact matching on molecule fragments. This means that only
isolated molecule fragments exactly matching the given functional group are transformed.
A typical application of this feature together with functional group removal is
counter ion removal: the "[Cl-]>>" transformation with the
Exact attribute set to "true" removes the "Cl-" counter ions.
Another application is solvent removal.
In case of functional group removal in exact mode it may occur that the molecule contains
only fragments that would be removed by the transformation and in this way the molecule would
become empty. This is prevented by an additional rule: if all fragments would be removed in a
single exact transformation then one of the fragments is kept. This is useful for example
when we want to remove benzene as a solvent but want to keep benzene itself when it is the
only component.
The default value of the Exact attribute is "false".
<Reaction> subsections:
same as the <Transformation>
subsections above, used mainly in earlier configurations and kept for backward
compatibility.
<Removal> subsections:
these elements specify fragment removal actions.
The method to be applied is determined in the Method
attribute. There are four such methods:
Method="keepLargest")
Method="removeSmallest")
Method="keepSmallest")
Method="removeLargest")
Method="rgroups")
The default method is "keepLargest".
The measure that determines the fragment size is specified in the
Measure attribute (not relevant
in case of R-group definition removal):
Measure="atomCount")
Measure="molMass")
The default measure is "atomCount".
Note:
Method="keepLargest" Measure="atomCount")
<Sgroups> subsections:
these elements enable the user to convert S-groups.
The value of the Act attribute specifies the conversion:
<ClearStereo>
subsections:
these elements enable the user to clear stereo data. By default, both
chirality information and double bond stereo data are cleared.
The Type attribute can specify only one of these stereo types:
Note:
<AbsoluteStereo>
subsections:
these elements enable the user to set/clear the absolute stereo flag (used in MDL molfiles).
The Act attribute specifies the action:
Note:
<ConvertDoubleBonds>
subsections:
these elements enable the user to convert the representation of double bonds with unspecified CIS/TRANS stereo
information to wiggly or crossed type.
The Type attribute can specify only one of these representation types:
If input molecules have no 2D coordinates, then also a 2D clean is performed before converting the double bond representation.
Note:
<WedgeClean>
subsections:
rearrange the stereo wedges according to the IUPAC recommendations. Can also be referenced from the
simple action string as "wedgeclean".
<ConvertWedgeInterpretation>
subsections:
convert each wedge between two stereo centers into two wedges. Can also be referenced from the
simple action string as "convertwedgeinterpretation".
<Expand>
subsections: expand stoichiometry data. This means that copies of molecule fragments
are added such that the multiplicities of the fragments would reflect the
stoichiometry data. Since these multiplicities should be integral, a calculation is made
to find the least possible integers with the required ratios.
Finally the stoichiometry data is removed, since it is represented
by the structure itself after this expansion.
The Data attribute specifies the
attached data field name that stores the stoichiometry data (default: "Stoichiometry").
The stoichiometry data of an atom refers to the connected fragment containing the atom.
If more atoms in a fragment have different stoichiometry data then the behaviour is
undefined. The stoichiometry data can be specified in either of the following forms:
0.5, 0.667)
1/2, 2/3)
2, 3)
1.
The fragment coefficients are corresponding integer values that preserve the
stoichiometry ratio. The expansion will contain each fragment with these multiplicities.
For example, this structure:
![]() |
![]() |
<Tautomerize>
subsections: takes the canonical tautomer of the molecule. Can also be referenced from the
simple action string as "tautomerize".
Note:
<Mesomerize>
subsections: takes the canonical resonant form of the molecule. Can also be referenced from the
simple action string as "mesomerize".
Note:
<MapReaction>
subsections: maps reactions by identifying and assigning numbers to the corresponding atoms
on the two sides of the reaction arrow. Can also be referenced from the
simple action string as "mapreaction".
<Unmap>
subsections: removes all map numbers from the atoms. Can also be referenced from the
simple action string as "unmap".
<Clean> subsections:
these elements perform automatic atom coordinate calculation. The Dim
attribute specifies the molecule dimension (2 or 3),
the default is the molecule dimension in case of full clean, or 2
if the molecule dimension is 0. For partial and template based clean
the dimension is always 2. The Type attribute
specifies the clean type:
TemplateFile attribute (available in 2D only)
Currently partial and template based clean is available in 2D only. If the template molecules are not in 2D then they are cleaned in 2D upon startup.
Template based clean is performed in the following way: templates are searched in the target molecule in the order as they are given in the template file. The first matching is processed: template atom coordinates are copied to the corresponding target atoms and the remaining atoms are cleaned with partial clean. See some difficult to clean structures processed with template based cleaning below.
| skeleton | compound to clean | template | cleaned result |
| crown ether | ![]() |
![]() |
![]() |
| porphyrine | ![]() |
![]() |
![]() |
| bicycle | ![]() |
![]() |
![]() |
Notes:
Dim attribute is different from the
original molecule dimension (e.g. in case of SMILES input with cleaning in 2D).
<Action> subsections:
specific transformation actions that do not have a corresponding transformation reaction
are given in the Act attribute with a predefined keyword referring to the action
to be performed. Note, that this configuration section is now deprecated and maintained
for backward compatibility. These actions are now replaced by their corresponding specific
subsections, listed below.
The available actions:
Aromatize),
Dearomatize),
AddExplicitH),
RemoveExplicitH).
These actions can also be referenced from the simple action string.
"aromatize" can optionally have a Type attribute which
enables basic (ChemAxon) aromatization if the attribute value is set to "basic",
"chemaxon" or "1".
The corresponding action string is "aromatize:b". The default aromatization is performed in
Daylight (general) style. Refer to the
aromaticity documentation
for a detailed description of the two methods.
If Standardizer is run with the
--active-groups
parameter specified (API: setActiveGroups(String[] groups) or
setActiveGroup(String group)) then only those tasks are processed which:
Groups
attribute in the configuration XML or between curly braces
in the simple action string. More groups can be specified as a
comma-separated list.
Grouping tasks can be useful in case of query and target standardization before
substructure search: some of the tasks may be required for target standardization only
(e.g. removing explicit hydrogens). Add these tasks to group "target" in the configuration
and then run Standaridzer with active groups "query" when you standardize the query structure.
In this way tasks belonging to the "target" group will be skipped. There is
an example for using groups
among the
Standardizer working examples.
Example
<?xml version="1.0" encoding="UTF-8"?>
<!-- Standardizer configuration file -->
<StandardizerConfiguration Version ="0.1">
<Actions>
<Action ID="aromatize" Act="aromatize"/>
<Transformation ID="PlusMinus" Structure="[*+:1][*-:2]>>[*:1]=[*:2]"/>
<Transformation ID="PlusMinusDouble" Structure="molfiles/PlusMinusDouble.mol"/>
<Transformation ID="Enamine" Structure="[H]N[C:1]=[C:2]>>[H][C:2][C:1]=N"/>
<Transformation ID="Enol" Structure="[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]"/>
<Transformation ID="ClMinus" Structure="[Cl-]>>" Exact="true" Groups="target,g1"/>
<RemoveExplicitH ID="removeH" Charged="true" Radical="true" Mapped="true"/>
<Removal ID="keepOne" Method="keepLargest" Measure="molMass"/>
<Aromatize ID="chemaxonaromatize" Type="basic"/>
<AddExplicitH ID="addH"/>
<Sgroups ID="ungroup" Act="Ungroup"/>
<ClearStereo ID="clearstereo" Type="Chirality"/>
<AbsoluteStereo ID="setstereo" Act="Set"/>
<Expand ID="stoichiometry" Data="COEFF"/>
<Dearomatize ID="dearomatize"/>
<Neutralize ID="neutralize"/>
<ClearIsotopes ID="clearisotopes"/>
<Clean Type="TemplateBased" TemplateFile="templates.mrv" ID="clean"/>
<Tautomerize ID="tautomer"/>
<Mesomerize ID="mesomer"/>
</Actions>
</StandardizerConfiguration>
The molecular files referred in the above configuration XML file contain the molecular structures displayed below. Map indices must be unique among the atoms of a given molecule. For a detailed description on the reaction definitions see the Reactor manual.
![]() |
![]() |
![]() |
![]() |
![]() |
Simple actions can be listed in an action string, replacing the configuration XML. Actions should be separated by either ".." or newline separators. The actions can also be specified in a file, each action written in a separate line. Action strings are handled case insensitive. The available simple actions are listed below:
Each action can have a comma-separated list of group names between curly braces as prefix
in which case the task belongs to the listed groups. In this case the task is skipped
if the active group list is specified (command line: --active-groups, API: setActiveGroups(String[] groups) or
setActiveGroup(String group)) and none of the task groups belongs to the active group set. The group names are handled case insensitive.
This group setting corresponds to the
Groups attribute in the XML configuration. There is
an example for using groups
among the
Standardizer working examples.
Example:
First extract the largest fragment, then aromatize, then standardize nitro groups, finally standardize enamine groups.
keepone..aromatize:b..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]