Fragmenter cleaves single bonds to generate molecular fragments. The cleavage rules correspond to chemical reactions in order to enhance synthetic accessibility. The cleavage points on the fragments are labeled with the cleavage rules:
![]() |
![]() |
Fragmenter fragments molecules based on predefined cleavage rules. The cleavage rules are given in form of reaction molecules in the configuration XML.
By default, all non-ring bonds matching the cleavage bonds in the rules are cleaved. However, it is possible to provide a revision algorithm that forbids certain cuts depending on predefined criteria (e.g. the resulting fragment size, the structural environment of the bond, the number of cleaved bonds in the resulting fragments, etc.). Currently one such algorithm is implemented: the RECAP method.
The RECAP algorithm raises the following cleavage revision rules:
Notlist.
The cut-bond reactions, the forbidden fragment list (notlist), the maximum number of open bonds per fragment and the minimum number of atoms per fragment are specified in the configuration XML.
The following cleavage data is stored in SDF tags (molecule properties) for each fragment, if specified in the configuration:
Example
Apply the RECAP cleavage revision algorithm for the ether and amine cleavage reactions:
![]() |
![]() |
Note, that usually the application of these rules will not result in
single oxygen or nitrogen atom fragments: setting the
MinAtomCount RECAP parameter
to a value greater than 1 will prevent Fragmenter from
creating single-atom fragments.
Take the input molecule:
![]() |
Then the following fragments will be generated if bond cleavage between ring-carbons and hetero atoms (see revision rule 2. above) is forbidden (FragmenterRecap1.xml):
fragment -c FragmenterRecap1.xml input.mol -y 1 -f sdf:-a
![]() |
while one more cleavage is performed if bond cleavage between ring-carbons ant hetero atoms (see revision rule 2. above) is allowed (FragmenterRecap2.xml):
fragment -c FragmenterRecap2.xml input.mol -y 1 -f sdf:-a
![]() |
Note, that we set sdf:-a as output format in the -f parameter
because our fragments are aromatized due to standardization,
but the SDF format is supposed to store the dearomatized form.
A set of working examples is also available.
fragment -c <config file> [<options>] [<input files/strings>]
Prepare the usage of the fragment script or batch file
as described in Preparing the Usage of JChem
Batch Files and Shell Scripts.
Alternatively, the Fragmenter class can be directly invoked:
Win32 / Java 2 (assuming that JChem is installed in c:\jchem):
java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
chemaxon.reaction.Fragmenter \
-c <config file> [<options>] \
[<input files/strings>]
Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):
java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
chemaxon.reaction.Fragmenter \
-c <config file> [<options>] \
[<input files/strings>]
General Options:
-h, --help this help message
-c, --config <filepath> configuration XML file
-x, --fragment-count <count> the maximum number of fragments
per fragment set (default: unlimited)
-y, --set-count <count> the maximum number of fragment sets
per molecule (default: unlimited)
-e, --extensive include extendable cut sets
-i, --id <SDF tag> SDFile tag that stores the molecule ID
(default: the molecule index)
-s, --statistics <SDF tag> SDFile tag that stores data used later
for making statistics (default: none)
-g, --ignore-error continue with next molecule on error
Output Options:
-f, --format <format> output file format (default: cxsmiles)
-o, --output <filepath> output file path (default: stdout)
-k, --skip-unfragmented skip unfragmented molecules in output
-n, --no-data <L|I|LI> no fragment data in:
L: atom labels
I: fragment ID
LI: neither atom labels nor fragment ID
The command line parameter --config is mandatory. This
specifies the path and filename of a configuration file without which the
program cannot operate. A detailed description of the format of this
configuration file is given below.
The command line parameter --fragment-count
specifies the maximum number of fragments to be generated in one fragment set.
This parameter overrides the MaxFragmentCount
attribute specified in the configuration XML.
The command line parameter --set-count
specifies the maximum number of fragment sets to be generated per molecule.
This parameter overrides the MaxSetCount
attribute specified in the configuration XML.
If the command line parameter --extensive is specified then
fragment sets originating from cut sets that can be extended by adding more cleavage bonds
are also added to the result fragments.
This parameter overrides the Extensive attribute specified
in the configuration XML.
If the command line parameter --skip-unfragmented is specified then
unfragmentable molecules will not be written to the output.
By default, cleavage data is stored and visualized in the atom labels and associated
fragment IDs (these are used during fragment duplicate check in
FragmentStatistics).
Both can be changed by specifying the command line parameter --no-data:
L: no data written in atom labels
I: no fragment ID is written
LI: neither atom labels nor fragment IDs are written
CutIds attribute
of the SDFTags element in the configuration XML
UID SDF tag or in cxsmiles field
The first method is only available for SDF output. Atom labels are also useful for visualizing cleavage data at each atom, however, they may be disturbing if there are long cleavage data strings or atom labels are used for a different purpose.
The command line parameter --id specifies the SDF tag storing
the molecule ID to be written to the output SDF as reference to the source molecule
that the fragment has been generated from.
The command line parameter --statistics
specifies an SDFile tag name that stores some of the input molecule: this can be a
real number from a continuous range or a value from a discrete range, depending on
the type of the data. This data is copied into the fragments and will be used by
FragmentStatistics when counting fragments
falling into a certain data class (which is a user-defined interval in a
continuous range and a single value in a discrete range). Without this data field,
FragmentStatistics simply counts identical fragments.
Note, that FragmentStatistics requires fragments
written in cxsmiles format, therefore do not change the default output format
of Fragmenter when FragmentStatistics is to be run.
For details on making statistical data on fragments generated by Fragmenter,
refer to the FragmentStatistics Manual.
If the command line parameter --ignore-error is specified, then import/export errors
will not stop the processing but the error is written to the console and the molecule is skipped.
By default, the program exits in case of molecule import/export erros.
Most molecular file formats are accepted ( MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).
If no input file name or input string is specified in the command line then input is taken from the standard input.
By default, Fragmenter writes output molecules in cxsmiles format with the following fields:
--statistics is specified)
Other output formats can be specified in the --format parameter.
Note, that FragmentStatistics requires fragments
written in cxsmiles format.
The --output parameter specifies the output file path.
If omitted, results are written to the standard output.
The cleavage reactions are determined by the configuration file
(specified following the --config mandatory command line parameter).
An optional standardization section can be provided to perform pre-standardization on reaction reactants, products and input molecules. See the Standardizer manual for information on standardization.
The configuration XML may also specify reviser algorithm parameters in a separate section. If this section is omitted then no cleavage revision is made, that is, all bonds matching the cleavage reaction cleavage bonds are cleaved. Currently only the RECAP algorithm is implemented, therefore there are only two options:
The cleavage reactions are given in <Action>
subsections under the <Fragmenter> section.
Each reaction has an ID attribute and a
Structure attribute as well as an
optional Type attribute which specifies whether the
Structure attribute is a file path (Type="path")
or a molecule string (Type="string"). More actions can have the same
ID attribute; in this way alternate reaction definitions
may be specified for one cleavage rule (see the ether definitions
below).
If the Type attribute is omitted then the structure type is
automatically decided based on its format which gives the correct result
in most cases.
For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.
Unlike in case of usual reaction definitions, here atom maps do not have to be unique: identical atom maps denote symmetric atoms (see the ether and amine reactions in the introduction example). In theID attribute together with the matching reaction
atom map is written in the fragment SDF tag to identify the cleavage bond endpoint.
The SDFTags section specifies which
cleavage data should be stored in fragment SDF tags and specifies
these SDF tag names (if the attribute is omitted then the corresponding data will not
be stored).
The Fragmentation section
specifies the following fragmentation parameters:
An optional cleavage bond reviser algorithm implementation may be applied with
parameters listed under the <Reviser> section.
The implementation java class is specified in the <Class>
attribute. Reviser specific parameters are specified in sunsections of the reviser
section. Currently only the RECAP reviser algorithm is available.
RECAP parameters that are specified in subsections are:
NotList: the list of forbidden fragments.
Molecules are specified the Structure and the optional
Type attribute similarly to reaction definitions.
CutRingCHetero: "true" if cleavage between ring carbons
and hetero atoms is allowed, "false" otherwise (default: "false")
Example
<FragmenterConfiguration Version ="0.1" schemaLocation="fragment_schema.xsd"> <Standardizer> <Actions> <Reaction ID="plusminus" Structure="[*+:1][*-:2]>>[*:1]=[*:2]"/> <Action ID="aromatize" Act="aromatize"/> </Actions> </Standardizer> <Fragmenter> <Actions> <Action ID="amide" Structure="[O:3]=[C!$(C([#7])(=O)[!#1!#6]):2]-[#7!$([#7][!#1!#6]):1]>>[O:3]=[C:2].[#7:1]"/> <Action ID="ester" Structure="[#6!$([#6](O)~[!#1!#6])][O:2][C:1]=O>>[C:1]=O.[#6][O:2]"/> <Action ID="amine" Structure="[#6:2]-[N!$(N[#6]=[!#6])!$(N~[!#1!#6])!X4:1]>>[N:1].[#6:2]"/> <Action ID="urea" Structure="N[C:1]([N:2])=O>>N[C:1]=O.[N:2]"/> <Action ID="ether" Structure="[#6]-[O!$(O[#6]~[!#1!#6]):1]-[#6:2]>>[#6:2].[O:1]-[#6]"/> <Action ID="olefin" Structure="[C:1]=[C:1]>>[C:1].[C:1]"/> <Action ID="quatN" Structure="[#6:1]-[N$(N([#6])([#6])([#6])[#6])!$(NC=[!#6]):2]>>[#6:1].[N:2]"/> <Action ID="aromN-carbon" Structure="[n:1]-[#6!$([#6]=[!#6]):2]>>[n:1].[#6:2]"/> <Action ID="lactamN-carbon" Structure="[C:3](=[O:4])@-[N:1]!@-[#6!$([#6]=[!#6]):2]>>[C:3](=[O:4])[N:1].[#6:2]"/> <Action ID="aromcarbon-aromcarbon" Structure="[c:1]-[c:1]>>[c:1].[c:1]"/> <Action ID="sulphonamide" Structure="[#7:1][S:2](=O)=O>>[#7:1].[S:2](=O)=O"/> </Actions> <Params> <SDFTags CutIds="REACTIONS" CutCounts="COUNTS" CutSum="SUM" Count="COUNT" FragmentSets="FRAGMENTSETS"/> <Fragmentation MaxFragmentCount="3" MaxSetCount="30" Extensive="false"/> </Params> </Fragmenter> <Reviser> <Recap Class="chemaxon.reaction.Recap"> <Notlist> <Mol ID="butyl" Structure="CCCC"/> <Mol ID="ibutyl" Structure="CC(C)C"/> </Notlist> <Params> <Limits MaxCutCount="4" MinAtomCount="4"/> <Options CutRingCHetero="false"/> </Params> </Recap> </Reviser> </FragmenterConfiguration>
mols.sdf file
and writes the molecule fragments to the standard output in cxsmiles format:
fragment -c Fragmenter.xml mols.sdf
fragment -c Fragmenter.xml "CC(CCN(C)COCCC1=CC=CC=C1C2=CC=CC=C2)COCN" "CCCCN(C)C(C(=O)C1CCCC(Cl)C1)C(C)C(Cl)Cl"
fragment -c Fragmenter.xml -e mols.sdf
5 fragment sets
and maximum 4 fragments in each fragment set:
fragment -c Fragmenter.xml -x 5 -y 4 mols.sdf
o.sdf,
then displays the result in MarvinView:
fragment -c Fragmenter.xml mols.sdf -f sdf -o o.sdf mview o.sdf
fragment -c Fragmenter.xml mols.sdf -f sdf | mview -
Note that such piping does not work in Windows.
fragment -c Fragmenter.xml mols.sdf -o fragments.cxsmiles fragstat fragments.cxsmiles
Or in a single command:
fragment -c Fragmenter.xml mols.sdf | fragstat
Note that such piping does not work in Windows.
DATA SDFile tag of the input molecules
(counts fragments by data classes and sorts the result by occurrences):
fragment -c Fragmenter.xml -s DATA mols.sdf -o fragments.cxsmiles fragstat -c "0.2 0.5" fragments.cxsmiles
Or in a single command:
fragment -c Fragmenter.xml -s DATA mols.sdf | fragstat -c "0.2 0.5"
Note that such piping does not work in Windows.
Appropriate fragmentation parameter settings can be used to avoid combinatorial explosure:
--fragment-count
--set-count
Fragment repetition is detected by first comparing the unique ID of the two fragments, then if the two IDs are the same, structure search is performed to test exact molecular structure matching. Note that this duplicate check is performed for each input molecule separately, therefore duplicated fragments may occur if they correspond to different input molecules. For a complete duplicate check with fragment sorting based on occurrences, create Fragment Statistics from the fragments created by Fragmenter.
RECAP - Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorical Chemistry In: J. Chem. Inf. Comput. Sci. 1998, 38. 511-522