chemaxon.jchem.db
Class Importer

java.lang.Object
  extended byjava.lang.Thread
      extended bychemaxon.jchem.db.Importer
All Implemented Interfaces:
java.lang.Runnable, chemaxon.jchem.db.Transfer

public class Importer
extends java.lang.Thread
implements chemaxon.jchem.db.Transfer

Tool for importing molecules to database tables from a File or InputStream object.

Author:
Szilard Dorant

Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Fields inherited from interface chemaxon.jchem.db.Transfer
JTF, MOLFILE, MRV, RDFILE, RXNFILE, SDFILE, SMILES
 
Constructor Summary
Importer()
           
 
Method Summary
 void cancel()
          Stops the importing progress.
static java.io.InputStream decompress(java.io.InputStream is)
          Detects if the InputStream is compressed with Gzip, returns with GZIPInputStream if necessary.
 ConnectionHandler getConnectionHandler()
          Getter for property connectionHandler.
 java.lang.String getConnections()
          Deprecated. since 2.2 replaced by getFieldConnections().
 java.util.ArrayList getDuplicateIDs()
          Returns the IDs (cd_id column in database table) of duplicate structures.
 int getDuplicates()
          Returns the number of molecules that were not imported, because they are duplicates.
 int getEmptyStructures()
          Returns the number of molecules that were not imported, because they are empty strucures.
 boolean getEmptyStructuresAllowed()
          Gets whether empty structures are allowed.
 java.lang.String getErrorMessage()
          If error occures this function returns the error message;
 java.lang.String getFieldConnections()
          Returns the specified table field - file field pairs.
 java.util.Vector getFieldNames()
          Returns field names in an SDfile.
static java.util.Vector getFieldNames(java.io.InputStream is, int linesToCheck)
          Returns field names in an SDfile.
 IntArray getImportedIDs()
          Returns the IDs (cd_id column in database table) of imported structures.
 int getImportedNumber()
          Returns the number of imported molecules.
 java.lang.Object getInput()
          Gets the source object.
 int getLinesToCheck()
          Gets the number of lines to check for file format.
 java.lang.String getNote()
          Returns the note of the progresswriter.
 long getProgress()
          Gets the status of the importing progress.
 ProgressWriter getProgressWriter()
          Gets the ProgressWriter object used for monitoring.
 boolean getSetChiralFlag()
          Gets whether chiral flag is set on import.
 int getSkip()
          Gets the number of molecules to skip from the beginning ogf file.
 int getStructCount()
          Returns the current count of structures which were examined by the import process.
 java.lang.String getTableName()
          Gets the name of the table to import into.
 int importMols()
          Imports molecules.
 void init()
          Initialization, checking given number of lines for file format and fields.
 boolean isDuplicateImportAllowed()
          Gets whether duplicate structures are allowed.
 boolean isFinished()
          Returns true if importing has finished, else returns false.
 boolean isHaltOnError()
          Gets if import should stop when an error occures.
 void run()
          Starts execution as a thread.
 void setConnectionHandler(ConnectionHandler conh)
          Setter for property connectionHandler.
 void setConnections(java.lang.String connections)
          Deprecated. since 2.2 replaced by setFieldConnections(String).
 void setDuplicateImportAllowed(boolean b)
          If set to false does not import molecules that already exist in the table with the same topology.
 void setEmptyStructuresAllowed(boolean b)
          If set to false does not import empty molecules.
 void setFieldConnections(java.lang.String connections)
          Specifies which data fields correspond to which table fieds.
 void setHaltOnError(boolean b)
          Sets if import should stop when an error occures.
 void setInfoStream(java.io.PrintStream st)
          Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).
 void setInput(java.io.File inputFile)
          Sets the source object as a file.
 void setInput(java.io.InputStream is)
          Sets the source object as a stream.
 void setInput(java.lang.String fileName)
          Sets the source object as a file, specifying the name of the file.
 void setLinesToCheck(int linesToCheck)
          Sets the number of lines to check for file format.
 void setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, java.io.OutputStream os, boolean doNotImport)
          With this option one can print duplicate or non-duplicate molecules to a stream.
 void setProgressWriter(ProgressWriter pwriter)
          Sets the ProgressWriter object to track the progress the actual importing.
 void setSetChiralFlag(boolean setChiralFlag)
          Sets if chiral flag should be set to true during import.
 void setSkip(int skip)
          Sets the number of molecules to skip from the beginning ogf file.
 void setStoreDuplicates(boolean value)
          Specifies whether the ID's of duplicate structures should be stored.
 void setStoreImportedIDs(boolean value)
          Specifies whether the ID's of imported structures should be stored.
 void setTableName(java.lang.String tname)
          Sets the name of the table to import into.
 void skip(int offset)
          Skips the given number of molecules.
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getContextClassLoader, getName, getPriority, getThreadGroup, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setName, setPriority, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Importer

public Importer()
Method Detail

setConnectionHandler

public void setConnectionHandler(ConnectionHandler conh)
Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.


getConnectionHandler

public ConnectionHandler getConnectionHandler()
Getter for property connectionHandler.


setInput

public void setInput(java.io.File inputFile)
Sets the source object as a file.


setInput

public void setInput(java.io.InputStream is)
Sets the source object as a stream.


setInput

public void setInput(java.lang.String fileName)
Sets the source object as a file, specifying the name of the file.


getInput

public java.lang.Object getInput()
Gets the source object. The objcet may be File or InputStream.


setTableName

public void setTableName(java.lang.String tname)
Sets the name of the table to import into.


getTableName

public java.lang.String getTableName()
Gets the name of the table to import into.


setConnections

public void setConnections(java.lang.String connections)
Deprecated. since 2.2 replaced by setFieldConnections(String).

Specifies which data fields correspond to which table fieds.

The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"


setFieldConnections

public void setFieldConnections(java.lang.String connections)
Specifies which data fields correspond to which table fieds.

The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"


getConnections

public java.lang.String getConnections()
Deprecated. since 2.2 replaced by getFieldConnections().

Returns the specified table field - file field pairs.


getFieldConnections

public java.lang.String getFieldConnections()
Returns the specified table field - file field pairs.


setLinesToCheck

public void setLinesToCheck(int linesToCheck)
Sets the number of lines to check for file format. The same number of lines will be checked for field names in the case of SDfiles.
Default value is 500 lines.
Note: In the case of using InputStream as source, these lines are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
Using File as input recommended if it's feasible, since it doesn't need buffering.


getLinesToCheck

public int getLinesToCheck()
Gets the number of lines to check for file format.


setProgressWriter

public void setProgressWriter(ProgressWriter pwriter)
Sets the ProgressWriter object to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
It can be null if no monitoring is necessary.


getProgressWriter

public ProgressWriter getProgressWriter()
Gets the ProgressWriter object used for monitoring.


setHaltOnError

public void setHaltOnError(boolean b)
Sets if import should stop when an error occures.


isHaltOnError

public boolean isHaltOnError()
Gets if import should stop when an error occures.


setDuplicateImportAllowed

public void setDuplicateImportAllowed(boolean b)
If set to false does not import molecules that already exist in the table with the same topology. This checking may slow down the import progress.


isDuplicateImportAllowed

public boolean isDuplicateImportAllowed()
Gets whether duplicate structures are allowed.


setEmptyStructuresAllowed

public void setEmptyStructuresAllowed(boolean b)
If set to false does not import empty molecules.


getEmptyStructuresAllowed

public boolean getEmptyStructuresAllowed()
Gets whether empty structures are allowed.


setSetChiralFlag

public void setSetChiralFlag(boolean setChiralFlag)
Sets if chiral flag should be set to true during import.

Parameters:
setChiralFlag - if set to true, chiral flag is set to true for imported molecules. The default setting is false. since 2.3

getSetChiralFlag

public boolean getSetChiralFlag()
Gets whether chiral flag is set on import.


isFinished

public boolean isFinished()
Returns true if importing has finished, else returns false.


getErrorMessage

public java.lang.String getErrorMessage()
If error occures this function returns the error message;


getStructCount

public int getStructCount()
Returns the current count of structures which were examined by the import process.


getImportedNumber

public int getImportedNumber()
Returns the number of imported molecules.


getDuplicates

public int getDuplicates()
Returns the number of molecules that were not imported, because they are duplicates.


getEmptyStructures

public int getEmptyStructures()
Returns the number of molecules that were not imported, because they are empty strucures.


getNote

public java.lang.String getNote()
Returns the note of the progresswriter.


setSkip

public void setSkip(int skip)
Sets the number of molecules to skip from the beginning ogf file.


getSkip

public int getSkip()
Gets the number of molecules to skip from the beginning ogf file.


getProgress

public long getProgress()
Gets the status of the importing progress.

Returns:
the position of the ProgressWriter, -1 if the object is not set (null)

run

public void run()
Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr.

Specified by:
run in interface java.lang.Runnable

setInfoStream

public void setInfoStream(java.io.PrintStream st)
Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).

Parameters:
st - the stream. The default is null (no info is written).

setOutputOptions

public void setOutputOptions(boolean printDuplicates,
                             boolean printNonDuplicates,
                             java.io.OutputStream os,
                             boolean doNotImport)
With this option one can print duplicate or non-duplicate molecules to a stream.


setStoreDuplicates

public void setStoreDuplicates(boolean value)
Specifies whether the ID's of duplicate structures should be stored.


setStoreImportedIDs

public void setStoreImportedIDs(boolean value)
Specifies whether the ID's of imported structures should be stored.

Since:
JChem 3.1.7
See Also:
getImportedIDs()

getDuplicateIDs

public java.util.ArrayList getDuplicateIDs()
Returns the IDs (cd_id column in database table) of duplicate structures.

Returns:
the IDs as an ArrayList containing Integer objects.

getImportedIDs

public IntArray getImportedIDs()
Returns the IDs (cd_id column in database table) of imported structures.

Returns:
the IDs stored in an IntArray object
Since:
JChem 3.1.7
See Also:
setStoreImportedIDs(boolean)

importMols

public int importMols()
               throws TransferException
Imports molecules.

Returns:
the number of molecules imported
Throws:
TransferException

cancel

public void cancel()
Stops the importing progress.


skip

public void skip(int offset)
          throws TransferException
Skips the given number of molecules.

Parameters:
offset - the number of molecules to be skipped
Throws:
TransferException

init

public void init()
          throws TransferException
Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called by skip or importMols if necessary.

Throws:
TransferException

decompress

public static java.io.InputStream decompress(java.io.InputStream is)
                                      throws java.io.IOException
Detects if the InputStream is compressed with Gzip, returns with GZIPInputStream if necessary.

Throws:
java.io.IOException

getFieldNames

public java.util.Vector getFieldNames()
                               throws TransferException,
                                      java.io.IOException
Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Calls int if initialization is necessary.

Returns:
a vector of String objects, the names of the SDfile fields.
Throws:
TransferException
java.io.IOException

getFieldNames

public static java.util.Vector getFieldNames(java.io.InputStream is,
                                             int linesToCheck)
                                      throws java.io.IOException,
                                             MRecordParseException
Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)

Returns:
a vector of String objects, the names of the SDfile fields.
Throws:
java.io.IOException
MRecordParseException