Command line interfaces (CLIs)
Introduction
The command line tools allow end-user access to the overlap/fast similarity search APIs. Instead of a single universal CLI multiple workflow specific CLIs are provided. Launcher scripts of the command line tools are found in the directory bin/
.
Note that the distribution contains further self contained example scripts in directory examples/
, they are detailed in document Self contained examples.
General considerations
-
Intermediate outputs (serialized structures) can not be used across different versions. In case of version change usually all the import/preparation processes must be run.
-
Various licenses (additional to license named
Overlap
) are required for executing the command line examples. For details see ChemAxon Installing Licenses documentation. -
Command line tools print help and terminate when option
-h
is passed. -
Scripts usually print verbose information during execution to the standard error.
-
Internal IO recognizes gzipped input structure files and transparently unzips them. The documentation however contains examples where unzipping is done using
gzip
and content read from the standard input. (The general pattern isgzip -dc <INFILE>.gz | <SCRIPT> ... -in - ...
.) -
Usually parameter
-tp
sets the thread pool size for concurrent operations. As a rule of thumb one can use the number of visible CPU cores as a starting number (this value is set as default). Usejvisualvm
tool of JDK to monitor execution (heap size vs GC activity, CPU utilization) or the supplied execution profiling when available. For details see document Profiling and execuiton statistics. -
Command line scripts might fail currently when they are accessed through links. This will be fixed in the future.
-
Some command line scripts support the collection of profiling (option
-prof
) and execution statistics (option-stat
) data. For details see document Profiling and execuiton statistics. -
An
OverlapAnalysisContext
collects rules and methods required for descriptor generation and comparison. The context used for descriptor generation is stored in the generated binary file however it can be customized using scripting hooks when used in searches. See Basic overview of the concepts of overlap analysis context for details. -
Launcher scripts check if variable
JAVA_HOME
is set. When set they will use<JAVA_HOME>/bin/java
to launch Java. Otherwise they will use commandjava
.
Special arguments
Normal command line arguments (options) of the launcher scripts are passed to the launched applications. Some special arguments (when preceding the normal ones) are captured by the launcher script and treated differently. These special arguments:
Some of the JVM options
JVM options -server
, -client
, -X....
, -D....
, -verbose:....
and -javaagent:....
(preceding normal options) arte recognized. These options are passed to the JVM. For example
-
dumpStorage.sh -Xmx4g -in bigfile.bin
will start the JVM with 4G heap (option-Xmx4g
is passed as JVM option). Malformed commanddumpStorage.sh -in bigfile.bin -Xmx4g
will try to interpret-Xmx4g
as a normal command line argument of the tool and will fail (since-X...
option does not precede normal option-in
). -
jseval.sh "-Dabc.def-ghi=jkl mno" -js "print(java.lang.System.getProperties());"
will set system propertyabc.def-ghi
to valuejkl_mno
. This machanism allows data injection for Profiling and execution statistics.
Classpath addition
Option -classpath <SPEC>
will accept additional parts for the classpath to be used. Argument <SPEC>
will be added to the beginning of the classpath of the application.
Launcher script diagnostic
Option -launcherverbose
will print details of the launcher script internal state (Java command to be used, classpath, JVM options, arguments, main class to be launched) and Java version before invoking the application.
Complete example workflows
Scripts in directory examples
provide complete workflows exercising functionality available in this package. For details see documentation.
-
Some of the JVM options (
-server
,-client
,-X....
,-D....
,-verbose:....
,-javaagent:....
). These options are passed to the JVM. For details on these JVM options see http://jvmmemory.com/. Examples:-
dumpStorage.sh -Xmx4g -in bigfile.bin
will start the JVM with 4G heap (option-Xmx4g
is passed as JVM option). Malformed commanddumpStorage.sh -in bigfile.bin -Xmx4g
will try to interpret-Xmx4g
as a normal command line argument of the tool and will fail. -
jseval.sh "-Dabc.def-ghi=jkl mno" -js "print(java.lang.System.getProperties());"
will set system propertyabc.def-ghi
to valuejkl_mno
. This mechanism allows data injection for Profiling and execution statistics.
-
-
Launcher scripts will recognize option
-classpath <SPEC>
preceding normal options. The argument<SPEC>
will be appended to the classpath used for application startup. -
Launcher scripts will recognize option
-launcherverbose
preceding normal options. Whe used the script will print its settings (effective classpath, JVM options, normal options) to the console before launching the application.
Creation of serialized structures
-
calculateOverlap.sh
Calculate similarity based overlap analysis. For details see Introduction to overlap analysis. -
buildStorage.sh
Read molecules, generate descriptors and ID mapping and write them to a serialized file. See Basic search workflow for usage example. -
importStorage.sh
Parse descriptors from String representation, create ID mapping and write to a serialized file. See Custom binary descriptors and Custom float descriptors for usage example. -
createMms.sh
Read molecules and store them in a serialized file. Also molecule names or SD properties can be retrieved and stored in separate files. See Basic search workflow for usage example. -
createAllAbsentMms.sh
Create a virtual master molecule store containing only absent (unknown) structures.
Searching of serialized structures
searchStorage.sh
Read serialized descriptor storage file and search descriptors (either parsed from String form or generated using the specified context). See Basic search workflow for usage example.
Interactive front-end applications
gui.sh
Provide real time similarity search and visualize the results ofcalculateOverlap.sh
andcalculateSelfOverlap.sh
. This interface also provides a REST API for remote clients. See REST API example for usage example.
Debug, diagnostic, tools
-
prepareMolecules.sh
Tool for common molecule conversions helping to prepare publicly available molecule sets. Usage examples can be found in document Prepare molecules. Note that in case of export conversion error the offending structure will be ignored. -
dumpStorage.sh
Diagnostic tool to write the contents of serialized files to the console for further inspection. Verbose messages and dumped contents are printed to the stderr. See Basic search workflow for usage example. -
stdg.sh
Single threaded descriptor generator; create string representations capable of comparing reference string descriptor representations for testing. Also can be used to create sample input for custom descriptor import. -
jseval.sh
Evaluate custom JavaScript code; can be used for provide JS based scripting. This tool allows the injection of arbitrary parameters into the script context.