Command line interfaces (CLIs)
Introduction
The command line tools allow end-user access to the overlap/fast similarity search APIs. Instead of a single universal CLI multiple workflow specific CLIs are provided. Launcher scripts of the command line tools are found in the directory bin/.
Note that the distribution contains further self contained example scripts in directory examples/, they are detailed in document Self contained examples.
General considerations
- 
  Intermediate outputs (serialized structures) can not be used across different versions. In case of version change usually all the import/preparation processes must be run. 
- 
  Various licenses (additional to license named Overlap) are required for executing the command line examples. For details see ChemAxon Installing Licenses documentation.
- 
  Command line tools print help and terminate when option -his passed.
- 
  Scripts usually print verbose information during execution to the standard error. 
- 
  Internal IO recognizes gzipped input structure files and transparently unzips them. The documentation however contains examples where unzipping is done using gzipand content read from the standard input. (The general pattern isgzip -dc <INFILE>.gz | <SCRIPT> ... -in - ....)
- 
  Usually parameter -tpsets the thread pool size for concurrent operations. As a rule of thumb one can use the number of visible CPU cores as a starting number (this value is set as default). Usejvisualvmtool of JDK to monitor execution (heap size vs GC activity, CPU utilization) or the supplied execution profiling when available. For details see document Profiling and execuiton statistics.
- 
  Command line scripts might fail currently when they are accessed through links. This will be fixed in the future. 
- 
  Some command line scripts support the collection of profiling (option -prof) and execution statistics (option-stat) data. For details see document Profiling and execuiton statistics.
- 
  An OverlapAnalysisContextcollects rules and methods required for descriptor generation and comparison. The context used for descriptor generation is stored in the generated binary file however it can be customized using scripting hooks when used in searches. See Basic overview of the concepts of overlap analysis context for details.
- 
  Launcher scripts check if variable JAVA_HOMEis set. When set they will use<JAVA_HOME>/bin/javato launch Java. Otherwise they will use commandjava.
Special arguments
Normal command line arguments (options) of the launcher scripts are passed to the launched applications. Some special arguments (when preceding the normal ones) are captured by the launcher script and treated differently. These special arguments:
Some of the JVM options
JVM options -server, -client, -X...., -D...., -verbose:.... and -javaagent:.... (preceding normal options) arte recognized. These options are passed to the JVM. For example
- 
  dumpStorage.sh -Xmx4g -in bigfile.binwill start the JVM with 4G heap (option-Xmx4gis passed as JVM option). Malformed commanddumpStorage.sh -in bigfile.bin -Xmx4gwill try to interpret-Xmx4gas a normal command line argument of the tool and will fail (since-X...option does not precede normal option-in).
- 
  jseval.sh "-Dabc.def-ghi=jkl mno" -js "print(java.lang.System.getProperties());"will set system propertyabc.def-ghito valuejkl_mno. This machanism allows data injection for Profiling and execution statistics.
Classpath addition
Option -classpath <SPEC> will accept additional parts for the classpath to be used. Argument <SPEC> will be added to the beginning of the classpath of the application.
Launcher script diagnostic
Option -launcherverbose will print details of the launcher script internal state (Java command to be used, classpath, JVM options, arguments, main class to be launched) and Java version before invoking the application.
Complete example workflows
Scripts in directory examples provide complete workflows exercising functionality available  in this package. For details see documentation.
- 
    Some of the JVM options ( -server,-client,-X....,-D....,-verbose:....,-javaagent:....). These options are passed to the JVM. For details on these JVM options see http://jvmmemory.com/. Examples:- 
      dumpStorage.sh -Xmx4g -in bigfile.binwill start the JVM with 4G heap (option-Xmx4gis passed as JVM option). Malformed commanddumpStorage.sh -in bigfile.bin -Xmx4gwill try to interpret-Xmx4gas a normal command line argument of the tool and will fail.
- 
      jseval.sh "-Dabc.def-ghi=jkl mno" -js "print(java.lang.System.getProperties());"will set system propertyabc.def-ghito valuejkl_mno. This mechanism allows data injection for Profiling and execution statistics.
 
- 
      
- 
  Launcher scripts will recognize option -classpath <SPEC>preceding normal options. The argument<SPEC>will be appended to the classpath used for application startup.
- 
  Launcher scripts will recognize option -launcherverbosepreceding normal options. Whe used the script will print its settings (effective classpath, JVM options, normal options) to the console before launching the application.
Creation of serialized structures
- 
  calculateOverlap.shCalculate similarity based overlap analysis. For details see Introduction to overlap analysis.
- 
  buildStorage.shRead molecules, generate descriptors and ID mapping and write them to a serialized file. See Basic search workflow for usage example.
- 
  importStorage.shParse descriptors from String representation, create ID mapping and write to a serialized file. See Custom binary descriptors and Custom float descriptors for usage example.
- 
  createMms.shRead molecules and store them in a serialized file. Also molecule names or SD properties can be retrieved and stored in separate files. See Basic search workflow for usage example.
- 
  createAllAbsentMms.shCreate a virtual master molecule store containing only absent (unknown) structures.
Searching of serialized structures
- searchStorage.shRead serialized descriptor storage file and search descriptors (either parsed from String form or generated using the specified context). See Basic search workflow for usage example.
Interactive front-end applications
- gui.shProvide real time similarity search and visualize the results of- calculateOverlap.shand- calculateSelfOverlap.sh. This interface also provides a REST API for remote clients. See REST API example for usage example.
Debug, diagnostic, tools
- 
  prepareMolecules.shTool for common molecule conversions helping to prepare publicly available molecule sets. Usage examples can be found in document Prepare molecules. Note that in case of export conversion error the offending structure will be ignored.
- 
  dumpStorage.shDiagnostic tool to write the contents of serialized files to the console for further inspection. Verbose messages and dumped contents are printed to the stderr. See Basic search workflow for usage example.
- 
  stdg.shSingle threaded descriptor generator; create string representations capable of comparing reference string descriptor representations for testing. Also can be used to create sample input for custom descriptor import.
- 
  jseval.shEvaluate custom JavaScript code; can be used for provide JS based scripting. This tool allows the injection of arbitrary parameters into the script context.

