Basic overview of the concepts of overlap analysis context
An instance of OverlapAnalysisContext
class (see apidoc) represents major settings and parameters required for generating and comparating molecular descriptors (fingerprints). Command line tools usually need the context specified explicitly by the user. Usually parameters -context
and -contextjs
used for this specification. The help printed by the involved command line tools (printed when option -h
passed) documents these options briefly.
Underlying APIs
Internally an instance of OverlapAnalysisContext
class (see apidoc) is used for calculations. Command line tools use OverlapAnalysisContextFactory
(see apidoc) to create predefined instances specified by option -context
. JavaScript context customization/creation hooks (to interact directly directly with the Java APIs) can be specified by option -contextjs
; they are processed by class ContextJsTools
(see apidoc).
Note that class OverlapAnalysisContext
is an immutable (see Wikipedia) cumulative factory (See explanation) class: each method invoked will create a new instance of it. On the other hand typically descriptor parameter builders (like CfpParameters.Builder
) are builders (see Wikipedia) where method invocations modify the state of the builder itself and the build()
method will create the immutable parameter object (like CfParameters
).
Using pre-defined contexts
Predefined contexts are referenced by option -context
. These available ones are printed when option -h
passed to the involved command line tool:
bin/buildStorage.sh -h
....
Applicable context names:
"createSimpleCfp5Context" "createSimpleCfp6Context" "createSimpleCfp7Context" "createSimpleCfp8Context" "createSimpleCfp9Context" "createSimpleCfp10Context" "createSimpleEcfp4Context" "createSimpleEcfp6Context" "createSimpleEcfp8Context" "createSimpleEcfp10Context" "createSimplePharmaCalcContext" "screen3d" "screen3dr" "createSimpleCfp4Context"
....
Using custom JS hooks
A custom JavaScript fragment can be passed to option -contextjs
which last statement is expected to specify the context to be used. This script fragment access the Java API and some preinitialized helper variables, also documented by the help printed.
Example JS hook: customize metric
This example shows how to change the represented metric of a specified context. A pre-defined context (specified with option -context
is used initially. It is further customized in the script hook.
...
-context "createSimpleCfp7Context" -contextjs "ctx.descriptorComparator(ctx.getDescriptorGenerator().getBinaryMetricsComparator(bm_MANHATTAN))"
...
Breakdown of the contents of the passed JavaScript fragment customizing the OverlapAnalysisContext
used:
Script part | Description |
---|---|
ctx |
This reference holds the OverlapAnalysisContext instance specified by option -context . See apidoc. |
.descriptorComparator(...) |
Update metric to be used. See apidoc. |
ctx.getDescriptorGenerator() |
Represented generator; will use its factory methods to create new metric. See apidoc. |
.getBinaryMetricsComparator(...) |
Factory method for non parametrized metrics. See apidoc. |
bm_MANHATTAN |
Constant which can be passed to method .getBinaryMetricComparator(..) . See apidoc. |
Example JS hook: customize a fingerprint
This example shows how to access Java API to set fingerprint parameters. Note that the fast similarity search tools and this overlap-examples
distribution prefer to use the "new" descriptors API (found in package com.chemaxon.descriptors
)
....
-contextjs "ctx_from_descpb(bld_cfp.length(2048).bitsPerPattern(4).bondCount(2).rings(false)).standardizer(std_defaultaroma)"
....
Breakdown of the contents of the passed JavaScript fragment creating the OverlapAnalysisContext
used:
Script part | Description |
---|---|
ctx_from_descpb(..) |
Helper function which creates a default OverlapAnalysisContext from the associated DescriptorParameters builder. |
bld_cfp |
A builder instance for CfpParameters in default state. |
.length(..) |
Update builder with length parameter (see apidoc). |
.bitsPerPattern(..) |
Update builder with bitsPerPattern parameter (see apidoc). |
.bondCount(..) |
Update builder with bondCount parameter (see apidoc). |
.rings(..) |
Update builder with rings parameter (see apidoc). |
.standardizer(..) |
Update context (created by ctx_from_descpb(..) ) by specifying a standardizer. (See apidoc); |
std_defaultaroma |
Helper constant, a StandardizerWrapper instance wrapping default aromatization |
Example JS hook: define a custom binary vector descriptor
Custom binary vector descriptors holding externally defined fingerprints currently must be defined using the Java API. This example defines a descriptor expecting 1024 bit length binary bit strings:
....
-contextjs "ctx_from_descpb(bld_bv.length(1024).endianness(en_BIG_ENDIAN).stringFormat(sf_STRICT_BINARY_STRING))"
....
Breakdown of the contents of the passed JavaScript fragment creating the OverlapAnalysisContext
used:
Script part | Description |
---|---|
ctx_from_descpb(..) |
Helper function which creates a default OverlapAnalysisContext from the associated DescriptorParameters builder. |
bld_bv |
A builder instance for BvParameters in default state. |
.length(..) |
Update builder with length parameter (see apidoc). |
.endianness(..) |
Update builder with endianness parameter (see apidoc). |
en_BIG_ENDIAN |
Constant which can be passed to .endianness(..) (see apidoc). |
.stringFormat(..) |
Update builder with string format parameter (see apidoc). |
sf_STRICT_BINARY_STRING |
Constant which can be passed to .stringFormat(..) (see apidoc). |
Helper function ctx_from_descpb
This helper function definition (as documented by the command line tools help) is the following JavaScript fragment:
ctx_from_descpb = function ctx_from_desc(d) {
return Packages.com.chemaxon.overlap.OverlapAnalysisContext.initial(d.build().getDescriptorGenerator());
}
Breakdown of the used parts of the Java API:
Script part | Apidoc link |
---|---|
Packages.com.chemaxon.overlap.OverlapAnalysisContext.initial |
apidoc |
d |
Expected to be a builder for a DescriptorParameters |
Examples for diagnostic
During the evaluation of scripting hook JavaScript command println(...)
can be used for diagnostic. Most classes in the new descriptors API produce meaningful messages in their toString()
methods. Some objects implement method toString(boolean multiline)
or toMultilineString()
which returns more readable String representations.
Note that the execution of most command line tools is verbose; they print the textual representation of the main settings eventually used.
Print already initialized context
cat data/vitamins.smi | bin/buildStorage.sh -in - -out tmp.bin \
-context createSimpleCfp7Context \
-contextjs "println('Initialized context:'); println(ctx.toString(true)); ctx"
Note that last statement of the script passed to option -contextjs
must be the OverlapAnalysisContext
to be used. In this example we just want to print it to the console, but println() would return undefined
so as a last statement the already initialized context reference (ctx
) is used.
Output:
com.chemaxon.overlap.cli.BuildStorage
args: [-in, -, -out, tmp.bin, -context, createSimpleCfp7Context, -contextjs, println('Initialized context:'); println(ctx.toString(true)); ctx]
Initialized context:
Overlap analysis context.
Pagesize: 50
Standardizer: ThreadLocalized wrapper over chemaxon.standardizer.Standardizer@3cf05ce2 (actions count: 1)
Generator: CFP generator, parameters: bond count: 7 (bits per pattern: 1, length: 1024)
Comparator: Comparator BINARY_TANIMOTO, vector size: 1024 bits
Extractor: Extract packed long [] fingerprint representation (16 longs, 1024 bits)
Unguarded calc: Tanimoto dissimilarity of binary fingerprints represented as packed long[]
Context
Overlap analysis context.
Pagesize: 50
Standardizer: ThreadLocalized wrapper over chemaxon.standardizer.Standardizer@3cf05ce2 (actions count: 1)
Generator: CFP generator, parameters: bond count: 7 (bits per pattern: 1, length: 1024)
Comparator: Comparator BINARY_TANIMOTO, vector size: 1024 bits
Extractor: Extract packed long [] fingerprint representation (16 longs, 1024 bits)
Unguarded calc: Tanimoto dissimilarity of binary fingerprints represented as packed long[]
Reading - time: 491 ms (30 x 16 ms each)
(Finished) Reading - time: 491 ms (30 x 16 ms each)
Error counts collected: Total: 30 OK: 30 Parse error: 0 Process error: 0
Index projector: Skiplist index projector initialMasterSkips: 0 maxClientIndex: 29 maxMasterIndex: 29 master index skiplist: []
Writing tmp.bin time: 12 ms (1 x 12 ms each) (1 of 30; 3 %)
(Finished) Writing tmp.bin time: 12 ms (30 x 400 us each) (30 of 30; 100 %)
All done.
Look up available metrics of a predefined context
A DescriptorComparator
instance represents a metric. Such instances are usually created by various factory methods of the associated DescriptorGenerator
instance. Java API documentation describes the available such methods for various descriptors. Script hooks can be useful to look up the type of the associated DescriptorGenerator
instances represented by the pre-defined contexts.
cat data/vitamins.smi | bin/buildStorage.sh -in - -out tmp.bin \
-context createSimpleCfp7Context \
-contextjs "println('Generator summary: '+ctx.getDescriptorGenerator()); println('Generator immediate type: '+ctx.getDescriptorGenerator().getClass())"
Output:
com.chemaxon.overlap.cli.BuildStorage
args: [-in, -, -out, tmp.bin, -context, createSimpleCfp7Context, -contextjs, println('Generator summary: '+ctx.getDescriptorGenerator()); println('Generator immediate type: '+ctx.getDescriptorGenerator().getClass())]
Generator summary: CFP generator, parameters: bond count: 7 (bits per pattern: 1, length: 1024)
Generator immediate type: class com.chemaxon.descriptors.fingerprints.cfp.CfpGeneratorImpl
Exception in thread "main" java.lang.IllegalArgumentException: Script returned null: println('Generator summary: '+ctx.getDescriptorGenerator()); println('Generator immediate type: '+ctx.getDescriptorGenerator().getClass())
at com.chemaxon.overlap.ContextJsTools.evalJs(ContextJsTools.java:186)
at com.chemaxon.overlap.ContextJsTools.initializeContext(ContextJsTools.java:220)
at com.chemaxon.overlap.cli.BuildStorage.main(BuildStorage.java:95)
This execution fails since the scripting hook did not returned a valid OverlapAnalysisContext
instance, however we have the immediate type of the associated DescriptorGenerator
: com.chemaxon.descriptors.fingerprints.cfp.CfpGeneratorImpl
. We can look up its apidoc. We can identify the applicable DescriptorComparator
factory methods: