D
- Stored descriptor typepublic final class PagedDescriptorStorage<D extends Descriptor> extends Object implements Updater<D>
Storage is organized into fixed size pages. All pages are full, expect the last one which can be partially filled. Descriptors at pages are indexed sequentially.
Licensing: this class can be used with valid LicenseGlobals.MADFAST
license.
Modifier and Type | Field and Description |
---|---|
static int |
MAX_RESULT_QUEUE_SIZE
Max non reported queue elements.
|
Constructor and Description |
---|
PagedDescriptorStorage(int pagesize,
DescriptorGenerator<D> generator)
Construct new empty descriptor storage.
|
PagedDescriptorStorage(int pagesize,
DescriptorGenerator<D> generator,
InputStream is,
SubProgressObserver po)
Construct from a
String serialized form. |
PagedDescriptorStorage(int pagesize,
DescriptorGenerator<D> generator,
ObjectInputStream ois,
SubProgressObserver po)
Construct from a
byte [] serialized form. |
Modifier and Type | Method and Description |
---|---|
void |
addAll(InputStream is,
String opts,
int skipCount,
int maxProcessCount,
StandardizerWrapper standardizer,
SubProgressObserver po,
ExecutorService e,
MoleculeCallback moleculeCallback)
Read all molecules from a structure file into the similarity subsystem.
|
int |
addDescriptor(D d)
Add a single descriptor to the similarity subsystem.
|
int |
addMolecule(Molecule m)
Add a single molecule to the similarity subsystem.
|
<T extends Serializable> |
createBruteForceOverlap(UnguardedExtractor<D,T> extractor,
UnguardedDissimilarityCalculator<T> comparator)
Create a brute force overlap calculator from the current state of the storage.
|
static <D extends Descriptor,T extends Serializable> |
deserializeUnguarded(int pagesize,
DescriptorGenerator<D> generator,
UnguardedExtractor<D,T> extractor,
UnguardedDissimilarityCalculator<T> comparator,
ObjectInputStream ois,
Sink<Descriptor> onDescriptorRead,
SubProgressObserver po)
Deserialize an
UnguardedPagedSimilarity from a binary serialized form. |
static <D extends Descriptor,T extends Serializable> |
deserializeUnguarded(int pagesize,
DescriptorGenerator<D> generator,
UnguardedExtractor<D,T> extractor,
UnguardedDissimilarityCalculator<T> comparator,
ObjectInputStream ois,
SubProgressObserver po)
Deserialize an
UnguardedPagedSimilarity from a binary serialized form. |
int |
size()
Stored descriptor count.
|
void |
toBytes(ObjectOutputStream os,
SubProgressObserver po)
Deprecated.
Use
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver, long) with a
sound reset interval. |
void |
toBytes(ObjectOutputStream os,
SubProgressObserver po,
long resetInterval)
Dump descriptors to a binary file.
|
void |
toStrings(PrintStream ps,
SubProgressObserver po)
Write String representations to a
PrintStream . |
void |
toStrings(PrintStream ps,
SubProgressObserver po,
ExecutorService e)
Write String representations to a
PrintStream using concurrent conversions. |
public static final int MAX_RESULT_QUEUE_SIZE
This is the max number of enqueued Future
references waiting to final storage/error reporting.
public PagedDescriptorStorage(int pagesize, DescriptorGenerator<D> generator)
Note that to acquire guard object reference, an empty molecule is generated in the constructor.
pagesize
- Size of each page (molecules/descriptors)generator
- Represented descriptor generatorLicenseException
- when appropriate license is not availablepublic PagedDescriptorStorage(int pagesize, DescriptorGenerator<D> generator, ObjectInputStream ois, SubProgressObserver po) throws IOException, ClassNotFoundException
byte []
serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serialization. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorSerializer.fromByteArray(byte[])
.
Note that to acquire guard object reference, an empty molecule is generated in the constructor.
Compatible serialized form is generated by
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver)
. Note that
serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!
pagesize
- Size of each page (molecules/descriptors)generator
- Represented descriptor generatorois
- ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish
or abortpo
- ProgressObserver to track progress. Note that ProgressObserver.done()
is invoked upon
completionIOException
- re-thrown from passed ObjectInputStream
ClassNotFoundException
- re-thrown from passed ObjectInputStream
IllegalArgumentException
- upon error readingCancellationException
- upon cancellation from progress observerLicenseException
- when appropriate license is not availablepublic PagedDescriptorStorage(int pagesize, DescriptorGenerator<D> generator, InputStream is, SubProgressObserver po)
String
serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serializetion. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorSerializer.fromString(java.lang.String)
.
Note that to acquire guard object reference, an empty molecule is generated in the constructor.
pagesize
- Size of each page (molecules/descriptors)generator
- Represented descriptor generatoris
- InputStream to read descriptors line by linepo
- ProgressObserver to track progress. ProgressObserver.done()
is invoked upon
completionIllegalArgumentException
- upon error readingCancellationException
- upon cancellation from progress observerLicenseException
- when appropriate license is not availablepublic static <D extends Descriptor,T extends Serializable> UnguardedPagedOverlap<T> deserializeUnguarded(int pagesize, DescriptorGenerator<D> generator, UnguardedExtractor<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator, ObjectInputStream ois, SubProgressObserver po) throws IOException, ClassNotFoundException
UnguardedPagedSimilarity
from a binary serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serialization. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorSerializer.fromByteArray(byte[])
.
Compatible serialized form is generated by
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver)
. Note that
serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!
D
- Generated descriptor typeT
- Unguarded form of the descriptorspagesize
- Size of each pagegenerator
- Generator to be used for deserializationextractor
- Function
to extract unguarded descriptor content for storagecomparator
- Unguarded comparator to be represented by the constructed instanceois
- ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish
or abortpo
- ProgressObserver to track progress. Note that ProgressObserver.done()
is invoked upon
completionIOException
- re-thrown from passed ObjectInputStream
ClassNotFoundException
- re-thrown from passed ObjectInputStream
IllegalArgumentException
- upon error readingCancellationException
- upon cancellation from progress observerpublic static <D extends Descriptor,T extends Serializable> UnguardedPagedOverlap<T> deserializeUnguarded(int pagesize, DescriptorGenerator<D> generator, UnguardedExtractor<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator, ObjectInputStream ois, Sink<Descriptor> onDescriptorRead, SubProgressObserver po) throws IOException, ClassNotFoundException
UnguardedPagedSimilarity
from a binary serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serialization. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorSerializer.fromByteArray(byte[])
.
Compatible serialized form is generated by
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver)
. Note that
serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!
D
- Generated descriptor typeT
- Unguarded form of the descriptorspagesize
- Size of each pagegenerator
- Generator to be used for deserializationextractor
- Function
to extract unguarded descriptor content for storagecomparator
- Unguarded comparator to be represented by the constructed instanceois
- ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish
or abortonDescriptorRead
- Callback to invoke on each deserialized descriptor or null
po
- ProgressObserver to track progress. Note that ProgressObserver.done()
is invoked upon
completionIOException
- re-thrown from passed ObjectInputStream
ClassNotFoundException
- re-thrown from passed ObjectInputStream
IllegalArgumentException
- upon error readingCancellationException
- upon cancellation from progress observerpublic void toStrings(PrintStream ps, SubProgressObserver po) throws CancellationException
PrintStream
.
Any error from the underlying DescriptorSerializer.toString(com.chemaxon.descriptors.common.Descriptor)
will propagate from this method and the execution will be aborted.
ps
- PrintStream to write progress. Note that ps will not be closed upon finish.po
- Observer to follow progress. Observer is switched to determinate state with each descriptor
representing a work unit. Done will be reported upon completion/cancellation.CancellationException
- upon cancellation@Deprecated public void toBytes(ObjectOutputStream os, SubProgressObserver po) throws IOException
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver, long)
with a
sound reset interval.os
- Object output stream to write. Stream is not closed upon completion.po
- ProgressObserver to track progress. Observer is closed by invoking ProgressObserver.done()
upon completion, failure or cancellationCancellationException
- when cancelled through the given observerIOException
- thrown from passed ObjectOutputStream
public void toBytes(ObjectOutputStream os, SubProgressObserver po, long resetInterval) throws IOException
Warning! This method usually resets the given ObjectOutputStream
by calling its
ObjectOutputStream.reset()
method periodically.
This method differs from serialization: only the descriptors are written, the associated descriptor generator is not. Also, page size is not retained, so it is possible to read descriptors back to different page sizes.
It is important that the underlying DescriptorGenerator
instance must be reconstructed upon
deserialization. This method currently does not write descriptor generator related information, but this behavior
might change in the future.
Export format in the current version:
ObjectOutputStream.writeInt(int)
invoked with the total descriptor count as the parameterObjectOutputStream.writeUnshared(java.lang.Object)
invoked for each descriptor, byte []
representation of each descriptor is passed as the parameter
(created by
DescriptorSerializer.toByteArray(com.chemaxon.descriptors.common.Descriptor)
).ObjectOutputStream.reset()
is invoked to avoid memory leak in serializationos
- Object output stream to write. Stream is not closed upon completion.po
- ProgressObserver to track progress. Observer is closed by invoking ProgressObserver.done()
upon completion, failure or cancellationresetInterval
- Reset stream by invoking ObjectOutputStream.reset()
periodically after given
descriptors written. Value must be greater than zero.
todo: consider optimal value for resetInterval.CancellationException
- when cancelled through the given observerIOException
- thrown from passed ObjectOutputStream
public void toStrings(PrintStream ps, SubProgressObserver po, ExecutorService e) throws CancellationException
PrintStream
using concurrent conversions.
Callback (po) and stream access is made on the calling thread. This method blocks until completion or abortion due to an underlying exception.
ps
- PrintStream to write progress. Note that ps will not be closed upon finish.po
- Observer to follow progress. Observer is switched to determinate state with each descriptor
representing a work unit. Done will be reported upon completion/cancellation.e
- Executor service to use for string serializationCancellationException
- upon cancellationpublic int size()
public void addAll(InputStream is, String opts, int skipCount, int maxProcessCount, StandardizerWrapper standardizer, SubProgressObserver po, ExecutorService e, MoleculeCallback moleculeCallback)
Updater
Consecutive members of a structure file have consecutive indexes associated. Usually first molecule in the file have index value 0 associated. To allow segmented reading, this method can be called multiple times to append additional structures.
Consistency considerations: the storage is left in a consistent state in case of the following abnormal or unexpected terminations:
SubProgressObserver
Notes on multithreading:
addAll
in interface Updater<D extends Descriptor>
is
- Input stream to read from. Note that the stream is not closed when returning.opts
- Input options or null
to pass to underlying
MFileFormatUtil.createRecordReader(java.io.InputStream, java.lang.String)
skipCount
- Skip given number of structures. Skipped structures are also reported to the
given progress observer like ordinary processed structures, however they wont
generate calls into the supplied MoleculeCallback
.maxProcessCount
- Read at most given number of structures. Count starts after skipping structures.standardizer
- Standardizer to apply on molecules. See StandardizerWrappers
for utility
methods. Note that supplied wrapper must be thread safe.po
- ProgressObserver to track file read. Total reported work units are assigned to read
and processed/skipped molecules count. The given observer is closed
upon returninge
- ExecutorService to run descriptor generation for pagesmoleculeCallback
- Callback to report back assigned indexes/processing errors.public int addMolecule(Molecule m)
Updater
Note that the given molecule must be standardized before calling this method.
addMolecule
in interface Updater<D extends Descriptor>
m
- Molecule to be addedpublic int addDescriptor(D d)
Updater
Note that descriptors have a compatibility related API contract (currently references returned by
Descriptor#getDescriptorGenerator()
must be equal for compatible descriptors) which must be satisfied by
the passed descriptor.
addDescriptor
in interface Updater<D extends Descriptor>
d
- Descriptor to be addedpublic <T extends Serializable> UnguardedPagedOverlap<T> createBruteForceOverlap(UnguardedExtractor<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator)
The supplied function is applied to all represented descriptors and the resulting bare forms are stored in the returned instance.
T
- Type of unguarded formextractor
- Unguarded form extractor function to usecomparator
- Unguarded dissimilarity calculator to use on extracted unguarded form