List of changes - MadFast Similarity search command line distribution
Please note that serialized forms (binary files created and handled by the tools) are not compatible across different versions.
Version 0.3.5 (2019-10-01)
Summary of changes
This release contains patches for the following bugs:
-
asynchronous similarity searches over very small target sets failed: when the target count was smaller than the requested hit count asynchronous similarity search resulted in an exception. This bug caused real time similarity search functionality of the Web UI over very small datasets unusable.
-
Web UI molecules display page showed additional properties regardless its display settings.
Version 0.3.4 (2019-09-23)
Summary of changes
- Components from ChemAxon backend APIs are updated to version
19.20.0-10485
. - MarvinJS component is updated to version
19.19.0
- Embedded web/REST server upgraded to Jetty version
9.4.15.v20190215
- Experimental Web UI client library and extension point added
- Web UI usability improvements
- REST API raw file handling, asynchronous server loading and asynchronous search calls.
- AdoptOpenJDK support, see Getting started guide for details.
Web UI related changes
-
Experimental development: Web UI JS client library. See Using MadFast Web UI JS library for details.
-
Experimental development: server loading progress following through Web UI. This change is listed in Web UI, REST API and Command line tools categories. For details see document Asynchronous server loading.
-
Web UI dynamic layout improvements, changes
- When a component palette item is clicked the new component will be appended to the first available container.
- New columns and rows can be added to the layout.
- Empty columns can be removed.
- Empty columns display a visual placeholder.
- Column height is equalized during layout changes (when moving UI components or placing component palette items)
- Library Jqurey UI touch punch is included allowing the reordering of UI components on some touch-enabled displays. This touch support is an experimental workaround which will need later rework.
-
Crossfilter updated to version from 1.4.0 to 1.4.6 (see https://github.com/crossfilter/crossfilter/releases) which support array valued dimensions (see https://github.com/crossfilter/crossfilter/wiki/API-Reference#wiki-dimension_with_arrays)
-
Crossfilter based word cloud display component improvements
- Array valued dimension support
- Optionally sort items by filtered count
- Optionally hide items having 0 filtered count
- Optionally fixed component height
- Optional additional content rendering
- Optional custom label formatting
- Clear filter by clicking on word cloud area (similar to crossfilter based histogram components)
- Filtered state indicated by component background
-
Crossfilter based histogram component improvements
- Array valued dimension support
- Filtered state indicated by component background
-
Crossfilter based scatter plot component improvements
- Array valued dimension support.
- Dimension selection dialog improved.
- Info dialog added.
-
Statistics display page improvements
- Add description for handled dimensions.
- Improve info dialog.
-
Molecules display page improvements
- Additional properties displayed on molecule table (see Store additional data)
- Molecule display defaults to dearomatized and dehydrogenized view
Improvements - REST API
-
Server feature flags introduced: command line tool
gui.sh
supports option-disable <FEATURE>
as a basic access control functionality. Features implemented:RAWFILES_LIST
,RAWFILES_MODIFY
,ASYNC_CALL_SEQUENTIAL_ID_GENERATION
,ASYNC_CALL_ID_LIST
. See command line help withgui.sh -h
and document REST API security considerations for details. -
Expermental raw file handling: Arbitrary small files can be kept in the server memory and served with chosen content type. Files can be specified with command line options and can be manipulated through the REST API. See documentation with examples at Raw file handling.
-
Experimental development: server loading progress following through REST API. This change is listed in Web UI, REST API and Command line tools categories. For details see document Asynchronous server loading.
-
Experimental development: Asynchronous search tasks.
-
Non-compatible change:
loadtime
fromStatisticsDto
returned by REST API endpointstatistics
is removed. The load time data is available inloadingSuperTask.runningDurationMs
.
Improvements - Command line tools
-
Rings parameter added to
CFP
fingerprint parameters JAVA API. Calculating bits associated to rings (up to a default ring size limit) can be disabled. See JAVA API of documentationCfParameters
andCfpParameters.Builder
classes and Basic overview of the concepts of overlap analysis context. for details. -
Request logging to stdout / stderr is possible from embedded REST server
gui.sh
. Use command line option-log -
/-log -2
. See command line helpgui.sh -h
for details. -
Option
-earlyStart
added to embedded servergui.sh
. When used the web server and REST API starts listening before resource loading is finished. Progress of resource loading can be tracked on the Web UI and through the REST API. For details see document Asynchronous server loading. -
Launcher scripts accept special command line arguments
-launcherverbose
(to print verbose/debug info from the script) and-classpath <SPEC>
(to specify additional components of the classpath of the launched application). For details see document Command line interfaces (CLIs). -
Launcher scripts check if variable
JAVA_HOME
is set and use java from<JAVA_HOME>/bin/java
. -
Tool
createMms.sh
andbuildStorage.sh
gzipped input file (specified by-in <LOCATION>
) recognition fixed.
Improvements - documentation
-
Download location of dataset
emolecules-plus
latest version is updated in example scriptdownload-molecules.sh
and in document Prepare molecules. -
Glossary updated with Crossfilter related definitions and asynchronous REST API related terms.
-
Documents added:
Improvements - examples
-
Example scripts collect CPU count during execution which is stored in execution statistics files written and exposed on the visualization pages. See Profiling and execution statistics and Self contained example for details.
-
Fix PubChem random subset creation bug in example script
download-molecules.sh
(script stopped after creating the 1k subset). See document Prepare molecules for details. -
Example for property space analysis of
nci-250k
andchembl
sets are added to scriptexamples/overlap-example.sh
. For details see Store additional data. -
Example scripts
rest-api-example.sh
andrest-api-medium.sh
accept option-e
and pas it to the embedded server as-earlyStart
. See Asynchronous server loading for details. -
Example script
rest-api-example.sh
checks if outputs of preprocessing steps are already calculated in a previous execution. For details see document Asynchronous server loading and Self contained examples.
Version 0.3.3 (2018-07-09)
Summary of changes
-
Utilities and documentation for performance optimization added.
-
Components from ChemAxon backend APIs are updated to version 18.16.0
New functionality, improvements - REST API
-
REST API endpoint
statistics
is extended with- Server uptime and version info returned by the default statistics response
statistics/profiling-snapshot
to create a detiled VM memory/garbage collector state description- Experimental endpoint
get-total-sizeinfo
to estimate memory consumption of all exposed resources
-
Experimental
get-sizeinfo
methods are added to the following REST API endpoints to estimate individual resource memory consumption. Note that execution on large sets can be very long currently.
New functionality, improvements, changes - Web front-end
-
Experimental server memory and garbage collection statistics are available from the landing page.
-
Server version is displayed on the landing page.
-
Server uptime is displayed on the landing page when hovering with the mouse on the exposed molecule/descriptor count text.
Improvements - documentation
-
Document Server memory optimization added.
-
Document REST API security considerations added.
-
Download location of dataset
emolecules-plus
is updated in example scriptdownload-molecules.sh
and in document Prepare molecules. -
Links to non-document files of (molecule files, script sources, directories of the distribution) are removed in the following documents:
Version 0.3.2 (2018-06-29)
Summary of changes
-
Components from ChemAxon backend APIs are updated to version 18.15.0. Please note that due to a known issue the embedded server
gui.sh
printsWARNING
level log messages (WARNING: Ignoring unknown format: png
) when a molecule image conversion is requested. -
k-NN analysis visualization now can include stored properties. See Store additional data for details and use cases.
-
Self contained example script
rest-api-example.sh
contains example of additional stored properties calculated from chemical terms.
New functionality, improvements, changes - Web front-end
-
The k-NN analysis visualization page downloads only the data required for displaying dimensions. Prior to 0.3.2 visualization page downloaded all the stored neighbor (dissimilarity and index) information for analysis.
-
Direct dimension selection icon added to crossfilter based histograms (k-NN analysis visualization page, statistics page) title bar. Dimension selection dialog (of the k-NN page) is more structured / descriptive than the dimension listing previously available from the histogram context menu. When no custom dimension selection dialog is specified (as currently at the statistics page) this dialog still available and replicates the listing in the context menu.
-
The k-NN analysis visualization page use textual ("Most similar"/"2nd most similar"/...) neighbor names on histogram dimensions.
-
Dimension selection list of k-NN analysis histograms removed from their dropdown menu: with the possibility to use query and target molecule property space the list would be excessively long. For these histograms the dimension selection dialog available from the title icon can be used.
-
The k-NN analysis visualization page provides a molecule display setting icon (cogwheel icon on the left taskbar) allowing hydrogen and aromaticity display settings for the page.
-
Checkbox and radio buttons behavior in the dropdown menus fixed: mouse click is registered on their label (text part, not only on the checkbox/radio button icon). Dropdown menu is not closed when a click on the checkbox or radio button part is registered.
-
Download on the k-NN analysis visualization page changed: more attributes (query and target IDs) are written to the output; tab and newline characters in fields are changed to spaces. Note that further changes are expected in this download functionality.
-
Info message shown when a dimension change on a crossfilter component (word cloud or histogram on statistics or k-NN results visualization pages) removes a previously set filter.
New functionality, improvements - REST API
-
REST API endpoint
knn-results
is extended withknn-results/{res}/queryindices
to retrieve query master indices.knn-results/{res}/neighborcounts
to retrieve stored neighbor counts for every query.knn-results/{res}/neighbors/{k}/indices
to retrieve stored neighbor indices for every query.knn-results/{res}/neighbors/{k}/dissimilarities
to retrieve stored neighbor dissimilarities for every query.knn-results/{res}/neighbors/{k}/props/{propname}
to retrieve additonal properties for neighbors for every query.knn-results/{res}/neighbor-png-or-placeholder
to retrieve a image of a single neighbor of a single query.knn-results/{res}/query-png-or-placeholder
to retrieve a image of a single query.knn-results/{res}/table-labels
to retrieve a labels (IDs) typically displayed in a k-NN table visualization.
-
DTO
KnnInfo
extended withMoleculeSetInfo
of query/target sets. -
DTO
MoleculePropRange
extended withmissingvalue
,presentcount
andmissingcount
.
Improvements - documentation
-
Minor clarifications in the Getting started guide and Installing dependencies on Windows version 10.
-
Fixed separator bar (between UI component icons) highlight on hover / mouse pointer.
-
Documents Store additional data and Introduction to overlap analysis are updated.
Improvements - Command line tools
- Bug fixed in command line argument parsing of
calculateOverlap.sh
tool: commas in additional property declarations (typically occurring in Chemical Terms expressions) caused an exception.
Version 0.3.1 (2018-06-13)
Summary of changes
-
Home link leading to the index page (upper left corner of Web UI pages) changed to relative (
index.html
) omitting leading/
: for proxied deployments behind arbitrary URL patterns this fixes navigation to the index page. -
Some dialogs (page/component descriptions) of the Web UI were failing. The underlying problem with markdown formatted dialog contents fixed.
-
Web UI index page displays stored molecule / descriptor counts. On hovering the server load time is shown.
-
Fixed flickering of placeholder for molecule images in the real time search and knn visualization pages under Internet Explorer.
-
Fixed X axis tick format of histograms on execution statistics visualization page: in case of time dimensions the tick labels were displayed in milliseconds.
-
Expose resource initialization time in the following DTOs:
DescriptorInfo
,KnnInfo
,MoleculeSetInfo
,ResourceClassInfoDto
, -
Fix Java package of DTO classes
KnnData
andKnnInfo
.
Version 0.3.0 (2018-06-07)
Summary of changes
- Complete revamp of the WebUI
- Overlap analysis calculation and interactive visualization is added. See Introduction to overlap analysis for details.
- MarvinJS component is updated to version 18.5.0
- Components from ChemAxon backend APIs are updated to version 18.10.0
- Additional properties can be attached to molecules and exposed on the REST API. Please note that this feature is under construction. For more details see Store additional data
- Command line tools and self contained examples are prepared to run on Mac OS X. See Getting started guide.
Improvements - documentation
- Document Introduction to overlap analysis added.
- Document Store additional data added.
- Document Installing dependencies on Windows version 10 added.
- Max OS X installation details added to the Getting started guide.
- Release dates added to this document.
New functionality, improvements - Examples
- Download locations of datasets
GDB-13
andemolecules-plus
are updated in example scriptdownload-molecules.sh
and in document Prepare molecules. - Download script for
SureChEMBL
uses FTP directory listing instead of theREADME
file to download segments in example scriptdownload-molecules.sh
and in document Prepare molecules. - Download script and desctiption for
ChEBI
dataset added to document Prepare molecules. - Overlap analysis was added to script
rest-api-example.sh
. - Example script
overlap-example.sh
added. - SDF version of the PubChem random 1k dataset is added in file
data/molecules/pubchem-compound/pubchem-compound-rnd-1k.sdf.gz
. Contents of this file is processed by therest-api-example.sh
example script by importing various properties. - Public dataset ChEBI (Chemical Entities of Biological Interest (ChEBI)) is added to Prepare example molecule sets and script
download-molecules.sh
. - Scripts (command lines and self contained) are fixed on Mac OS X using command
greadlink
instead ofreadlink
. On Max OS X to installgreadlink
invokebrew install coreutils
. For details see https://brew.sh/. - Example scripts determine CPU model name and total memory. Profiling and execution statistics uses these values among with the value thread pool size (parameter
-tp <THREADPOOL>
). - Fix in retention of
overlap-benchmark.*
system properties indata/sanitize-prof.js
. - Update allocated memory for self contained example scripts
rest-api-XXX.sh
due to the size increase of the used public molecule sets.
New functionality, improvements - Command line tools
- Tool
calculateOverlap.sh
added for similarity based overlap analysis calculations. See Introduction to overlap analysis for details. - File inputs recognize gzipped files.
- Error when key or value contained space in
-D<propkey>=<provalue>
style system property declaration fixed. - Embedded server (
gui.sh
) improvements - invokegui.sh -h
for detailed help- Parameter
-page <URL>
added to use when opening a browser - Parameter
-in
is multi arity. - Network interface addresses are printed to the console on startup.
- Increase request log details - see Rest API example and option
gui.sh -log <FILE_PATTERN>
- Option
-additionalresourcedir <DIR>
added to embedded servergui
. When specified contents of the referenced directory is exposed under path/additional/
by the server. This option can be used to specify a valid Marvin JS license to the Web UI. For details see the Getting started guide, REST API / Web UI for similarity searches section Advanced server configuration: Additional static content and Self contained examples documents. - Error when installation directory contained space fixed.
- Parameter
New functionality, improvements - REST API
- k-NN analysis results are exposed on API endpoint
knn-results
- Initial limited support for storing and retrieving additional properties on molecules - See Store additional data and REST API endpoint
molecules
- molecule set info (
molecules/{set}
) extended with property names and property descriptions - molecule sets info (
molecules
) endpoint - molecule set info objects exposed - molecule info (
molecules/{set}/{index}
) endpoint also extended with properties molecules/{set}/{index}/props/{props}
endpoint addedmolecules/{set}/get-multiple-props
endpoint addedmolecules/{set}/get-multiple-ids
endpoint addedmolecules/{set}/props/{propname}/get-properties-on-index-range
endpoint (GET/POST) added
- molecule set info (
- REST API endpoint
meta
added with metadata on available resources - REST API endpoint
statistics
added to serve basic server statistics. - REST API endpoint
molconverter/convert
(application/json
request body encoding version) extended withmolprops
andpseudos
parameters (seeConversionRequest
data type).
New functionality, improvements, changes - Web front-end
- Web UI revamp:
- Unify UI look across screens
- Make UI components removable
- Richer UI interaction feedback
- Marvin JS component also moveable
- Web UI codes are built and packaged using WebPack. Note that the Web UI uses packaged and minified JavaScript codes which are not suitable for direct modifications.
- Web UI is expected to be compatible with Internet Explorer 11, Safari.
- Real time search page revamp:
- Multiple pick lists supported
- Pick list interactions (remove/reorder) changed: cells can be dragged over other droppables (sketcher, other picklist). Dragging the cells by the reorder handle allows reordering. Clicking on cell remove button removes cell. Cell layout and information content depends on cell size.
- Pick list and hits display cells are resizable
- Hits display cell count can be set by resizing cells or component by the resize handles.
- Dissimilarity distribution chart is resizable.
- Dissimilarity distribution chart caches distribution; zebra mode changed; bin size can be changed.
- Molecules display page revamp:
- Hidrogenize, aromatize display options added.
- Molecule details dialog shows additional properties.
- Statistics results page revamp:
- Unify look and feel; usable on smaller screens.
- Fewer initial components.
- Histograms instead of zebra chart
- Components can be added and changed
- Scatter plot redesigned; non-numeric axes supported
- Table columns can be added, removed and reordered
- k-NN analysis visualization page added.
- Upper left icon (product logo) on the UI pages takes to the index page; it is changed to a proper HTML link which
- Vertical scrollbar on real time search page is always shown to avoid component layout changes when scrollbar is needed.
- Shepherd tour based page introductions are replaced by page and UI component specific help/info dialogs.
Version 0.2.3 (2017-01-11)
New functionality, improvements - Command line tools
- Improvements of tool
searchStorage
in output and visualization. For examples and details see Basic search workflow and Details onSearchStorage
.-
Option
-out-matrix-as-list
added to create a list style textual output forFULLMATRIX
search mode instead of the default matrix style, similar to the output ofMOSTSIMILARS
mode. When this option is used forFULLMATRIX
search mode the optional dissimilarity threshold (specified by option-maxdissim <VALUE>
) is also considered: query-target pairs having dissimilartity exceeding the threshold wont be printed. -
Option
-out-numeric-format <FORMAT>
added to specify numeric formatting (precision, etc) of dissimilarity results in textual output. -
Option
-heatmap-image <FILE>
with further options-heatmap-image-....
added to render simple heatmap visualizations of search results. -
Textual output of dissimilarity results can be disabled by passing an empty String (
""
) to option-out <FILE>
.
-
Improvements - documentation
-
Details on
SearchStorage
is improved with examples on the available search modes and the heatmap image generation. -
Links to original images added.
New functionality, improvements - Examples
-
Example molecule set
antibiotics
added to filedata/molecules/antibiotics/antibiotics.smi
. For details see Prepare example molecule sets and file data/README_data.html. -
Example molecule set
who-essential-medicines
added to filedata/molecules/who-essential-medicines/who-essential-medicines.smi
. For details see Prepare example molecule sets and file data/README_data.html.
Version 0.2.2 (2017-01-02)
Summary of changes
- MarvinJS component is updated to version 16.12.12
- Components from ChemAxon backend APIs are updated to version 16.12.26.0
Bugfixes
- License checking bug fixed: license
Overlap
was required for certain functionalities instead of the expectedMADFAST
license.
Version 0.2.1 (2016-12-12)
New functionality, improvements - Command line tools
- Tool
jseval
options-df <NAME>=<FILE>
and-out <LOCATION>
added. No printing of the script to be executed. - Script
data/sanitize-prof.js
to compact/sanitize execution statistics and profiling files added.
Improvements - documentation
- Fix code examples, use the introduced
sanitize-prof.js
script and improve document Profiling and execution statistics - Further minor docmentation, styling updates.
New functionality, improvements - REST API/Web UI examples
- Use the introduced
sanitize-prof.js
script inrest-api-XXX
self contained examples. - Example script
rest-api-vitamints.sh
is removed.
Version 0.2.0 (2016-12-07)
Summary of changes
- Licensing is modified according to longer term plans. Core functionality needs license
MADFAST
. LicenseMACCS
is needed forMACCS-166
fingerprint generation. For further license dependencies see the Getting started guide. Note that already existingOverlap
licenses are equivalent with the newMADFAST
license, so they wont coverMACCS
orECFP
fingerprint generation functionality. - This is the first publicly available release of this distribution. The distibution is renamed to
madfast-cli-<VERSION>
from its previousoverlap-examples-cli-<VERSION>
name. - Java 1.8 is required.
- Embedded Jetty server for REST API/Web UI is updated to version
jetty-9.3.13.v20161014
. This mitigated known vulnerability involved version used earlier (8.1.8.v20121106
). When used in production however it is recommended to check Jetty Sercurity Reports for possible further uncovered vulnerabilities. - Jersey framework providing JAX-RS implementation for the REST API is updated to version 2.23.2.
- Java 1.8 style used for the Java API documentation. The presentation of the generated documentation changed.
- New version (2.7.0) of tool Enunciate used for generating REST API documentation. The layout and presentation of the generated documentation changed.
- MarvinJS component is updated to version 16.11.14
- Components from ChemAxon backend APIs are updated to version 16.11.14
- Improved documentation, command line tools, self contained examples - see details below.
Improvements - documentation
- File index.html is added with reorganized links to different documentations.
- Documentation links from README are removed.
- Getting started guide is extracted to a separate document and improved.
- Performance overview is extracted to separate documen and extended with further data points.
- File
TODO.txt
is removed. - Basic search workflow document is simplified.
- Use two supplied datasets (to demonstrate
sdf
andsmiles
handling). - Remove detailed performance data.
- Use two supplied datasets (to demonstrate
- Details on
searchStorage
is added. - Document Examples provides more details on self contained example scripts.
- Diagrams, screenshots are added to various documents.
- Styling of HTML documentation changed.
- Syntax highlighting for the code examples are added using highlight.js.
- Document Metric customization tversky example corrected; examples to customize metric for REST API queries added.
- Example for sending
POST
request usingcurl
added to document REST API / Web UI for similarity searches.
New functionality, improvements, changes - Web front-end
- Page option
dist
added to real time similarity search (simsearch.html
). Whenhide
used no dissimilarity distribution is displayed on startup. (Example usage:http://localhost:8081/simsearch.html?ref=rest/descriptors/vita-cfp7/&dist=hide
) - Real time similarity search (
simsearch.html
) page shows 16 most similar hits on startup (instead of 10). - Index page improvements:
- Page layout and appearance improved.
- Showing available resource classes (data exposed by the server) in precedence based ordering.
- Use lexical ordering for resource listing.
- Show resource sizes.
-
Statistics and profiling results display page always show vertical scrollbar preventing possible jitter while interacting with the page.
New functionality, improvements - Command line tools
- Tool
stdg
option-errout
specify output file for structures causing error. - Tool
prepareMolecules
for common molecule conversions added. Usage examples can be found in document Prepare molecules. - Tools
searchStorage
,buildStorage
,createMMs
andstdg
garbage collection at the end of the execution skipped when no profiling or statistics collection is requested. - Tool
searchStorage
further improvements:- Targets can be specified as molecules (see options
-tm
,-tmf
,-tidname
,-tidprop
). - Targets can be specified as custom descriptors (see options
-td
,-tdf
,-tdescsplitter
,-tidsplitter
) - Query IDs can be specified (see options
-qm
,-qmf
,-qd
,-qdf
,-qidname
,-qidprop
,-qdescsplitter
and-qidsplitter
) - See also options
-context
and-contextjs
. - Detailed help on metric customization and context setting/customization is printed with option
-hd
. - See Details on
searchStorage
. - Note that verbose messages printed during execution changed.
- Targets can be specified as molecules (see options
New functionality, improvements - REST API
- Basic server statistics added. See REST API documentation of
StatisticsResource
. - Dissimilarity distribution calculation for descriptor queries added. See REST API documentation of endpoint
distribution-by-descriptor
ofDescriptorResource
.
New functionality, improvements - Examples
- Example workflow scripts will use the subdirectories of the distributions
examples-tmp/
directory as their default working directory. Subdirectory name is derived from the script name. For example scriptexamples/rest-api-small.sh
will useexamples-tmp/rest-api-small/
as its default working directory. The location of the working directory can be set using option-w <WORKDIR>
.
New functionality, improvements - REST API/Web UI examples
For details on the REST API / Web UI example scripts, their exposed datasets, memory requirements, estimated runtime see document Examples.
- Example
examples/rest-api-small.sh
is simplified- Use the shipped
nci-250k
dataset. - Use only
CFP
fingerprint. - Use
examples-tmp/rest-api-small/
as default working directory.
- Use the shipped
- Example
examples/rest-api-example.sh
added to demonstrate major configuration steps in document REST API example. - Example
examples/rest-api-medium.sh
is added. - Example
examples/rest-api-medium-maccs.sh
is added. - Example
examples/rest-api-large.sh
is added. - Example
examples/rest-api-large-ecfp.sh
is added. - Example
examples/rest-api-large-ecfp-maccs.sh
is added. - Example
examples/rest-api-xlarge.sh
is added. - Example
examples/rest-api-xlarge-ecfp.sh
is added. - Example
examples/rest-api-xlarge-ecfp-maccs.sh
is added. - Example
examples/rest-api-xxlarge.sh
is added. - Example
examples/rest-api-xxlarge-ecfp.sh
is added. - Example
examples/rest-api-xxlarge-ecfp-maccs.sh
is added.
New functionality, improvements - Workflow examples
- Example
examples/search-workflow.sh
is simiplified:- Use shipped
drugbank-all
(as target) andvitamins
(as query) datasets. - Use only
CFP
fingerprint. - Use
examples-tmp/search-workflow
as default working directory. - Option
-n
(nowget
) and-m <MOLDIR>
(specifyMOLDIR
) removed. - Option
-t
(test mode) removed.
- Use shipped
- Example
examples/custom-binaryfp-workflow-vitamins.sh
is improved:- Fixed on windows + cygwin.
- Use
examples-tmp/custom-binaryfp-workflow-vitamins/
as default working directory.
- Example
examples/custom-floatv-workflow.sh
is improved:- Use
examples-tmp/custom-floatv-workflow/
as default working directory.
- Use
New functionality, improvements - Examples - Public datasets
- Example
examples/download-molecules.sh
is simplified.- Parameter
-m <MOLDIR>
removed. - Use
examples-tmp/download-molecules
as the default working directory. - Put downloaded and processed files into subdirectory
download
of the working directory. - Sets which are included in the distribution (
drugank-all
,nci-250k
,chembl
) are removed from the download script. - Print timestamps; log
wget
output; check if at least one set to download is specified.
- Parameter
- Dataset
GDB-13
and its subsetGDB-12
added to example scriptdownload-molecules.sh
(invoked with option-G
) and to document Prepare molecules. Note that this dataset is only used as a reference in Performance overview.
New functionality, improvements - Included datasets
- The Vitamins dataset is moved into directory
data/molecules/vitamins/
. - The DrugBank Open Data dataset is available in directory
data/molecules/drugbank/
. For details see filedata/molecules/drugbank/README.html
and document Prepare molecules. Download option for the DrugBank dataset from script [examples/download-molecules.sh
] is removed. - The NCI Release 1 dataset is available in directory
data/molecules/nci/
. For details see filedata/molecules/nci/README.html
. Download option for the NCI dataset from scriptexamples/download-molecules.sh
is removed. - ChEMBL dataset (version chembl_21) is available in directory [
data/molecules/chembl
)(data/molecules/chembl). For details see filedata/molecules/chembl/README.html
. Download option for the ChEMBL dataset from scriptexamples/download-molecules.sh
is removed. - PubChem Compound random subsets 1k 10k and 100k are available in directory
data/molecules/pubchem-compound
. Creating random ordering of the emolecules set is removed from scriptexamples/download-molecules.sh
and from document Prepare molecules.
Version 0.1.7 (2016-04-21)
New functionality, improvements - REST API
- Embedded server
gui
supports Cross-Origin Resource Sharing. When parameter-allowedOrigins <ORIGINS>
is specified CrossOriginFilter is configured, value of the parameter<ORIGINS>
is used as theallowedOrigins
parameter of the filter. For usage example see document REST API example.
New functionality, improvements - Examples
- In
download-molecules.sh
SureChEMBL download (invoked with option-S
) fixed. - Dataset pubchem-compound-rnd-1k added to
download-molecules.sh
and to document Prepare molecules.
Version 0.1.6 (2016-04-20)
Summary of changes
- Metric specification in Descriptors API, command line interfaces, REST API and real time similarity search front-end implemented.
- Real time similarity search frontend improvements: metric/descriptor specification, dynamic layout with multiple similarity search hits/dissimilarity distribution chart components, usability improvements.
- Self conatined examples
rest-api-multiple.sh
andrest-api-large.sh
improved. - Maccs-166 implementation is exercised by
rest-api-large.sh
. Please see remark regarding licensing below.
New functionality, improvements, changes - Web front-end
- Real time similarity search front end main changes:
- Taskbar added with component palette, info and help.
- Metric customization and descriptor selection for most similar structures and dissimilarity distribution display components.
- Additional most similar structures and dissimilarity distribution display components can be added from the taskbar by clicking/dragging on component palette icons.
- Display components (with the exception of the sketcher) can be rearranged.
- Help button provides a small page tour.
- Changes in the most similar structures display component in real time similarity search front end:
- Component dropdown menu is added with descriptor and metric selection options.
- Feedback message (displaying search time) changed. Multiple messages can be displayed. Descriptor and metric changes are displayed by this message component.
- Component title (showing
Most similar structures (<DESCRIPTION>)
) changed to showMost similar structures (<NAME>: <DESCRIPTION>)[ with "<METRIC>"]
- Component title bar (containig title, component context menu and icons) is not hidden when no hits displayed.
- Dissimilarity bar (blue bar at the bottom of structure cards) scaling changed: previously the bar represented the 0.0 .. 1.0 dissimilarity interval. Dissimilarity values from certain metrics (such as non normalized versions of
euclidean
,manhattan
andcommonpart
) can be outside of this interval. Current scaling depends on the range of dissimilarity values by extending the actual interval to 0.0 .. 1.0 and rounding the resulting interval to nice values using the underlying D3 library'sd3.scale.linear.nice()
method. The modified behavior is equivalent with the previous one for metrics resulting dissimilarity values in the 0.0 .. 1.0 interval.
- Changes in the dissimilarity distribution display component in the real time similarity search front-end:
- Component dropdown menu is added with descriptor and metric selection options.
- Feedback message added to display search speed, target count and descriptor/metric changes.
- Component title bar with a notification message is shown when no distirbution is displayed.
- Error panel is shown when distribution calculation failed (for example because of invalid metric parameterization).
- Spinner overlay is displayed while waiting for distributopm calculations.
- Chart is
- Dissimilarity distribution component uses
POST
requests. URL size limit ofGET
requests used in the previous version caused failure for large structures.
New functionality, improvements - REST API
- Metric customization related functionalities are added to REST API endpoint
descriptors/{desc}
:- Endpoint
descriptors/{desc}/get-available-metrics
provides metadate on the accepted metrics. - Endpoints
/descriptors/{desc}/distribution
,/descriptors/{desc}/find-most-similars
,/descriptors/{desc}/find-most-similars-by-descriptor
and/descriptors/{desc}/find-most-similars-by-id
accept optional parametermetric
. When not specified the default behavior is preserved. For details see REST API documentation ofDescriptorsResource
.
- Endpoint
- REST API endpoint
/descriptors/{desc}/distribution
for dissimilarity distribution calculation acceptsapplication/x-www-form-urlencoded
POST
request. (In the previous version onlyGET
requests were supported with query parameters.) For details see documentation.
New functionality, improvements - Command line tools
- Tool
searchStorage
option-metric
can specify metric to be used for comparison. Command line help printed by option-h
provides an overview of applicable metrics for various descriptors. See also new documentation Metric customization. - Help of command line tool
searchStorage
improved: context specification and customization description was irrelevant and removed. - Tool
searchStorage
prints and records progress info during result printing; this progress info is exposed on execution profiling/benchmark visualizations. - Bug fixed in tool
searchStorage
which aborted execution for large full matrix calculation due to integer overflow in progress reporting. - Tool
dumpStorage
is able to export the contents of descriptor storage in various formats using options-descout
and-descf
. For details see help printed bydumpstorage -h
. Functionality existed in previous version, command line help is clarified in this version. - Descriptor generator classes for tool
stdg
prints underlying standardizer configuration.
Major/incompatible changes in the underlying Overlap/Descriptors API
- Deprecated float vector metric
EUCLIDEAN_NORMALIZED
was removed fromcom.chemaxon.descriptors.metrics.FloatVectorMetrics
. @Description
annotations oncom.chemaxon.descriptors.metrics.FloatVectorMetrics
andcom.chemaxon.descriptors.metrics.BinaryMetrics
cleaned up.- Binary vector metrics
PETKE
andSIMPSON
are added. - Serialization fixed in Maccs-166 implementation.
Other changes
-
Download link for Emolecules Plus dataset is updated in document Prepare molecules and in example script
download-molecules.sh
. Post processing of the downloadedEmolecules Plus
dataset in scriptdownload-molecules.sh
is fixed. -
README documentation links reorganized.
New functionality, improvements - Examples
-
Self contained example script
rest-api-multiple.sh
improved. Profilig and statistics is collected and exposed by the launched server. Allocated memory for launching embedded server increased to 10G from previous 8G value (-Xmx10g used instead of -Xmx8g). System propertyoverlap-benchmark.fingerprint
for descriptor generation runs is added and displayed on statistics page exposed by the launched server. -
Self contained example script
rest-api-large.sh
added with more datasets. Please note that this example calculates MACCS-166 fingerprints. Currently this fingerprint is covered by theOverlap
license; this might change in a future release. -
SureChEMBL dataset added to Prepare example molecule sets document and to script
download-molecules.sh
. -
Document Verification and benchmarking of concurrent implementations and script
verify-concurrent-generation.sh
added.
Version 0.1.5 (2016-02-10)
New functionality, improvements - Command line tools
- Tool
searchStorage
option-maxdissim
can specify maximum dissimilarity threshold (inclusive) for search modesMOSTSIMILAR
andMOSTSIMILARS
. Targets with dissimilarity exceeding this threshold are not printed to the output.
Other improvements
- Executable flag set for various non .sh files fixed.
Version 0.1.4 (2016-01-14)
Bugfixes
- Fixed: most similar search with multiple queries might use excessive amount of memory during execution.
- Fixed: web UI/embedded server initialization might fail when importing data from files having the same file names
New functionality, improvements - Web front-end
- Real time similarity search web ui component is revamped:
- Showing dissimilarity distribution
- Entering molecule source
- Structures (from sketcher or from hit list) can be cherry picked and downloaded.
- Structure source can be specified with URL parameter
src
andfrm
.
New functionality, improvements - Command line tools
- Tools
createMms
,buildStorage
,searchStorage
andstdg
can write VM profiling log specified by parameters-prof
. and-profres
. The written profiling log contains periodic snapshots of the status of garbage collectors and VM memory pools. State of runningProgressObserver
s also recorded. - Tools
createMms
,buildStorage
,searchStorage
andstdg
can write performance statistics using option-stat
. - Tool
dumpStorage
can export descriptors in various formats using options-descout
and-descf
. - Interactive visualization of execution statistics and profiling data is available in web ui.
- Initial revision of documentation Profiling and execution statistics added.
- Script
examples/concat-jsons.sh
merges the content of specified files containing JSONs into a JSON array. - Tool
stdg
option-stdjs
added to specify standardization in a JS hook. - Command line tool
gui
option-profres
is multi arity.
New functionality, improvements - Examples
-
Self contained example script
benchmark.sh
improved. See usage example in document Profiling and execution statistics for details. -
Download script
download-molecules.sh
added to download and prepare public molecule sets. For usage help launch with opion-h
. For details of the downloaded sets see document Prepare molecules.
New functionality, improvements - REST API
-
REST API endpoint
molconverter/convert
for structure conversion with optional 2D clean is added. See documentation. -
REST API endpoint
molconverter/convert
POST request with JSON request body added. For details see documentation of the endpoint and the request JSON. Note that this endpoint acceptsGET
requests (with URL encoded query parameters),POST
requests either withapplication/x-www-form-urlencoded
parameters orapplication/json
request objects. -
REST API endpoint
/descriptors/{desc}/distribution
for dissimilarity distribution calculation is added. See documentation.
New functionality, improvements - Descriptors API
-
MACCS-166 fingerprint implementation is added.
-
Interface
com.chemaxon.descriptors.common.unguarded.UnguardedContext
expose associatedDescriptorComparator
. Note that methodsextractor
andcomparator
are renamed tounguardedExtractor
andunguardedComparator
.
Major/incompatible changes in the REST API
-
For POST requests REST API endpoints
molconverter/cxformat
andmolconverter/cxbinformat
expect all parameters asapplication/x-www-form-urlencoded
form parameters. In the previous versions the structure source was expected as the request body (astext/plain
), the further parameters were expected as query parameters, similar to the GET requests. From this version the structure is also requested as a form parameter with namemol
. -
For POST request REST API endpoints
descriptors/{desc}/find-most-similars
anddescriptors/{desc}/find-most-similars-by-descriptor
expect all parameters asapplication/x-www-form-urlencoded
form parameters. In the previous versions the query structure/descriptor source was expected as the request body (astext/plain
), the further parameters were expected as query parameters, similar to the GET requests. From this version these are also requested as a form parameter with namequery
/query-descriptor
.
Major/incompatible changes in the underlying Overlap/Descriptors API
-
In interface
DescriptorGenerator
methodcontextFactory()
is removed. -
In interface
DescriptorGenerator
methodcomparisonContextFactory()
is added. -
In interface
DescriptorComparator
methodunguardedContext()
is added. -
MDTableReader
API expose a compatible descriptor generator for the deserialized desciptors with method getDescriptorGenerator(). The returned generator is intended to use as the factory of descriptor comparators (either through itscomparisonContextFactory
or by its direct factory methods). MethodgetDefaultComparator
is removed. implementations (CfpTableReader
,EcfpTableReader
andPfTableReader
) additional comparator factory methods are also removed. -
Deprecated method
getDescriptorGenerator
in interfacecom.chemaxon.descriptors.common.Descriptor
is removed.
Other improvements
-
Output of option
-prof
contains execution statistics. See document "Profiling and execution statistics" for details. -
Version information is available in the JAVA API
com.chemaxon.overlap.version.OverlapVersion
. Version info is exposed in execution statistics. For details see apidoc.
Version 0.1.3 (2015-07-21)
Bugfixes
- Missing descriptions for some result elements in the Enunciate REST API documentation is fixed. (See example.)
New functionality, improvements - Command line tools
- Tool
gui
parameter-sslkeystore
and-sslkeystorepass
specify SSL keystore. When keystore specified embedded server accepts https connections. For security concerns of this version see issues documentation. For usage example see REST API example documentation. - Tool
gui
parameter-port
can accept value0
to use any available port. Allocated port number is printed to the console. - Tool
gui
can import ID-s with no attached molecules using option-idonly
. The created molecule storage will store the read ID-s but all the molecules are marked as absent. This makes possible to import custom descriptors without attached molecules. See REST API example documentation for an example. Self contained example script rest-api-vitamins.sh also contain this modification. - Tools
createMms
andbuildStorage
can write performance statistics using option-stat
. See Basic search workflow as an example. - Tool
stdg
accepts parameters-cfgstring
,-slowout
and-slowlimit
.
New functionality, improvements - REST API
- Error handling in the REST API is improved. Further information is available in the JAVA API documentation. See example and description of the error description object returned.
- Diagnostic REST API endpoint
generate-error-response
added. See documentation. - REST API endpoint
molecules/{set}/{index}/png-or-placeholder
added. See documentation. - Tool
importStorage
parameter-infilter
is added. See custom float descriptors for an example on usage. - REST API endpoint
descriptors/{desc}/find-most-similars-by-id
for launching similarity search against a structure contained by the attached molecule storage is added. See documentation. - REST API endpoint
descriptors/{desc}/find-most-similars-by-descriptor
for launching similarity search against a String representation of a descriptor is added (both GET and POST supported). See documentation.
New functionality, improvements - Examples
- Self contained example script benchmark.sh added.
Major/incompatible changes in the underlying Overlap/Descriptors API
- DTOs used in the REST API are moved into package
com.chemaxon.overlap.wui.dto
. See package javadoc. - Error handling changes in the REST API. Parse errors (molecules, descriptors) result in status 400 (Bad Request). Invalid references to molecule sets, descriptor sets; queries with no results typically result in status 404 (Not found).
Version 0.1.2 (2015-06-15)
Bugfixes
- Standardizer added to the descriptor parametrization example in the context concepts documentation.
New functionality, improvements
- Some of the self contained example scripts found in
examples
directory accept arguments which customize their behavior. Some of the self contained example scripts provide test mode. For details see their documentation. - Tool
gui
has parameters-stopport
and-stopsecret
. For details see rest api example documentation. - Glossary added to the documentations.
Changes of command line tools
- Self contained example scripts delete already existing log file instead of appending.
Major/incompatible changes in the underlying Overlap/Descriptors API
- Method
descriptorComparator
added toOverlapAnalysisContext
. This allows specifying metric later.
Version 0.1.1 (2015-06-03)
Bugfixes
- Tool
importStorage
is available again. - Custom binary and float descriptor workflow descriptions and example scripts are available again:
New functionality
- New helper functions
ctx_from_desc
,ctx_from_descpb
andctx_from_desc_comp
inOverlapAnalysisContext
customization scripting hooks. - REST API documentation generated by Enunciate added.
- Java API documentation of classes definied in this distribution is added. Note that some of the classes appearing in this documentation might be unused, non complete or removed in any subsequent release.
- Tool
createAllAbsentMms
added. - Tool
importStorage
parameter-aamms
added. - Concepts documentation of using
OverlapAnalysisContext
added - Improved documentation
REST API changes
DescriptorsResource.MostSimilarsResult.query
might contain query ID or query descriptorDescriptorsResource.MostSimilarsResult.querysmi
is optional; not filled when querying by descriptorsDescriptorsResource.MostSimilarsResult.findMostSimilars
count
parameter is interpreted as max count; not recommended to useDescriptorsResource.MostSimilarsResult.findMostSimilars
maxCount
parameter is added.DescriptorsResource.MostSimilarsResult.findMostSimilars
maxDissimilarity
parameter is added.DescriptorsResource.MostSimilarsResult.findMostSimilarsPost
count
parameter is interpreted as max count; not recommended to useDescriptorsResource.MostSimilarsResult.findMostSimilarsPost
maxCount
parameter is added.DescriptorsResource.MostSimilarsResult.findMostSimilarsPost
maxDissimilarity
parameter is added.rest/molecules/{set}/{id}
has query parametersmiles
rest/molecules/{set}/find-id
query method added
Changes of command line tools
- Tool
importStorage
parameter-tobytes
is renamed to-out
.OverlapAnalysisContext
used for descriptor import is stored in the output binary file; it is not needed to specify for search. - Diagnostic tool
stdg
has optional descriptor post processing scripting hook-processdesc
.
Major/incompatible changes in the underlying Overlap/Descriptors API
- Method
unguardedContext()
added to interfacecom.chemaxon.descriptors.common.DescriptorComparator
- Method
getUnguardedDissimilarityCalculator()
added to interfacecom.chemaxon.descriptors.metrics.BinaryVectorComparator
- Class
com.chemaxon.descriptors.common.binary.SimpleBinaryVectorComparator
constructor needs associated guard object reference as parameter - Class
com.chemaxon.descriptors.common.realvector.SimpleFloatVectorComparator
constructor needs float array size as parameter - Method
getGuardObject()
added to interfacecom.chemaxon.descriptors.common.DescriptorComparator
. - Method
getGuardObject()
added to interfacecom.chemaxon.descriptors.common.unguarded.UnguardedExtractor
. - Method
getGuardObject()
added to interfacecom.chemaxon.descriptors.common.unguarded.UnguardedContext
. - Descriptor comparators check guard objects of descriptors compared against the associated
DescriptorGenerator
OverlapAnalysisContext
unguarded handling defaults to the underlyingDescriptorComparator.unguardedContext()
com.chemaxon.overlap.io.MasterMoleculeStorage
class is separated into interfacecom.chemaxon.overlap.io.MasterMoleculeStorage
and implementationcom.chemaxon.overlap.io.MasterMoleculeStorageImpl
- Contents of package
com.chemaxon.overlap.persistence
is moved to moduleoverlap-core
. - Method
isPresent(int)
added to interfacecom.chemaxon.overlap.io.MasterStorage
- Method
getSource(int)
added to interfacecom.chemaxon.overlap.io.MasterMoleculeStorage
- Interface
com.chemaxon.overlap.persistence.serialization.IndirectSerializable
added and used by various storages
Version 0.1.0 (2015-05-11)
This version is incomplete, some of the functionalities, documentation and examples are in a work in progress state. For details see the issues document.
Bugfixes
- Memory leak in Java versions prior to 7u4 caused by
String.substring()
(see http://bugs.java.com/view_bug.do?bug_id=4513622) work around in splitters used in custom descriptor imports
New Functionality
- (Not available in this version) Visualization for self overlap / inter set overlap analysis is unified
- (Not available in this version) Visualization for most similar search / knn search is unified
- (Not available in this version) New overlap analysis visualization: knn map, deep zoom knn map
- (Not available in this version) Export of knn analysis visualization
- Improved documentation
Changes of command line tools
- Serialized formats changed; can not use binary files generated by prior versions
- Tool
buildStorage
option-tobytes
is renamed to-out
.OverlapAnalysisContext
used for descriptor generation is stored in the output binary file; it is not needed to specify for search. - Tool
dumpStorage
uses option-in
to specify inputs which types are recognized. Options-mms
,-mid
anddesc
are removed. Options-context
and-contextjs
are also removed since context is stored with the descriptors. - Thread pool size (option
-tp
) is set to the number of available processors - Tool
searchStorage
does not need context to be specified since it is stored with the descriptors. Parameter-context
and-contextjs
are not available currently. Note that specifying dissimilarity metric in this release is not possible for this tool. Ordering of processing steps are changed; descriptors are read first. - Visualization tools
overlapGui
,selfOverlapGui
andrealtimeSearch
are unified intogui
. - For tool
gui
webapp can be specified with system propertycom.chemaxon.overlap.wui.webapp
which can be overriden with option-webapp
. This system property is initialized by default to the correct location. - Self contained examples moved to directory
examples
from directorydoc/examples
. - Self contained examples works in directory
tmp
created from the working directory - Self contained examples changed for clarity
Major/incompatible changes in the underlying Overlap API
- Simplification of
KnnResults
, introduction ofNnResults
- Using
NnResults
as the return type in most similar search against multiple queries IdProjector
is renamed toIndexProjector
to avoid confusion with IDs.- Serialized formats changed. Using
com.chemaxon.overlap.persistence.serialization.Deserializer
based storage. - Using
com.chemaxon.overlap.persistence.storage.DescriptorContainer
in command line tools
Version 0.0.8 - 0.0.12
These versions are used internally, no further detailed change log is available.
Version 0.0.7
New functionality
- Self overlap analysis introduced. See scripts
self-overlap-example.sh
,calculateSelfOverlap.sh
andselfOverlapGui.sh
Major/incompatible changes
Major changes in the underlying Descriptors API resulting in the handling of Overlap analysis contexts:
- Unguarded for handling is part of the descriptors API
- Unguarded context and unguarded context factory introduced on the Descriptors API level
- It is recommended to use
DescriptorGenerator.contextFactory()
to acquire and parametrizeOverlapAnalysisContext
through API or JS hook. - Possibly breaking changes in Descriptors / Overlap APIs
Incompatible overlap analysis context JS hook changes:
- Constants
uge_fw
,uge_lw
(unguarded extractors),uge_bl_tanimoto
,ugc_bl_manhattan
,ugc_fv_euclidsqr
ugc_fv_manhattan
,ugc_fv_maxdiff
(cnguarded comparators) are removed.
Deprecation
- CLI
pdi.sh
and classPagedDescriptorImport
are not recommended for usage/as example.