List of changes - MadFast Similarity search command line distribution
Please note that serialized forms (binary files created and handled by the tools) are not compatible across different versions.
Version 0.3.5 (2019-10-01)
Summary of changes
This release contains patches for the following bugs:
-
asynchronous similarity searches over very small target sets failed: when the target count was smaller than the requested hit count asynchronous similarity search resulted in an exception. This bug caused real time similarity search functionality of the Web UI over very small datasets unusable.
-
Web UI molecules display page showed additional properties regardless its display settings.
Version 0.3.4 (2019-09-23)
Summary of changes
- Components from ChemAxon backend APIs are updated to version
19.20.0-10485. - MarvinJS component is updated to version
19.19.0 - Embedded web/REST server upgraded to Jetty version
9.4.15.v20190215 - Experimental Web UI client library and extension point added
- Web UI usability improvements
- REST API raw file handling, asynchronous server loading and asynchronous search calls.
- AdoptOpenJDK support, see Getting started guide for details.
Web UI related changes
-
Experimental development: Web UI JS client library. See Using MadFast Web UI JS library for details.
-
Experimental development: server loading progress following through Web UI. This change is listed in Web UI, REST API and Command line tools categories. For details see document Asynchronous server loading.
-
Web UI dynamic layout improvements, changes
- When a component palette item is clicked the new component will be appended to the first available container.
- New columns and rows can be added to the layout.
- Empty columns can be removed.
- Empty columns display a visual placeholder.
- Column height is equalized during layout changes (when moving UI components or placing component palette items)
- Library Jqurey UI touch punch is included allowing the reordering of UI components on some touch-enabled displays. This touch support is an experimental workaround which will need later rework.
-
Crossfilter updated to version from 1.4.0 to 1.4.6 (see https://github.com/crossfilter/crossfilter/releases) which support array valued dimensions (see https://github.com/crossfilter/crossfilter/wiki/API-Reference#wiki-dimension_with_arrays)
-
Crossfilter based word cloud display component improvements
- Array valued dimension support
- Optionally sort items by filtered count
- Optionally hide items having 0 filtered count
- Optionally fixed component height
- Optional additional content rendering
- Optional custom label formatting
- Clear filter by clicking on word cloud area (similar to crossfilter based histogram components)
- Filtered state indicated by component background
-
Crossfilter based histogram component improvements
- Array valued dimension support
- Filtered state indicated by component background
-
Crossfilter based scatter plot component improvements
- Array valued dimension support.
- Dimension selection dialog improved.
- Info dialog added.
-
Statistics display page improvements
- Add description for handled dimensions.
- Improve info dialog.
-
Molecules display page improvements
- Additional properties displayed on molecule table (see Store additional data)
- Molecule display defaults to dearomatized and dehydrogenized view
Improvements - REST API
-
Server feature flags introduced: command line tool
gui.shsupports option-disable <FEATURE>as a basic access control functionality. Features implemented:RAWFILES_LIST,RAWFILES_MODIFY,ASYNC_CALL_SEQUENTIAL_ID_GENERATION,ASYNC_CALL_ID_LIST. See command line help withgui.sh -hand document REST API security considerations for details. -
Expermental raw file handling: Arbitrary small files can be kept in the server memory and served with chosen content type. Files can be specified with command line options and can be manipulated through the REST API. See documentation with examples at Raw file handling.
-
Experimental development: server loading progress following through REST API. This change is listed in Web UI, REST API and Command line tools categories. For details see document Asynchronous server loading.
-
Experimental development: Asynchronous search tasks.
-
Non-compatible change:
loadtimefromStatisticsDtoreturned by REST API endpointstatisticsis removed. The load time data is available inloadingSuperTask.runningDurationMs.
Improvements - Command line tools
-
Rings parameter added to
CFPfingerprint parameters JAVA API. Calculating bits associated to rings (up to a default ring size limit) can be disabled. See JAVA API of documentationCfParametersandCfpParameters.Builderclasses and Basic overview of the concepts of overlap analysis context. for details. -
Request logging to stdout / stderr is possible from embedded REST server
gui.sh. Use command line option-log -/-log -2. See command line helpgui.sh -hfor details. -
Option
-earlyStartadded to embedded servergui.sh. When used the web server and REST API starts listening before resource loading is finished. Progress of resource loading can be tracked on the Web UI and through the REST API. For details see document Asynchronous server loading. -
Launcher scripts accept special command line arguments
-launcherverbose(to print verbose/debug info from the script) and-classpath <SPEC>(to specify additional components of the classpath of the launched application). For details see document Command line interfaces (CLIs). -
Launcher scripts check if variable
JAVA_HOMEis set and use java from<JAVA_HOME>/bin/java. -
Tool
createMms.shandbuildStorage.shgzipped input file (specified by-in <LOCATION>) recognition fixed.
Improvements - documentation
-
Download location of dataset
emolecules-pluslatest version is updated in example scriptdownload-molecules.shand in document Prepare molecules. -
Glossary updated with Crossfilter related definitions and asynchronous REST API related terms.
-
Documents added:
Improvements - examples
-
Example scripts collect CPU count during execution which is stored in execution statistics files written and exposed on the visualization pages. See Profiling and execution statistics and Self contained example for details.
-
Fix PubChem random subset creation bug in example script
download-molecules.sh(script stopped after creating the 1k subset). See document Prepare molecules for details. -
Example for property space analysis of
nci-250kandchemblsets are added to scriptexamples/overlap-example.sh. For details see Store additional data. -
Example scripts
rest-api-example.shandrest-api-medium.shaccept option-eand pas it to the embedded server as-earlyStart. See Asynchronous server loading for details. -
Example script
rest-api-example.shchecks if outputs of preprocessing steps are already calculated in a previous execution. For details see document Asynchronous server loading and Self contained examples.
Version 0.3.3 (2018-07-09)
Summary of changes
-
Utilities and documentation for performance optimization added.
-
Components from ChemAxon backend APIs are updated to version 18.16.0
New functionality, improvements - REST API
-
REST API endpoint
statisticsis extended with- Server uptime and version info returned by the default statistics response
statistics/profiling-snapshotto create a detiled VM memory/garbage collector state description- Experimental endpoint
get-total-sizeinfoto estimate memory consumption of all exposed resources
-
Experimental
get-sizeinfomethods are added to the following REST API endpoints to estimate individual resource memory consumption. Note that execution on large sets can be very long currently.
New functionality, improvements, changes - Web front-end
-
Experimental server memory and garbage collection statistics are available from the landing page.
-
Server version is displayed on the landing page.
-
Server uptime is displayed on the landing page when hovering with the mouse on the exposed molecule/descriptor count text.
Improvements - documentation
-
Document Server memory optimization added.
-
Document REST API security considerations added.
-
Download location of dataset
emolecules-plusis updated in example scriptdownload-molecules.shand in document Prepare molecules. -
Links to non-document files of (molecule files, script sources, directories of the distribution) are removed in the following documents:
Version 0.3.2 (2018-06-29)
Summary of changes
-
Components from ChemAxon backend APIs are updated to version 18.15.0. Please note that due to a known issue the embedded server
gui.shprintsWARNINGlevel log messages (WARNING: Ignoring unknown format: png) when a molecule image conversion is requested. -
k-NN analysis visualization now can include stored properties. See Store additional data for details and use cases.
-
Self contained example script
rest-api-example.shcontains example of additional stored properties calculated from chemical terms.
New functionality, improvements, changes - Web front-end
-
The k-NN analysis visualization page downloads only the data required for displaying dimensions. Prior to 0.3.2 visualization page downloaded all the stored neighbor (dissimilarity and index) information for analysis.
-
Direct dimension selection icon added to crossfilter based histograms (k-NN analysis visualization page, statistics page) title bar. Dimension selection dialog (of the k-NN page) is more structured / descriptive than the dimension listing previously available from the histogram context menu. When no custom dimension selection dialog is specified (as currently at the statistics page) this dialog still available and replicates the listing in the context menu.
-
The k-NN analysis visualization page use textual ("Most similar"/"2nd most similar"/...) neighbor names on histogram dimensions.
-
Dimension selection list of k-NN analysis histograms removed from their dropdown menu: with the possibility to use query and target molecule property space the list would be excessively long. For these histograms the dimension selection dialog available from the title icon can be used.
-
The k-NN analysis visualization page provides a molecule display setting icon (cogwheel icon on the left taskbar) allowing hydrogen and aromaticity display settings for the page.
-
Checkbox and radio buttons behavior in the dropdown menus fixed: mouse click is registered on their label (text part, not only on the checkbox/radio button icon). Dropdown menu is not closed when a click on the checkbox or radio button part is registered.
-
Download on the k-NN analysis visualization page changed: more attributes (query and target IDs) are written to the output; tab and newline characters in fields are changed to spaces. Note that further changes are expected in this download functionality.
-
Info message shown when a dimension change on a crossfilter component (word cloud or histogram on statistics or k-NN results visualization pages) removes a previously set filter.
New functionality, improvements - REST API
-
REST API endpoint
knn-resultsis extended withknn-results/{res}/queryindicesto retrieve query master indices.knn-results/{res}/neighborcountsto retrieve stored neighbor counts for every query.knn-results/{res}/neighbors/{k}/indicesto retrieve stored neighbor indices for every query.knn-results/{res}/neighbors/{k}/dissimilaritiesto retrieve stored neighbor dissimilarities for every query.knn-results/{res}/neighbors/{k}/props/{propname}to retrieve additonal properties for neighbors for every query.knn-results/{res}/neighbor-png-or-placeholderto retrieve a image of a single neighbor of a single query.knn-results/{res}/query-png-or-placeholderto retrieve a image of a single query.knn-results/{res}/table-labelsto retrieve a labels (IDs) typically displayed in a k-NN table visualization.
-
DTO
KnnInfoextended withMoleculeSetInfoof query/target sets. -
DTO
MoleculePropRangeextended withmissingvalue,presentcountandmissingcount.
Improvements - documentation
-
Minor clarifications in the Getting started guide and Installing dependencies on Windows version 10.
-
Fixed separator bar (between UI component icons) highlight on hover / mouse pointer.
-
Documents Store additional data and Introduction to overlap analysis are updated.
Improvements - Command line tools
- Bug fixed in command line argument parsing of
calculateOverlap.shtool: commas in additional property declarations (typically occurring in Chemical Terms expressions) caused an exception.
Version 0.3.1 (2018-06-13)
Summary of changes
-
Home link leading to the index page (upper left corner of Web UI pages) changed to relative (
index.html) omitting leading/: for proxied deployments behind arbitrary URL patterns this fixes navigation to the index page. -
Some dialogs (page/component descriptions) of the Web UI were failing. The underlying problem with markdown formatted dialog contents fixed.
-
Web UI index page displays stored molecule / descriptor counts. On hovering the server load time is shown.
-
Fixed flickering of placeholder for molecule images in the real time search and knn visualization pages under Internet Explorer.
-
Fixed X axis tick format of histograms on execution statistics visualization page: in case of time dimensions the tick labels were displayed in milliseconds.
-
Expose resource initialization time in the following DTOs:
DescriptorInfo,KnnInfo,MoleculeSetInfo,ResourceClassInfoDto, -
Fix Java package of DTO classes
KnnDataandKnnInfo.
Version 0.3.0 (2018-06-07)
Summary of changes
- Complete revamp of the WebUI
- Overlap analysis calculation and interactive visualization is added. See Introduction to overlap analysis for details.
- MarvinJS component is updated to version 18.5.0
- Components from ChemAxon backend APIs are updated to version 18.10.0
- Additional properties can be attached to molecules and exposed on the REST API. Please note that this feature is under construction. For more details see Store additional data
- Command line tools and self contained examples are prepared to run on Mac OS X. See Getting started guide.
Improvements - documentation
- Document Introduction to overlap analysis added.
- Document Store additional data added.
- Document Installing dependencies on Windows version 10 added.
- Max OS X installation details added to the Getting started guide.
- Release dates added to this document.
New functionality, improvements - Examples
- Download locations of datasets
GDB-13andemolecules-plusare updated in example scriptdownload-molecules.shand in document Prepare molecules. - Download script for
SureChEMBLuses FTP directory listing instead of theREADMEfile to download segments in example scriptdownload-molecules.shand in document Prepare molecules. - Download script and desctiption for
ChEBIdataset added to document Prepare molecules. - Overlap analysis was added to script
rest-api-example.sh. - Example script
overlap-example.shadded. - SDF version of the PubChem random 1k dataset is added in file
data/molecules/pubchem-compound/pubchem-compound-rnd-1k.sdf.gz. Contents of this file is processed by therest-api-example.shexample script by importing various properties. - Public dataset ChEBI (Chemical Entities of Biological Interest (ChEBI)) is added to Prepare example molecule sets and script
download-molecules.sh. - Scripts (command lines and self contained) are fixed on Mac OS X using command
greadlinkinstead ofreadlink. On Max OS X to installgreadlinkinvokebrew install coreutils. For details see https://brew.sh/. - Example scripts determine CPU model name and total memory. Profiling and execution statistics uses these values among with the value thread pool size (parameter
-tp <THREADPOOL>). - Fix in retention of
overlap-benchmark.*system properties indata/sanitize-prof.js. - Update allocated memory for self contained example scripts
rest-api-XXX.shdue to the size increase of the used public molecule sets.
New functionality, improvements - Command line tools
- Tool
calculateOverlap.shadded for similarity based overlap analysis calculations. See Introduction to overlap analysis for details. - File inputs recognize gzipped files.
- Error when key or value contained space in
-D<propkey>=<provalue>style system property declaration fixed. - Embedded server (
gui.sh) improvements - invokegui.sh -hfor detailed help- Parameter
-page <URL>added to use when opening a browser - Parameter
-inis multi arity. - Network interface addresses are printed to the console on startup.
- Increase request log details - see Rest API example and option
gui.sh -log <FILE_PATTERN> - Option
-additionalresourcedir <DIR>added to embedded servergui. When specified contents of the referenced directory is exposed under path/additional/by the server. This option can be used to specify a valid Marvin JS license to the Web UI. For details see the Getting started guide, REST API / Web UI for similarity searches section Advanced server configuration: Additional static content and Self contained examples documents. - Error when installation directory contained space fixed.
- Parameter
New functionality, improvements - REST API
- k-NN analysis results are exposed on API endpoint
knn-results - Initial limited support for storing and retrieving additional properties on molecules - See Store additional data and REST API endpoint
molecules- molecule set info (
molecules/{set}) extended with property names and property descriptions - molecule sets info (
molecules) endpoint - molecule set info objects exposed - molecule info (
molecules/{set}/{index}) endpoint also extended with properties molecules/{set}/{index}/props/{props}endpoint addedmolecules/{set}/get-multiple-propsendpoint addedmolecules/{set}/get-multiple-idsendpoint addedmolecules/{set}/props/{propname}/get-properties-on-index-rangeendpoint (GET/POST) added
- molecule set info (
- REST API endpoint
metaadded with metadata on available resources - REST API endpoint
statisticsadded to serve basic server statistics. - REST API endpoint
molconverter/convert(application/jsonrequest body encoding version) extended withmolpropsandpseudosparameters (seeConversionRequestdata type).
New functionality, improvements, changes - Web front-end
- Web UI revamp:
- Unify UI look across screens
- Make UI components removable
- Richer UI interaction feedback
- Marvin JS component also moveable
- Web UI codes are built and packaged using WebPack. Note that the Web UI uses packaged and minified JavaScript codes which are not suitable for direct modifications.
- Web UI is expected to be compatible with Internet Explorer 11, Safari.
- Real time search page revamp:
- Multiple pick lists supported
- Pick list interactions (remove/reorder) changed: cells can be dragged over other droppables (sketcher, other picklist). Dragging the cells by the reorder handle allows reordering. Clicking on cell remove button removes cell. Cell layout and information content depends on cell size.
- Pick list and hits display cells are resizable
- Hits display cell count can be set by resizing cells or component by the resize handles.
- Dissimilarity distribution chart is resizable.
- Dissimilarity distribution chart caches distribution; zebra mode changed; bin size can be changed.
- Molecules display page revamp:
- Hidrogenize, aromatize display options added.
- Molecule details dialog shows additional properties.
- Statistics results page revamp:
- Unify look and feel; usable on smaller screens.
- Fewer initial components.
- Histograms instead of zebra chart
- Components can be added and changed
- Scatter plot redesigned; non-numeric axes supported
- Table columns can be added, removed and reordered
- k-NN analysis visualization page added.
- Upper left icon (product logo) on the UI pages takes to the index page; it is changed to a proper HTML link which
- Vertical scrollbar on real time search page is always shown to avoid component layout changes when scrollbar is needed.
- Shepherd tour based page introductions are replaced by page and UI component specific help/info dialogs.
Version 0.2.3 (2017-01-11)
New functionality, improvements - Command line tools
- Improvements of tool
searchStoragein output and visualization. For examples and details see Basic search workflow and Details onSearchStorage.-
Option
-out-matrix-as-listadded to create a list style textual output forFULLMATRIXsearch mode instead of the default matrix style, similar to the output ofMOSTSIMILARSmode. When this option is used forFULLMATRIXsearch mode the optional dissimilarity threshold (specified by option-maxdissim <VALUE>) is also considered: query-target pairs having dissimilartity exceeding the threshold wont be printed. -
Option
-out-numeric-format <FORMAT>added to specify numeric formatting (precision, etc) of dissimilarity results in textual output. -
Option
-heatmap-image <FILE>with further options-heatmap-image-....added to render simple heatmap visualizations of search results. -
Textual output of dissimilarity results can be disabled by passing an empty String (
"") to option-out <FILE>.
-
Improvements - documentation
-
Details on
SearchStorageis improved with examples on the available search modes and the heatmap image generation. -
Links to original images added.
New functionality, improvements - Examples
-
Example molecule set
antibioticsadded to filedata/molecules/antibiotics/antibiotics.smi. For details see Prepare example molecule sets and file data/README_data.html. -
Example molecule set
who-essential-medicinesadded to filedata/molecules/who-essential-medicines/who-essential-medicines.smi. For details see Prepare example molecule sets and file data/README_data.html.
Version 0.2.2 (2017-01-02)
Summary of changes
- MarvinJS component is updated to version 16.12.12
- Components from ChemAxon backend APIs are updated to version 16.12.26.0
Bugfixes
- License checking bug fixed: license
Overlapwas required for certain functionalities instead of the expectedMADFASTlicense.
Version 0.2.1 (2016-12-12)
New functionality, improvements - Command line tools
- Tool
jsevaloptions-df <NAME>=<FILE>and-out <LOCATION>added. No printing of the script to be executed. - Script
data/sanitize-prof.jsto compact/sanitize execution statistics and profiling files added.
Improvements - documentation
- Fix code examples, use the introduced
sanitize-prof.jsscript and improve document Profiling and execution statistics - Further minor docmentation, styling updates.
New functionality, improvements - REST API/Web UI examples
- Use the introduced
sanitize-prof.jsscript inrest-api-XXXself contained examples. - Example script
rest-api-vitamints.shis removed.
Version 0.2.0 (2016-12-07)
Summary of changes
- Licensing is modified according to longer term plans. Core functionality needs license
MADFAST. LicenseMACCSis needed forMACCS-166fingerprint generation. For further license dependencies see the Getting started guide. Note that already existingOverlaplicenses are equivalent with the newMADFASTlicense, so they wont coverMACCSorECFPfingerprint generation functionality. - This is the first publicly available release of this distribution. The distibution is renamed to
madfast-cli-<VERSION>from its previousoverlap-examples-cli-<VERSION>name. - Java 1.8 is required.
- Embedded Jetty server for REST API/Web UI is updated to version
jetty-9.3.13.v20161014. This mitigated known vulnerability involved version used earlier (8.1.8.v20121106). When used in production however it is recommended to check Jetty Sercurity Reports for possible further uncovered vulnerabilities. - Jersey framework providing JAX-RS implementation for the REST API is updated to version 2.23.2.
- Java 1.8 style used for the Java API documentation. The presentation of the generated documentation changed.
- New version (2.7.0) of tool Enunciate used for generating REST API documentation. The layout and presentation of the generated documentation changed.
- MarvinJS component is updated to version 16.11.14
- Components from ChemAxon backend APIs are updated to version 16.11.14
- Improved documentation, command line tools, self contained examples - see details below.
Improvements - documentation
- File index.html is added with reorganized links to different documentations.
- Documentation links from README are removed.
- Getting started guide is extracted to a separate document and improved.
- Performance overview is extracted to separate documen and extended with further data points.
- File
TODO.txtis removed. - Basic search workflow document is simplified.
- Use two supplied datasets (to demonstrate
sdfandsmileshandling). - Remove detailed performance data.
- Use two supplied datasets (to demonstrate
- Details on
searchStorageis added. - Document Examples provides more details on self contained example scripts.
- Diagrams, screenshots are added to various documents.
- Styling of HTML documentation changed.
- Syntax highlighting for the code examples are added using highlight.js.
- Document Metric customization tversky example corrected; examples to customize metric for REST API queries added.
- Example for sending
POSTrequest usingcurladded to document REST API / Web UI for similarity searches.
New functionality, improvements, changes - Web front-end
- Page option
distadded to real time similarity search (simsearch.html). Whenhideused no dissimilarity distribution is displayed on startup. (Example usage:http://localhost:8081/simsearch.html?ref=rest/descriptors/vita-cfp7/&dist=hide) - Real time similarity search (
simsearch.html) page shows 16 most similar hits on startup (instead of 10). - Index page improvements:
- Page layout and appearance improved.
- Showing available resource classes (data exposed by the server) in precedence based ordering.
- Use lexical ordering for resource listing.
- Show resource sizes.
-
Statistics and profiling results display page always show vertical scrollbar preventing possible jitter while interacting with the page.
New functionality, improvements - Command line tools
- Tool
stdgoption-erroutspecify output file for structures causing error. - Tool
prepareMoleculesfor common molecule conversions added. Usage examples can be found in document Prepare molecules. - Tools
searchStorage,buildStorage,createMMsandstdggarbage collection at the end of the execution skipped when no profiling or statistics collection is requested. - Tool
searchStoragefurther improvements:- Targets can be specified as molecules (see options
-tm,-tmf,-tidname,-tidprop). - Targets can be specified as custom descriptors (see options
-td,-tdf,-tdescsplitter,-tidsplitter) - Query IDs can be specified (see options
-qm,-qmf,-qd,-qdf,-qidname,-qidprop,-qdescsplitterand-qidsplitter) - See also options
-contextand-contextjs. - Detailed help on metric customization and context setting/customization is printed with option
-hd. - See Details on
searchStorage. - Note that verbose messages printed during execution changed.
- Targets can be specified as molecules (see options
New functionality, improvements - REST API
- Basic server statistics added. See REST API documentation of
StatisticsResource. - Dissimilarity distribution calculation for descriptor queries added. See REST API documentation of endpoint
distribution-by-descriptorofDescriptorResource.
New functionality, improvements - Examples
- Example workflow scripts will use the subdirectories of the distributions
examples-tmp/directory as their default working directory. Subdirectory name is derived from the script name. For example scriptexamples/rest-api-small.shwill useexamples-tmp/rest-api-small/as its default working directory. The location of the working directory can be set using option-w <WORKDIR>.
New functionality, improvements - REST API/Web UI examples
For details on the REST API / Web UI example scripts, their exposed datasets, memory requirements, estimated runtime see document Examples.
- Example
examples/rest-api-small.shis simplified- Use the shipped
nci-250kdataset. - Use only
CFPfingerprint. - Use
examples-tmp/rest-api-small/as default working directory.
- Use the shipped
- Example
examples/rest-api-example.shadded to demonstrate major configuration steps in document REST API example. - Example
examples/rest-api-medium.shis added. - Example
examples/rest-api-medium-maccs.shis added. - Example
examples/rest-api-large.shis added. - Example
examples/rest-api-large-ecfp.shis added. - Example
examples/rest-api-large-ecfp-maccs.shis added. - Example
examples/rest-api-xlarge.shis added. - Example
examples/rest-api-xlarge-ecfp.shis added. - Example
examples/rest-api-xlarge-ecfp-maccs.shis added. - Example
examples/rest-api-xxlarge.shis added. - Example
examples/rest-api-xxlarge-ecfp.shis added. - Example
examples/rest-api-xxlarge-ecfp-maccs.shis added.
New functionality, improvements - Workflow examples
- Example
examples/search-workflow.shis simiplified:- Use shipped
drugbank-all(as target) andvitamins(as query) datasets. - Use only
CFPfingerprint. - Use
examples-tmp/search-workflowas default working directory. - Option
-n(nowget) and-m <MOLDIR>(specifyMOLDIR) removed. - Option
-t(test mode) removed.
- Use shipped
- Example
examples/custom-binaryfp-workflow-vitamins.shis improved:- Fixed on windows + cygwin.
- Use
examples-tmp/custom-binaryfp-workflow-vitamins/as default working directory.
- Example
examples/custom-floatv-workflow.shis improved:- Use
examples-tmp/custom-floatv-workflow/as default working directory.
- Use
New functionality, improvements - Examples - Public datasets
- Example
examples/download-molecules.shis simplified.- Parameter
-m <MOLDIR>removed. - Use
examples-tmp/download-moleculesas the default working directory. - Put downloaded and processed files into subdirectory
downloadof the working directory. - Sets which are included in the distribution (
drugank-all,nci-250k,chembl) are removed from the download script. - Print timestamps; log
wgetoutput; check if at least one set to download is specified.
- Parameter
- Dataset
GDB-13and its subsetGDB-12added to example scriptdownload-molecules.sh(invoked with option-G) and to document Prepare molecules. Note that this dataset is only used as a reference in Performance overview.
New functionality, improvements - Included datasets
- The Vitamins dataset is moved into directory
data/molecules/vitamins/. - The DrugBank Open Data dataset is available in directory
data/molecules/drugbank/. For details see filedata/molecules/drugbank/README.htmland document Prepare molecules. Download option for the DrugBank dataset from script [examples/download-molecules.sh] is removed. - The NCI Release 1 dataset is available in directory
data/molecules/nci/. For details see filedata/molecules/nci/README.html. Download option for the NCI dataset from scriptexamples/download-molecules.shis removed. - ChEMBL dataset (version chembl_21) is available in directory [
data/molecules/chembl)(data/molecules/chembl). For details see filedata/molecules/chembl/README.html. Download option for the ChEMBL dataset from scriptexamples/download-molecules.shis removed. - PubChem Compound random subsets 1k 10k and 100k are available in directory
data/molecules/pubchem-compound. Creating random ordering of the emolecules set is removed from scriptexamples/download-molecules.shand from document Prepare molecules.
Version 0.1.7 (2016-04-21)
New functionality, improvements - REST API
- Embedded server
guisupports Cross-Origin Resource Sharing. When parameter-allowedOrigins <ORIGINS>is specified CrossOriginFilter is configured, value of the parameter<ORIGINS>is used as theallowedOriginsparameter of the filter. For usage example see document REST API example.
New functionality, improvements - Examples
- In
download-molecules.shSureChEMBL download (invoked with option-S) fixed. - Dataset pubchem-compound-rnd-1k added to
download-molecules.shand to document Prepare molecules.
Version 0.1.6 (2016-04-20)
Summary of changes
- Metric specification in Descriptors API, command line interfaces, REST API and real time similarity search front-end implemented.
- Real time similarity search frontend improvements: metric/descriptor specification, dynamic layout with multiple similarity search hits/dissimilarity distribution chart components, usability improvements.
- Self conatined examples
rest-api-multiple.shandrest-api-large.shimproved. - Maccs-166 implementation is exercised by
rest-api-large.sh. Please see remark regarding licensing below.
New functionality, improvements, changes - Web front-end
- Real time similarity search front end main changes:
- Taskbar added with component palette, info and help.
- Metric customization and descriptor selection for most similar structures and dissimilarity distribution display components.
- Additional most similar structures and dissimilarity distribution display components can be added from the taskbar by clicking/dragging on component palette icons.
- Display components (with the exception of the sketcher) can be rearranged.
- Help button provides a small page tour.
- Changes in the most similar structures display component in real time similarity search front end:
- Component dropdown menu is added with descriptor and metric selection options.
- Feedback message (displaying search time) changed. Multiple messages can be displayed. Descriptor and metric changes are displayed by this message component.
- Component title (showing
Most similar structures (<DESCRIPTION>)) changed to showMost similar structures (<NAME>: <DESCRIPTION>)[ with "<METRIC>"] - Component title bar (containig title, component context menu and icons) is not hidden when no hits displayed.
- Dissimilarity bar (blue bar at the bottom of structure cards) scaling changed: previously the bar represented the 0.0 .. 1.0 dissimilarity interval. Dissimilarity values from certain metrics (such as non normalized versions of
euclidean,manhattanandcommonpart) can be outside of this interval. Current scaling depends on the range of dissimilarity values by extending the actual interval to 0.0 .. 1.0 and rounding the resulting interval to nice values using the underlying D3 library'sd3.scale.linear.nice()method. The modified behavior is equivalent with the previous one for metrics resulting dissimilarity values in the 0.0 .. 1.0 interval.
- Changes in the dissimilarity distribution display component in the real time similarity search front-end:
- Component dropdown menu is added with descriptor and metric selection options.
- Feedback message added to display search speed, target count and descriptor/metric changes.
- Component title bar with a notification message is shown when no distirbution is displayed.
- Error panel is shown when distribution calculation failed (for example because of invalid metric parameterization).
- Spinner overlay is displayed while waiting for distributopm calculations.
- Chart is
- Dissimilarity distribution component uses
POSTrequests. URL size limit ofGETrequests used in the previous version caused failure for large structures.
New functionality, improvements - REST API
- Metric customization related functionalities are added to REST API endpoint
descriptors/{desc}:- Endpoint
descriptors/{desc}/get-available-metricsprovides metadate on the accepted metrics. - Endpoints
/descriptors/{desc}/distribution,/descriptors/{desc}/find-most-similars,/descriptors/{desc}/find-most-similars-by-descriptorand/descriptors/{desc}/find-most-similars-by-idaccept optional parametermetric. When not specified the default behavior is preserved. For details see REST API documentation ofDescriptorsResource.
- Endpoint
- REST API endpoint
/descriptors/{desc}/distributionfor dissimilarity distribution calculation acceptsapplication/x-www-form-urlencodedPOSTrequest. (In the previous version onlyGETrequests were supported with query parameters.) For details see documentation.
New functionality, improvements - Command line tools
- Tool
searchStorageoption-metriccan specify metric to be used for comparison. Command line help printed by option-hprovides an overview of applicable metrics for various descriptors. See also new documentation Metric customization. - Help of command line tool
searchStorageimproved: context specification and customization description was irrelevant and removed. - Tool
searchStorageprints and records progress info during result printing; this progress info is exposed on execution profiling/benchmark visualizations. - Bug fixed in tool
searchStoragewhich aborted execution for large full matrix calculation due to integer overflow in progress reporting. - Tool
dumpStorageis able to export the contents of descriptor storage in various formats using options-descoutand-descf. For details see help printed bydumpstorage -h. Functionality existed in previous version, command line help is clarified in this version. - Descriptor generator classes for tool
stdgprints underlying standardizer configuration.
Major/incompatible changes in the underlying Overlap/Descriptors API
- Deprecated float vector metric
EUCLIDEAN_NORMALIZEDwas removed fromcom.chemaxon.descriptors.metrics.FloatVectorMetrics. @Descriptionannotations oncom.chemaxon.descriptors.metrics.FloatVectorMetricsandcom.chemaxon.descriptors.metrics.BinaryMetricscleaned up.- Binary vector metrics
PETKEandSIMPSONare added. - Serialization fixed in Maccs-166 implementation.
Other changes
-
Download link for Emolecules Plus dataset is updated in document Prepare molecules and in example script
download-molecules.sh. Post processing of the downloadedEmolecules Plusdataset in scriptdownload-molecules.shis fixed. -
README documentation links reorganized.
New functionality, improvements - Examples
-
Self contained example script
rest-api-multiple.shimproved. Profilig and statistics is collected and exposed by the launched server. Allocated memory for launching embedded server increased to 10G from previous 8G value (-Xmx10g used instead of -Xmx8g). System propertyoverlap-benchmark.fingerprintfor descriptor generation runs is added and displayed on statistics page exposed by the launched server. -
Self contained example script
rest-api-large.shadded with more datasets. Please note that this example calculates MACCS-166 fingerprints. Currently this fingerprint is covered by theOverlaplicense; this might change in a future release. -
SureChEMBL dataset added to Prepare example molecule sets document and to script
download-molecules.sh. -
Document Verification and benchmarking of concurrent implementations and script
verify-concurrent-generation.shadded.
Version 0.1.5 (2016-02-10)
New functionality, improvements - Command line tools
- Tool
searchStorageoption-maxdissimcan specify maximum dissimilarity threshold (inclusive) for search modesMOSTSIMILARandMOSTSIMILARS. Targets with dissimilarity exceeding this threshold are not printed to the output.
Other improvements
- Executable flag set for various non .sh files fixed.
Version 0.1.4 (2016-01-14)
Bugfixes
- Fixed: most similar search with multiple queries might use excessive amount of memory during execution.
- Fixed: web UI/embedded server initialization might fail when importing data from files having the same file names
New functionality, improvements - Web front-end
- Real time similarity search web ui component is revamped:
- Showing dissimilarity distribution
- Entering molecule source
- Structures (from sketcher or from hit list) can be cherry picked and downloaded.
- Structure source can be specified with URL parameter
srcandfrm.
New functionality, improvements - Command line tools
- Tools
createMms,buildStorage,searchStorageandstdgcan write VM profiling log specified by parameters-prof. and-profres. The written profiling log contains periodic snapshots of the status of garbage collectors and VM memory pools. State of runningProgressObservers also recorded. - Tools
createMms,buildStorage,searchStorageandstdgcan write performance statistics using option-stat. - Tool
dumpStoragecan export descriptors in various formats using options-descoutand-descf. - Interactive visualization of execution statistics and profiling data is available in web ui.
- Initial revision of documentation Profiling and execution statistics added.
- Script
examples/concat-jsons.shmerges the content of specified files containing JSONs into a JSON array. - Tool
stdgoption-stdjsadded to specify standardization in a JS hook. - Command line tool
guioption-profresis multi arity.
New functionality, improvements - Examples
-
Self contained example script
benchmark.shimproved. See usage example in document Profiling and execution statistics for details. -
Download script
download-molecules.shadded to download and prepare public molecule sets. For usage help launch with opion-h. For details of the downloaded sets see document Prepare molecules.
New functionality, improvements - REST API
-
REST API endpoint
molconverter/convertfor structure conversion with optional 2D clean is added. See documentation. -
REST API endpoint
molconverter/convertPOST request with JSON request body added. For details see documentation of the endpoint and the request JSON. Note that this endpoint acceptsGETrequests (with URL encoded query parameters),POSTrequests either withapplication/x-www-form-urlencodedparameters orapplication/jsonrequest objects. -
REST API endpoint
/descriptors/{desc}/distributionfor dissimilarity distribution calculation is added. See documentation.
New functionality, improvements - Descriptors API
-
MACCS-166 fingerprint implementation is added.
-
Interface
com.chemaxon.descriptors.common.unguarded.UnguardedContextexpose associatedDescriptorComparator. Note that methodsextractorandcomparatorare renamed tounguardedExtractorandunguardedComparator.
Major/incompatible changes in the REST API
-
For POST requests REST API endpoints
molconverter/cxformatandmolconverter/cxbinformatexpect all parameters asapplication/x-www-form-urlencodedform parameters. In the previous versions the structure source was expected as the request body (astext/plain), the further parameters were expected as query parameters, similar to the GET requests. From this version the structure is also requested as a form parameter with namemol. -
For POST request REST API endpoints
descriptors/{desc}/find-most-similarsanddescriptors/{desc}/find-most-similars-by-descriptorexpect all parameters asapplication/x-www-form-urlencodedform parameters. In the previous versions the query structure/descriptor source was expected as the request body (astext/plain), the further parameters were expected as query parameters, similar to the GET requests. From this version these are also requested as a form parameter with namequery/query-descriptor.
Major/incompatible changes in the underlying Overlap/Descriptors API
-
In interface
DescriptorGeneratormethodcontextFactory()is removed. -
In interface
DescriptorGeneratormethodcomparisonContextFactory()is added. -
In interface
DescriptorComparatormethodunguardedContext()is added. -
MDTableReaderAPI expose a compatible descriptor generator for the deserialized desciptors with method getDescriptorGenerator(). The returned generator is intended to use as the factory of descriptor comparators (either through itscomparisonContextFactoryor by its direct factory methods). MethodgetDefaultComparatoris removed. implementations (CfpTableReader,EcfpTableReaderandPfTableReader) additional comparator factory methods are also removed. -
Deprecated method
getDescriptorGeneratorin interfacecom.chemaxon.descriptors.common.Descriptoris removed.
Other improvements
-
Output of option
-profcontains execution statistics. See document "Profiling and execution statistics" for details. -
Version information is available in the JAVA API
com.chemaxon.overlap.version.OverlapVersion. Version info is exposed in execution statistics. For details see apidoc.
Version 0.1.3 (2015-07-21)
Bugfixes
- Missing descriptions for some result elements in the Enunciate REST API documentation is fixed. (See example.)
New functionality, improvements - Command line tools
- Tool
guiparameter-sslkeystoreand-sslkeystorepassspecify SSL keystore. When keystore specified embedded server accepts https connections. For security concerns of this version see issues documentation. For usage example see REST API example documentation. - Tool
guiparameter-portcan accept value0to use any available port. Allocated port number is printed to the console. - Tool
guican import ID-s with no attached molecules using option-idonly. The created molecule storage will store the read ID-s but all the molecules are marked as absent. This makes possible to import custom descriptors without attached molecules. See REST API example documentation for an example. Self contained example script rest-api-vitamins.sh also contain this modification. - Tools
createMmsandbuildStoragecan write performance statistics using option-stat. See Basic search workflow as an example. - Tool
stdgaccepts parameters-cfgstring,-slowoutand-slowlimit.
New functionality, improvements - REST API
- Error handling in the REST API is improved. Further information is available in the JAVA API documentation. See example and description of the error description object returned.
- Diagnostic REST API endpoint
generate-error-responseadded. See documentation. - REST API endpoint
molecules/{set}/{index}/png-or-placeholderadded. See documentation. - Tool
importStorageparameter-infilteris added. See custom float descriptors for an example on usage. - REST API endpoint
descriptors/{desc}/find-most-similars-by-idfor launching similarity search against a structure contained by the attached molecule storage is added. See documentation. - REST API endpoint
descriptors/{desc}/find-most-similars-by-descriptorfor launching similarity search against a String representation of a descriptor is added (both GET and POST supported). See documentation.
New functionality, improvements - Examples
- Self contained example script benchmark.sh added.
Major/incompatible changes in the underlying Overlap/Descriptors API
- DTOs used in the REST API are moved into package
com.chemaxon.overlap.wui.dto. See package javadoc. - Error handling changes in the REST API. Parse errors (molecules, descriptors) result in status 400 (Bad Request). Invalid references to molecule sets, descriptor sets; queries with no results typically result in status 404 (Not found).
Version 0.1.2 (2015-06-15)
Bugfixes
- Standardizer added to the descriptor parametrization example in the context concepts documentation.
New functionality, improvements
- Some of the self contained example scripts found in
examplesdirectory accept arguments which customize their behavior. Some of the self contained example scripts provide test mode. For details see their documentation. - Tool
guihas parameters-stopportand-stopsecret. For details see rest api example documentation. - Glossary added to the documentations.
Changes of command line tools
- Self contained example scripts delete already existing log file instead of appending.
Major/incompatible changes in the underlying Overlap/Descriptors API
- Method
descriptorComparatoradded toOverlapAnalysisContext. This allows specifying metric later.
Version 0.1.1 (2015-06-03)
Bugfixes
- Tool
importStorageis available again. - Custom binary and float descriptor workflow descriptions and example scripts are available again:
New functionality
- New helper functions
ctx_from_desc,ctx_from_descpbandctx_from_desc_compinOverlapAnalysisContextcustomization scripting hooks. - REST API documentation generated by Enunciate added.
- Java API documentation of classes definied in this distribution is added. Note that some of the classes appearing in this documentation might be unused, non complete or removed in any subsequent release.
- Tool
createAllAbsentMmsadded. - Tool
importStorageparameter-aammsadded. - Concepts documentation of using
OverlapAnalysisContextadded - Improved documentation
REST API changes
DescriptorsResource.MostSimilarsResult.querymight contain query ID or query descriptorDescriptorsResource.MostSimilarsResult.querysmiis optional; not filled when querying by descriptorsDescriptorsResource.MostSimilarsResult.findMostSimilarscountparameter is interpreted as max count; not recommended to useDescriptorsResource.MostSimilarsResult.findMostSimilarsmaxCountparameter is added.DescriptorsResource.MostSimilarsResult.findMostSimilarsmaxDissimilarityparameter is added.DescriptorsResource.MostSimilarsResult.findMostSimilarsPostcountparameter is interpreted as max count; not recommended to useDescriptorsResource.MostSimilarsResult.findMostSimilarsPostmaxCountparameter is added.DescriptorsResource.MostSimilarsResult.findMostSimilarsPostmaxDissimilarityparameter is added.rest/molecules/{set}/{id}has query parametersmilesrest/molecules/{set}/find-idquery method added
Changes of command line tools
- Tool
importStorageparameter-tobytesis renamed to-out.OverlapAnalysisContextused for descriptor import is stored in the output binary file; it is not needed to specify for search. - Diagnostic tool
stdghas optional descriptor post processing scripting hook-processdesc.
Major/incompatible changes in the underlying Overlap/Descriptors API
- Method
unguardedContext()added to interfacecom.chemaxon.descriptors.common.DescriptorComparator - Method
getUnguardedDissimilarityCalculator()added to interfacecom.chemaxon.descriptors.metrics.BinaryVectorComparator - Class
com.chemaxon.descriptors.common.binary.SimpleBinaryVectorComparatorconstructor needs associated guard object reference as parameter - Class
com.chemaxon.descriptors.common.realvector.SimpleFloatVectorComparatorconstructor needs float array size as parameter - Method
getGuardObject()added to interfacecom.chemaxon.descriptors.common.DescriptorComparator. - Method
getGuardObject()added to interfacecom.chemaxon.descriptors.common.unguarded.UnguardedExtractor. - Method
getGuardObject()added to interfacecom.chemaxon.descriptors.common.unguarded.UnguardedContext. - Descriptor comparators check guard objects of descriptors compared against the associated
DescriptorGenerator OverlapAnalysisContextunguarded handling defaults to the underlyingDescriptorComparator.unguardedContext()com.chemaxon.overlap.io.MasterMoleculeStorageclass is separated into interfacecom.chemaxon.overlap.io.MasterMoleculeStorageand implementationcom.chemaxon.overlap.io.MasterMoleculeStorageImpl- Contents of package
com.chemaxon.overlap.persistenceis moved to moduleoverlap-core. - Method
isPresent(int)added to interfacecom.chemaxon.overlap.io.MasterStorage - Method
getSource(int)added to interfacecom.chemaxon.overlap.io.MasterMoleculeStorage - Interface
com.chemaxon.overlap.persistence.serialization.IndirectSerializableadded and used by various storages
Version 0.1.0 (2015-05-11)
This version is incomplete, some of the functionalities, documentation and examples are in a work in progress state. For details see the issues document.
Bugfixes
- Memory leak in Java versions prior to 7u4 caused by
String.substring()(see http://bugs.java.com/view_bug.do?bug_id=4513622) work around in splitters used in custom descriptor imports
New Functionality
- (Not available in this version) Visualization for self overlap / inter set overlap analysis is unified
- (Not available in this version) Visualization for most similar search / knn search is unified
- (Not available in this version) New overlap analysis visualization: knn map, deep zoom knn map
- (Not available in this version) Export of knn analysis visualization
- Improved documentation
Changes of command line tools
- Serialized formats changed; can not use binary files generated by prior versions
- Tool
buildStorageoption-tobytesis renamed to-out.OverlapAnalysisContextused for descriptor generation is stored in the output binary file; it is not needed to specify for search. - Tool
dumpStorageuses option-into specify inputs which types are recognized. Options-mms,-midanddescare removed. Options-contextand-contextjsare also removed since context is stored with the descriptors. - Thread pool size (option
-tp) is set to the number of available processors - Tool
searchStoragedoes not need context to be specified since it is stored with the descriptors. Parameter-contextand-contextjsare not available currently. Note that specifying dissimilarity metric in this release is not possible for this tool. Ordering of processing steps are changed; descriptors are read first. - Visualization tools
overlapGui,selfOverlapGuiandrealtimeSearchare unified intogui. - For tool
guiwebapp can be specified with system propertycom.chemaxon.overlap.wui.webappwhich can be overriden with option-webapp. This system property is initialized by default to the correct location. - Self contained examples moved to directory
examplesfrom directorydoc/examples. - Self contained examples works in directory
tmpcreated from the working directory - Self contained examples changed for clarity
Major/incompatible changes in the underlying Overlap API
- Simplification of
KnnResults, introduction ofNnResults - Using
NnResultsas the return type in most similar search against multiple queries IdProjectoris renamed toIndexProjectorto avoid confusion with IDs.- Serialized formats changed. Using
com.chemaxon.overlap.persistence.serialization.Deserializerbased storage. - Using
com.chemaxon.overlap.persistence.storage.DescriptorContainerin command line tools
Version 0.0.8 - 0.0.12
These versions are used internally, no further detailed change log is available.
Version 0.0.7
New functionality
- Self overlap analysis introduced. See scripts
self-overlap-example.sh,calculateSelfOverlap.shandselfOverlapGui.sh
Major/incompatible changes
Major changes in the underlying Descriptors API resulting in the handling of Overlap analysis contexts:
- Unguarded for handling is part of the descriptors API
- Unguarded context and unguarded context factory introduced on the Descriptors API level
- It is recommended to use
DescriptorGenerator.contextFactory()to acquire and parametrizeOverlapAnalysisContextthrough API or JS hook. - Possibly breaking changes in Descriptors / Overlap APIs
Incompatible overlap analysis context JS hook changes:
- Constants
uge_fw,uge_lw(unguarded extractors),uge_bl_tanimoto,ugc_bl_manhattan,ugc_fv_euclidsqrugc_fv_manhattan,ugc_fv_maxdiff(cnguarded comparators) are removed.
Deprecation
- CLI
pdi.shand classPagedDescriptorImportare not recommended for usage/as example.
