Asynchronous search tasks

Up to MadFast version 0.3.3 similarity search requests to the REST API were processed synchronously only: when the request was received by the server the search was launched and the response of the request was composed from the search results. For long running searches no cancellation or progress observing is available. The synchronous endpoints are still available, however new, experimental asynchronous capabilities were introduced in 0.3.4.

Please note that asynchronous task handling is highly experimental currently. Incompatible changes are expected in future releases. Please contact us if you plan to use these asynchronous enthis extension library to discuss your use case and compatibility requirements.

As a reference see the typical synchronous call to the REST API:

Sync call

Key concepts for asynchronous tasks

Overview of an asynchronous call:

Async call

Additional details

Web UI usages

Similarity search results and dissimilarity distribution display components use asynchronous endpoints rest/descriptors/{desc}/find-most-similars-async and rest/descriptors/{desc}/distribution-async.

The current WebUI displays progress bar during longer searches as well it will send cancel requests when new structure is entered.

REST API examples

The following REST API examples use diagnostic tool (part of the distribution tool codebase) com.chemaxon.overlap.wui.SlowUnguardedComparator). When injected as a comparator this class introduces a specified delay into every similarity comparison. The following examples use the drugbank dataset (of around 7k molecules).

Prepare running server:

# Import MMS
bin/createMms.sh -in data/molecules/drugbank/drugbank-common_name.smi.gz -out mms.bin

# Calculate fingerprint; inject slow comparator wrapper as the default comparison
# Note that the similarity search internally will further group pages together
# Specify pagesize 1 in order to avoid grouping all targets together a single page
bin/buildStorage.sh \
    -in data/molecules/drugbank/drugbank-common_name.smi.gz  \
    -out fp.bin \
    -context createSimpleCfp7Context \
    -contextjs 'ctx.pagesize(1).unguarded(
        ctx.getUnguardedExtractor(),
        new Packages.com.chemaxon.overlap.wui.SlowUnguardedComparator(
            ctx.getUnguardedDissimilarityCalculator(), 1
        )
    );'

# Launch embedded server
# Specify to use only a single working thread for search execution
bin/gui.sh \
    -tp 1 \
    -allowedOrigins "*,*" -nobrowse -port 8085\
    -mols -mms:mms.bin:-name:m \
    -desc -desc:fp.bin:-name:slow-fp:-mols:m

Launch similarity search

echo "******************************************************"
echo "Launch request"
echo "******************************************************"
echo

# Note that we must send a JSON request object
# Option -sS makes curl hide its progress but show errors
curl \
    -sS \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{ "query":"C1CCCCC1", "maxCount":1 }' \
    "http://localhost:8085/rest/descriptors/slow-fp/find-most-similars-async" | python -m json.tool

# Get status immediately, after 2 and 10 seconds 
echo "******************************************************"
echo "Task status after launched"
echo "******************************************************"
echo

# We "know" that the first assigned task ID (after server startup) will be "AR0000"
curl -sS "http://localhost:8085/rest/experimental-async-calls/AR0000" | python -m json.tool

sleep 2

echo "******************************************************"
echo "Task status while running"
echo "******************************************************"
echo

curl -sS "http://localhost:8085/rest/experimental-async-calls/AR0000" | python -m json.tool

sleep 10

echo "******************************************************"
echo "Task status after finished"
echo "******************************************************"
echo

curl -sS "http://localhost:8085/rest/experimental-async-calls/AR0000" | python -m json.tool
******************************************************
Launch request
******************************************************

{
    "error": null,
    "id": "AR0000",
    "partialResult": null,
    "result": null,
    "task": {
        "cancelled": false,
        "done": false,
        "id": "T0004",
        "name": "async-AR0000",
        "runningDurationMs": 1,
        "startTimeMs": 1568302605472,
        "totalWork": null,
        "workUnit": null,
        "worked": 0
    }
}

******************************************************
Task status after launched
******************************************************

{
    "error": null,
    "id": "AR0000",
    "partialResult": null,
    "result": null,
    "task": {
        "cancelled": false,
        "done": false,
        "id": "T0004",
        "name": "async-AR0000",
        "runningDurationMs": 54,
        "startTimeMs": 1568302605472,
        "totalWork": 7123,
        "workUnit": null,
        "worked": 0
    }
}

******************************************************
Task status while running
******************************************************

{
    "error": null,
    "id": "AR0000",
    "partialResult": null,
    "result": null,
    "task": {
        "cancelled": false,
        "done": false,
        "id": "T0004",
        "name": "async-AR0000",
        "runningDurationMs": 2072,
        "startTimeMs": 1568302605472,
        "totalWork": 7123,
        "workUnit": null,
        "worked": 1000
    }
}

******************************************************
Task status after finished
******************************************************

{
    "error": null,
    "id": "AR0000",
    "partialResult": null,
    "result": {
        "query": "C1CCCCC1",
        "querysmi": "C1CCCCC1",
        "searchtime": 7637,
        "targetcount": 7123,
        "targets": [
            {
                "base64img": null,
                "dissimilarity": 0.3333333333333333,
                "targetid": "MOLECULE-3252",
                "targetimageurl": "rest/molecules/m/3252/png-or-placeholder?w=0&h=0",
                "targetindex": 3252,
                "targetmolurl": "rest/molecules/m/3252"
            }
        ]
    },
    "task": {
        "cancelled": false,
        "done": true,
        "id": "T0004",
        "name": "async-AR0000",
        "runningDurationMs": 7665,
        "startTimeMs": 1568302605472,
        "totalWork": 7123,
        "workUnit": null,
        "worked": 7123
    }
}

Security considerations

Asynchronous call statuses are visible to every REST API clients without authentication/authorization. By default IDs are assigned sequentially and tasks can be listed. When default settings used then the details of launched tasks (included query structures) are visible to all REST API clients.

See document REST API security considerations for additional details on the options to mitigate these risks (randomized task ID generation and disabling listing) using server feature flag options.