Asynchronous server loading

Up to MadFast version 0.3.3 the embedded server gui.sh started listening with the REST API / static contents only after all the exposed data was loaded into memory. For large deployments (exposing hundreds of gigabytes of data) this could take minutes. To follow the progress of server initialization one could rely only the periodic updates printed to the console.

MadFast version 0.3.4 introduced command line option -earlyStart. When this option is specified gui.sh launches the embedded Jetty HTTP server upon startup (in a few seconds) and continues to load resources (molecules, fingerprints, etc) asynchronously in the background. The exposed REST API provides endpoints to follow the loading process and allows the WebUI to visualize it.

Example - self contained examples

Self contained example scripts rest-api-example.sh and rest-api-medium.sh accept option -e. When specified they will invoke the embedded server (after preparation is done) with option -earlyStart.

On a desktop launch

examples/rest-api-medium.sh -b -e

After several minutes of preprocessing time (importing molecules, calculating fingerprints) a browser is opened and the importing process can be followed on the index page. Note that manual page refresh is needed to update resource lists. Note that when this script is launched again preprocessed data is reused thus preprocessing is skipped.

Example - artificial slow loading resource

A test data generation tool (in class com.chemaxon.overlap.wui.SlowlyDeserializingMms) is included in the distribution code base. When launched, this class writes a binary storage to its standard output which can be read by the MadFast tools as a master molecule storage. Reading is intentionally slowed and the resulting storage contains a random number of random (and possibly chemically non-sensible) molecules. Writing such storage once and reading it multiple times by the embedded server provides a simple way to produce long (10s of seconds) startup times with no lenghty preprocessing or excessive memory usage:

# Create the binary storage
java -cp lib/classpath.jar com.chemaxon.overlap.wui.SlowlyDeserializingMms > a.bin

# Load the slowly loading storage multiple times
bin/gui.sh \
    -allowedOrigins "*,*" -nobrowse -earlyStart -port 8085 \
    -mols -mms:a.bin:-name:slow0 -mols -mms:a.bin:-name:slow1 -mols -mms:a.bin:-name:slow2 -mols -mms:a.bin:-name:slow3 \
    -mols -mms:a.bin:-name:slow4 -mols -mms:a.bin:-name:slow5 -mols -mms:a.bin:-name:slow6 -mols -mms:a.bin:-name:slow7 \
    -mols -mms:a.bin:-name:slow8 -mols -mms:a.bin:-name:slow9 -mols -mms:a.bin:-name:slowA -mols -mms:a.bin:-name:slowB \
    -mols -mms:a.bin:-name:slowC -mols -mms:a.bin:-name:slowD -mols -mms:a.bin:-name:slowE -mols -mms:a.bin:-name:slowF \
    -mols -mms:a.bin:-name:slowG -mols -mms:a.bin:-name:slowH -mols -mms:a.bin:-name:slowI -mols -mms:a.bin:-name:slowJ \
    -mols -mms:a.bin:-name:slowK -mols -mms:a.bin:-name:slowL -mols -mms:a.bin:-name:slowM -mols -mms:a.bin:-name:slowN \
    -mols -mms:a.bin:-name:slowO -mols -mms:a.bin:-name:slowP -mols -mms:a.bin:-name:slowQ -mols -mms:a.bin:-name:slowR \
    -mols -mms:a.bin:-name:slowS -mols -mms:a.bin:-name:slowT -mols -mms:a.bin:-name:slowU -mols -mms:a.bin:-name:slowV \
    -mols -mms:a.bin:-name:slowW -mols -mms:a.bin:-name:slowX -mols -mms:a.bin:-name:slowY -mols -mms:a.bin:-name:slowZ

Connect to http://localhost:8085 to check landing page during loading.

Index page during loading

Loading tasks details

REST API endpoints

Note that the related REST API endpoints are in an experimental state. They are expected to be changed in an incompatible way in the next few releases.

We can use curl to examine the server loading data returned by the statistics endpoint (this endpoint is used by the index page to show the cumulated loading progress):

curl -g "http://localhost:8085/rest/statistics"  | python -m json.tool
{
    "loadingSuperTask": {
        "cancelled": false,
        "done": false,
        "id": "T0072",
        "name": "Loading resources",
        "runningDurationMs": 4704,
        "startTimeMs": 1565901020299,
        "totalWork": 72,
        "workUnit": "task",
        "worked": 4
    },
    "serverStartTimeMs": 1565901019331,
    "totaldescriptorcount": 0,
    "totalmoleculecount": 17229,
    "uptime": 5672,
    "version": "0.3.4-SNAPSHOT"
}

And the detailed tasks list (this endpoint is used by the details dialog):

curl -g "http://localhost:8085/rest/statistics/loading-status"  | python -m json.tool
{
    "serverStartTimeMs": 1565901019331,
    "superTask": {
        "cancelled": false,
        "done": false,
        "id": "T0072",
        "name": "Loading resources",
        "runningDurationMs": 4889,
        "startTimeMs": 1565901020299,
        "totalWork": 72,
        "workUnit": "task",
        "worked": 4
    },
    "tasks": [
        {
            "cancelled": false,
            "done": true,
            "id": "T0000",
            "name": "Reading master molecule storage from a.bin",
            "runningDurationMs": 2455,
            "startTimeMs": 1565901020299,
            "totalWork": 2268,
            "workUnit": null,
            "worked": 2268
        },
        {
            "cancelled": false,
            "done": true,
            "id": "T0001",
            "name": "Creating masterStorage index",
            "runningDurationMs": 5,
            "startTimeMs": 1565901022804,
            "totalWork": null,
            "workUnit": null,
            "worked": 0
        },
        {
            "cancelled": false,
            "done": true,
            "id": "T0002",
            "name": "Reading master molecule storage from a.bin",
            "runningDurationMs": 1617,
            "startTimeMs": 1565901022809,
            "totalWork": 1489,
            "workUnit": null,
            "worked": 1489
        },
        {
            "cancelled": false,
            "done": true,
            "id": "T0003",
            "name": "Creating masterStorage index",
            "runningDurationMs": 1,
            "startTimeMs": 1565901024476,
            "totalWork": null,
            "workUnit": null,
            "worked": 0
        },
        {
            "cancelled": false,
            "done": false,
            "id": "T0004",
            "name": "Reading master molecule storage from a.bin",
            "runningDurationMs": 710,
            "startTimeMs": 1565901024477,
            "totalWork": 1571,
            "workUnit": null,
            "worked": 624
        },
        {
            "cancelled": false,
            "done": false,
            "id": "T0005",
            "name": "Creating masterStorage index",
            "runningDurationMs": 0,
            "startTimeMs": null,
            "totalWork": null,
            "workUnit": null,
            "worked": 0
        },
        .....