Files in Data directory
Individual files
-
Descriptor configuration files
cfp-7-1.xml
,ecfp-4.xml
are based on samples provided in JChem distribution (injchem/examples/config/
). -
Text file
floatdesc.txt
contains an example 2D float vector descriptor in line format<ID> <X> <Y>
. -
Text file
floats-1d.txt
contains an example 1D float vector descriptor in line format<ID> <X>
. This file contains empty lines and comment lines starting with character#
. The textual<ID>
field reflects the actual value<X>
for each item. -
Details on script
sanitize-prof.js
can be found in document Profiling and execution statistics.
Frontend customization examples (jsclient-additional/
)
For details see document Using MadFast Web UI JS library (experimental).
Molecule sets (molecules/
)
Details on these and further publicly available molecule sets can be found in document Prepare example molecule sets.
-
File
molecules/vitamins/vitamins.smi
contains example molecules from http://en.wikipedia.org/wiki/Vitamins. -
File
molecules/antibiotics/antibiotics.smi
contains list of antibiotics based on https://en.wikipedia.org/wiki/List_of_antibiotics table "By class". -
File
molecules/who-essential-medicines/who-essential-medicines.smi
contains structures from WHO Model List of Essential Medicines (*adult list* of 19th edition, April 2015), based on https://en.wikipedia.org/wiki/WHO_Model_List_of_Essential_Medicines. -
File
molecules/vitamins/vitamins-mod.smi
contains modified versions of molecules in filevitamins.smi
. A single carbon atom ("C
") is added to the beginning of the SMILES strings in the source file (this results in valid but slightly different SMILES sources). String fragment "- MODIFIED
" is also appended to the names. This file can be used to benchmark search against larger public databases. -
Directory
molecules/drugbank/
contains the DrugBank Open Data dataset, available from http://www.drugbank.ca/releases/latest#open-data under Creative Common’s CC0 International License. -
Directory
molecules/nci/
contains the NCI Release 1 dataset, available from http://cactus.nci.nih.gov/download/nci/. According to the publisher of this dataset the structures are in the public domain. -
Directory
molecules/chembl/
contains ChEMBL compounds from http://www.ebi.ac.uk/chembl. The version of ChEMBL is chembl_21. For details see filemolecules/chembl/README.html
. Please note that structureCHEMBL2146209
(https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL2146209) is omitted from the SMILES version. -
Directory
molecules/pubchem-compound/
contains 1k 10k and 100k random subsets from the PubChem Compound database.