As previously promised in a comment on Google+ I did a quick USRCAT search now with the most potent OSDD Malaria compound (OSM-S-39) against the ZINC database.

I previously generated (up to 20) conformers for all compounds in the ZINC drug-like set and calculated their USRCAT moments, leading to a PostgreSQL schema with partitions for each conformer serial (0-19). Using the pgopeneye cartridge, I only had to plug in the SMILES of compound OSM-S-39 to generate a conformer, to calculate the USRCAT moments and to run the virtual screen. In this particular case I only searched the low-energy conformers (conformer serial=0) as they return the best results in my experience. The total number of screened ZINC compounds was 12,454,678. The SQL query I used can be seen below:

The whole query only takes less than 2 seconds, with almost 1.5s spent on the more demanding conformer generation (the database I/O take a large toll as well). I did four runs with different USRCAT parameters: the first one with all pharmacophore weights set to 0.0, which is equivalent to a USR search. The second used 0.25 for all weights, the third 0.5 and the fourth and last 1.0. The screening results for each run can be seen and downloaded here from figshare. Maybe someone else will find something useful in the results!

Feb 26, 2013

Epigenetic drugs in CREDO

The lastest issue of Nature Reviews Drug Discovery contains a research highlight about a fairly new class of drugs that can act as modulators of epigenetic proteins. More specifically, it summarises this article in Nature Chemical Biology that goes on to describe the Discovery of a chemical probe for the L3MBTL3 methyllysine reader domain. A quick UniProt search in CREDO reveals that PDB entry 4FL6 belongs to this publication and contains the complex between and L3MBTL3 and the novel inhibitor UNC1215. The assembly can be visualised here with the interactions of one of the ligands of UNC1215 (4FL6/1/A/UWN1002`). The visualisation of the assembly will also reveal three SNPs that are in close contact with the ligand, with rs80129948 (D274N) that forms a side chain hydrogen bond, predicted to be damaging.

As a bonus here is another epigenetic drug candidate in CREDO, a small molecule inhibitor of BRDT that causes a reversible contraceptive effect. The journal article Small-Molecule Inhibition of BRDT for Male Contraception can be accessed here. The inhibitor JQ1 in PDB entry 4FLP can be seen here (and the assembly visualised as well). The visualisation will also the show the rather unusual (very) 3D conformation of the ligand.

After seeing screening results from the OSDD Malaria project showing up on Twitter I decided to use USRCAT to see if there are any similar compounds already in CREDO (PDB). Therefore I simply downloaded the compounds from the post on Google+ that contains the latest screening hits with the help of and picked the one with the highest activity (OSM-S-35, I think). OpenEye's OMEGA will complain about undefined isomers therefore the two cis/trans isomers of OSM-S-35 have to be enumerated first:

Cc1cc(c(n1c2ccccc2)C)/C=C\3/C(=N/C(=N\c4ccccc4)/S3)O Cc1cc(c(n1c2ccccc2)C)/C=C\3/C(=N/C(=N/c4ccccc4)/S3)O

The Lowest Energy Conformer (LEC) of each isomer can then be used to do a USRCAT search with the help of the CREDO web service:

There are definitely some interesting hits given the relatively small number of chemical components in the PDB (< 16k). The top-ranked hit from the first isomer, BEY, has potent in vitro and in vivo antimalarial activity and the complex with M1 Alanylaminopeptidase can be seen in PDB entry 3EBI. Further notable hits within the top 25 for this isomer include LL3, which inhibits EthR from Mycobacterium tuberculosis (PDB entry: 3Q0U). Considering the other isomer, the third-ranked hit R2C and 951 are inhibitors of dihydroorotate dehydrogenase that has been suggested as a target for the treatment of bacterial and fungal infections. FMX (Famoxadone) isa fungicide as well as MQ0 that inhibits Scytalone dehydratase. The other retrieved compounds were mostly inhibitors of various kinases, Hepatitis C polymerases, thrombins and Factor X, which is not surprising give the target bias in the PDB. The 25 best screening hits for each isomer can be downloaded here from figshare.

This was only a very quick test of USRCAT and an example of how to use the web service programmatically. More interesting certainly would be to run all the OSDD Malaria screening hits against ChEMBL and compound databases such as ZINC.

There seem to have been interest in the previous post that described some of the credoscript features with the help of an example. Therefore I thought readers might be interest in another elaborate credoscript example that shows more advanced features of CREDO, PostgreSQL and SQLAlchemy.

Continue reading

This post is going to explain some of the features of the PostgreSQL database and the SQLAlchemy Object Relational Mapper (ORM) that are used in credoscript, the CREDO Application Programming Interface (API). The credoscript API is written in the Python programming language and used to simplify the access to the database and to facilitate the analysis of the data in CREDO. There have been many complains about ORMs in the past and why they should not be used. Some of these were certainly true due to limitations that were imposed by the ORM, for example insisting that the primary key must be an integer and called id, and other nonsense like this. A well-designed (i.e. by someone that actually knows SQL) ORM however can be very powerful, particularly if it can be combined with an equally powerful database management system such as PostgreSQL. An example that demonstrates a couple of PostgreSQL and SQLAlchemy features used in the credoscript API is explained in detail below.

Continue reading

Just before the Christmas break I decided to change to visualisation style of interatomic contacts in the WebGL representation of PDB structures in the CREDO web interface. So far contacts have been rendered as simple lines that are hard to see and do not offer any styling options. Therefore I decided to implement contacts (similar to PyMOL) as dashed cylinders with optional parameters for gaps, radii, length and colour to highlight contact types and distances. The actual implementation was relatively straightforward after studying the examples in the Three.js documentation. An example of what the contact visualisation looks like now can be seen in the screenshot below for PDB:3N7S.

WebGL representation showing the interatomic contacts of the protein ligand complex in PDB entry 3N7S as PyMOL-like dashes instead of simple lines.

I recently re-implemented an old function to determine the Maximum Common Substructure (MCS) in a large set of molecules that I wrote three years ago for a paper as PostgreSQL (Python) database function. It is based on the OEMCSS function from the OEChem toolkit and therefore very concise - the two functions that are used are only 46 lines of combined code without whitespace and comments. More importantly, the multimcs is also very fast as the example below shows that uses the benzotriazole data set from Andrew Dalke's fmcs repository. The speed obviously depends on the MCS search parameters - the looser they are (atom & bond expression) the longer it will take for the function to find the MCS (if any).

The usefulness of the database implementation becomes more apparent when the multimcs function is used on a large, real-life data set. The example below determines the MCS for each assay in ChEMBL against target P53779, Mitogen-activated protein kinase 10 (JNK3). Only results having an MCS of at least five atoms are considered, and only assays with at least five compounds.

The multimcs function supports a few parameters for MCS searching. That is also why the function is not implemented as true aggregate function because they do not support parameters. Important are the parameters for the internally used OEMCSS function: atomexpr, bondexpr, mcstype (approximate or exhaustive) and the MCS scoring function, scorefunc. The atom and bond expression bit masks make the multimcs very flexible because any combination found in the respective namespaces can be used. The function also supports partial MCS searches through a threshold, i.e. only return the MCS that is shared by 80% of the compounds. The query below shows the result for the same query as above but now a threshold is introduced:

Below is a short screencast that shows the 3D visualisation options for small molecules and protein-ligand complexes on the upcoming CREDO website. These visualisations use the webGL-based GLmol which itself is using Three.js. It is absolutely lightweight (around 400Kb) and as can be seen with the protein-ligand complex (2527 atoms), lightning quick. I extended GLmol to be able to display interatomic interactions from CREDO and to add text labels to specific atoms. The default representation for a protein-ligand complex not only shows the interactions, but also highlights binding site-lining residues that are linked to mutations (shown in pale red with dbSNP label).

Return button