Atom typing on natural products with the cdk

During some work on natrual products I analysed a proprietary database with over 180000 molecules with the help of cdk-taverna. The aim of the work will be a diversity analysis of natural products. Therefore I have to calculate some molecular descriptors from the cdk. But before I could calculate the descriptors I have to check whether the cdk could handle these molecules or not. Therefore I develope workflow to test how the cdk performs with natural products, especially how many wrong or missing atom types will be detected within this public database.
Therefore I developed the following workflow:

This workflow can be found here on myExperiment.org

For the proprietary database of natural products which contains over 180000 molecules the cdk had only problems with 1350 molecules. Which is less than 0.8 % of the molecules. These molecules has 1854 wrong or failing atom types.

The large number of wrong nitrogens is mainly caused by five bonded nitrogens for which the cdk currently has no atom type defined.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: