Getting Started with DeepChem
Creators

Like many fields, chemistry is in the midst of a machine learning transformation. Chemistry also has some peculiarities that make getting started with machine learning a challenge. What would be helpful is a workbench that makes it possible to conduct simple but illustrative studies with minimal ceremony. DeepChem is a batteries-included suite that seeks to fill this need. This article describes DeepChem's installation, and works through an example showing one way to train a random forest model to predict experimental solubility data.
About DeepChem
From the README, DeepChem provides:
… a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.
DeepChem goes beyond merely aggregating packages in that it also provides a suite of uniform machine learning primitives customized for chemistry and biology. For this reason, it can be useful to both beginners and experts.
Installation
The README recommends running DeepChem through Google Colab. In my experience, that approach seems less than ideal because it requires the re-installation of at least one dependency (RDKit) before each use. There are other reasons for running DeepChem on your own machine. For example, your use case may be more sophisticated than what Jupyter Notebooks allow.
A previous article described the installation of a cheminformatics stack consisting of RDKit, Jupyter, and Anaconda. The procedure for installing DeepChem re-uses many elements from that approach. If you haven't done so already, begin by installing Anaconda on your system.
My approach installs DeepChem, RDKit, Jupyter and matplotlib into an Anaconda instance. I've found it to be reproducible on my macOS Mojave installation:
conda create --name deepchem-test
conda activate deepchem-test
conda install -y -c conda-forge rdkit nb_conda_kernels matplotlib
pip3 install tensorflow==2.2.0
pip3 install --pre deepchem
Three points are worth noting:
pip3
is my system's Python 3 Pip installation. You may be able to use justpip
.- It's important to install Pip dependencies after the
Anaconda dependencies, and while the
deepchem-test
environment is activated. - The DeepChem pre-release is required. The last stable release of DeepChem was one year ago (2.3.0). Unfortunately, all of DeepChem's examples are written to use the most recent API, which is not backward-compatible. To avoid this headwind, go with the pre-release.
Starting a Notebook
Jupyter notebooks ("notebooks") offer a convenient alternative to the Python REPL thanks to inline graphics and publication capabilities. For this reason, the DeepChem example that follows will be presented as a notebook.
By way of preparation, ensure that your terminal prompt is prefixed
with (deepchem-test)
, or the name you chose for the
Anaconda environment. If this isn't the case, activate the environment
with:
$ conda activate deepchem-test
(deepchem-test) $ jupyter notebook
Example
What follows is a transcript of a notebook I created while working through the DeepChem project's aqueous solubility tutorial. The goal is not develop a practical model, but rather to illustrate some of DeepChem's capabilities. For a more technical treatment of solubility modeling, see the article by Pat Walters. More examples are available from the DeepChem repository. An entire subdirectory of examples is dedicated to the Delaney solubility set.
[↓]{.font-bold .text-primary-600 .ltr:pr-2 .rtl:pl-2 .dark:text-primary-400}Skip to main content{.px-3 .py-1 .text-sm .-translate-y-8 .rounded-b-lg .bg-primary-200 .focus:translate-y-0 .dark:bg-neutral-600}
::::::::::: {.main-menu .flex .items-center .justify-between .px-4 .py-6 .sm:px-6 .md:justify-start .space-x-3 style="padding-left:0;padding-right:0;padding-top:2px;padding-bottom:3px"}
Depth-First{.text-base .font-medium .text-gray-500 .hover:text-gray-900}
::::: {.ltr:mr-14 .rtl:ml-14 .flex .items-center} ::: {.flex
.items-center .justify-center .dark:hidden}
:::
::: {.items-center .justify-center .hidden .dark:flex}
::: :::::
::::: {.flex .md:hidden .items-center .space-x-5 .md:ml-12 .h-12}
::: {.flex .items-center .justify-center .dark:hidden}
:::
::: {.items-center .justify-center .hidden .dark:flex}
::: :::::
::: {.-my-2 .-mr-2 .md:hidden} ::: :::::::::::
Page Not Found 😕
Error 404
::: {.prose .dark:prose-invert} It seems that the page you've requested does not exist. :::
© 2024
Powered by Hugo{.hover:underline .hover:decoration-primary-400 .hover:text-primary-500 target="_blank" rel="noopener noreferrer"} & Blowfish{.hover:underline .hover:decoration-primary-400 .hover:text-primary-500 target="_blank" rel="noopener noreferrer"}
::::::: {#search-wrapper .invisible .fixed .inset-0 .flex .h-screen .w-screen .cursor-default .flex-col .bg-neutral-500/50 .p-4 .backdrop-blur-sm .dark:bg-neutral-900/50 .sm:p-6 .md:p-[10vh] .lg:p-[12vh] url="http://localhost:1313/" style="z-index:500"} :::::: {#search-modal .flex .flex-col .w-full .max-w-3xl .min-h-0 .mx-auto .border .rounded-md .shadow-lg .top-20 .border-neutral-200 .bg-neutral .dark:border-neutral-700 .dark:bg-neutral-800}
:::::: :::::::
Video Tutorial
A video tutorial from Jan Jansen's helpful YouTube channel explains how to use graph convolution with DeepChem to predict aqueous solubility (to better effect than random forest).
Conclusion
DeepChem is a suite of machine learning primitives geared toward chemistry and biology. In addition to a wide range of functionality, DeepChem offers many examples illustrating how to build and use predictive models with datasets containing chemical graphs. This article shows how to install DeepChem from scratch and works through the training of a model for aqueous solubility step-by-step.
Additional details
Description
Like many fields, chemistry is in the midst of a machine learning transformation. Chemistry also has some peculiarities that make getting started with machine learning a challenge. What would be helpful is a workbench that makes it possible to conduct simple but illustrative studies with minimal ceremony. DeepChem is a batteries-included suite that seeks to fill this need.
Identifiers
- UUID
- c93c4e94-c957-46f7-8e6d-8c31141835b6
- GUID
- https://depth-first.com/articles/2020/09/14/getting-started-with-deepchem/
- URL
- https://depth-first.com/articles/2020/09/14/getting-started-with-deepchem
Dates
- Issued
-
2020-09-14T14:00:00
- Updated
-
2020-09-14T14:00:00