Harvard researchers use AI to find active areas in cell DNA
top of page

Harvard researchers use AI to find active areas in cell DNA


 

Researchers from Nvidia and Harvard are publishing research on a new way they've applied deep learning to epigenomics . The study of modifications on the genetic material of a cell. Using a neural network originally developed for computer vision, the researchers have developed a deep learning toolkit that can help scientists study rare cell types and possibly identify mutations that make people more vulnerable to diseases.


Nvidia researcher Avantika Lal, lead author on the paper said, the new deep learning toolkit, called AtacWorks, allows us to study how diseases and genomic variation influence very specific types of cells of the human body. And this will enable previously impossible biological discovery and we hope would also contribute to the discovery of new drug targets.


AtacWorks, featured in Nature Communications, works with ATAC-seq, a popular method for finding the parts of the human genome that are accessible in cells. Just about every cell in your body carries a copy of your genome sequence, a sequence of your DNA about 3 billion bases long. However, only certain parts of the genome sequence are accessible to certain cells. Every cell type whether it's liver, blood or skin cells can only access the regions of DNA they need for their respective function.


Lal said, that allows us to understand what makes every type of cell different from each other, or how every type of cell is affected in disease or in other biological changes. ATAC-seq finds those accessible parts by producing a signal for every base in the genome.

Peaks in the signal denote accessible regions of DNA. This method typically requires tens of thousands of certain kinds of cells to get a clean signal. This makes it challenging to study rare cell types like the stem cells that produce blood cells and platelets. However, by applying AtacWorks to ATAC-seq data, the researchers found they could rely on just tens of cells, rather than tens of thousands. In the research described in their new paper, the Nvidia and Harvard scientists applied AtacWorks to a dataset of stem cells that produce red and white blood cells. They used a sample set of just 50 cells to identify distinct regions of DNA associated with cells that develop into white blood cells, as well as separate sequences that correlate with red blood cells.


AtacWorks is a PyTorch-based convolutional neural network that was trained on labeled pairs of matching ATAC-seq datasets one high quality and one noisy. The model learned to predict an accurate high-quality version of a dataset and identify peaks in the signal. Running on Nvidia Tensor Core GPUs, the model took under 30 minutes for inference on a whole genome, a process that normally takes 15 hours on a system with 32 CPU cores.


Lal said, the researchers were able to train the model on any type of cell and then apply it to any different type. That's a really wonderful thing because it means that we can train models using whatever data we have available and then apply it to entirely new biological samples. We are hoping that once our paper comes out, other scientists working with different diseases would also pick up this technique and be interested in using it. And we are excited to see what new research and new developments that can enable.


The model could help deliver insights into a range of diseases including cardiovascular disease, Alzheimer's disease, diabetes or neurological disorders. It's available on the NGC Software Hub, Nvidia's hub of GPU-optimized software, where any researcher can access it.

6 views0 comments

Recent Posts

See All
bottom of page