Jump to the main content block
TITLE

[2021-Apr-21] A Transformer Architecture Based on BERT and 2D Convolutional Neural Network to Identify DNA Enhancers from Sequence Information

ImgDesc
ImgDesc
ImgDesc

Institute of Information Systems and Applications

Speaker:

Prof. Khanh Lee 黎阮國慶 助理教授

Professional Master Program in Artificial Intelligence in Medicine台北醫學大學人工智慧醫療碩士在職專班

Topic:

A Transformer Architecture Based on BERT and 2D Convolutional Neural Network to Identify DNA Enhancers from Sequence Information

Date:

13:30-15:00 Wednesday 21-Apr-2021

Locate:

Delta R105

QR code:

Link:

https://meet.google.com/oxa-smvh-dmk

Hosted by:

Prof. Po-Chih Kuo

Abstract

Recently, language representation models have drawn a lot of attention in the natural language processing (NLP) field due to their remarkable results. Among them, Bidirectional Encoder Representations from Transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared. In this study, we present a novel technique by incorporating BERT-base-multilingual model in bioinformatics to represent the information of DNA sequences. We treated DNA sequences as sentences and transformed them into fixed-length meaningful vectors where 768- vector represents each nucleotide. As a case study, we applied our method to DNA enhancer prediction, which is a well-known and challenging problem in this field. We then observed that our BERT-base features improved more than 5-10% in terms of sensitivity, specificity, accuracy, and MCC compared to the current state-of-the-art features in bioinformatics. Moreover, advanced experiments show that deep learning (as represented by 2D convolutional neural networks) hold potential in learning BERT features better than other traditional machine learning techniques. In conclusion, we suggest that BERT and 2D convolutional neural networks could open a new avenue in biological modeling using sequence information.

All faculty and students are welcome to join.

Click Num: