Predicting the Disease Risk of Protein Mutation Sequences With Pre-training Model

Li, Kuan and Zhong, Yue and Lin, Xuan and Quan, Zhe (2020) Predicting the Disease Risk of Protein Mutation Sequences With Pre-training Model. Frontiers in Genetics, 11. ISSN 1664-8021

[thumbnail of pubmed-zip/versions/2/package-entries/fgene-11-605620-r1/fgene-11-605620.pdf] Text
pubmed-zip/versions/2/package-entries/fgene-11-605620-r1/fgene-11-605620.pdf - Published Version

Download (1MB)

Abstract

Accurately identifying the missense mutations is of great help to alleviate the loss of protein function and structural changes, which might greatly reduce the risk of disease for tumor suppressor genes (e.g., BRCA1 and PTEN). In this paper, we propose a hybrid framework, called BertVS, that predicts the disease risk for the missense mutation of proteins. Our framework is able to learn sequence representations from the protein domain through pre-training BERT models, and also integrates with the hydrophilic properties of amino acids to obtain the sequence representations of biochemical characteristics. The concatenation of two learned representations are then sent to the classifier to predict the missense mutations of protein sequences. Specifically, we use the protein family database (Pfam) as a corpus to train the BERT model to learn the contextual information of protein sequences, and our pre-training BERT model achieves a value of 0.984 on accuracy in the masked language model prediction task. We conduct extensive experiments on BRCA1 and PTEN datasets. With comparison to the baselines, results show that BertVS achieves higher performance of 0.920 on AUROC and 0.915 on AUPR in the functionally critical domain of the BRCA1 gene. Additionally, the extended experiment on the ClinVar dataset can illustrate that gene variants with known clinical significance can also be efficiently classified by our method. Therefore, BertVS can learn the functional information of the protein sequences and effectively predict the disease risk of variants with an uncertain clinical significance.

Item Type: Article
Subjects: Archive Science > Medical Science
Depositing User: Managing Editor
Date Deposited: 28 Jan 2023 10:02
Last Modified: 11 Jun 2024 13:34
URI: http://editor.pacificarchive.com/id/eprint/85

Actions (login required)

View Item
View Item