A new COVID-19 detection method from human genome sequences using CpG island features and KNN classifier

Arslan H., ARSLAN H.

Engineering Science and Technology, an International Journal, vol.24, no.4, pp.839-847, 2021 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 24 Issue: 4
  • Publication Date: 2021
  • Doi Number: 10.1016/j.jestch.2020.12.026
  • Journal Name: Engineering Science and Technology, an International Journal
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.839-847
  • Keywords: COVID-19, CpG islands, Human coronaviruses, K-Nearest Neighbors, SARS-CoV-2
  • Ankara Yıldırım Beyazıt University Affiliated: Yes


© 2020 Karabuk UniversityVarious viral epidemics have been detected such as the severe acute respiratory syndrome coronavirus and the Middle East respiratory syndrome coronavirus in the last two decades. The coronavirus disease 2019 (COVID-19) is a pandemic caused by a novel betacoronavirus called severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). After the rapid spread of COVID-19, many researchers have investigated diagnosis and treatment for this terrifying disease quickly. Identifying COVID-19 from the other types of coronaviruses is a difficult problem due to their genetic similarity. In this study, we propose a new efficient COVID-19 detection method based on the K-nearest neighbors (KNN) classifier using the complete genome sequences of human coronaviruses in the dataset recorded in 2019 Novel Coronavirus Resource. We also describe two features based on CpG island that efficiently detect COVID-19 cases. Thus, genome sequences including approximately 30,000 nucleotides can be represented by only two real numbers. The KNN method is a simple and effective non-parametric technique for solving classification problems. However, performance of the KNN depends on the distance measure used. We perform 19 distance metrics investigated in five categories to improve the performance of the KNN algorithm. Some efficient performance parameters are computed to evaluate the proposed method. The proposed method achieves 98.4% precision, 99.2% recall, 98.8% F-measure, and 98.4% accuracy in a few seconds when any L1 type metric is used as a distance measure in the KNN.