TECH REVIEW

WX: Technology for Novel Biomarker Discovery

Deargen has announced research results on an AI technology Wx that can discover novel biomarkers. Research achievements are being published in the journal of the Nature Scientific Report, and two inventions related to these have been patented. 

Introduction

A biomarker can be understood as a marker that represents biological characteristics or status. If a marker that indicates the mechanism of a particular disease is found, new targets of the drugs can be derived. And if a marker for the condition of the disease is found, it can be used as a diagnostic marker.

In particular, the rapid generation of NGS (Next Generation Sequencing) data and the digitization of patient data have led to an increasing number of attempts to discover new biomarkers as drug and diagnostic targets. In this regard, we are making this biomarker discovery platform using self-developed artificial intelligence technology.

Deargen’s Gene Selection Algorithm

The key to discovering biomarkers is to select the major marker candidates among the many biomarker candidates. Among them, the most core part is to select marker genes among numerous genes. With the advent of RNA-seq technology, it is possible to analyze the entire distribution of genes according to diseases or the results of experiments. Thus, we come to know which genes play a role as biomarkers by analyzing these genetic data. 

Until now, we have selected a significant group of gene candidates using a statistical method called differentially expressed gene (DEG). However, this is a mathematical statistics-based method, so this does not reflect the actual pheno-type pattern. As the number of genes and samples increases, so many candidates are pulled out, which makes the selection of candidates difficult. There is also a problem that the significance of the selected gene is unknown except for the statistical p-value. It was confirmed that other feature selection algorithms (MRMR, fisher score, LLL21, SVM etc.) also showed no significant results in the large-scale feature space. 

Therefore, we have developed Wx that is a deep learning based feature selection algorithm overcoming these shortcomings. Previously, even though it was learned though the deep learning, it was difficult to analyze outcome and input data because internal networks were considered as a black box. We derived the correlation between input feature and outcome by analyzing weight values inside the training of the deep learning and made it gene and biomarkers be selected by calculating the importance score. 

Method

In general, Neural Network can be expressed by stacking a layer composed of numerous nodes. And one node can be expressed as product of input and weight and Activation Function.

When classification deep learning model through softmax layer is trained, Feature Xi with Yi label as below can be expressed as probability distribution of product of weight and input value.

In this state, we analyzed the feature weight as follows. We gave a WX score by size differences after calculating DI (Discriminative Index) index for each feature in order to see how much each input feature affected the classification label. 

In the paper that calculated the importance of features by analyzing the neural network, it was difficult to reflect how much the actual input feature affected the outcome because only the weight trained from each feature was analyzed. However, in the Wx algorithm, the formula was designed considering even the input value of the input feature. Therefore, we could calculate more significant feature importance score by analyzing the difference value of Wx value multiplied by input value and feature weight.

Conclusion

The following table shows the cancer &normal classification accuracy comparison. We respectively selected 14 genes and 7 genes for the major carcinomas with each Wx and another algorithm. The 14 genetic panels selected by the Wx algorithm shows higher or more similar accuracy than the panel of 14gens selected by Peng’s method that was previously reported and edgeR that is the most widely used. Besides, when Cancer & Normal was predicted with only the top seven genetic panels, Wx showed higher accuracy in most cancers than Martinez’s method.

Below is a table that compares the accuracy of 14 genes selected respectively by the WX algorithm and Peng algorithm in various diseases other than cancer. It showed superior accuracy with the top 14 genes selected by Wx compared to Peng’s 14 gene panel.

Deargen’s opportunities

We are using Meta analysis and Wx algorithms to select novel drug targets. For example, in the case of a large number of published genetic profiling experiments (RNA-seq) for specific diseases, the most disease-specific genes can be selected by the Wx algorithm after correcting the Batch Effect through Meta analysis, Currently, many pipelines have selected novel targets with this approach. (Dementia, rheumatism, ALS, sacopenia, etc.)

We are securing biomarker discovery algorithms and platforms by further developing the basic Wx algorithm published in papers and codes. To build this platform, we made an implementation of a vast amount of data training and own deep learning model. It also continues to improve model performance through continuous data training in conjunction with the DearTRANS service.

Finally, we are currently evaluating the effectiveness of real patients by constructing lung cancer prognosis panel with Wx algorithm and has developed a Cascaded Wx algorithm that is more specialized for prognostic prediction.