PregAN-NET: Addressing Class Imbalance with GANs in Interpretable Computational Framework for Predicting Safety Profile of Drugs Considering Adverse Reactions During Pregnancy

Adverse Drug Reactions (ADRs) represent a significant risk to patient safety and are a leading cause of morbidity and mortality worldwide. Although ADRs can affect individuals across the lifespan, drug-induced maternal and perinatal adverse reactions are particularly concerning due to the dual impact on both the mother and the developing fetus [1]. In the last forty years, there has been a notable increase in the use of medications during pregnancy. Currently, up to 90% of pregnant women take at least one medication, and use is exceptionally high in the first trimester, with some women consuming an average of 2.6 medications and a third taking four or more [2]. These medications are required to manage pre-existing conditions such as asthma, depression, hypertension, neurological disorders, etc. [3], [4]. However, the use of certain medications, including antidepressants, ACE inhibitors, and ARBs, is associated with a range of complications during pregnancy, namely congenital anomalies, preterm birth, stillbirth, miscarriage, maternal hemorrhage, fetal growth restriction, and others [5], [6].

Hence, there is a need for predicting the safety profile of drugs based on the ADRs which are induced by drugs in pregnancy conditions. A drug is signaled as safe during pregnancy if it does not lead to any pregnancy-related adverse reaction. Conversely, if a drug induces pregnancy-related reactions, then that is marked as unsafe during pregnancy. Traditionally, healthcare professionals have utilized post-marketing surveillance, case reports, and observational studies to evaluate the safety of drugs during pregnancy [7], [8]. However, conducting clinical trials on pregnant women is considered unethical and impractical due to the risks posed to both the mother and the fetus. Computational techniques offer a promising solution by reducing the need for direct involvement of pregnant women in clinical trials, and different techniques have been applied for ADR prediction [9]. But most of these works have primarily focused on the general population [10], [11], [12], and very few works have been reported on drug-induced adverse reactions in pregnancy condition [13], [14]. In addition, many existing works do not incorporate interpretability mechanisms to explain the prediction of the model. The availability of limited data on pregnancy-related ADRs poses a significant challenge of class imbalance in computational techniques for predicting safe and unsafe drugs under pregnancy conditions. The objectives of the present work are to:

Design a computational framework that can predict the safe and unsafe drugs during pregnancy, considering the adverse reactions induced by them to the mother or fetus.

Address the issue of class imbalance by deploying a Conditional Tabular Generative Adversarial Network (CTGAN) for synthetic data generation.

Integrate the neural network and gradient boosting as a Boosted Neural Ensemble architecture to amplify the prediction accuracy of the proposed framework.

To enhance the interpretability of the proposed BNE model, SHAP (SHapley Additive exPlanations) has been applied as a post-hoc interpretability method that quantifies the contribution of each drug feature to the prediction, providing a transparent explanation of their impact on the final classification.

The contributions of the present work are as follows:

An interpretable computational framework, PregAN-NET (Predicting Drug Safety in Pregnancy using CTGAN and BNE NETwork ), has been proposed to predict safe and unsafe drugs under pregnancy conditions.

The proposed framework has been applied to the chemical and three biological properties of drugs, which are extracted from PubChem [15] and DrugBank [16], respectively. The chemical property SMILE strings, represented by three types of fingerprints, namely SS_Morgan, SS_MACCS, and SS_TT, have been considered. The biological properties that have been used are Target, Transporter, and Enzymes. Additionally, these biological properties have been combined in pairs and all together for analysis purposes.

The performance of the CTGAN in handling the class imbalance has been analyzed and compared with the SMOTE technique. Further, the performance of the BNE architecture has been analyzed in terms of Precision, True Positive Rate (TPR), F1-Score, and ROC-AUC scores and compared with six state-of-the-art techniques viz. XGBoost, Gradient Boosting (GB), KNN, Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP).

The top 20 features corresponding to each drug property and their different representations have been identified using SHAP values, which quantify their contribution to predicting safe and unsafe drugs.

The rest of the paper is organized as follows: Section 2 provides an in-depth literature review, Section 3 presents the problem statement, Section 4 details the proposed framework, Section 5 focuses on the experimental setup and analysis of results, and Section 6 concludes the paper.

Statement of Significance

Comments (0)

No login
gif