In order to illustrate the generalizability of the evaluative item-contrastive explanations approach, we have investigated its efficacy across two distinct domains, utilizing open-source datasets to underscore the reproducibility of the results obtained.
The first investigation focuses on the recruitment process of recent MBA graduates from an Indian college, presenting a scenario wherein a recruiter is constrained to select only a predetermined number of candidates. This analysis is conducted on the Campus Recruitment dataset [22]Footnote 1.
The second experiment explores customer churn within the credit card industry, whereby churn denotes the cessation of card usage and subsequent closure of the associated credit card account. Here, we simulate the role of an employee tasked with retention efforts under budget constraints. The study employs the Credit Card Customer dataset [23]Footnote 2.
In each experiment, we introduce the input dataset, the designated target variable and a brief description of the data preparation phase.Footnote 3 As elucidated in “General Approach forImplementation and Application with Linear Model” section, in both scenarios, we exploited an LR model to forecast the placement of an item on the basis of the values of several features. More precisely, to attain the dual goals of dropping non-significant features and reaching satisfactory performances, a pre-processing phase of backward step-wise feature selection has been used. In particular, we employed the p-value as a metric to choose the candidate feature to be removed at each step and the significance level (p-value less than 5%) as a criterion whether to retain or drop the selected feature. Furthermore, to enhance the reliability and generalizability of our experimental findings, we employed a 5-fold stratified Cross Validation. This approach ensures that the findings are not dependent on a particular partition of the dataset.
The LR model is then applied out-of-sample on the remaining set of candidates to extract the corresponding ranking scores. As a concrete illustration of the item-contrastive approach, we conduct a comparative analysis of two elements extracted from this sample, elucidating the guiding rationale behind their respective positioning through both graphical and textual means. This textual description is generated through an automated function.
Experiment 1: RecruitmentThe Campus Recruitment dataset on academic and employability factors influencing placement consists of records on job placement of 215 students from an Indian University campus. In particular, it contains information about students’ education, from secondary school to post-graduate specialization. Other information about the education system and the working experience is also present. The schema of the dataset is presented in Table 2. We refer to [22] for additional info on the data.
Table 2 Schema of the Campus Recruitment dataset. For each variable, we report the name along with the description, the data type, and the domainFig. 2Coefficients of the LR model for the recruitment case. The analysis underscores the importance of participation in commercial/scientific programs during high school, as well as the related grades achieved. Students with work experience appear to have an advantage
In our experiment, we use STATUS as binary target variable (1 placed, 0 not placed). The dataset counts 148 hired and 67 unemployed students.
During the data preparation phase, categorical features have been one-hot encoded, while numeric features were pre-processed via standard scaling in order to make their coefficients comparable for the evaluative phase. The LR learned coefficients for the selected features are shown in Fig. 2. Notably, the analysis reveals the significance of attending commercial or scientific programs during the high secondary school, along with the grades attained in such studies. Moreover, students with working experience seem to be strongly advantaged with the perspective of job placement. Table 3 displays the rank and the significant features of the 10 candidates with the highest score. In the following, we shall employ this set of ten candidates as working examples to showcase our approach.
Table 3 Top 10 candidates sorted by model’s output scores. Each entity is provided with the identification code, the ranking position, and the LR model’s forecast (SCORE). Additionally, the most influential features as determined by the pre-processing algorithm have been included. Candidates with shaded background are those chosen to elucidate the functionality of the proposed solution as discussed in “Constructing Evaluative Item-Contrastive Explanations for Recruitment” sectionConstructing Evaluative Item-Contrastive Explanations for RecruitmentWe assume that the resulting rank obtained from the LR model represents the ordered list of candidates presented to hiring managers for selecting a limited number (e.g., \(k=5\)) of candidates for interviews. Following our methodology, we consider an exemplar scenario in which managers start by engaging in pairwise comparisons. For instance, a recruiter may be interested into evaluate the rationale behind the positioning of candidates ranked at positions 5 (candidate 00079) and 6 (candidate 00188). This particular choice addresses a larger gap among both the numeric features and the final score of the items. However, it is paramount to emphasize that our approach is not reliant on these individual cases.
Fig. 3Feature contributions to support the disparity in ranking among candidates 00079 and 00188. While candidate 00079 is favored by having higher marks during the secondary school, candidate 00188 benefits from having previous work experience and higher marks during the bachelor degree. Since they both attended the same high-secondary studies, no contribution is provided by this feature
Aligning to what delineated in “General Setting” section, in this showcase, the explanation returned by the system comprises both graphical comparisons and textual support. As mentioned in “General Approach for Implementation and Application with Linear Model” section, the displayed amount of information depends on the context. In our example, given that the number of features considered by the final model has already been filtered by a feature-selection procedure, a scenario in which all available information is provided to the user is outlined.
Figure 3 provides a comprehensive explanation by incorporating both computed model weights and feature importance. The graphical depiction showcases feature contributions, with the length of each bar indicating the magnitude of the contribution and direction indicating the respective item to which it is provided. Null bars represent no relevant contribution for any candidate. In particular, the contribution of each feature towards the final score is computed as percentage on the overall discrepancy. Candidate 00079 predominantly benefits from having recorded higher marks during secondary education, with high-secondary education (HSC_P) contributing the most and (low-)secondary grades (SSC_P) approximately half of it. Conversely, candidate 00188 derives primary support from prior work experience, with a smaller contribution from higher marks in the bachelor’s degree. Finally, since they both attended the same high-secondary studies (namely, scientific studies), this feature is not a discriminator among the two of them.
Table 4 Schema of the Credit Card Churn dataset. For each variable, we report the name along with the description and the data type. A sample of the possible values is also reported to provide an intuitive comprehension of the domainFig. 4Coefficients of the LR model for the churn scenario. The variables driving customer churn encompass the number of interactions with the institution, the period of inactivity, and the count of dependents. Conversely, features indicative of credit card retention include the activity in Q4 compared to Q1, marital status, and the number of overall relationships with the institution
Alongside the visual representation comes the textual explanations that, in our approach, is structured as the following example:
The available information regarding Candidate 00079 and Candidate 00188 suggests that both individuals are qualified for the job. Candidate 00079 is ranked higher than Candidate 00188 according to the current algorithm reasoning. However, the ultimate decision remains within your control, offering the option to alter this ranking if desired. Characteristics in favor of Candidate 00079 include a higher score in HSC_P and a higher score in SSC_P. Characteristics in favor of Candidate 00188 include a higher score in DEGREE_P and having previous working experience.
This comparative analysis serves the dual purpose of either confirming the validity of the existing rank or potentially prompting adjustments to the final candidate selection for interviews.
Experiment 2: ChurnThe Credit Card Customer dataset serves as a comprehensive repository of churn activity data pertaining to credit card holders within a specific financial institution. Comprising approximately 10,000 records and 21 columns, the dataset encompasses a wide array of demographic and customer relationship information pertinent to the institution’s clientele. The schema and the domain of the source are showcased in Table 4. We refer to [23] for additional info on the data.
ATTRITION_FLAG has been selected as target variable for this scenario. In particular, since only 16% of the customer considered ceased to use a credit card, oversampling has been exploited to re-balance the target variable to around 30%. During the data preparation phase, ordinal categorical variables such as INCOME_CATEGORY have been cast to numeric, while non ordinal ones have been one-hot encoded. During this phase, we also took care of removing variables with high correlation level that could undermine the performances of the LR model. Moreover, numeric features were pre-processed via standard scaling in order to make their coefficients comparable for the evaluative phase.
The LR model is utilized as outlined in “General Setting” section. The acquired coefficients of the model pertaining to the features selected through the aforementioned procedure are depicted in Fig. 4. The propensity to churn is more evident for customers who have been inactive for several months in the last year, for those who have had a high number of contacts, and for those who have dependents. Customers who have increased the number of transactions in the fourth quarter compared to the first, who are married/single, or who have a high number of other relationships with the bank, on the other hand, show a lower propensity to churn. Table 5 collects the customers most likely to start churn actions along with the features found significant by the algorithm. This sample will be utilized in the prosecution to offer a tangible illustration of item-contrastive explainability.
Constructing Evaluative Item-Contrastive Explanations for ChurnThe outlined scenario pertains to an employee positioned within a financial institution ascribed with the task of executing customer retention strategies. However, constrained by budgetary and time limitations inherent to their role, the employee endeavors to exclusively engage with clients deemed susceptible to churn. The proposed ranking system constitutes the initial phase in the employee’s decision-making process, subsequently enabling an evaluation of client positioning to discern the factors underlying their ranking relative to others.
Table 5 Top 25 customers for the credit card churn use case. For each customer, the features that the LR model found significant are represented. It is also reported the probability of churn and the ranking assigned by the algorithm. Customers with shaded background are those chosen to elucidate the functionality of the proposed solution as discussed in “Constructing Evaluative Item-Contrastive Explanations for Recruitment” sectionFor instance, the employee may wish to compare clients ranked 6 (identifier 794560833) and 19 (identifier 719808558). This selection is entirely non-binding within the approach proposed in this study and merely serves narrative purposes. Hence, Fig. 5 is provided to the user to aid in understanding the relative positioning of the two clients, indicating for each the features that highlight a greater inclination for churn activity with respect to the other. In light of the coefficients’ signs illustrated in Fig. 4, as expounded upon in “General Approach for Implementation and Application with Linear Model” section, it is important to remark that a NC implies that the likelihood of churn is heightened for the client exhibiting a lower value for the associated feature. Conversely, a positive coefficient indicates a greater predisposition towards churn for the client demonstrating a higher value. The approach suggests that client 794560833 is primarily favored by having a greater number of contacts in the last 12 months, a higher number of dependents, and a greater decrease in activity in the last quarter (feature with NC). On the other hand, client 719808558 receives a greater contribution from having a lower number of relationship with the institution (feature with NC), a lower total number of transactions (feature with NC), and from being inactive for a greater number of months in the last year.
Fig. 5Feature contributions to support the disparity in ranking among customers 794560833 and 719808558. Client 794560833 is favored by a greater number of contacts in the last year, a higher number of dependents, and a greater decrease in activity in the last quarter. Client 719808558 receives a greater contribution from having a lower number of relationship with the institution, a lower total number of transactions, and from being inactive for more time
The structure of the textual explanation associated to this use case should comply with the guidelines depicted in the following example:
The available information regarding Customer 794560833 and Customer 719808558 suggests that both clients may engage in churn activities. Customer 794560833 is ranked higher than Customer 719808558 according to the current algorithm reasoning. However, the ultimate decision remains within your control, offering the option to alter this ranking if desired. Characteristics in favor of Customer 794560833 include a higher level of CONTACTS_COUNT_12_MON and DEPENDENT_COUNT, along with a smaller value for TOTAL_CT_CHNG_Q4_Q1. Characteristics in favor of Customer 719808558 include a lower number of TOTAL_RELATIONSHIP_COUNT and TOTAL_TRANS_CT, along with a higher value of MONTHS_INACTIVE_12_MON.
Due to this explanatory capability, the employee gains enhanced insight into the algorithm’s preference for one client over another, while also obtaining valuable contrasting information regarding the client with a lower final rank.
Comments (0)