Evaluative Item-Contrastive Explanations in Rankings

General Setting

In order to illustrate the generalizability of the evaluative item-contrastive explanations approach, we have investigated its efficacy across two distinct domains, utilizing open-source datasets to underscore the reproducibility of the results obtained.

The first investigation focuses on the recruitment process of recent MBA graduates from an Indian college, presenting a scenario wherein a recruiter is constrained to select only a predetermined number of candidates. This analysis is conducted on the Campus Recruitment dataset [22]Footnote 1.

The second experiment explores customer churn within the credit card industry, whereby churn denotes the cessation of card usage and subsequent closure of the associated credit card account. Here, we simulate the role of an employee tasked with retention efforts under budget constraints. The study employs the Credit Card Customer dataset [23]Footnote 2.

In each experiment, we introduce the input dataset, the designated target variable and a brief description of the data preparation phase.Footnote 3 As elucidated in “General Approach forImplementation and Application with Linear Model” section, in both scenarios, we exploited an LR model to forecast the placement of an item on the basis of the values of several features. More precisely, to attain the dual goals of dropping non-significant features and reaching satisfactory performances, a pre-processing phase of backward step-wise feature selection has been used. In particular, we employed the p-value as a metric to choose the candidate feature to be removed at each step and the significance level (p-value less than 5%) as a criterion whether to retain or drop the selected feature. Furthermore, to enhance the reliability and generalizability of our experimental findings, we employed a 5-fold stratified Cross Validation. This approach ensures that the findings are not dependent on a particular partition of the dataset.

The LR model is then applied out-of-sample on the remaining set of candidates to extract the corresponding ranking scores. As a concrete illustration of the item-contrastive approach, we conduct a comparative analysis of two elements extracted from this sample, elucidating the guiding rationale behind their respective positioning through both graphical and textual means. This textual description is generated through an automated function.

Experiment 1: Recruitment

The Campus Recruitment dataset on academic and employability factors influencing placement consists of records on job placement of 215 students from an Indian University campus. In particular, it contains information about students’ education, from secondary school to post-graduate specialization. Other information about the education system and the working experience is also present. The schema of the dataset is presented in Table 2. We refer to [22] for additional info on the data.

Table 2 Schema of the Campus Recruitment dataset. For each variable, we report the name along with the description, the data type, and the domainFig. 2figure 2

Coefficients of the LR model for the recruitment case. The analysis underscores the importance of participation in commercial/scientific programs during high school, as well as the related grades achieved. Students with work experience appear to have an advantage

In our experiment, we use STATUS as binary target variable (1 placed, 0 not placed). The dataset counts 148 hired and 67 unemployed students.

During the data preparation phase, categorical features have been one-hot encoded, while numeric features were pre-processed via standard scaling in order to make their coefficients comparable for the evaluative phase. The LR learned coefficients for the selected features are shown in Fig. 2. Notably, the analysis reveals the significance of attending commercial or scientific programs during the high secondary school, along with the grades attained in such studies. Moreover, students with working experience seem to be strongly advantaged with the perspective of job placement. Table 3 displays the rank and the significant features of the 10 candidates with the highest score. In the following, we shall employ this set of ten candidates as working examples to showcase our approach.

Table 3 Top 10 candidates sorted by model’s output scores. Each entity is provided with the identification code, the ranking position, and the LR model’s forecast (SCORE). Additionally, the most influential features as determined by the pre-processing algorithm have been included. Candidates with shaded background are those chosen to elucidate the functionality of the proposed solution as discussed in “Constructing Evaluative Item-Contrastive Explanations for Recruitment” sectionConstructing Evaluative Item-Contrastive Explanations for Recruitment

We assume that the resulting rank obtained from the LR model represents the ordered list of candidates presented to hiring managers for selecting a limited number (e.g., \(k=5\)) of candidates for interviews. Following our methodology, we consider an exemplar scenario in which managers start by engaging in pairwise comparisons. For instance, a recruiter may be interested into evaluate the rationale behind the positioning of candidates ranked at positions 5 (candidate 00079) and 6 (candidate 00188). This particular choice addresses a larger gap among both the numeric features and the final score of the items. However, it is paramount to emphasize that our approach is not reliant on these individual cases.

Fig. 3figure 3

Feature contributions to support the disparity in ranking among candidates 00079 and 00188. While candidate 00079 is favored by having higher marks during the secondary school, candidate 00188 benefits from having previous work experience and higher marks during the bachelor degree. Since they both attended the same high-secondary studies, no contribution is provided by this feature

Aligning to what delineated in “General Setting” section, in this showcase, the explanation returned by the system comprises both graphical comparisons and textual support. As mentioned in “General Approach for Implementation and Application with Linear Model” section, the displayed amount of information depends on the context. In our example, given that the number of features considered by the final model has already been filtered by a feature-selection procedure, a scenario in which all available information is provided to the user is outlined.

Figure 3 provides a comprehensive explanation by incorporating both computed model weights and feature importance. The graphical depiction showcases feature contributions, with the length of each bar indicating the magnitude of the contribution and direction indicating the respective item to which it is provided. Null bars represent no relevant contribution for any candidate. In particular, the contribution of each feature towards the final score is computed as percentage on the overall discrepancy. Candidate 00079 predominantly benefits from having recorded higher marks during secondary education, with high-secondary education (HSC_P) contributing the most and (low-)secondary grades (SSC_P) approximately half of it. Conversely, candidate 00188 derives primary support from prior work experience, with a smaller contribution from higher marks in the bachelor’s degree. Finally, since they both attended the same high-secondary studies (namely, scientific studies), this feature is not a discriminator among the two of them.

Table 4 Schema of the Credit Card Churn dataset. For each variable, we report the name along with the description and the data type. A sample of the possible values is also reported to provide an intuitive comprehension of the domainFig. 4figure 4

Coefficients of the LR model for the churn scenario. The variables driving customer churn encompass the number of interactions with the institution, the period of inactivity, and the count of dependents. Conversely, features indicative of credit card retention include the activity in Q4 compared to Q1, marital status, and the number of overall relationships with the institution

Alongside the visual representation comes the textual explanations that, in our approach, is structured as the following example:

The  available  information  regarding  Candidate  00079  and  Candidate  00188 suggests  that  both  individuals  are  qualified  for  the  job.  Candidate  00079  is  ranked  higher  than  Candidate  00188  according  to  the  current  algorithm  reasoning.  However,  the  ultimate  decision  remains  within  your  control,  offering  the  option  to  alter  this  ranking  if  desired.  Characteristics  in  favor  of  Candidate  00079  include  a  higher  score  in  HSC_P  and  a  higher  score  in  SSC_P.  Characteristics  in  favor  of  Candidate  00188  include  a  higher  score  in  DEGREE_P  and  having  previous  working  experience.

This comparative analysis serves the dual purpose of either confirming the validity of the existing rank or potentially prompting adjustments to the final candidate selection for interviews.

Experiment 2: Churn

The Credit Card Customer dataset serves as a comprehensive repository of churn activity data pertaining to credit card holders within a specific financial institution. Comprising approximately 10,000 records and 21 columns, the dataset encompasses a wide array of demographic and customer relationship information pertinent to the institution’s clientele. The schema and the domain of the source are showcased in Table 4. We refer to [23] for additional info on the data.

ATTRITION_FLAG has been selected as target variable for this scenario. In particular, since only 16% of the customer considered ceased to use a credit card, oversampling has been exploited to re-balance the target variable to around 30%. During the data preparation phase, ordinal categorical variables such as INCOME_CATEGORY have been cast to numeric, while non ordinal ones have been one-hot encoded. During this phase, we also took care of removing variables with high correlation level that could undermine the performances of the LR model. Moreover, numeric features were pre-processed via standard scaling in order to make their coefficients comparable for the evaluative phase.

The LR model is utilized as outlined in “General Setting” section. The acquired coefficients of the model pertaining to the features selected through the aforementioned procedure are depicted in Fig. 4. The propensity to churn is more evident for customers who have been inactive for several months in the last year, for those who have had a high number of contacts, and for those who have dependents. Customers who have increased the number of transactions in the fourth quarter compared to the first, who are married/single, or who have a high number of other relationships with the bank, on the other hand, show a lower propensity to churn. Table 5 collects the customers most likely to start churn actions along with the features found significant by the algorithm. This sample will be utilized in the prosecution to offer a tangible illustration of item-contrastive explainability.

Constructing Evaluative Item-Contrastive Explanations for Churn

The outlined scenario pertains to an employee positioned within a financial institution ascribed with the task of executing customer retention strategies. However, constrained by budgetary and time limitations inherent to their role, the employee endeavors to exclusively engage with clients deemed susceptible to churn. The proposed ranking system constitutes the initial phase in the employee’s decision-making process, subsequently enabling an evaluation of client positioning to discern the factors underlying their ranking relative to others.

Table 5 Top 25 customers for the credit card churn use case. For each customer, the features that the LR model found significant are represented. It is also reported the probability of churn and the ranking assigned by the algorithm. Customers with shaded background are those chosen to elucidate the functionality of the proposed solution as discussed in “Constructing Evaluative Item-Contrastive Explanations for Recruitment” section

For instance, the employee may wish to compare clients ranked 6 (identifier 794560833) and 19 (identifier 719808558). This selection is entirely non-binding within the approach proposed in this study and merely serves narrative purposes. Hence, Fig. 5 is provided to the user to aid in understanding the relative positioning of the two clients, indicating for each the features that highlight a greater inclination for churn activity with respect to the other. In light of the coefficients’ signs illustrated in Fig. 4, as expounded upon in “General Approach for Implementation and Application with Linear Model” section, it is important to remark that a NC implies that the likelihood of churn is heightened for the client exhibiting a lower value for the associated feature. Conversely, a positive coefficient indicates a greater predisposition towards churn for the client demonstrating a higher value. The approach suggests that client 794560833 is primarily favored by having a greater number of contacts in the last 12 months, a higher number of dependents, and a greater decrease in activity in the last quarter (feature with NC). On the other hand, client 719808558 receives a greater contribution from having a lower number of relationship with the institution (feature with NC), a lower total number of transactions (feature with NC), and from being inactive for a greater number of months in the last year.

Fig. 5figure 5

Feature contributions to support the disparity in ranking among customers 794560833 and 719808558. Client 794560833 is favored by a greater number of contacts in the last year, a higher number of dependents, and a greater decrease in activity in the last quarter. Client 719808558 receives a greater contribution from having a lower number of relationship with the institution, a lower total number of transactions, and from being inactive for more time

The structure of the textual explanation associated to this use case should comply with the guidelines depicted in the following example:

The  available  information  regarding  Customer  794560833  and  Customer  719808558  suggests  that  both  clients  may  engage  in  churn  activities.  Customer  794560833  is  ranked  higher  than  Customer  719808558  according  to  the  current  algorithm  reasoning.  However,  the  ultimate  decision  remains  within  your  control,  offering  the  option  to  alter  this  ranking  if  desired.  Characteristics  in  favor  of  Customer  794560833  include  a  higher  level  of  CONTACTS_COUNT_12_MON  and  DEPENDENT_COUNT,  along  with  a  smaller  value  for  TOTAL_CT_CHNG_Q4_Q1.  Characteristics  in  favor  of  Customer  719808558  include  a  lower  number  of  TOTAL_RELATIONSHIP_COUNT  and  TOTAL_TRANS_CT,  along  with  a  higher  value  of  MONTHS_INACTIVE_12_MON.

Due to this explanatory capability, the employee gains enhanced insight into the algorithm’s preference for one client over another, while also obtaining valuable contrasting information regarding the client with a lower final rank.

Comments (0)

No login
gif