Graph neural networks (GNNs) have revolutionized the encoding of user intents hidden within user-item interactions into the learning process of networks. This approach generates embedding vectors for users and items that encapsulate critical information. Furthermore, GNN algorithms, particularly those leveraging knowledge graphs, deepen the understanding of higher-order relationships between items, thereby enhancing the accuracy of recommendation systems. In this paper, we propose an item category information recommendation algorithm based on the knowledge graph neural network. Inspired by causal graphs, the issue of popularity bias can be explored from a causal perspective to address it in recommendation scenarios. To do this, it is necessary to identify several key factors that impact the probability of user-item interactions, including the interaction between the user, item, and knowledge graph, the distribution of item popularity, the consistency of users, and the density of the knowledge graph. Based on cognitive analysis, a causal effect graph of the recommendation process can be constructed, as shown in Fig. 1(c). Here, U represents user consistency, E represents knowledge graph density, I denotes item popularity in the recommendation scenario, C represents the interaction between the user, item, and knowledge item, and Y represents the final recommendation score.
Knowledge-Aware Graph User Intent NetworkThe knowledge-aware graph user intent network (KGUIN) is divided into three modules, namely the user cognitive intent network module, score prediction module, and learning module, as shown in Fig. 2.
Fig. 2The knowledge-aware graph user cognitive intent network (KGUIN) framework shows how relations 2 and 3 in the knowledge graph affect user 1’s intent to purchase item 3
User Cognitive Intent ModuleIn the user’s cognitive intent module, it is assumed that each user selects a certain item for a specific intent, which is referred to as the user’s intent. The user intent set, denoted as P, can be used to divide (u, i) into \(\), which reorganizes the original heterogeneous graph. Each user intention is matched with the relation in the knowledge graph, and the attention mechanism is used to construct a vector of user intents.
$$\begin e_ = \alpha }\left( \right) e_, \end$$
(5)
Where \(e_r\) denotes the ID embedding vector of relation r, which is assigned an \(\alpha (r, p)\) importance score. The calculation formula for this score is:
$$\begin \alpha \left( \right) = \frac \right) } \in R}p} \right) }}, \end$$
(6)
Where \(w_ \) denotes a trainable weight matrix, which corresponds to a specific relationship and a specific user’s cognitive intent. Then this attention score does not belong to only one user. As long as a user has a combination of these relations and intents, the score will be assigned to this user.
Different user intents should be independent of each other to provide distinct information that describes a user’s behavior, and the embedding of the user intents should be as independent as possible. The distance correlation coefficient is used as a regular term to measure the independence between two intentions. The smaller the distance correlation coefficient between two intents, the more independent they are. The calculation formula for the distance correlation coefficient is as follows:
$$\begin L_ = \in P,p \ne p^}\left( ,e_}} \right) }}, \end$$
(7)
$$\begin d_\left( ,e_}} \right) = \frac\left( ,e_}} \right) }\left( e_ \right) *d_\left( e_} \right) }}, \end$$
(8)
Where \(d_\) denotes the distance correlation coefficient between user intent p. \(d_\) and \(d_\) denote the distance covariance and distance variance of each expression vector, respectively.
In the aggregation of relational paths, the idea of collaborative filtering is used. Collaborative filtering assumes that users with similar behaviors will have similar intents for items. Therefore, it is assumed that users with similar intents have similar preferences for items. Let \(N_u = (p, i)|(u, p, i)\in C\) denote the user intention history and the first-order correlation of user u. The user intention information is then integrated with all historical interaction items, enabling a representation of the user intention module.
$$\begin ^}^ = \frac \right| })} \in N_} \right) e_}} \odot ^}^, \end$$
(9)
The ID embedding vector of user intention module item i is denoted as \(^\). The Hadamard product, denoted as \(\odot \), is used since each user intention should have a different driving force. Therefore, an attention score \(\beta \) is designed to distinguish the importance of each potential factor p for user u. \(\beta (u, p)\) is calculated as follows:
$$\begin \beta \left( \right) = \frac^^}^} \right) } \in P}}^^}^} \right) }}, \end$$
(10)
The ID embedding vector of user u is denoted as \(^\). In the knowledge graph aggregation layer, \(N_i=(r, v)|(i, r, v)\in G\) is used to represent the attributes of item i and its first-order linked entities, taking into account the relational context of the aggregation function. Each entity has different semantics in different relations, so the representation of item i is produced as follows:
$$\begin ^}^ = \frac \right| })} \in N_} \odot ^}^}}, \end$$
(11)
where \(^\) denotes the item embedding vector after the aggregation of the first layer of adjacency information, and \(^\) denotes the ID embedding vector of the entity v. This expression allows for the consideration of different relationship entities \(e_r\) and their corresponding semantics, even if the final attribute entity v is the same. Personalized weighting is an operational function that follows a specific process. To elaborate, consider a scenario where there are four items, \(i_1\), \(i_2\), \(i_3\), and \(i_4\), in the item library and a user \(u_1\) buys items \(i_1\), \(i_2\), and \(i_3\). In the adjacency matrix, the corresponding vector is (1, 1, 1, 0). Clustering the items based on K-means or spectral methods yields \(i_1\) and \(i_2\) belonging to the first category and \(i_3\) belonging to the second category, with \(i_4\) belonging to the third category. Consequently, for user \(u_1\), there are two categories of items to browse, with two items in the first category and one item in the second category. Therefore, the weight of the first category of items is 2/3, and the weight of the second category of items is 1/3, resulting in a vector in the adjacency matrix represented as (2/3, 2/3, 1/3, 0). After normalization, the final vector is (2/5, 2/5, 1/5, 0) in the adjacency matrix.
Score Prediction ModuleUpon completing the aforementioned modules, we obtain the user vector \(e_u^\) of the user’s cognitive intention module, the item vector \(e_i^\) of the user intention module, the category-weighted user vector \(e_u^}\), and the category-weighted item vector \(e_i^}\). By performing addition operations on these vectors, we derive the user vector \(e_u\) and item vector \(e_i\) of the layer.
$$\begin }^ = ^}^ + ^}^. \end$$
(12)
$$\begin }^ = ^}^ + ^}^. \end$$
(13)
As graph neural networks can propagate across multiple layers during training, we obtain multiple layers of user vector \(e_u\) and item vector \(e_i\). To ensure that the embedded vectors contain rich information, we perform addition operations on the user vector and item vector generated by each layer.
$$\begin e_ = }^ + \ldots + }^. \end$$
(14)
$$\begin e_ = }^ + \ldots + }^. \end$$
(15)
The final user vector is denoted by \(e_u\), and the final item vector is denoted by \(e_i\). By multiplying these two vectors, we obtain the final score.
$$\begin y_ = }^e_. \end$$
(16)
LearningFinally, we optimize the model using the Bayesian Personalized Ranking (BPR) loss function.
$$\begin L_ = )} \in O} - y_} \right) }}, \end$$
(17)
$$\begin L_ = L_ + \lambda _L_ + \lambda _\left| |\Theta | \right| _^, \end$$
(18)
The model’s parameters are represented by \(\mathrm \Theta \), where \(e_u^,\ e_v^,\ e_r,\ e_p,\) and w correspond to entities \(u\in U\), items \(i\in I\), relations r, intents \(p\in P\), and their associated weights, respectively. The hyperparameter \(\lambda _1\) controls the independence loss, while \(\lambda _2\) is a regularization.
Technically, the intent embedding is created through an attentive combination of relation embeddings, where more significant relations are assigned larger attribution scores. This leads to what we call Relational Path-aware Aggregation. Unlike node-based aggregation mechanisms, we treat a relational path as an information channel and embed each channel into a representative vector. Given that user-intent-item triplets and KG (Knowledge Graph) triplets are heterogeneous, we employ distinct aggregation strategies for these two parts. This approach allows us to effectively distill the behavioral patterns of users and the relatedness of items separately.
Cognitive Recommendations Based on Causal ReasoningThis subsection focuses on the model-agnostic causal reasoning method (KGUIN-Causal), which serves as a multi-task learning framework to tackle the problem of item popularity bias through counterfactual reasoning. Based on the causal graph and causal effect, the KeCAIN model can be broken down into four modules: the KGUIN recommendation module, the user module, the item module, and the knowledge graph module. Recommender systems are influenced by various perceptual factors such as the word-of-mouth effect, promotional activities, and item quality, resulting in a long-tail distribution of user interactions with items. Models that fit this distribution inherit its biases during training, tending to recommend popular products rather than capturing user preferences and cognitive alignment with items from the perspective of user-item matches. Excessive bias may lead to the loss of personalization in the recommender system. The proposed model utilizes counterfactual theory from causal inference, incorporating factors related to users, items, and entities in the knowledge graph, thus effectively mitigating these biases.
The recommendation module, which is constructed as a multi-layer perceptron, receives representations from the user module, item module, and knowledge graph module, respectively. The user and item modules can be implemented as multi-layer perceptrons, and the knowledge graph module can be implemented as a graph neural network.
Fig. 3The model-agnostic causal reasoning method (KUIN-Causal)
The framework of KeCAIN is illustrated in Fig. 3. In the recommendation module, the ranking score between the user and item is denoted by \(\widehat = Y_c(C(U=u, I=i, E=e))\), reflecting the extent to which item i matches the preference of the user u.
In the item module, the effect of item popularity is captured by \(\widehat = Y_i(I=i)\), where the score increases with the popularity of items. In the user module, \(\widehat = Y_u(U=u)\) demonstrates the degree to which a person interacts with an item, irrespective of their preferences.
When two users are randomly recommended the same number of items, a user may click on items with higher exposure due to broader interests or herd mentality. Users with this tendency are expected to receive higher \(\widehat\) values due to the influence of popularity.
In the knowledge graph module, \(\widehat = Y_e(E=e)\) depicts how the entities and relationships in the knowledge graph impact recommendations. Popular items are likely to have denser relationships in the graph, resulting in higher scores for \(\widehat\).
Since the goal of training is to obtain the user-item score \(y_\) after removing the popularity bias, the four branches are combined into a final score as follows:
$$\begin \hat} = \hat}*\sigma \left( \hat} \right) *\sigma \left( \hat} \right) *\sigma \left( \hat} \right) , \end$$
(19)
Here, the sigmoid function \(\sigma (\cdot )\) is used to transform the values of \(\hat\), \(\hat\), and \(\hat\) to the range [0, 1] and adjust the final score of the original recommendation model. To construct the loss function of the KeCAIN model, we follow a multi-task training pattern:
$$\begin L = L_ + \alpha *L_ + \beta *L_ + \gamma *L_, \end$$
(20)
$$\begin \begin L_ =&\; } \right) } \right) }}} \\&- \left( } \right) } \right) } \right) }, \end \end$$
(21)
$$\begin \begin L_ =&\; } \right) } \right) }}} \\&- \left( } \right) } \right) } \right) }, \end \end$$
(22)
$$\begin \begin L_ =&\; } \right) } \right) }}} \\&- \left( } \right) } \right) } \right) } , \end \end$$
(23)
The loss of the KGUIN module is represented by \(L_R\), while \(\alpha \), \(\beta \), and \(\gamma \) are the three adjustable hyperparameters.
In the testing phase, the key to eliminating the popularity bias through counterfactual reasoning of the popularity analysis is to remove the influence of the item factor on the final score \(\hat}\) through the path \(I\rightarrow Y\). Taking NDE as an example, this direct effect signifies the immediate impact on outcome Y from changes in the UI (User Interface) state, provided that the mediating variable K remains unchanged and only direct pathways affect Y. Given that K actually depends on UI, the assumption that “K remains constant when UI changes” constructs a hypothetical scenario where “user behavior is influenced only by the items and the user themselves, not by the match between the items and the user.” The indirect effect, TIE, exclusively retains the influence of the match between users and items on user behavior, effectively removing bias. This can be achieved by subtracting \(c\times \sigma (\hat)\times \sigma (\hat)\times \sigma (\hat)\) from the original formula:
$$\begin \hat}*\sigma \left( \hat} \right) *\sigma \left( \hat} \right) *\sigma \left( \hat} \right) - c*\sigma \left( \hat} \right) *\sigma \left( \hat} \right) *\sigma \left( \hat} \right) . \end$$
(24)
KeCAIN, the model introduced in this paper, stands for Knowledge Graph-guided Recommendation System, which integrates causal analysis to mitigate bias. KGUIN, short for Knowledge Graph User Intent Network, models the relationships between user intents and items using information propagation based on the knowledge graph. KGUIN-Causal applies a causal analysis module to the KGUIN framework.
Comments (0)