TP-GCL: graph contrastive learning from the tensor perspective

Eq. (3) and Eq. (4), βi1,i2,…,iq=cα, (3) α=∑r1,r2,,…,rc≥1,∑i=1cri=m(mr1,r2,,…,rc), (4)

where α represents the number of corresponding permutations, and α is a polynomial coefficient with additional constraints r1,r2,,…,rc≠0 . This tensor construction method maximally preserves the original hyperedge structure, further reflecting the associative patterns between different nodes in the hypergraph. To better understand this process, we provide an illustrative example:

Example 4.1. For a given hyperedge e1′= , to construct a 2-order adjacency tensor, we need to consider all permutations of length 2 for the node v1,v2,v3 . This means that hyperedge e1′ needs to choose 3 nodes from 2 positions for permutation, which will result in one of the 3 nodes being discarded, yielding ,, . Then, calculate the adjacency tensor coefficient as 23 , where the numerator is the cardinality of hyperedge e1′ , and the denominator is the number of permutations in this case.

4.2 Graph contrastive learning

Graph contrastive learning aims to extract effective node features by comparing the feature differences between the original graph and the tensorized hypergraph. To comprehensively understand the data from different perspectives, we employ GCN for encoding learning on the original graph G . This process maps node features from a high-dimensional space to a low-dimensional feature space denoted as RN×k , resulting in the node feature vector Z1 under the original graph, formulated as Eq. (5),

Z1=σD̂−12⋅Â⋅D̂−12⋅X⋅W, (5)

where A^=A+I denotes the adjacency matrix with self-loops, D^ij=∑j=0A^ij represents the diagonal matrix, σ⋅ signifies the non-linear transformation function, and W corresponds to the learnable weight matrix. Additionally, alignment between node features and adjacency tensor is achieved through a learnable weight matrix. The outer product pooling technique is then employed on the adjacency tensor to perform tensor convolution on G′ , facilitating information aggregation, formulated as Eq. (6),

Z2=σT̂⋅X⋅Θ, (6)

where σ⋅ represents the non-linear transformation function, T̂ signifies the insertion of a self-loop matrix into the adjacency tensor, Enhancing the model’s focus on node-specific information helps in learning more comprehensive node representations, T̂=T+∑j,j≠iφiξij , ξ is defined as 1 when i=j , otherwise it is 0, and Z2 denotes the tensor perspective of node representation information. To minimize the similarity between positive samples and maximize the similarity between negative samples, a contrastive loss function is employed to enhance the discriminative power of node embeddings. For embeddings of the same node in two different views, we treat the same node from different views as positive samples and consider all other nodes as negative samples. Furthermore, we optimize the positive sample pairs (z1,i,z2,i) in a pairwise manner, formulated as Eq. (7),

Lz1,i,z2,i=logeθz1,i,z2,i/τeθz1,i,z2,i/τ+∑k=1N1k≠ieθz1,i,z2,i/τ, (7)

where τ is a temperature parameter used to measure and adjust the distribution of similarities between samples in L(z1,i,z2,i) . 1[k≠i]∈ is an indicator function, taking the value 1 only when k=i . Considering the symmetry between views, we employ a symmetry loss function to reflect the symmetric features of node embeddings between the two views. Ultimately, our loss function L(z1,i,z2,i) is formulated as Eq. (8),

L=12N∑i=1N(L(z1,i,z2,i)+L(z2,i,z1,i)), (8) 5 Experiments

In this section, we first introduce the datasets utilized in our experiments. Subsequently, we compare our method with baseline approaches and conduct relevant ablation experiments. Finally, we perform additional experiments to further validate the superiority of the proposed method presented in this paper.

5.1 Datasets

To validate the effectiveness of TP-GCL, we designed two sets of experiments, namely node classification tasks and graph classification tasks. The details of the datasets are provided in Table 2.

Table 2. Statistics of datasets used in experiments.

Our aim is to comprehensively evaluate the performance of the TP-GCL model in node classification. Node classification tasks focus on categorizing nodes with different features and labels. The datasets Cora, Citeseer, and PubMed belong to the academic network domain, where nodes represent papers, and edges represent citation relationships between papers. By utilizing these datasets, we validate the effectiveness and generalization capability of the TP-GCL method on graph data of various sizes and scales.

5.2 Baselines

The baseline models for node classification tasks can be categorized into two groups. The first group includes semi-supervised learning methods such as ChebNet (Tang et al., 2019), GCN (Kipf and Welling, 2016), GAT (Veličković et al., 2017), GraphSAGE (Hamilton et al., 2017), which utilize node labels during the learning process. The second group comprises self-supervised methods, including DGI (Veličković et al., 2018), GMI (Peng et al., 2020), MVGRL (Hassani and Khasahmadi, 2020), GraphCL (You et al., 2020), GraphMAE (Hou et al., 2022), H-GCL (Zhu et al., 2023), United States-GCL (Zhao et al., 2023) which do not rely on node labels. The proposed TP-GCL in this paper also falls into the category of self-supervised graph contrastive learning methods.

5.3 Experiment implementation details

During the experimental process, we utilized the NVIDIA A40 GPU, equipped with 48GB of VRAM and 80GB of CPU memory. The deployment of TP-GCL was supported by PyTorch 1.12.1, PyTorch Geometric, and the PyGCL library. The code for our experiments will be made publicly available in upcoming work. For specific optimal parameter settings, please refer to Table 3. As shown in the table, Training epochs indicates the total number of epochs required for training, Learning rate controls the step size of model parameter updates, Weight decay is a regularization coefficient used to prevent overfitting, τ is the temperature coefficient used to set the focus on hard negative samples during contrastive learning, and Hidden dimension determines the size of the hidden layer, affecting the complexity and expressive power of feature learning.

Table 3. Detailed parameter setting.

5.4 Experimental results

We validated the effectiveness of TP-GCL on node classification tasks, and Table 4 presents the performance comparison on the Cora, Citeseer, and Pubmed datasets.

Table 4. The performance of accuracy on node classification tasks.

The results in Table 4 clearly demonstrate the superior performance of TP-GCL in node classification tasks. TP-GCL exhibits high accuracy on three different datasets, Cora, Citeseer, and PubMed, surpassing other baseline models. This can be attributed to several advantages:

1. TP-GCL comprehensively captures the structural features of graphs in complex spaces using high-order adjacency tensors. Compared to traditional methods, high-order tensor representations provide richer information, facilitating a better understanding of both local and global structures in the graph. This allows TP-GCL to more accurately learn abstract representations of nodes.

2. Through the contrastive learning mechanism of anchor graph-tensorized hypergraphs, TP-GCL sensitively learns subtle differences and similarities between nodes. This learning approach makes TP-GCL more discriminative, enabling accurate differentiation of nodes from different categories.

5.5 Hyperparametric sensitivity

Our research focuses on an in-depth analysis of key hyperparameters such as hidden layer dimension, Tau value, and learning rate. Firstly, the hidden layer dimension plays a crucial role in the performance of TP-GCL. By adjusting the dimension of the hidden layer, we explored the impact of different dimensions on the model’s performance on the Cora and Citeseer datasets. As shown in Figure 2, the results indicate that increasing the dimension of the hidden layer within the range of [32 ~ 512] enhances the fitting capability of TP-GCL, with the optimal performance reached when the dimension equals 512. This is because a higher-dimensional hidden layer helps capture more complex data patterns. However, excessively high dimensions, such as 1,024, can lead to overfitting.

Figure 2. Performance of hidden layer dimension on Cora and Citeseer.

Next, we focused on the hyperparameters Tau and learning rate. Tau is typically used to control the smoothness of the distribution of similarities in contrastive learning, while the learning rate is used to regulate the speed of model parameter updates during training. We plotted the parameter space with the x-axis representing the learning rate in the range [0.001 ~ 0.009], the y-axis representing Tau in the range [0.1 ~ 0.9], and the z-axis representing the accuracy of node classification, as shown in Figure 3.

Figure 3. Performance of different learning rates and tau values on the Cora and Citeseer datasets.

From Figure 3, it can be observed that the variation in accuracy is influenced by changes in the learning rate under different Tau values. When Tau values are low (0.1–0.3), combinations within the learning rate range of 0.001–0.004 generally result in lower accuracy. This might be attributed to the slower parameter update speed caused by the lower learning rates in this range, preventing the model from fully utilizing information in the dataset and thereby hindering accurate node differentiation. Additionally, lower Tau values imply more sensitivity in similarity calculations, potentially causing similarity to concentrate too much between nodes, making effective node discrimination challenging and consequently reducing accuracy. On the other hand, when Tau values are high (0.6–0.9), combinations within the learning rate range of 0.008–0.009 exhibit relatively higher accuracy. This is possibly due to the higher learning rates in this range accelerating the model’s parameter update speed, aiding the model in better learning the dataset’s features. Furthermore, higher Tau values smooth out the similarity distribution, reducing the model’s sensitivity to noise and subtle differences in the data, allowing the model to better discriminate between nodes and thereby improving accuracy.

6 Conclusion

In response to the challenges posed by existing graph neural network methods in capturing global dependencies and diverse representations, as well as the difficulty in fully revealing the inherent complexity of graph data, this paper proposes a novel tensor-perspective graph contrastive learning method, TP-GCL. The aim is to comprehensively and deeply understand the structure of graphs and the relationships between nodes. Firstly, TP-GCL transforms graphs into tensorized hypergraphs, introducing higher-order information representation while preserving the original topological structure of the graph. This addresses the limitations of existing methods in capturing the complex structure of graphs and relationships between nodes. Subsequently, in TP-GCL, we delve into the differences and similarities between anchor graphs and tensorized hypergraphs to enhance the model’s sensitivity to global information in the graph. Experimental results on public datasets demonstrate a comprehensive evaluation of TP-GCL’s performance, validating its outstanding performance in the analysis of complex graph structures.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author contributions

ML: Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing. LM: Formal analysis, Methodology, Writing – original draft, Writing – review & editing. ZY: Conceptualization, Supervision, Validation, Writing – review & editing. YY: Formal analysis, Methodology, Software, Writing – review & editing. SC: Formal analysis, Methodology, Writing – review & editing. YX: Data curation, Validation, Writing – review & editing. HZ: Conceptualization, Funding acquisition, Supervision, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work is partially supported by the Construction of Innovation Platform Program of Qinghai Province of China under Grant no.2022-ZJ-T02. ML and LM have contributed equally to this work and should be regarded as co-first authors.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Cai, L., Li, J., Wang, J., and Ji, S. (2020). Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021, 5103–5113.

Google Scholar

Feng, G., Wang, H., and Wang, C. (2023). Search for deep graph neural networks. Inf. Sci. 649:119617. doi: 10.1016/j.ins.2023.119617

More from this channel

TP-GCL: graph contrastive learning from the tensor perspective

Comments (0)