As seen from Table 4, the performance of BERTGCN and PEGCN is much
better than that of the single TextGCN or BERT models, with an
improvement of 2 to 4% in each of the five datasets. Compared with
BERTGCN, PEGCN is 0.36% better than BERTGCN on the 20NG dataset. The R8
and R52 datasets are improved by 0.12% and 0.05%; Ohsumed and MR
datasets are improved by 0.96% and 3.59%, respectively. Both models,
PEGCN and BERTGCN, combine GCN and pre-trained models, but PEGCN
outperforms BERTGCN in terms of classification performance. This study
analyzes that BERTGCN also uses BERT input representation, so this is
not the main reason for the difference between the two. BERTGCN still
uses the adjacency matrix for edge feature processing, while this study
uses the processed edge matrix, which can extract richer edge features
and improve the representation ability of node representation. At the
same time, it also reduces the appearance of discrete points and
improves storage efficiency. Therefore, PEGCN’s final classification
results are superior to BERTGCN. The improvements made in TextGCN
presented here are conducive to the improvement of classification
accuracy and the competitiveness of the proposed model among similar
models. As seen from Table 4, the classification accuracy of a series of
combined models of TextGCN and BERT is generally higher than that of a
single model of BERT or TextGCN. This indicates that the combination of
TextGCN and large-scale pre-training model can significantly improve the
classification accuracy, and the combination of TextGCN and pre-training
model has significant advantages.
The further to prove the reliability of PEGCN, the proposed method is
also compared with other classical methods (Table 5). Among them, the
CNN proposed by Kim et al11 . in 2014, LSTM is
the long and short-term memory network10, and Bi-LSTM
is the two-way long and short-term memory network12.
PTE is a network model based on word embeddings proposed by Tanget al .28 that learns word embeddings based on
heterogeneous text networks with words, documents and labels as nodes,
and then averages the word embeddings into document embeddings for text
classification. FastText is a simple and efficient classification method
proposed by Joulin29, which imparts the mean value of
words or N-grams as documents and passes them to a linear classifier for
classification. LEAM is an attention model based on tag embedding
proposed by Wang et al .30, which imparts words
and tags into the same space for text classification. SGC is a
simplified graph network proposed by Wu et
al .31; SSGC is a spectrogram network proposed by Zhuet al .32, which uses Markov diffusion nuclei to
derive GCN, combining the advantages of spatial and spectral methods.
Table 5. Comparison of classification accuracy of different models.
metric: accuracy (%)