Position coding
Previous network models that utilized One-Hot vectors as GCN inputs were unable to consider the relative positional information between words. In contrast to such inputs, the PEGCN model uses the sum of Token Embedding and Position Embedding as input word embeddings, which are sourced from BERT distributed representations6. The input representation of BERT is the sum of Token Embedding, Segment Embedding, and Position Embedding. The introduction of Segment Embedding in BERT is primarily for the next sentence prediction task. As the specific classification tasks in this study all involve single sentences, the Segment Embedding used to distinguish between the preceding and following sentences is considered redundant for this task. Therefore, only the sum of Token Embedding and Position Embedding is used to represent the network input in this study. Specific details are illustrated in Figure 2. The Token Embedding layer converts each word into a fixed-size vector that contains the semantic meaning of the text. In this study, the length and dimension of word vectors are both based on the BERT paper. Each word is converted into a 768-dimensional vector representation. Assuming a sentence length of 128, the sentence is represented as a (128, 768) matrix after the Token Embedding layer.
Figure 2. Input representation of PEGCN.
The network will learn a vector representation on each Position of the Position Embedding. The vector representation is coded as the information of the sequence order. The network will judge the relative position relationship of words in the sentence through the offset of each vector. The Position Embeddings layer is essentially a table measuring (128, 768) with the first row (when seen as a vector) representing the first position of the first sequence, the second row representing the second position of the sequence, and so on. The data of each row in this table are randomly generated at the beginning and updated with the training of the network. In the specific training, the network will also consider the batch size of the model batch_size, therefore, the Token Embedding and Position Embedding are represented as the tensor of (batch_size,128,768). When they are added together, the final input representation can be obtained. The word vectors obtained in this manner are used as the input representation for PEGCN document nodes in this study. The node embedding X for the document is represented as a matrix of dimensions ((ndoc+nword) × d), where ndoc represents the number of document nodes, nword represents the number of word nodes, and d represents the dimensionality of the node embedding.