Figure neural network
It is a representative method in the field of learning. It is necessary to label data to ensure the performance of the model, and in the real system, there is usually a large-scale non-label data, resulting in a limited performance. To this end, an intuitive idea is to design GNN’s pre-training strategy, learn to migrate knowledge from the general structural properties of the figure. Most of the current pre-training strategies are designed for homogeneism, with each node and edges belong to the same type, and in actual systems are usually heterogeneous, multiple types of nodes are associated with different types of edges. , Rich in semantic information. Existing models are difficult to effectively build symptom information.
In this article, we put forward a kind ofCompare pre-training strategy of neural network on isomechanical map CPT-HG
To capture semantics and structural properties in self-supervision. Specifically, we have designed a pre-training task in relationship grades and elemental levels and further enhance their representative by comparing learning. exist
Relationship level, By distinguishing the simplest heterogeneous view to capture the corresponding semantic information;Chart level, Construct a different element diagram instance to capture the corresponding semantic information.
Contrastive pre-training of gnns on Heterogeneous Graphs
In recent years, the map has become an abstraction that represents a variety of real world data sets. As a diagram structure data, the emerging tool for machine learning, the nerve network (GNN) is to learn powerful diagrams by recursively polymerizing the contents of adjacent nodes (ie, features or embedded), thereby retaining content and structural information. They have proven to improve performance of various graph applications, such as nodes and diagrams, recommendations, and graphs. In general, the GNN model is training using (semi) supervising information, and different downstream tasks require a large number of markup data. However, in most realistic scenarios, a large number of marking data is usually costly. In order to make full use of unmarkable graph structure data, the recent part works inspiration from the recent natural language processing and computer vision, and proposes a pre-trained GNN model on the map. Although these GNN pre-training methods have achieved good performance, they are designed for the same composition, each of which belongs to the same type. In contrast, existing strategies ignore heterogeneous diagrams, where multiple types of nodes interact through different types.
The network in real life can constitute a heterogeneous map, which reflects a rich semantic and composed of a variety of types of nodes and unique structures generated. As shown in Fig. 1 (a), a simple heterogeneous map is constructed for bibliographic data, which is composed of nodes of authors, papers, conferences and terminology, and authors, papers and the terms of the terms of the paper. Different types of nodes or edges usually exhibit different network properties, such as degree, and cluster coefficients. For example, the meeting node is usually higher than the author node. In addition, this isomer has also produced more complex semantic contexts, involving multiple relationships between multiple nodes, for example, describes the semantic context of "two authors of similar themes". In addition to a simple example,Heterogeneous mapIn many fields, it is also common, such as e-commerce interacting in various ways, for example, in various ways, and in biology associated with diseases, proteins and drugs. Taking into account their universality, it is important for the GNN pre-training strategy for heterogeneous map design.
In this article, we put forward a comparative pre-training program, which not only considers the difference between a single node, but also retains high-order semantics between multiple nodes. More specifically, this articleA pre-training task is designed to distinguish between different types of two nodes.(For example, author – papers and papers – meeting relationships) come to the foundation of the downstream task coding. Inspired by comparative learning , in order to enhance the representation of the sample, this paper constructs negative relationship grade samples from two aspects:
FromInconsistencyThe negative sample, two of which are different from the positive sample;
2. FromIrregular nodeNegative samples, two of which have no links at all in the figure.
At the same time, this paper proposes a subgraph pre-training task on a heterogeneous map, and a sub-map instance is used to generate a sub-map instance for comparison, so it is possible to encode information encoding for high-order semantics related to different upper and downstream tasks.
In this section, the pre-training model corresponding to this article will be introduced from two aspects of pre-training tasks, respectively:Analysis from the edge of the relationship and the meta level.
2.1 Relationship level pre-training task
For a given correct case, node set, and through relationships on the heterogeneous map constitute a corresponding example. Here, it is a three-way group before the pre-training. For negative samples, we build in two ways, one is inconsistent relationship, one is a non-connected node.
Inconsistent relationshipFor a given timing, there is a relationship between a node and nodes through an inconsistent relationship. Therefore, the corresponding negative sample example is constructed to represent an inconsistent relationship of the three-tuple, by indicating:
Since the scale is relatively large, the method in this paper will randomly sampling the collections to build a negative learning to prepare a neural network model. The corresponding loss function is:
Among them, the relationship of the learning weight matrix
Not connected nodeBased on the previous work, this paper provides a simple negative sampling scheme, which is directly sampled without a K-jet node to be used as a negative sample. In order to ensure the quality of the negative sample, the node selected as the selected node is used as the negative sample. The corresponding loss function is:
Therefore, for the pre-training task, the overall loss function is:
2.2 Subgraph level pre-training tasks
In order to capture the high-order information of the model, a natural idea is to use the Yuan path to explore high-order relationships. However, the extensive use of the Yuan path pre-trained GNN on the isomerial chart.Two weaknesses:
WithChartCompared to the fact that the energy path is characterized by rich semantics and extracting high-order structures;
2. FromSource nodeStart, the number of nodes that can be reached in the Yuan path can be too large, and the number of nodes from the same source node can be covered, because its structure is more complicated and more restrictive, which makes the metamodogram more efficient.
Therefore, this article is consideredChartTo capture high-order information.
Structural sampleFor a given element diagram and source node, a chart instance is constructed as a collection of metamod maps of the node, which is represented as a collection of all instances of the element diagram. Therefore, the sample based on the graph is as follows:
Negative sample queueIn order to build a negative sample of a metapogram, a dynamic queue is used to maintain a negative sample collection. Because of the real-time sampling negative samples, specifically, based on the previous positive samples during the training process, this article adds the nearest positive sample and remove the earliest queue end to generate negative samples.
Therefore, in order to capture high-order semantic information, this part of the model source node and the corresponding positive and negative sample, the corresponding loss function:
In order to consider two pre-training tasks, this paper achieves the corresponding effect by the following losses:
3.1 Link Forecast
The following tableDemonstrate all methods on link prediction tasksThe last line represents the increase in the proposed method relative to the existing method. It can be seen that the model is about 2% relative to the relative improvements of the existing optimal baseline on all data sets.These indicators verify the validity of the proposed model.
3.2 Node Classification
The following table shows all the performance on the node classification task, and the last line represents the increase in the proposed method relative to the existing method. It can be seen that the model is about 1% around all DBLP and Aminer data sets, relative to the relative improvement of the existing optimal baseline.These indicators verify the effectiveness of the proposed model
3.3 ablation experiment
By replacing different baseline map neural network models, it can be seen that the good results can be achieved relative to the unpredictable model. The pre-training GAT does not have a satisfactory performance because it is difficult to learn normal attention between the pre-training and fine tuning chart, soModel performance is worse than the model without pre-trained.
This article comes from:]
Author: Jiang Xiangqiang
IllustrastionBY Dmitry Nikunikov from ICons8
New this week!
About me "door"
The door is a new type of venture company that focuses on the discovery, acceleration and investment technology drive type entrepreneurial, covering the innovation service, willing the door technology community, and the Women’s Ventures.
At the end of 2015, the founding team was constructed by Microsoft Venture Investment in China’s founding team.
If you are a start-up business in the technical field, not only want to get investment, but also hope to get a series of continuous, valuable post-service services.
Welcome to send me "door" to me:
? One button to send you to Techbeat Happy Planet