1、Lecture 11: Graph NeuralNetworksArtificial Intelligence2Natural Language Processing Question Answering Information Extraction Machine Translation .November 24, 2019Artificial Intelligence3Question AnsweringNovember 24, 2019Artificial Intelligence4Information ExtractionNovember 24, 2019Artificial Int
2、elligence5Machine TranslationNovember 24, 2019Artificial Intelligence6Graphs are everywhere in NLPNovember 24, 2019Artificial Intelligence7Deep Learning in NLPNovember 24, 2019Artificial Intelligence8OverviewNovember 24, 2019Artificial Intelligence9OverviewNovember 24, 2019Artificial Intelligence10N
3、ovember 24, 2019Data DomainImage, volume, video lie on2D, 3D, 2D+1 Euclidean domainsSentence, word, sound lie on1D Euclidean domainThese domains have strong regular spatial structures.All ConvNet operations are mathematically well defined and fast(convolution, pooling).Artificial Intelligence11Graph
4、 Structured DataNovember 24, 201912November 24, 2019Artificial IntelligenceHow CNNs for Graphs? Translation Downsampling (Pooling)13November 24, 2019Artificial IntelligenceMotivating Example Co-authorship Network Nodes: Authors, Edges: Co-authorship14November 24, 2019Artificial IntelligenceMotivatin
5、g Example: Co-authorship Network Node Classification: (Semi-supervised Learning) Predict research area of unlabeled authors15November 24, 2019Artificial IntelligenceMotivating Example Identify Communities: (Unsupervised) Grouping authors with similar research interests16November 24, 2019Artificial I
6、ntelligenceMotivating Example Graph Classification: (Supervised) Identifying class of each community.Artificial Intelligence17OverviewNovember 24, 201918November 24, 2019Artificial IntelligenceEmbedding Nodes Goal is to encode nodes so that similarity in theembedding space (e.g., dot product) approx
7、imatessimilarity in the original network.Artificial Intelligence19Embedding NodesNovember 24, 2019Artificial Intelligence20November 24, 2019Two Key Components Encoder maps each node to a low-dimensional vector. Similarity function specifies how relationships in vectorspace map to relationships in th
8、e original network.Artificial Intelligence21Two Key Components Shallow encoders:November 24, 2019Artificial Intelligence22November 24, 2019Two Key Components Limitations of shallow encoding: O(|V|) parameters are needed: there no parametersharing and every node has its own unique embeddingvector. In
9、herently “transductive”: It is impossible to generateembeddings for nodes that were not seen during training. Do not incorporate node features: Many graphs havefeatures that we can and should leverage.Artificial Intelligence23Graph Neural Network Graph Neural Network for Deeperencoding!November 24,
10、2019Artificial Intelligence24Graph Neural NetworkNovember 24, 2019Artificial Intelligence25November 24, 2019Neighborhood Aggregation Key idea: Generate node embeddings based onlocal neighborhoods.Artificial Intelligence26November 24, 2019Neighborhood Aggregation Intuition: Nodes aggregate informatio
11、n from theirneighbors using neural networksArtificial Intelligence27November 24, 2019Neighborhood Aggregation Intuition: Nodes aggregate information from theirneighbors using neural networksArtificial Intelligence28Neighborhood Aggregation Intuition: Network neighborhood defines acomputation graphNo
12、vember 24, 2019Artificial Intelligence29November 24, 2019Neighborhood Aggregation Nodes have embeddings at each layer. Model can be arbitrary depth. “layer-0” embedding of node u is its input feature, i.e. xu.Artificial Intelligence30November 24, 2019Neighborhood Aggregation Neighborhood aggregation
13、 can be viewed as a center-surround filter. Mathematically related to spectral graph convolutions (Bronstein et al., 2017)Artificial Intelligence31Neighborhood Aggregation Key distinctions are in how different approachesaggregate information across the layers.November 24, 2019Artificial Intelligence
14、32November 24, 2019Neighborhood Aggregation Basic approach: Average neighbor information and applya neural network.Artificial Intelligence33November 24, 2019Neighborhood Aggregation Basic approach: Average neighbor information and applya neural network.Artificial Intelligence34November 24, 2019Train
15、ing the Model How do we train the model to generate “high-quality”embeddings?Artificial Intelligence35Training the ModelNovember 24, 2019Artificial Intelligence36Training the ModelNovember 24, 2019Artificial Intelligence37November 24, 2019Training the Model Alternative: Directly train the model for
16、a supervised task(e.g., node classification):Artificial Intelligence38November 24, 2019Training the Model Alternative: Directly train the model for a supervised task(e.g., node classification):Artificial Intelligence39Overview of Model DesignNovember 24, 2019Artificial Intelligence40Overview of Mode
17、l DesignNovember 24, 2019Artificial Intelligence41Overview of Model DesignNovember 24, 2019Artificial Intelligence42November 24, 2019Inductive Capability The same aggregation parameters are shared for allnodes. The number of model parameters is sublinear in |V|andwe can generalize to unseen nodes!Ar
18、tificial Intelligence43Inductive CapabilityNovember 24, 2019Artificial Intelligence44Inductive CapabilityNovember 24, 201945November 24, 2019Artificial IntelligenceGraph Convolutional Networks (GCN)46November 24, 2019Artificial IntelligenceGraph Convolutional Networks (GCN)47November 24, 2019Artific
19、ial IntelligenceGraph Convolutional Networks (GCN)48November 24, 2019Artificial IntelligenceGraph Convolutional Networks (GCN)49November 24, 2019Artificial Intelligenceal., EMNLP 1750November 24, 2019Artificial IntelligenceMessage Passing Neural Networks Gilmer et al., ICML 1751November 24, 2019Arti
20、ficial IntelligenceMessage Passing Neural Networks Gilmer et al., ICML 1752November 24, 2019Artificial IntelligenceHypergraph Convolutional Network (Yadati et al. NeurIPS19)53November 24, 2019Artificial IntelligenceExample: GNNs for Semantic Role LabelingArtificial Intelligence54OverviewNovember 24,
21、 201955November 24, 2019Artificial IntelligenceNeighborhood Aggregations in GCNsStandard GCN neighborhood aggregationNo restriction on influence neighborhoodMethods:Graph Attention Networks (GAT)Confidence-based GCN (ConfGCN)56November 24, 2019Artificial IntelligenceGraph Attention Networks (Velicko
22、vic et al. ICLR 18)57November 24, 2019Artificial IntelligenceGraph Attention Networks (Velickovic et al. ICLR 18)58November 24, 2019Artificial IntelligenceGraph Attention Networks (Velickovic et al. ICLR 18)59November 24, 2019Artificial IntelligenceGraph Attention Networks (Velickovic et al. ICLR 18
23、)Artificial Intelligence60OverviewNovember 24, 201961November 24, 2019Artificial IntelligenceMotivating Example Identify Communities: (Unsupervised) Grouping authors with similar research interests62November 24, 2019Artificial IntelligenceUnsupervised Representation Learning Labeled data is expensiv
24、e Allows to discover interesting structure from large-scale graphs63November 24, 2019Artificial IntelligenceUnsupervised Representation Learning Labeled data is expensive Allows to discover interesting structure from large-scalegraphs Methods GraphSAGE Graph Auto-Encoder (GAE) Deep Graph Infomax (DG
25、I)64November 24, 2019Artificial IntelligenceGraphSAGE Hamilton et al. NeurIPS 1765November 24, 2019Artificial IntelligenceGraphSAGE Hamilton et al. NeurIPS 1766November 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al. ICLR 16 GCNs and GraphSAGE generally only 2-3 layers deep.67Nov
26、ember 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al. ICLR 16 But what if we want to go deeper?68November 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al. ICLR 16 How can we build models with many layers of neighborhoodaggregation? Challenges: Overfitting from
27、too many parameters. Vanishing/exploding gradients during backpropagation. Idea: Use techniques from modern recurrent neuralnetworks!69November 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al. ICLR 16 Idea 1: Parameter sharing across layers.70November 24, 2019Artificial Intelligen
28、ceGated Graph Neural NetworksLi et al. ICLR 16 Idea 2: Recurrent state update.71November 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al. ICLR 16 Intuition: Neighborhood aggregation with RNN state update.72November 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al
29、. ICLR 16Can handle models with 20 layers.Most real-world networks have small diameters (e.g., less than 7).Allows for complex information about global graph structure to bepropagated to all nodes.73November 24, 2019Artificial IntelligenceGated Graph Neural NetworksLi et al. ICLR 16Artificial Intell
30、igence74Zero-shot LearningNovember 24, 2019Artificial Intelligence75GCN predicts visual classifierNovember 24, 2019Artificial Intelligence76Visual Question AnsweringNovember 24, 2019GCN for F-VQANovember 24, 2019Artificial IntelligenceNarasimhan et al., NeurIPS1877Artificial Intelligence78SummaryNov
31、ember 24, 201979November 24, 2019Artificial IntelligenceSummaryGraphs are everywhere and effective tool for exploiting such graphstructure in end-to-end learning.GNNs are versatile, can be applied overLearning settings: Semi-supervisedGraph granularity: node level, link, subgraph, whole graphGraph types: undirected, directed, multi-relationalGNNs have achieved considerable success on several tasks.Many more possibilities ahead!