Escolar Documentos
Profissional Documentos
Cultura Documentos
on graphs
Michal Punčochář (michal.puncochar@gmail.com)
CNS course @ UniPi, 2018
Outline
1. Motivation
2. Dynamical system approach
3. Early models
• NN4G
• Learnable molecular fingerprints
4. Analogy with CNN on images and sequences
5. Spectral convolution approach
6. Unified view
Motivation
• Molecules – many small graphs
• Classification, regression
• Biological networks: protein-protein interactions, gene regulatory, gene
co-expression, metabolic, signaling, ...
• Citation networks, social networks, ...
• We can generate graphs from non-graph data
− 𝑥 −𝑥 2 /𝜎
• K-nearest neighbors, 𝑒 1 2 < 𝛿, supervised, ...
Node embeddings, semi-supervised node labeling,
inductive learning
?
GNN[1,2]
[1] Gori, M.; Monfardini, G.; Scarselli, F. In A new model for learning in graph domains, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005; pp 729-734 vol. 2.
[2] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G., The Graph Neural Network Model. IEEE Transactions on Neural Networks 2009, 20 (1), 61-80.
[3] Gallicchio, C.; Micheli, A. In Graph Echo State Networks, The 2010 International Joint Conference on Neural Networks (IJCNN), 18-23 July 2010; pp 1-8.
[4] Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks ArXiv e-prints, 2015.
NN4G[5]
[5] Micheli, A., Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks 2009, 20 (3), 498-511.
Learnable molecular fingerprints[6]
𝒙𝑙 𝑣 = 𝜎 𝑊𝑙, 𝒩 𝑣 ⋅ 𝒙𝑙−1 𝑤
𝑤∈𝒩0 𝑣
• 𝒙0 = 𝒖 ... atom features
• Weight matrix for each layer and degree (# of bonds ≤ 5)
• Connection only to previous layer – less parameters
• 𝜎 = ReLU, better results than tanh (deep architecture)
• Fingerprint 𝒇 = 𝑙 softmax 𝑊𝑜 𝒙𝑙
• Differentiable version of fixed ECFP → Task specific fingerprint
• Similar concept to Neural Turing Machine
• Trained by backprop gradient descent
[6] Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional Networks on Graphs for Learning
Molecular Fingerprints ArXiv e-prints, 2015.
Convolution on graphs?
• Sequences, images → 𝑑-dimensional grids in Euclidean space
• Regularity, notion of direction, distance, translation Convolution
[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to
Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
Graph spectral convolution
• Classical convolution: 𝑓 ∗ 𝑔 𝑡 = ℝ
𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏
• Works on grids, but no translation in graphs
• Convolution theorem: ℱ 𝑓 ∗ 𝑔 = ℱ 𝑓 ∘ ℱ 𝑔
⇒ 𝑓 ∗ 𝑔 = ℱ −1 ℱ 𝑓 ∘ ℱ 𝑔
• Graph convolution: 𝒙 ∗ 𝒈 = 𝑈 𝑇 𝑈𝒙 ∘ 𝑈𝒈 = 𝑈 𝑇 diag 𝑈𝒈 𝑈𝒙
• First idea[8,9]: learn the vector 𝒘 = 𝑈𝒈
• Convolutional layer: conv 𝒙 = 𝜎 𝑈 𝑇 diag 𝒘 𝑈𝒙
• Issues:
• Filters are not localized
• Number of parameters depends on input size: 𝒘 ∈ ℝ𝑛
• Multiplications with 𝑈 and 𝑈 𝑇 are costly: 𝒪 𝑛2
[8] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs ArXiv e-prints, 2013.
[9] Henaff, M.; Bruna, J.; LeCun, Y. Deep Convolutional Networks on Graph-Structured Data ArXiv e-prints, 2015.
Fast localized spectral filters[10]
• 𝒙 ∗ 𝒈 = 𝑈 𝑇 diag 𝒘 𝑈𝒙
• 𝑈 𝑇 diag 𝒘 𝑈 is interpreted as Laplacian with modified frequencies
• 𝒙 ∗ 𝒈𝜃 = 𝑔𝜃 𝐿 𝒙 = 𝑈 𝑇 𝑔𝜃 Λ 𝑈 𝒙
• 𝑔𝜃 Λ = diag 𝑔𝜃 𝜆0 , … , 𝑔𝜃 𝜆𝑛−1
• Theorem: When 𝑔𝜃 (𝜆) is 𝐾-degree polynomial, convolution at a
vertex is localized to its 𝐾-neighborhood
• Naive parametrization: 𝑔𝜃 Λ = 𝐾−1 𝜃
𝑘=0 𝑘 Λ𝑘
... 𝒪 𝑛 2
[10] Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering ArXiv e-prints, 2016.
The circle closes
• Setting 𝐾 = 1 [11]: 𝑔𝜃 𝐿 = 𝜃0 𝐼 + 𝜃1 𝐿 = 𝜃0 𝐼 + 𝜃1 𝐷 − 𝜃1 𝐴
• 𝒙 ∗ 𝒈𝜃 𝑣 = 𝜃0 𝐼 + 𝜃1 𝐷 − 𝜃1 𝐴 𝒙 𝑣