Convolutional Neural Networks On Graphs

Convolutional neural networks
on graphs
Michal Punčochář (michal.puncochar@gmail.com)
CNS course @ UniPi, 2018
Outline
1. Motivation
2. Dynamical system approach
3. Early models
• NN4G
• Learnable molecular fingerprints
4. Analogy with CNN on images and sequences
5. Spectral convolution approach
6. Unified view
Motivation
• Molecules – many small graphs
• Classification, regression
• Biological networks: protein-protein interactions, gene regulatory, gene
co-expression, metabolic, signaling, ...
• Citation networks, social networks, ...
• We can generate graphs from non-graph data
− 𝑥 −𝑥 2 /𝜎
• K-nearest neighbors, 𝑒 1 2 < 𝛿, supervised, ...
 Node embeddings, semi-supervised node labeling,
inductive learning
?
GNN[1,2]
• Dynamical system, diffusion between nodes

𝒙 𝑣 = 𝑓 𝒖 𝑣 ,𝒙 𝒩 𝑣
• Cyclical dependencies  iterate
• e.g. 𝒙𝑡 𝑣 = 𝜎 𝑊𝑖𝑛 𝒖 𝑣 + 𝑤∈𝒩 𝑣 𝑊𝒙𝑡−1 𝑤
• Ensure contractivity of 𝑓 and iterate to convergence
• Learning using the gradient at equilibrium[1,2] (Almeida-Pineda algorithm)
• No learning – GraphESN[3]
• Or unroll to fixed time-step, then backprop through time[4]
• No need for contractivity  greater expressivity?
[1] Gori, M.; Monfardini, G.; Scarselli, F. In A new model for learning in graph domains, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005; pp 729-734 vol. 2.
[2] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G., The Graph Neural Network Model. IEEE Transactions on Neural Networks 2009, 20 (1), 61-80.
[3] Gallicchio, C.; Micheli, A. In Graph Echo State Networks, The 2010 International Joint Conference on Neural Networks (IJCNN), 18-23 July 2010; pp 1-8.
[4] Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks ArXiv e-prints, 2015.
NN4G[5]
𝒙𝑙 (𝑣) = 𝜎 𝑊𝑙𝑖𝑛 𝒖 𝑣 + 𝑊𝑙,𝑙′ 𝒙𝑙′ 𝑤

𝑙 ′ <𝑙 𝑤∈𝒩 𝑣
• Connections to input and all previous layers

• ~ unrolled GNN up to 𝐿 layers with “skip-connections”
• Stationarity assumption: the same procedure is
applied to all vertices
• Hierarchical growth of context ~ diffusion
• Suggested training by cascade correlation
• Possible to use backprop through all layers
[5] Micheli, A., Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks 2009, 20 (3), 498-511.
Learnable molecular fingerprints[6]
• Simplified model updated with modern DL practices
𝒙𝑙 𝑣 = 𝜎 𝑊𝑙, 𝒩 𝑣 ⋅ 𝒙𝑙−1 𝑤
𝑤∈𝒩0 𝑣
• 𝒙0 = 𝒖 ... atom features
• Weight matrix for each layer and degree (# of bonds ≤ 5)
• Connection only to previous layer – less parameters
• 𝜎 = ReLU, better results than tanh (deep architecture)
• Fingerprint 𝒇 = 𝑙 softmax 𝑊𝑜 𝒙𝑙
• Differentiable version of fixed ECFP → Task specific fingerprint
• Similar concept to Neural Turing Machine
• Trained by backprop gradient descent
[6] Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional Networks on Graphs for Learning
Molecular Fingerprints ArXiv e-prints, 2015.
Convolution on graphs?
• Sequences, images → 𝑑-dimensional grids in Euclidean space
• Regularity, notion of direction, distance, translation Convolution
• Natural convolution using sliding windows Pooling
• Natural subsampling (pooling)

• Hierarchical stacking of layers
⇒ growth of context
• General graphs 𝑖−3 𝑖−2 𝑖−1 𝑖 𝑖+1 𝑖+2 𝑖+3
• Irregular
• All edges from a node are equivalent
• No notion of left, right, up, down
Signal processing on graphs
• Graph 𝐺 = {𝑉, 𝐸}, 𝑛 = 𝑉
• Signal on graph: function 𝑓: 𝑉 → ℝ
• Or a vector 𝒇 ∈ ℝ𝑛
• Laplace operator Δ
𝑑 2 𝜕2 𝜕2
• In Euclidean ℝ , Δ = 𝛻 = + ⋯+ 2
𝜕𝑥12 𝜕𝑥𝑑
𝑓 𝑥−ℎ − 2𝑓 𝑥 + 𝑓 𝑥+ℎ
• Finite difference approximation, e.g. Δ𝑓 𝑥 ≈
ℎ2
• For graphs Δ𝑓 𝑣 = 𝑤∈𝒩(𝑣) 𝑓 𝑣 − 𝑓 𝑤
• Laplace matrix 𝐿 = 𝐷 − 𝐴 deg 𝑣𝑖 , 𝑖=𝑗
• 𝐷 = diag deg 𝑉 , 𝐴 is adjacency matrix 𝐿𝑖,𝑗 = −1, 𝑖, 𝑗 ∈ 𝐸
• ∆𝑓 = 𝐿𝒇 0, otherwise
[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to
Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
Graph spectrum and Fourier transform
• 𝐿 is positive semi-definite ⇒ 𝐿 = 𝑈 𝑇 Λ 𝑈
• Real eigenvalues 𝜆0 ≤ ⋯ ≤ 𝜆𝑛−1 and eigenvectors 𝒖1 , … , 𝒖𝑛
• Classical Fourier transform 𝑓 𝜉 = 𝑓 𝑡 𝑒 −2𝜋𝑖𝜉𝑡 𝑑𝑡 = 𝑓, 𝑒 2𝜋𝑖𝜉𝑡
ℝ
• 𝑒 2𝜋𝑖𝜉𝑡 are eigenfunctions of Δ (Fourier basis)
• FT on graphs: 𝑓 𝜆𝑗 = 𝑛𝑖=1 𝑓 𝑖 𝑢𝑗 ⇒ 𝒙 = 𝑈𝒙
• Inverse FT: 𝑓 𝑖 = 𝑛−1
𝑗=0 𝑓(𝜆𝑗 ) 𝑢𝑗 𝑖 ⇒ 𝒙 = 𝑈 𝑇𝒙
[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to
Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
Graph spectral convolution
• Classical convolution: 𝑓 ∗ 𝑔 𝑡 = ℝ
𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏
• Works on grids, but no translation in graphs
• Convolution theorem: ℱ 𝑓 ∗ 𝑔 = ℱ 𝑓 ∘ ℱ 𝑔
⇒ 𝑓 ∗ 𝑔 = ℱ −1 ℱ 𝑓 ∘ ℱ 𝑔
• Graph convolution: 𝒙 ∗ 𝒈 = 𝑈 𝑇 𝑈𝒙 ∘ 𝑈𝒈 = 𝑈 𝑇 diag 𝑈𝒈 𝑈𝒙
• First idea[8,9]: learn the vector 𝒘 = 𝑈𝒈
• Convolutional layer: conv 𝒙 = 𝜎 𝑈 𝑇 diag 𝒘 𝑈𝒙
• Issues:
• Filters are not localized
• Number of parameters depends on input size: 𝒘 ∈ ℝ𝑛
• Multiplications with 𝑈 and 𝑈 𝑇 are costly: 𝒪 𝑛2
[8] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs ArXiv e-prints, 2013.
[9] Henaff, M.; Bruna, J.; LeCun, Y. Deep Convolutional Networks on Graph-Structured Data ArXiv e-prints, 2015.
Fast localized spectral filters[10]
• 𝒙 ∗ 𝒈 = 𝑈 𝑇 diag 𝒘 𝑈𝒙
• 𝑈 𝑇 diag 𝒘 𝑈 is interpreted as Laplacian with modified frequencies
• 𝒙 ∗ 𝒈𝜃 = 𝑔𝜃 𝐿 𝒙 = 𝑈 𝑇 𝑔𝜃 Λ 𝑈 𝒙
• 𝑔𝜃 Λ = diag 𝑔𝜃 𝜆0 , … , 𝑔𝜃 𝜆𝑛−1
• Theorem: When 𝑔𝜃 (𝜆) is 𝐾-degree polynomial, convolution at a
vertex is localized to its 𝐾-neighborhood
• Naive parametrization: 𝑔𝜃 Λ = 𝐾−1 𝜃
𝑘=0 𝑘 Λ𝑘
... 𝒪 𝑛 2
• Using Chebyshev polynomials: 𝑔𝜃 𝐿 = 𝐾−1 𝑘=0 𝜃𝑘 𝑇𝑘 𝐿

• 𝑇0 𝑥 = 1, 𝑇1 𝑥 = 𝑥, 𝑇𝑘 𝑥 = 2𝑥 𝑇𝑘−1 𝑥 − 𝑇𝑘−2 𝑥 ... 𝒪 𝐾 𝐸
[10] Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering ArXiv e-prints, 2016.
The circle closes
• Setting 𝐾 = 1 [11]: 𝑔𝜃 𝐿 = 𝜃0 𝐼 + 𝜃1 𝐿 = 𝜃0 𝐼 + 𝜃1 𝐷 − 𝜃1 𝐴
• 𝒙 ∗ 𝒈𝜃 𝑣 = 𝜃0 𝐼 + 𝜃1 𝐷 − 𝜃1 𝐴 𝒙 𝑣
𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ 𝒙𝑙−1 𝑤 − 𝒙𝑙−1 𝑣

𝑤∈𝒩 𝑣
• Using normalized 𝐿𝑁 = 𝐷−1 2 𝐿𝐷−1 2 = 𝐼 − 𝐷 −1 2 𝐴𝐷−1 2 ⟹ 𝜆 ∈ 0,2

1
𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ 𝒙𝑙−1 𝑤
𝑤∈𝒩 𝑣
deg 𝑣 deg 𝑤
Alternatively max or LSTM

≈ mean 𝒙𝑙−1 𝑤 , 𝑤 ∈ 𝒩 𝑣
aggregator[12]
[11] Kipf, T. N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks ArXiv e-prints, 2016.
[12] Hamilton, W. L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs ArXiv e-prints, 2017.
Putting it together
1. Time-unrolled dynamical system (diffusion)
2. Spatial convolution (hierarchical growth of context)
3. Spectral convolution
A general convolutional layer:
𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ Aggregate 𝒙𝑙−1 𝑤 , ∀𝑤 ∈ 𝒩 𝑣
• Aggregate by sum, mean, max, LSTM, ...

• Mean and max should prevent overfitting on local neighborhood structures
• LSTM on unordered set?!
• Additional skip-connections and other ad-hoc modifications possible
References
[1] Gori, M.; Monfardini, G.; Scarselli, F. In A new model for learning in graph domains, Proceedings. 2005 IEEE International Joint
Conference on Neural Networks, 2005; pp 729-734 vol. 2.
[2] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G., The Graph Neural Network Model. IEEE Transactions on Neural
Networks 2009, 20 (1), 61-80.
[3] Gallicchio, C.; Micheli, A. In Graph Echo State Networks, The 2010 International Joint Conference on Neural Networks (IJCNN), 18-23
July 2010; pp 1-8.
[4] Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks ArXiv e-prints, 2015.
[5] Micheli, A., Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks 2009, 20 (3), 498-
511.
[6] Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional
Networks on Graphs for Learning Molecular Fingerprints ArXiv e-prints, 2015.
[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending
High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
[8] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs ArXiv e-prints, 2013.
[9] Henaff, M.; Bruna, J.; LeCun, Y. Deep Convolutional Networks on Graph-Structured Data ArXiv e-prints, 2015.
[10] Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering ArXiv e-
prints, 2016.
[11] Kipf, T. N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks ArXiv e-prints, 2016.
[12] Hamilton, W. L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs ArXiv e-prints, 2017.
Thank you for your attention
Questions?

Convolutional Neural Networks On Graphs

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Convolutional Neural Networks On Graphs

Enviado por

Direitos autorais:

Formatos disponíveis

Convolutional neural networks

• Dynamical system, diffusion between nodes

𝒙𝑙 (𝑣) = 𝜎 𝑊𝑙𝑖𝑛 𝒖 𝑣 + 𝑊𝑙,𝑙′ 𝒙𝑙′ 𝑤

• Connections to input and all previous layers

• Simplified model updated with modern DL practices

• Natural convolution using sliding windows Pooling

• Natural subsampling (pooling)

• Using Chebyshev polynomials: 𝑔𝜃 𝐿 = 𝐾−1 𝑘=0 𝜃𝑘 𝑇𝑘 𝐿

𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ 𝒙𝑙−1 𝑤 − 𝒙𝑙−1 𝑣

• Using normalized 𝐿𝑁 = 𝐷−1 2 𝐿𝐷−1 2 = 𝐼 − 𝐷 −1 2 𝐴𝐷−1 2 ⟹ 𝜆 ∈ 0,2

Alternatively max or LSTM

• Aggregate by sum, mean, max, LSTM, ...

Você também pode gostar