What is my child learning while playing video games every day?

This is a question many parents ponder, especially if their child is unschooled and gaming is a big passion for their child. It was a question that occupied my mind a lot while my two sons devoted…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




An Introduction to Graph Attention Networks

Graph Learning is increasingly becoming more and more relevant as a significant amount of real-world data can be modelled as graphs.

A Visual Representation of the Cora Dataset as a Graph. Reference [1].

The Graph Attention Network or GAT is a non-spectral learning method which utilizes the spatial information of the node directly for learning. This is in contrast to the spectral approach of the Graph Convolutional Network which mirrors the same basics as the Convolutional Neural Net.

In this article, I will explain how the GAT is constructed.

The basic building block of the GAT is the Graph Attention Layer. To explain the following graph is used as an example.

An Example Graph

Here hi is a feature vector of length F.

The first step performed by the Graph Attention Layer is to apply a linear transformation — Weighted matrix W to the feature vectors of the nodes.

Attention Coefficients determine the relative importance of neighbouring features to each other. They are calculated using the formula.

Here a is a function that we determine subject to the following restriction.

i and j are neighbouring nodes.

The following figures further explain this.

As we observe in the series of images above, we first calculate the self-attention coefficient and then compute attention coefficients with all of the neighbours.

Due to the varied structure of graphs, nodes can have a different number of neighbours. To have a common scaling across all neighbourhoods, the Attention coefficients are Normalized.

Normalization of Calculated Attention Coefficient

Here N is the Neighbourhood of node i.

Now we compute the learned features of nodes. σ is a Non-Linear Transformation.

Example Network Architecture
Computation of Learned Output Features

To improve the stability of the learning process, Multi-head attention is employed. We compute multiple different attention maps and finally aggregate all the learned representations.

Final Computation of Learned Features

K denotes the number of independent attention maps used.

An overall look at all operations involved in the update of learned features of nodes. Reference [1].

This is my first Medium article and I hope you find it informative. I will update on the actual implementation of the Network in Python in the next article.

[1] P. Velickovic, G. Cucurull, A. Casanova, and A. Romero. Graph Attention Networks. In International Conference on Learning Representations, 2018.

Add a comment

Related posts:

Create react application from scratch

The user interface frameworks and libraries of Javascript programming language work in a cycle. After every six months, they change positions and a new one pops up. From past four years I am using…

Why You Need a Website Audit

They say a stitch in time saves nine. This popular proverb’s wisdom extends even to a website! Without proper maintenance, your website’s health can decline over time. So before you get there…

How a Fresh Perspective on Writing Can Magnify Your Reach

You have a heart for change and want to impact the world with your words. You feel like you’re finally making your way, getting it out there, sharing a message tethered to your dreams. Then it hits…