Learning representation of networks

Learning the distributed representation of data has been proved very successful in many domains such as speech, images, natural languages. In this project, our goal is to learn the distributed representation of network data, which are ubiquitous in real-world and cover various applications. Representing networks into low-dimensional spaces is potentially useful in many applications such as visualization, node classification, link prediction and recommendation. In this project, we proposed a large-scale information network embedding model called the “LINE”, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves the local and global network structures. We also proposed an efficient optimization algorithm, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a single machine.


Jian Tang, Microsoft Research, jiatang@microsoft.com, tangjianpku@gmail.com