Twitter: NetworkX Graph Analysis

Varun Garg
Web Mining [IS688, Spring 2021]
5 min readFeb 18, 2021

--

Social media includes various ways people connect to other people through the use of computation. Mobile devices, social networks, email, texting, vlogs and blogs are a few of the many ways people interact in a computer- mediated action. As they link, follow, like, reply, retweet, comment, tag, rate and review, and text one another , they form connections.

These connections contain network structures (or hubs) that can be fetched, analyzed and visualized. Twitter is such a hub which is growing every second. We follow and are followed by a growing number of users also known as an Ego-Network. We are at the focal point of such a hub, extending our reach to other egos and their ego-networks. This can be demonstrated via graph which is collection of nodes (vertices) along with identified pairs of nodes (edges).

As part of this article I will be using NetworkX for my analysis which is a project from a non-for-profit organization ‘Social Media Research Foundation’, dedicated to creating open tools, data, and open scholarship related to social media. Apart from NetworkX, I will also be demonstrating the use of the following libraries available in Python:

tweepy networkx pandas numpy matplotlib json csv

The first step towards doing such an analysis is to ‘mine’ Twitter data via it’s api call. For the analysis I am using ‘masalaspice2’ (an account managed by my wife for her YouTube channel). Here is a summary of tasks that I will be performing:

  1. Verify account credentials: To get to the root user’s information
  2. Pull Account infomation: Checking the above authentication steps to pull latest tweets from my timeline
  3. Fetch friends: Get info for each user (friends) the root user is following on Twitter
  4. Friends of friends: Get info for each friend of friend in csv format
  5. Visualization: Plot the graph
  6. Network Analysis: Most influential Node in the network
  7. Network Analysis: Most important Node in the network
  8. Network Analysis: Shortest distance between 2 Nodes
  9. Network Analysis: Best connector in the network
  1. Verify account credentials: I imported tweepy to verify my account credentials for pulling Twitter data.

2. Pull Account infomation: Once the authentication was successful, I ran the below script to fetch latest tweets from my timeline.

3. Fetch friends: I am using twitter handle ‘MasalaSpice2’ for the rest of my work. Here I am fetching list of twitter handles following ‘MasalaSpice2’.

Please note that I have hidden the actual user handles to avoid privacy concerns.

4. Friends of friends: To start with the network analysis, I pulled the friends of friends data (in csv format) and converted the csv file into a .txt format so that I can work with an edge graph which looked like the below:

Here the first entry has ‘ma****ice2’ as Node-1 and ‘Aka***shyap’ as Node-2

.csv file converted into .txt file for edge graph analysis
.csv file converted into .txt for edge graph analysis

5. Visualization: With the above information in the form of nodes, below is how I plotted a graph from the .txt file.

Network Graph for ‘MasalaSpice2’

6. Network Analysis: Most influential Node in the network

In network analysis, measurement of how important the nodes are, is referred to as centrality measures. Some handles in a graph may have high centrality measure but may have a low degree. This could mean that these handles are important, connecting otherwise different parts in the graph.

Twitter user ‘iAmXXXXX’ seems to be the most influential node in the current network.

7. Network Analysis: Most important Node in the network.

Twitter user ‘iAmXXXXX’ also seems to be the most important node in the network.

8. Network Analysis: Shortest distance between 2 Nodes

Here I am trying to figure out the shortest distance between ‘MasalaSpice2’ (the node under analysis) and an anonymous user (‘165486199’) who is a friend of friend.

Based on the above result if ‘MasalaSpice2’ wants to be followed by a Twitter user ‘165486199’, who is a friend of friend, she can reach out via Twitter user ‘Aka****ap’.

9. Network Analysis: Best Connector in the network seems to be user id ‘p****draa’.

Conclusion: Social Network analysis has recently emerged as a very powerful and popular method for modeling meaningful and often hidden relationships in online communities. NetworkX offers many functions for us to use in various network analysis problems and a programming language like Python gives us the flexibility to explore various network computationally in many different ways like finding out the most important and influential nodes in the network, best connectors in the network, degree to which the network is connected and many more.

--

--