Tutorial 5: Data Export

This tutorial will demonstrate how to export data from Communalytic. Once you have collected your data (and completed the toxicity analysis – optional), you can export both posts and the .graphml network data to your computer.

Additional resource: A case study on how to use Communalytic and Gephi to examine groups on Reddit.

Step 1

To begin, find your dataset on the “My Datasets” homepage, and look for the column labelled “Export Posts”. Then click on the green arrow icon.

Step 2

Before downloading your CSV file, please download and review the Reddit attributes guide. Then, click on “Download CSV File” to download your data. You can open this .csv file with Microsoft Excel or Google Sheets. 

The .csv file includes metadata about the collected Reddit posts such as the author, date published, the post content, and the number of upvotes and downvotes. The .csv file will also contain a series of values for measuring the toxicity level as calculated by Google Perspectives API, if you have completed the Toxicity Analysis step. 

Step 3

Now, let’s export the network data. Click the “Export Network” button on the sidebar on the bottom left.

Step 4

This will take you to a page where a Snapshot for the network data is automatically generated by our system. If there is no network snapshot available when you open the page, click “Generate/Download Network Graph”. Refresh the page to view a snapshot of your data.

Then click the “Download Graph” button to download the GraphML file. To open and work with this file, you will need to use software designed for network analysis such as Gephi.

Step 5

To download Gephi, visit this link and select “Download Free” to begin your download. Gephi is available for Windows, Mac OS X and Linux.

Step 6

After Gephi is installed, you will be able to open the network data you exported earlier. Start by opening up Gephi on your computer. A box will pop up giving you the option to open a “New Project” or “Open Graph File”. Select “Open Graph File” and open the network data you downloaded earlier from Communalytic.

Step 7:

Once you’ve selected your GraphML file, an Import Report pop up will come up. Here, just click “OK”.

Step 8:

After this step, you should see a visualization of your network data. You can switch between “Overview” and “Data Laboratory” views in the top left corner of your browser. All the original data will be displayed in the “Data Laboratory”, which is similar to the excel view.

Step 9

In Gephi, the network data is separated by Nodes and Edges. Nodes represent the unique users engaged in the Reddit discussion and Edges are the interactions (e.g., reply, comment) between those users.

Referring back to the graph, each black dot is a Node (a unique user). The grey lines connecting the dots are the “Edges” (interactions between users). Some Nodes have several lines connecting them to other Nodes, which represent users that have interacted with many other users. The thickness of lines represents the weight of an edge.

Step 10

If a user replied to another user more than once, the weight of the edge would be higher and the line would be thicker.

Step 11

The graph automatically generates in black and grey, which is quite monotonous and hard to read. You can assign different colours to the nodes and edges to make the graph look more user-friendly by clicking the colour palette icon in the top left section of your browser. Then click on the colour you wish to change your nodes to, then click “Apply”.

Step 12

You can also adjust the size of the Nodes based on the degree of the Nodes. The degree of a node is the number of connections it has to other nodes. To modify this, click on the size icon to the right of the colour palette icon in the top left corner. From here a drop-down menu will appear with options to display a degree, in-degree or out-degree. After you make a selection, you can change the minimum size and maximum size of the graph.

Bigger nodes suggests that users are actively involved in discussion. These users might be the key opinion leader in the Reddit forum. If you want to examine the toxicity level of interactions between users, switch to the Edges view in the top left corner of your browser.

Step 13

You can change graph layout in the layout section on the left. Let’s select an algorithm to change the layout of graph – we recommend trying different layouts to find which works best for your data.

Step 14

From here, look to the top of your browser and click the “Preview” button. Doing so will allow you to see the preview of your graph after modification, which can be exported to static image.

For further, more detailed tutorials on how to use Gephi, please visit their website.