Toxicity Comes From Both Sides of the Aisle: An exploratory network and toxicity analysis of reactions to Trump’s and Biden’s pandemic tweets using @Communalytic
Twitter has emerged as an important communication and information dissemination platform in recent years. Politicians and citizens often use this platform to interact with one another and share opinions. However, sometimes these interactions turn toxic. While Twitter has taken steps to minimize this type of interaction on their platform, we still see it, particularly, in conversations involving political leaders and their policies. This has been of particular interest for researchers who study social media interactions. In this post, we will show how publicly available conversational data from Twitter can be collected using Communalytic, and then examined in different ways using toxicity and network analysis techniques.
For this exploratory tutorial, we analyzed public replies to COVID-19 related tweets posted political leaders from opposite sides of the political spectrum, former U.S. president Donald Trump and the current president, Joe Biden. (Note: At the time of data collection in the fall of 2020, Donald Trump was the current president, and Joe Biden was the president-elect.) This topic and these two leaders were of particular interest to us as the division between the supporters of each leader has deepened since 2016 and the COVID-19 pandemic has has only increased this divide.
Our analysis shows that while the tweets of both Trump and Biden, like their views and opinions, were quite different, the communication patterns surrounding their COVID-19 related tweets are quite similar. The Twitter communication network that forms around their Twitter posts attracted similar levels of toxicity, and supporters of both leaders are attacked in both networks. Noteworthy, the most active and ‘toxic’ users in both networks are opposers of the political leader around whom the network is centered. This speaks to the unique accessibility that citizens have to political leaders via Twitter.
The following sections describe how we arrived at these conclusions. Please note that as this is a tutorial, any findings presented in this post are for illustrative purposes only.
To examine and compare how the public responded to Trump’s and Biden’s tweets related to the ongoing COVID-19 pandemic, we used a number of readily available tools including: Communalytic for data collection and analysis, Google’s Perspective API for toxicity analysis, Gephi for social network analysis, and Microsoft Excel for data exploration. Below we outline each of the steps in our methodology.
Step 1: Data Collection
As noted earlier, we collected and analyzed public replies to COVID-19 related tweets by opposing political leaders, the former US president Donald Trump and the current president, Joe Biden.
Donald Trump’s tweet chosen for this case study was a tweet with a video message from October 7, 2020 (see Figure 1), in which he reassured his followers that he was doing fine after he and first lady Melania Trump had been diagnosed with COVID-19 on October 2, 2020. The president also took this opportunity to speak on the experimental drug treatment he received while in hospital, and insisted that catching the virus was a “blessing from God.”
Joe Biden has been vocal about his criticism of the Trump administration’s mishandling of the pandemic. Biden’s tweet chosen for this case study was from November 13, 2020, in which he stated how “alarmed” he was by the surge of COVID-19 cases across the U.S., and called on Donald Trump’s administration to take immediate action (see Figure 2).
To limit the scope of our analysis, we collected a sample of five thousand replies to each of the two tweets: replies to Donald Trump’s tweet were collected between October 8-11, while replies to Joe Biden’s tweet were collected between November 14-17. Out of 5,000 tweets, 2,083 replies (42 per cent) in the Trump dataset were directed at the original tweet by Trump, and 2,793 replies (56 per cent) in the Biden dataset were directed at the original tweet by Biden. The remaining tweets were replies to other replies.
Detailed steps on how to collect Twitter data using Communalytic (via Twitter API version 2) can be found here.
Step 2: Toxicity Analysis
Once data was collected, we used Communalytic’s toxicity analysis feature to analyze each tweet with Google’s Perspective API. This online service uses machine learning techniques to calculate how likely a post is toxic in accordance with different scores such as toxicity, identity attack, insult, profanity, etc. See our previous post on how to apply and interpret these scores when analyzing Twitter data. For this sample case, we only used a more “general purpose” toxicity score designed to identify “rude, disrespectful, or unreasonable comment[s] that [are] likely to make people leave a discussion.”
Next, we exported each dataset from Communalytic (as a csv file) and used Excel to examine toxicity scores assigned to each reply in the datasets (note: you can perform similar steps in Google Sheets). Using filters in Excel, we determined that only about 9 per cent of all replies in both datasets could be broadly classified as ‘toxic’; that is, they have the toxicity score of 0.8 or higher. The threshold of 0.8 was set based on our previous work with Perspective API. About 50 per cent of toxic replies were directed at the original tweet: 241 (or 50 per cent) were directed at Trump’s tweet and 231 (or 49 per cent) were directed at Biden’s tweet. Table 1 below summaries the results of the toxicity analysis.
Table 1: Frequencies of Toxic Replies in Both Datasets
|Trump Dataset||Biden Dataset|
|# of tweets||%||# of tweets||%|
|Toxic replies in the entire dataset||477||9.54%(n=5000)||469||9.38% (n=5000)|
|Toxic replies to the original poster (% relative to all toxic tweets)||241||50.52%(n=477)||231||49.25%(n=469)|
Step 3: Visualizing Who-Replies-To-Whom Networks
Next, we turned to SNA to examine Twitter users behind some of the most toxic messages. Using Communalytic, we created and exported the “who replies to whom” networks. Detailed steps on how to export the network generated from the collected tweets as a GraphML file can be found here (see steps 3-14). One of the unique features of networks created by Communalytic is that they embed toxicity scores as edge-level attributes, which is especially useful when detecting and examining the most active users in each dataset who also happened to send or receive the highest number of toxic replies.
To facilitate our subsequent network visualization and analysis, we used Gephi, a popular SNA program. Once we imported the GraphML file into Gephi, before doing any additional analyses, we applied a layout algorithm (see Figure 3). A general idea behind a layout algorithm is to display nodes on the screen in a way to minimize the number of overlapping connections and reduce the visual clutter. ‘ForceAtlas 2’ or Frutcherman-Reingold layouts might be good options to try first. For larger networks, we recommend OpenOrd.
Next, to visually investigate what accounts are more actively involved in each of the two networks, and where toxic interactions are occurring, we changed
- node size to represent out-degree centrality (larger nodes represented more active users, those who replied to many other users in the network); to learn more about how to interpret in- and out-degree centrality measures in Twitter networks, see the Method part (Section 3) of this paper;
- edge colour to represent a toxicity score (between 0 and 1) which was automatically assigned to each edge/reply. Here, we used red to represent interactions with toxicity scores closer to 1, and green to represent potentially non-toxic interactions.
Figure 4a and 4b shows how to set these visualization options in Gephi.
The resulting network visualizations are shown in Figure 5. Each node is a Twitter account. Larger nodes represent nodes with a higher out-degree centrality; in other words, they represent users who replied to many other users in the network. Each edge represents a reply. Red edges represent highly toxic interactions. As per Table 1, about 9 per cent of all edges (replies) are considered to be toxic based on Perspective API.
Based on the visual inspection, both networks are very similar structurally, consisting of one large star-shaped component of users that replied to the original tweet and smaller peripheral structures representing replies to replies.
Step 4: Examining the Most ‘Toxic’ Accounts in the Networks
To further investigate toxicity, we can take a closer look at accounts who are involved in toxic interactions. Out-degree centrality scores (described above) can be used to identify accounts that are sending toxic tweets to many other users (i.e., they have a high out-degree centrality), while in-degree centrality scores can be used to identify accounts that receive toxic tweets from many other users (i.e., they have a high in-degree centrality).
To do this, we must first exclude edges that are not considered toxic, according to our toxicity threshold (i.e., edges with a toxicity score < 0.8). This can be done in Gephi using a filter option (see Figure 9a, left). Then, calculate the in-degree and out-degree centrality scores in Gephi based on the filtered network (see Figure 9a, right). When the filter in Gephi is enabled, the centrality measures will be calculated for the visible edges only; in this case, edges with a toxicity score greater than or equal to 0.8 (i.e., likely toxic edges).
Because we applied the filter to hide non-toxic edges, the in-degree centrality score will represent the number of unique users who sent a given user toxic replies (in-degree), and the in-degree centrality score will represent the number of unique users who received toxic replies from another user (out-degree). The key here is unique users; a user with a toxic in-degree centrality score of 50 means that they received toxic messages from 50 different, unique accounts. Similarly, a user with an out-degree score of 10 means that they sent toxic messages to 10 different, unique accounts. With these scores calculated, you can now sort users by in-degree and out-degree scores in the Data Laboratory tab in Gephi (see Figure 6b).
For the purposes of our exploratory analysis, we examined the 10 users with the highest in-degree scores and the 10 users with the highest out-degree scores for each network (see Tables 2a-b and 3a-b). As somewhat expected, both Donald Trump and Joe Biden have the highest toxic in-degree centrality scores in their respective networks since they were the users who received the most toxic messages.
To supplement our analysis of the role of partisanship in driving toxic interaction in both networks, we examined the public profiles and tweets of each user in the top 10 lists in both networks (40 accounts in total). Specifically, we examined each account to categorize them as either (a) pro-Trump, (b) anti-Trump, (c) pro-Biden, or (d) anti-Biden. If the user’s account did not indicate any political affiliation, or if we could not determine their political affiliation (e.g. the account was suspended), they were categorized as “unknown.” Additionally, if the user expressed views that did not acknowledge the seriousness of the COVID-19 pandemic, they were additionally categorized as a “COVID-denier”. Finally, we noted if the user’s account was either deleted by the user, protected, or suspended by Twitter.
Table 2a: Top ten users in the Trump network who sent the highest number of toxic tweets.
|Label||Partisanship||Account Creation Date||Account Tweet Count||Out-Degree Centrality|
|Trump_OutDegree_User3||pro-Biden, account deleted by user||2019-09-11||500-1000||3|
|Trump_OutDegree_User6||unknown, suspended account||2018-01-24||< 100||2|
Table 2b: Top ten users in the Biden network who sent the highest number of toxic tweets.
|Label||Partisanship||Account Creation Date||Account Tweet Count||Out-Degree Centrality|
|Biden_OutDegree_User2||unknown, COVID denier||2020-10-25||100-500||4|
|Biden_OutDegree_User5||pro-Trump, COVID denier||2009-05-16||100-500||3|
|Biden_OutDegree_User6||pro-Trump, COVID denier||2020-07-07||100-500||2|
Table 3a: Top ten users in the Trump network who received the highest number of toxic tweets.
|Label||Partisanship||Account Creation Date||Account Tweet Count||In-Degree Centrality|
|Trump_InDegree_User1||pro-Trump, deleted account||2018-06-24||10K+||9|
|Trump_InDegree_User3||pro-Trump, account suspended||2020-09-20||1-5K||6|
Table 3b: Top ten users in the Biden network who received the highest number of toxic tweets.
|Label||Partisanship||Account Creation Date||Account Tweet Count||In-Degree Centrality|
|Biden_InDegree_User4||pro-Trump – account deleted by user||2020-11-03||100-500||8|
|Biden_InDegree_User7||pro-Trump – account deleted by user.||2012-02-09||5-10K||7|
Tip: Since Communalytic can save tweets as part of a network file, we can view the content of tweets associated with each user or interaction. To do so, right click on any of the nodes in the network visualization and click “Select in Data Laboratory” (Figure 7, Step 1). Next, under the Data Laboratory tab select the Nodes table in the Data Table, and right click on the row corresponding to the selected node (Figure 7, Step 2). Then, click on the “Select related edges” option (Figure 7, Step 3). This will bring you to the Edges Tables, where you can subsequently view tweets from or directed at the selected account. Tweets will be stored under the column called “Text”.
Results from the Exploratory Analysis
By using a combination of text-based toxicity analysis techniques, supplemented with SNA, some interesting trends emerge from the data:
- Biden and Trump supporters are the recipients of toxic tweets in both networks: In our in-degree centrality score tables (Tables 3a and 3b), we can see that each network includes a mix of both pro-Biden and pro-Trump users that are the targets of toxic attacks. Specifically, we see that seven out of 10 users in the Trump network who received the most toxic messages are either pro-Trump (N = 4) or pro-Biden (N = 3). Similarly, eight out of 10 users receiving the most toxic messages in the Biden network are either pro-Biden (N = 4) or pro-Trump (N = 4). This is also seen in the network visualization: in Figure 8, we can see examples, from both networks, of smaller nodes surrounded by many red edges, which indicates an “attack” on that node. The users featured here are also included in Tables 3a and 3b, showing how network visualizations can provide further confirmation of trends you may find using text-based analysis.
Figure 8: Attacks (indicated as red lines) on Biden supporters in 1) the Trump network (first image), and 2) the Biden network (second image), and attacks on Trump supporters in 3) the Trump network (third image) and 4) the Biden network (fourth image).
- In each leader’s network, users that oppose that leader are among the most active toxic users: Users who are pro-Biden/anti-Trump are among the most active toxic users in the Trump network (N = 5), while pro-Trump/anti-Biden users are among the most active toxic users in the Biden network (N = 6). This makes sense since each leader receives the most toxicity in each network (see Table 1), and this unlikely is coming from their supporters. This finding is also consistent with what is generally seen on Twitter; as mentioned, Twitter gives citizens a unique opportunity to interact with politicians. Opposers of political leaders might take to Twitter to raise their concerns, and as we see here, this can turn toxic.
- COVID-deniers were more likely to spread toxicity rather than receive it and are most often also Trump supporters: As mentioned above, we noted accounts that, in addition to their partisanship, also tweeted about the insignificance or their denial, of the COVID-19 virus. This code was added for four accounts in Tables 2 and 3. An example of these users can be seen in Figure 9, from the account of Biden_OutDegree_User 2. As per the tweets included in Figure 9, this user does not grasp the seriousness of COVID-19 and is not very concerned about contracting the virus.
- Unavailable accounts: As per Twitter’s policies, users can be suspended permanently or temporarily for multiple reasons, such as suspected spam, or because of “abusive tweets or behaviour”. Two users in our dataset had suspended accounts: Trump_OutDegree_User6 and Trump_InDegree_User3. We can’t be exactly sure as to why these accounts were suspended, though, we can hypothesize. Trump_InDegree_User3 had a low follower count and was recently created, which suggests that this account might have been a spam bot. On the other hand, it is possible that Trump_OutDegree_User6 could have been posting high volumes of toxic content, seeing as they were a very active user in the network, and this could be what led to their suspension. We also see that three users who received high levels of toxic tweets (i.e., who are included in our top-10 in-degree score tables) have since deleted their accounts. Again, we can’t know exactly why these users did this, but since we do know that they received high levels of toxic messages, we can speculate that they might have deleted their accounts because of this. We further investigate the relationship between toxicity on Twitter and account status in a recent blog post.
By Alyssa Saiphoo and Anatoliy Gruzd