Frequently Asked Questions

Communalytic is a computational social science research tool for studying online communities and discourse. It can collect, analyze, and visualize publicly available data from various social media platforms including Reddit, Telegram, Twitter, and Facebook/Instagram (via CrowdTangle), or from your own CSV or JSON files.  

A suite of data analytics modules designed for research

Communalytic contains a suite of data analytics modules including: 1) a Toxicity Analyzer via Google Perspective API, 2) a Sentiment Analyzer via libraries such as VADER (EN), TextBlob (EN, FR, DE) and Dostoevsky (RU), 3) a Bot Analyzer via Botometer API, and 4) a Network Analyzer. These modules can be used to: identify and examine anti-social interactions, assess sentiments in online discourse, detect Twitter bots, identify influencers, map shared interests among online actors by examining what topics or links they shared, study the spread of mis- and dis-information as well as look for signs of possible coordination among seemingly disparate actors. (For more details see: the Tutorials page)

The Network Analyzer module in Communalytic can automatically generate and visualize various types of networks including communication networks, two-mode semantic networks, link-sharing networks and word co-occurrence networks. (For more details see: FAQ – What types of networks can Communalytic generate and visualize?) 

Unique feature – Generate and visualize “signed” networks 

One of the unique features of Communalytic is the ability to generate and visualize so-called “signed” networks via the built-in Network Analyzer module. A signed network is a network with edges that contains additional information such as positive or negative signs or scores (weights). Communalytic builds a signed network by assigning toxicity scores and/or sentiment polarity scores as weights to edges in the network. This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within the network so that they may be examined in more detail.

In addition, if users are working with Twitter data, they also have the option of running the Bot Analyzer and adding a bot probability score as an attribute to the nodes in the network generated by Communalytic. This feature can be used to identify and visually highlight interactions of interest (e.g., Twitter accounts that might be bots) within the network so that they may be examined in more detail.

There are two versions of Communalytic: EDU and PRO.

  • Communalytic EDU is designed to help students learn about social media data analytics.
  • Communalytic PRO is designed for the academic research community and is ideal for large-scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest. 

Each version is hosted on its own dedicated server with its own account creation and sign-in processes. Users of Communalytic can share datasets with other users who are using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).

The Network Analyzer module in Communalytic can automatically generate and visualize various types of networks including communication networks, two-mode semantic networks, link-sharing networks and word co-occurrence networks. For more details see the Tutorials page (Section 8: Network Analysis and Visualization).

Creating signed networks in Communalytic

The Network Analyzer module in Communalytic is unique among network research tools in that it can generate and visualize so-called “signed” networks. A signed network*** is a network with edges that contains additional information such as positive or negative signs or scores (weights). To turn a network into a signed network in Communalytic, users have the option to run a couple of additional analyses (toxicity and/or sentiment) prior to creating a network representation of their dataset. The resulting toxicity scores and/or sentiment polarity scores would be added as weights to edges in the network and visualized for easier exploration and analysis. This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within the network so that they may be examined in more detail.

In addition, if a user is working with Twitter data and completed a Bot detection analysis prior to creating a network representation of their dataset, the resulting bot probability scores would be added as weights to nodes in the network and visualized for easier exploration and analysis. This feature can be used to identify and visually highlight interactions of interest (e.g., Twitter accounts that might be bots) within the network so that they may be examined in more detail.

Types of networks that can be automatically generated by Communalytic

  • Reply-To Network: Account-to-Account (Reddit, Twitter, Telegram groups) 
    • This communication network shows who replied to whom. 
  • Retweet Network: Account-to-Account (Twitter only)
    • This communication network shows who retweeted whom.
  • Two-Mode Semantic Network*: Account-to-Named Entity (Reddit, Twitter, CrowdTangle, Telegram channels & groups)
    • This semantic network shows which account mentioned what ‘named entity’**. 
    • The named entity detection is based on an advanced Natural Language processing library called spaCy.  
  • Two-Mode Semantic Network: Account-to-Named Entity (Twitter only) 
    • This semantic network shows which account mentioned what ‘named entity’ and is based on Twitter’s automated annotation. This approach is faster than using spaCy, but may miss some named entities. 
  • Two-Mode Link Sharing Network: Account-to-Website (Reddit, Twitter, CrowdTangle, Telegram channels & groups) 
    • This ‘link sharing’ network shows which accounts in your dataset shared a link to the same website(s). 
  • Word co-occurrence network: Named Entity-to-Named Entity (Reddit, Twitter, CrowdTangle, Telegram channels & groups)
    • This semantic network connects two or more ‘named entities’ mentioned in the same post(s).
    • The named entity detection is based on an advanced Natural Language processing library called spaCy

Definitions

* A two-mode semantic network is a graph that connects two types of nodes, where one of the node types represents social actors (accounts) and the other node types represents semantic concepts (operationalized in Communalytic as named entities). A connection from a social actor to a semantic concept in such a network usually implies some form of endorsement, association or affiliation between the two nodes. The exact interpretation of social actors, semantic concepts and network connections will depend on the available data (including any metadata) and research questions that the researcher would like to answer.

** Named entities can be people, organizations, locations, products, etc. as detected by an advanced Natural Language Processing library called spaCy

*** A signed network is a network with edges that contains additional information such as positive or negative signs or scores (weights). 

Communalytic automatically generates the following types of summary charts for each of your dataset. Each chart can be downloaded as a PNG image or as a CSV data file. Communalytic also offers an easy import option to explore and customize most of the summary charts in a popular visualization tool for structured data called Plotly Chart Studio.

  • Posts Per Day Chart 
    • This chart shows the number of posts per day over time.
  • Word Cloud Chart
    • This chart shows the 100 most frequently used words based on the full dataset. It excludes numbers, URLs, and stop words in 15 different languages.
  • Emoji Cloud Chart
    • This chart shows the 100 most frequently used emojis based on the full dataset.
  • Top 10 Posters
    • This chart shows the Top 10 posters in your dataset.

EDU Version

Free (Req. an academic email address)

All Communalytic EDU accounts can collect and store up to 30K records shared across 3 datasets and have the following platform-specific data usage caps.

Reddit – Communalytic EDU can collect posts, including submissions, comments and replies to comments from any given public subreddit for up to 31 consecutive days. The 31-day period can be for any 31 days in the past. You can repeat this process till you have the data you need for the entirety of the period you wish to study. To use this collector, you do not need to apply for a separate Reddit API key as Communalytic uses public pushshift.io Reddit API created by the /r/datasets mod team.

  • Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
  • Note 2: Communalytic EDU version does not collect posts from subreddits with 10 million or more subscribers like r/askReddit. If you want to collect data from subreddits with 10 million or more subscribers, please check out Communalytic PRO.

Telegram – Communalytic EDU can collect messages (≤ 30k) from any public Telegram channel, group or super group. To use this collector, you will need to apply for a Telegram Developer Account.

Twitter Thread – Communalytic EDU can collect the most recent public replies (30K) to any public tweet posted within the previous 7 days. This data collection feature is ideal for studying recent tweets that have attracted a high level of engagement. (For example, a tweet from politicians, celebrities, news outlets, etc.) To use this collector, you will need to apply for a Twitter Developer Account.

Twitter Academic Research Track  N/A. This data collection feature is only available in Communalytic PRO.

Facebook/Instagram (via CrowdTangle) Communalytic EDU can collect posts (30K) from public Facebook/Instagram account, groups or pages that shared the same URLs (ex. a URL to a single NYT story or the URL to any other domain name). To use this collector, you will need to apply for academic access to Meta’s CrowdTangle platform.

  • Note 1: CrowdTangle data is not exhaustive; it only tracks public posts made by “influential” accounts. Here’s more info about the types of Facebook/Instagram accounts, pages and groups indexed by CrowdTangle.

Data/API access is granted solely at the discretion of the respective social media platform. You will need to apply directly to the platform(s) of your choice for API access. 

No, you cannot use Communalytic EDU to collect data that is private such as DMs or posts from accounts that are set to private.

The developers of Communalytic EDU are proponents of ethical computational social science research in the public interest. All data access in Communalytic EDU is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution. 

As a primer, please review Ethical Decision-Making and Internet Research” published by the Association of Internet Researchers (AOIR).

Yes, you can run multiple data collectors simultaneously within Communalytic EDU (Concurrently collect 1 Reddit, 1 Telegram, 1 Twitter and 1 CrowdTangle).

You can collect and store ≤ 30K records shared across ≤3 datasets at any time in your Communalytic EDU account (i.e., per account, you can have 1 dataset with ≤ 30K records or up to 3 datasets with a variable number of records not exceeding 30K records in total).

If you’re at your account limit, you can download your previously collected datasets to free up space.

Alternatively, if your need is more robust, consider upgrading to Communalytic PRO where you can collect and store ≤ 10M records shared across ≤ 50 datasets.

Yes, you can download your datasets as a CSV file. In addition, you can also download the resulting communication or semantic network files as a GraphML file. 

Yes, you can upload/import an existing dataset (in CSV format) into Communalytic EDU for analysis. Subject only to the EDU data cap of 30K records shared across 3 datasets.

Users of Communalytic can share datasets with other users who are using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO. 

  • You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
  • You can accept shared datasets from a collaborator from within Communalytic under the ‘Shared with Me’ tab. (Look for a jingling red bell.)

Yes, you can move datasets from the EDU version to the PRO version. Start by downloading your dataset as a CSV file from Communalytic EDU and then upload the file to your Communalytic PRO account.

We’ll keep your datasets on our server for 100 days from the end of your collection date. 

You will receive a notification 3 weeks before the expiration date and 3 days before your dataset is automatically deleted from our system.

If you are using Communalytic in an academic publication, please cite us as: 

  • Gruzd, A., & Mai, P. (2022). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.com

Note: For information on how to properly describe Communalytic EDU data collection processes, see the FAQ section on “What are the parameters for data collection?”

PRO Version

Paid ($349.00) for a 6-month subscription to support site infrastructure such as server-side data collection, processing and analysis using a dedicated server, and allocation of extra data storage capacity  

All Communalytic PRO accounts can collect and store up to 10M records shared across ≤ 50 datasets and have the following platform-specific data usage caps.

Reddit – Communalytic PRO can collect posts, including submissions, comments and replies to comments from a given public subreddit for up to 31 consecutive days. The 31-day period can be for any 31 days in the past. You can repeat this process till you have the data you need for the entirety of the period you wish to study. Once you have data for the entirety of the period you wish to study, you can download and combine the resulting CSV files (subject only to the PRO ≤ 10M records cap) and upload the new file for analysis. To use this collector, you do not need to apply for a separate Reddit API key as Communalytic uses public pushshift.io Reddit API created by the /r/datasets mod team.

  • Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
  • Note 2: Communalytic PRO will try to collect any new submissions within the specified data collection period; however, some posts in “high volume” groups (such as r/all) may be dropped due to the Reddit API limitations.

Telegram – Communalytic PRO can collect messages (≤ 10M) from any public Telegram channel, group or super group. To use this collector, you will need to apply for a Telegram Developer Account.

Twitter Thread – Communalytic PRO can collect the most recent public replies (500K) to any public tweet posted within the previous 7 days. This data collection feature is ideal for studying recent tweets that have attracted a high level of engagement. (For example, viral tweets from politicians, celebrities, news outlets, etc…). To use this collector, you will need to apply for a Twitter Developer Account.

  • Note 1: All standard Twitter Developer Accounts come with a monthly tweet cap usage of 500K posts as indicated by your Twitter Developer Dashboard.
  • Note 2: If you are a qualified academic researcher with access to a Twitter Academic Research Track accountyou will be able to collect historical tweets and replies to tweets that are still publicly available and you will not be subjected to the 500K monthly tweet cap limit, nor to only tweets posted within the previous 7 days. 

Twitter Academic Research Track – Communalytic PRO can collect tweets (≤ 10M/mo.) including replies and retweets via Twitter’s full-archive (historical) search. To use this collector, you will need to apply for a Twitter Academic Track account available only to qualified academic researchers.)

Facebook/Instagram (via CrowdTangle) – Communalytic PRO can collect public Facebook/Instagram posts (≤ 10M) that shared the same URL (ex. a URL to a single NYT story or the URL to any other domain name). To use this collector, you will need to apply for academic access to Meta’s CrowdTangle platform. 

  • Note 1: CrowdTangle data is not exhaustive, it only tracks public posts made by “influential” accounts. Here’s more info about the types of Facebook/Instagram accounts, pages and groups indexed by CrowdTangle.

Data/API access is granted solely at the discretion of the respective social media platform. You will need to apply directly to the platform(s) of your choice for API access.

No, you cannot use Communalytic PRO to collect data that is private such as DMs or for accounts that are set to private.

The developers of Communalytic PRO are proponents of ethical computational social science research in the public interest. All data access in Communalytic PRO is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution.

As a primer, please review “Ethical Decision-Making and Internet Research” published by the Association of Internet Researchers (AOIR).

Yes, you can run multiple data collectors simultaneously within Communalytic PRO (Concurrently collect 1 Reddit, 1 Telegram, 1 Twitter and 1 CrowdTangle).

You collect and store  10M records shared across  50 datasets at any time in your Communalytic PRO account (i.e., per account, you can have 1 dataset with  10M records or up to 50 datasets with a variable number of records not exceeding 10M records in total).

If you’re at your account limit, you can download your previously collected datasets to free up space.

Alternatively, if you know that you are likely to exceed either the 50-dataset cap or the 10M-record cap per account, you have the option to create a second PRO account using a different email address.

Yes, you can download your datasets as a CSV file. 

You can also download the resulting communication or semantic network files as a GraphML file. 

Yes, you can upload/import an existing dataset (in CSV format) into Communalytic PRO for analysis. Subject only to the PRO data cap of 10M records shared across ≤50 datasets.

(NEW!) You can now also upload/import an existing Twitter or Telegram dataset from multiple JSON files.

Users of Communalytic can share datasets with other users who are using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO. 

  • You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
  • You can accept shared datasets from a collaborator from within Communalytic under the ‘Shared with Me’ tab. (Look for an animate red bell.)

Yes, you can move datasets from the PRO to the EDU version. However, please note that due to the EDU low data cap, this ability is limited to datasets with ≤ 30K records.

We’ll keep your datasets on our server as long as your PRO account has not expired. You can extend your PRO account at any time for another 6 months via the My Profile menu within Communalytic PRO. 

You will receive a notification 7 days before your account’s expiration date. After your account has expired, you will have 14 days to upgrade it before your account and datasets are automatically removed from our system.

If you are using Communalytic in an academic publication, please cite us as:

  • Gruzd, A., & Mai, P. (2022). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.com

Note: For information on how to properly describe Communalytic PRO data collection processes, see the FAQ section on “What are the parameters for data collection?”