Parameters For Data Collection with @Communalytic: A Summary of Platform-specific Data Usage Caps for Communalytic Edu and Pro

By Communalytic

Communalytic is a research tool for studying online communities and online discourse. Communalytic can collect and analyze public data from various social media platforms including Reddit, Twitter, and Facebook/Instagram (via CrowdTangle). It uses advanced text and social network analysis techniques to automatically pinpoint toxic and anti-social interactions, identify influencers, map shared interests and the spread of misinformation, and detect signs of possible coordination among seemingly disparate actors. 

There are two versions of Communalytic:

  • Communlaytic Edu is designed for educators and students to teach and learn about social media data analytics and social network analysis.
  • Communalytic Pro is designed for the academic research community and is ideal for large scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest.

All Communalytic Edu accounts can store up to 30K records across 3 datasets and have the following platform-specific data usage caps.

Subreddit (Live Data Collector) – Communalytic Edu can collect up to 100 most recent submissions (=thread starting posts) and any new submissions (including the corresponding comments and replies to comments) from a given public subreddit for up to 7 consecutive days going forward (aka – Live), starting from the date when you initiated the data collection. To use this collector, you do not need to apply for a separate Reddit API key as Communalytic Edu is using a site-wide API key at this time.

  • Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If  a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
  • Note 2: Communalytic Edu will try to collect any new submissions within the specified data collection period; however, some posts in “high volume” groups (such as r/all) may be dropped due to the reddit API limitation.
  • Note 3: Communalytic Edu version does not collect posts from subreddits with 10 million or more subscribers like r/askReddit. If you need to collect data from subreddits with 10 million or more subscribers, please check out Communalytic Pro.

Twitter Thread – Communalytic Edu can collect the most recent public replies (up to 30K) to any public tweet posted within the previous 7 days. This data collection feature is ideal for studying recent tweets that have attracted a high level of engagement. (For example, a tweet from politicians, celebrities, news outlets, etc…). To use this collector, you will need to apply for a Twitter Developer account.

Twitter Academic Research Track – N/A. This data collection feature is only available in Communalytic Pro.

Facebook/Instagram (via CrowdTangle) Communalytic Edu can collect public Facebook/Instagram posts (up to 30K) that shared the same URL (ex. a URL to a single NYT story or the URL to any domain name). To use this collector, you will need to apply for academic access to Facebook’s CrowdTangle platform.

All Communalytic Pro accounts can store up to 10M records across 50 datasets and have the following platform-specific data usage caps.

Subreddit (Historical and Live Data Collectors) – Communalytic Pro can collect available posts (including submissions, comments and replies to comments) from a given public subreddit for up to 31 consecutive days. The 31-day period can be for any 31 days in the past (aka – Historical) or 31 days going forward starting from the date when you initiated the data collection (aka – Live). You can repeat this process till you have the data you need for the entirety of the period you wish to study. You can also download and combine the resulting CSV files and upload the new file for analysis. To use this collector, you do not need to apply for a separate Reddit API key as Communalytic Pro is using a site-wide API key at this time.

  • Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If  a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
  • Note 2: Communalytic Pro will try to collect any new submissions within the specified data collection period; however some posts in “high volume” groups (such as r/all) may be dropped due to the reddit API limitation.

Twitter Thread – Communalytic Pro can collect the most recent public replies (up to 500K) to any public tweet posted within the previous 7 days. This data collection feature is ideal for studying recent tweets that have attracted a high level of engagement. (For example, a tweet from politicians, celebrities, news outlets, etc…). To use this collector, you will need to apply for a Twitter Developer Account.

Twitter Academic Research Track – Communalytic Pro can collect up to 10M tweets per month via Twitter’s full-archive (historical) search. To use this collector, you will need to apply for a Twitter Academic Track account (available only to qualified academic researchers.)

Facebook/Instagram (via CrowdTangle) Communalytic Pro can collect public Facebook/Instagram posts that shared the same URL (ex. a URL to a single NYT story or the URL to any domain name). To use this collector, you will need to apply for. To access this API, you will need to apply for academic access to Facebook’s CrowdTangle platform