Communalytic can collect public posts from a given subreddit (including submissions, comments and replies) for up to 7 consecutive days in the Edu version and up to 31 consecutive days in the Pro version. Note: The collection of live posts is available in the Pro version only. The Edu version can only collect historical data.
What’s a Subreddit you may ask? Subreddits are online groups/forums on Reddit dedicated to a specific topic(s). If this is your first time working with Reddit data, we suggest you watch The Beginner’s Guide to Reddit from Mashable or a bit longer introductory video about Reddit from Teknikforce.
Technical Details: Communalytic uses Reddit’s public API to collect data via the PRAW library. As outlined in the table below, Communalytic starts by retrieving 100 most recent submissions in a given subreddit (Stage 1). The collection then continues by retrieving new submissions until the specified end date at 00:01 UTC (Stage 2). Both Stage 1 and 2 rely on PRAW’s SubredditStream to collect information about submissions. In the last stage (Stage 3), Communalytic goes over all of the submissions collected during Stage 2 & 3 and retrieves comments and replies via PRAW’s Submission.comments call.
Three Stages of Live Data Collection from Reddit
Stage 1: Collect 100 most recent submissions
Communalytic starts by retrieving 100 most recent submissions (=thread starting posts), even if they are posted before the current date.
Stage 2: Collect new submissions until the end date/time (UTC)
The collection continues until the specified end date and time (UTC).
Please note some submissions in “high volume” subreddits such as r/all may be missed due to the API limitation.
Stage 3: Collect comments and replies
During the final stage, Communalytic attempts to retrieve all comments and replies to comments corresponding to the submissions that have been collected during Stage 1 and 2.
Please note that any comments or replies that have been deleted will not be collected during this stage.
The following steps show how to collect data from Reddit using Communalytic. The procedure for the EDU and PRO versions are similar. The main difference is that the Pro version can collect both live and historical posts from a given subreddit, while the Edu version can only collect historical data.
Go to the “My Datasets” page and click on the “Reddit (Live)” button.
If you know what subreddit you would like to examine, proceed to Step 4 of this tutorial. Otherwise, click on the “Locate a subreddit” button.
Using the Subreddit Search page, you can locate subreddits that discuss a given topic by using the “Keyword” search bar.
When searching for a subreddit, a space between words will be counted as AND. If you would like to search for two keywords separately, use “|” to separate keywords.
After typing in your search keyword(s), click the “Search” button.
The Subreddit List page shows a list of public subreddits (with at least 100 comments made in the last 7 days) and sample posts corresponding to the search criteria.
Click “Start Collection on…” (corresponding subreddit) to select the designed subreddit.
Before starting your data collection, name your dataset, then enter the name of the selected subreddi, and the end date of data collection. You can collect data for up to 7 consecutive days in the Edu version and up to 31 consecutive days in the Pro version from the current date.
You can check the box “Email me once job completes” to receive an email notification. (Note: the Pro version doesn’t have this checkbox, since it will send an email notification automatically.)
As a final step on this page, click the “Start Collection” button.
Data collection time will vary by subreddits. Subreddits with more comments and replies may take up to several hours after the end date to collect.
To confirm that data collection is underway, you should be able to see your new dataset listed on the “My Datasets” page.
When your data collection is complete, it will say “Complete” under Status.
The table below shows data points available in the dataset, as provided by Reddit API:
|Field||Description||Sample Submission||Sample Comment||Sample Reply|
|id||Unique identifier for the post||q6x0lw||hgf14zp||hgkadlb|
|date||The date when the post was created/updated||10/12/2021||10/12/2021||10/13/2021|
|author||Poster’s unique username||916farmer||_AskMyMom_||None|
|title||Submission title||Saw this genius on the road today. Wouldn’t it be a shame if their email got overwhelmed with vax fax.|
|text||The main body of the post||Wait wait wait. So they don’t want a vaccine card or requirements because they don’t want the government “tracking them”: but will plaster personal information on their car windows?|
I mean, I watched Donald Duck and Bugs Bunny use reverse psychology on each other, is there any way we can do that with these guys?
|comment_on||Unique identifier of the parent post; Note: only available for Comment- and Reply-type posts||q6x0lw||hgexsg3|
|type||Post type; Possible values are: Submission = a thread starting post, Comment = a reply to a submission, Reply = a reply to a comment or to another reply||Submission||Comment||Reply|
|score||The overall engagement score assigned to the post based on the total number of up & down votes||119||34||-2|
|upvote_ratio||The ratio of upvotes out of all votes received by the post; Note: only provided for Submission-type posts||0.94|
|url||URL shared in the submission if applicable; Note: only provided for Submission-type posts||https://i.redd.it/sw3evc3sf3t71.jpg|
|permalink||A persistent URL to the post||https://www.reddit.com/r/…||https://www.reddit.com/r/…||https://www.reddit.com/r/…|
|user_link_karma||User’s link-based karma score||816||24484||1|
|user_comment_karma||User’s comment-based karma score||238||65315||0|
|user_flair||User’s subreddit-specific “flair” (tag or category); Note: Many subreddits/users don’t use this feature. Also in some subreddits, only their moderators can assign a flair to a user/post.||None||None||None|
|submission_flair||“Flair” (tag or category) assigned to the submission Note: In some subreddits, only their moderators can assign a flair to a post.||None|