This tutorial will describe how to use Social Network Analysis (SNA) to explore your CrowdTangle (Facebook or Instagram) dataset. Unlike other data sources available in Communalytic such as Twitter or Reddit, information about who replies to whom is not provided by the Crowdtangle API. As a result, you will not be able to create a communication network with CrowdTangle data; however, Communalytic can create another type of network commonly referred to as a Two-mode Semantic Network.
1. What is a Two-mode Semantic Network?
As the name suggests, a Two-mode Semantic Network is a graph that describes connections between two types of nodes, where one of the node types represents Social Actors and the other node types represents Semantic Concepts. A connection from Social Actor to Semantic Concept in this network usually implies some form of endorsement, association or affiliation between the two nodes. The exact interpretation of social actors, semantic concepts and network connections will depend on the available data (including any metadata) and research questions that you would like to answer.
While there are many papers that examined one-mode semantic networks (semantic networks that represent connections between semantic concepts only), Two-mode Semantic Networks are not common in the literature largely due to the lack of analytical tools to create and analyze them (see a related discussion on this topic in Yang & González-Bailón, 2018). Communalytic is one of the few analytical tools available that can automatically create Two-mode Semantic Networks.
One of the advantages of a Two-mode Semantic Network over a one-mode semantic network is that the underlying social, communication and/or information structures are partially preserved by incorporating social actors as part of the network and linking them to related semantic concepts based on criteria defined by your research questions and methodology.
2. How Communalytic builds a Two-Mode Semantic Network
Communalytic creates a Two-mode Semantic Network by automatically identifying and connecting two different types of nodes:
- Social Actors, further referred to as Actor Nodes, represent either one of two things:
– Facebook pages and groups if your dataset contains data from Facebook
– Instagram accounts if your dataset contains data from Instagram
- Semantic Concepts, further referred to as Semantic Nodes, represent one of eleven possible ‘named entities’ mentioned in posts from your dataset.
– ‘Named entities’ can be people, organizations, locations, products, etc. (See the Figure below for a list of all eleven name entities.)
In a Two-mode Semantic Network, as created by Communalytic, a connection from an Actor Node (a Facebook page/group or Instagram account) to a Semantic Node (a named entity) means that either a Facebook page/group or an Instagram account mentioned this Semantic Node (a named entity) in their post. (See the Figure above)
To better understand how Communalytic builds Two-mode Semantic Networks, consider the following example based on a single post by CNN Politics:
- During the first step in the process called Node Discovery, Communalytic will recognize “CNN Politics”, a Facebook page, as an Actor Node (based on the metadata collected from CrowdTanlge) and will designate it as such in the network. Next, it will use natural language processing (NLP) to examine the content of the post shared by CNN Politics and locate any named entities and designate them as Semantic Nodes. In this example, there will be two semantic nodes: “Donald Trump” and “Covid-19 pandemic”.
- During the second step in the process called Edge Discovery, Communalytic will connect the Actor Node to the Semantic Nodes discovered during the previous Node Discovery process.
The figure below summarizes this network discovery process based on a single post. In a real dataset, when there are more than one post, Step 1 and 2 are automatically repeated for each post in the dataset.
This type of representation allows researchers to determine if the same set of accounts, pages, or groups tend to discuss the same topics, people, or organizations, which in turn can be used for detecting shared interests and possibly coordination (a.k.a. coordinated inauthentic behavior) among seemingly disparate accounts on platforms like Facebook and Instagram.
3. How Named Entities Are Identified
Communalytic uses a natural language processing library called spaCy to detect different types of named entities. To determine which language-specific model from spaCy to use, Communalytic reiles on another library called LangDetect to determine in which language a post was written in. If a post is in English, Japanese or Chinese, Communalytic will use a corresponding language-specific model available in spaCy, otherwise it will default to the “multiple-language” model.
The table below shows how accurate different named entity detection models are depending on the language selected. The accuracy is shown here using two commonly used metrics: precision and recall. Precision denotes a ratio of how many correct named entities (true positives) were identified relative to the total number of all extracted named entities (higher values are better). However, since spaCy models may miss some named entities, we also need to look at recall, which denotes a ratio of how many correct named entities (true positives) were identified relatively to the total number of named entities present in the message, including those not found by spaCy (higher values are better). Overall, based on results shown in the table below, spaCy is excellent at name entity detection for English and multi-language models, while Japanese and Chinese models are trailing somewhat.
Note: If you plan to use a Two-mode Semantic Network created by Communalytic in your future publication, these are important numbers to include in your submission as peer reviewers might want to see them to determine how accurately your resulting network represents the semantic space of your dataset. And while the named entity process is not 100% accurate, the next tutorial on how to use Gephi to visualize and analyze this type of networks will teach how you can manually review the named entities discovered by spaCy and remove false positive results; thus further improving the overall accuracy of the results.
|Language models||Precision [0.00-1.00]||Recall [0.00-1.00]|
4. Network Export Screen in Communalytic
There are two pathways in Communalytic to launch the process of building this type of networks from your dataset:
- Pathway 1: Go to the [My Datasets] main menu in Communalytic and click on the button under the [Export Network] column in the row corresponding to your dataset.
- Pathway 2: Click on the name of any dataset listed under the [Dataset Name] tab to get an overview/stats about your new dataset and then select [Export Network] in the left panel (it’s the last option).
Either of the two options will take you to the [Export Network] page. Once there, click on the [Generate Semantic Network] button.
Depending on the size of your dataset, the network discovery process may take from a few minutes to a few hours. You do not need to keep your browser open. Communalytic will send an email once the process has been completed. Alternatively, you can visit the [Export Network] page and click the [Refresh Progress] button to see if the network is ready.
Once the Two-mode Semantic Network is created in Communalytic, go to the [Export Network] page. You will see a network visualization of your dataset and the [Download Network] button right below it.
The download option will provide you with a ZIP file of your network file in the GraphML format. GraphML is a common format to store network data and it’s supported by most programs designed for Social Network Analysis (SNA). For example, you can open your GraphML file using a popular SNA program called Gephi.
The next tutorial will show you how to use Gephi to visualize and explore your newly exported Communalytic’s Two-mode Semantic Network.