Introduction
This is a very basic introduction to Gephi. It begins with the assumption of no prior knowledge and explains how you can import the network dataset you receive from ARCH to perform basic transformations on it yourself.
This Introduction was written in April 2022 with Gephi 0.9.5.
Table of contents:
- Importing your data
- Setting up the dates in your data
- Adding some labels
- Basic graph layouts
- Applying more algorithms
Importing your data
This tutorial explores what you can learn from the ARCH dataset file marked as "Domain Graph." To create this dataset, visit the "Generate Datasets" page under the Network category.
- First, download and install the Gephi platform, available through the Gephi home page: https://gephi.org/.
- Open Gephi and start a “new project”. Then, under the “File” menu, select “Import Spreadsheet.” Once you select “domain-graph.csv,” the data will be ready for import as illustrated below. Remember, if you are on MacOS, you may need to use The UnArchiver to open the compressed file downloaded from ARCH.
- Make sure you are importing it as an edges table and that the separator is correctly set to comma.
- On the next page, select “intervals” and leave the remainder of the information set to its default settings. Click “finish” and you will see an overview of your graph. It will find “Parallel edges” if you have different dates in your crawl. Select “Don’t merge” as we will resolve those later; you may need to click “more options…” to reveal additional checkboxes and options. Click “OK”.
Setting up the dates in your data
Next, we need to ensure Gephi can recognize the dates found in this file, so you can dynamically explore the web graph.
- First, click the “Data Laboratory” tab, then click “Edges.” Look at the arrows in the screenshot below for further direction:
- Now click “Merge Columns” located at the bottom of your screen. You want to merge “Interval” with “crawl_date” and then “Create time interval.”
- You then want to parse the dates as yyyyMMddHHmmss. Use “crawl date” as your start and end time columns.
- You will now see a timeline at the bottom of the screen that you can click to enable. If you do not, however, open the timeline by going to the Window menu and selecting “Timeline.”
- Click on “time options” (which appears as a cog icon) in the lower left, and then select “select time format” and select “datetime.”
Congratulations, you now have a dynamic graph!
Adding some labels
Still in the Data Laboratory tab, let's do a similar transformation to migrate the domain names over to each node:
- To begin, click on “Nodes” at the top of the spreadsheet.
- Click on “Merge Columns” and we will copy the “ID” data over to “Label” as well so Gephi knows we might want to use this in our visualization.
- Click “Copy data to other column” and select “Id.” Copy it to “Label.” The spreadsheet should then look like this:
You are now ready to begin the process of laying your network out!
Basic graph layouts
The following basic layout is now available in the Overview tab; however, the graph isn't too useful. Let's begin by creating a new layout, with steps highlighted below:
Select the layout tab located in the left pane, and select "Yifan Hu Proportional." While we will leave the default values, you can begin to play with the figures and experiment. To apply the layout to the graph, click the "run" button.
The following image shows what this looks like after clicking "run" on the default visualization.
Let's add some labels to see the graph develop in a more meaningful way. Click on the "T" button below the graph, identified in the image below and you will see lots of labels populated on the network graph. Our next goal will be to make this visualization more readable.
The next step is to resize the nodes (domains) based on a characteristic. In this case, let's make them bigger based on how many times each node is linked to others in the diagram. This is called "in-degree" in Gephi and a common measure within network analysis literature.
This type of alteration can sometimes be challenging to find within the Gephi interface! In the "Appearance" window located in the left pane, click on the "size" icon, select "ranking," and then select "In-Degree" with a min size of 3 and a max size of 40. Then click "Apply."
Use the screenshot below to reproduce the results.
Let's replicate these steps to adjust label size: in this case, the bigger the label, the more freuently it is linked to; the smaller the label, the less it is linked to. To do this, click on the "text size" icon, select "Ranking," and then select "In-Degree." Again, select a minimum size of 0.1 and a max size of 3. Use the image below to practice these steps.
Some of the labels now overlap, so let's run another simple "layout." This time, we select "Label Adjust" and press run.
We now have a decently laid out network!
Applying more algorithms
Now let's run an algorithm to learn more about our graph. There are a lot of options, so we will just show one here. We'll run a rudimentary community detection algorithm located in the "statistics" section on the right-hand side. Click the "run" button next to modularity, and click through the next report. The two following screenshots show you where to look.
Our final step is to apply the modularity categories to the graph. We will use color to denote the communities nodes appear in.
To do so, go back to appearance. This time click the painter's palette, select "Partition," and then apply "Modularity Class." As before, use the screenshots to recreate this scenario.
At the end of this lesson, your graph should be looking similar to this:
Congratulations! You now have a nicely-laid out graph. Now, try experimenting with other features in Gephi.
There are several community-based and video tutorials available from https://gephi.org/users/.
Comments
0 comments
Please sign in to leave a comment.