During this tutorial we will analyze several real-world networks of different kinds. The first kind of network are social networks, represented by
Les Miserables
:http://www-personal.umich.edu/~mejn/netdata/lesmis.zip
http://www-personal.umich.edu/~mejn/netdata/netscience.zip
The second kind of networks will be technical networks where we will have as examples
https://iuuk.mff.cuni.cz/~anet/teaching/22/grafy-a-site/data/power.gml
(original source: http://www-personal.umich.edu/~mejn/netdata/power.zip)
http://vlado.fmf.uni-lj.si/pub/networks/data/mix/USAir97.net
The third kind of networks are biological networks with
https://iuuk.mff.cuni.cz/~anet/teaching/22/grafy-a-site/data/celegansneural.gml
(original source: http://www-personal.umich.edu/~mejn/netdata/celegansneural.zip)
The fourth kind of networks are linguistic networks where we have
http://vlado.fmf.uni-lj.si/pub/networks/data/DIC/TG/glossTG.htm
Task 1: Download these networks, import them and list the number of nodes and edges.
Task 2: Consider the networks mentioned above. Discuss the expected values of
How will probably look like the curve giving degree distribution?
To visualise degree distribution, we can use plotting by matplotlib
. There are several useful ways of plotting, such as histograms by function bar
and line plots by function plot
.
import matplotlib.pyplot as plt
x = range(-10,11)
y = [i**2 for i in x]
plt.bar(x, y)
plt.title("Parabola")
plt.ylabel("y")
plt.xlabel("x")
Text(0.5, 0, 'x')
There are multiple possibilities to set up the view, see https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.bar.html
Task 3: Write a function, which for given graph plots the histogram of its degree distribution.
A network is considered scale-free if the fraction of vertices $P(k)$ having degree $k$ decays for large values of $k$ similarly as the power law: $$P(k) \approx k^{-\lambda}$$ where $2 < \lambda < 3$. Sometimes we also call this as heavy or fat tailed degree distribution.
To find if the network tends to be scale-free or not, it will be useful to be able to visualize both power law and degree distribution in one image.
This program is suitable for more advanced manipulation with plots or for work with very large data sets.
Gnuplot
is a command line program. It is possible to either write commands directly or write scripts. The latter is recommended especially for plots with more parameters differing from default and for doing similar plots on different data.
A typical script for gnuplot
starts with set {parameter} {value}
on separate lines and ends by line plot {data source} using {order of column for x-data}:{order of column for y-data} title {title for given data} with {type of plotting}
.
By adding a comma and other part starting by {data source}
, you can plot more graphs into one picture.
Parameters which can be set in the first part are for example
[from:to]
The script can be then simply run using gnuplot {name of the script}
.
Task 4: For the networks above, plot the fraction of vertices having degree $k$ for all reasonable $k$'s and compare this plot with the function $k^{-\lambda}$ for $\lambda$ being $2, 2.5$ and $3$.
We can count the clustering coefficient of vertices in the network by NetworkX function clustering(G)
, which returns a dictionary of vertices with their clustering coefficient saved as values. More information is in the documentation:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html#networkx.algorithms.cluster.clustering.
You can even count the average clustering of the network G
using function average_clustering(G)
:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.average_clustering.html#networkx.algorithms.cluster.average_clustering
Task 5: For the above mentioned networks, count the
Task 6: Did the results match your expectations?