Tutorial 5: Analysis of real complex networks

During this tutorial we will analyze several real-world networks of different kinds. The first kind of network are social networks, represented by

https://iuuk.mff.cuni.cz/~anet/teaching/data/lesmis.gml

(original source http://www-personal.umich.edu/~mejn/netdata/lesmis.zip)

https://iuuk.mff.cuni.cz/~anet/teaching/data/netscience.gml

(original source http://www-personal.umich.edu/~mejn/netdata/netscience.zip)

The second kind of networks will be technical networks where we will have as examples

https://iuuk.mff.cuni.cz/~anet/teaching/data/power.gml

(original source: http://www-personal.umich.edu/~mejn/netdata/power.zip)

https://iuuk.mff.cuni.cz/~anet/teaching/data/air.paj

(original source: http://vlado.fmf.uni-lj.si/pub/networks/data/mix/USAir97.net)

The third kind of networks are biological networks with

https://iuuk.mff.cuni.cz/~anet/teaching/data/celegansneural.gml

(original source: http://www-personal.umich.edu/~mejn/netdata/celegansneural.zip)

The fourth kind of networks are linguistic networks where we have

https://iuuk.mff.cuni.cz/~anet/teaching/data/glossary.paj

(original source: http://vlado.fmf.uni-lj.si/pub/networks/data/DIC/TG/glossTG.htm)

Task 1: Download these networks, import them and list the number of nodes and edges.

Task 2: Consider the networks mentioned above. Discuss the expected values of

How will probably look like the curve giving degree distribution?

Degree distribtion

To visualise degree distribution, we can use plotting by matplotlib. There are several useful ways of plotting, such as histograms by function bar and line plots by function plot.

There are multiple possibilities to set up the view, see https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.bar.html

Task 3: Write a function, which for given graph plots the histogram of its degree distribution.

Les Miserables degree histogram:

lesmis-deg-hist.png

Collaboration degree histogram:

colab-deg-dist.png

Power grid degree histogram:

pow-deg-hist.png

Airline connections degree histogram:

air-deg-hist.png

C. Elegans neural network degree histogram:

neural-deg-hist.png

Glossary network degree histogram:

gloss-deg-hist.png

Scale-free property

A network is considered scale-free if the fraction of vertices $P(k)$ having degree $k$ decays for large values of $k$ similarly as the power law: $$P(k) \approx k^{-\lambda}$$ where $2 < \lambda < 3$. Sometimes we also call this as heavy or fat tailed degree distribution.

To find if the network tends to be scale-free or not, it will be useful to be able to visualize both power law and degree distribution in one image.

Gnuplot

This program is suitable for more advanced manipulation with plots or for work with very large data sets.

Gnuplot is a command line program. It is possible to either write commands directly or write scripts. The latter is recommended especially for plots with more parameters differing from default and for doing similar plots on different data.

A typical script for gnuplot starts with set {parameter} {value} on separate lines and ends by line plot {data source} using {order of column for x-data}:{order of column for y-data} title {title for given data} with {type of plotting}. By adding a comma and other part starting by {data source}, you can plot more graphs into one picture.

Parameters which can be set in the first part are for example

The script can be then simply run using gnuplot {name of the script}.

Task 4: For the networks above, plot the fraction of vertices having degree $k$ for all reasonable $k$'s and compare this plot with the function $k^{-\lambda}$ for $\lambda$ being $2, 2.5$ and $3$.

Output of this function can be given to following gnuplot script.

set terminal pngcairo
set output 'lesmis-collab-deg-dist.png'
set key center under
set title 'Degree distribution (using fraction of vertices)'
set xlabel 'Degree'
set ylabel 'Fraction of vertices'
set yrange [0:0.5]
set xrange [1:25]
plot    'lesmis-deg-frac.dat' using 1:2 title 'Les Miserables network' with lines,  \
        'collab-deg-frac.dat' using 1:2 title 'Collaboration network of network scientists' with lines,  \
        x**(-2) title 'Mean value for P(k) = k^{-2}' with lines, \
        x**(-2.5) title 'Mean value for P(k) = k^{-2.5}' with lines, \
        x**(-3) title 'Mean value for P(k) = k^{-3}' with lines

This and similar two scripts give the following results:

lesmis-collab-deg-dist.png

powgrid-airline-deg-dist.png

neural-gloss-deg-dist.png

Clustering coefficient

We can count the clustering coefficient of vertices in the network by NetworkX function clustering(G), which returns a dictionary of vertices with their clustering coefficient saved as values. More information is in the documentation: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html#networkx.algorithms.cluster.clustering.

You can even count the average clustering of the network G using function average_clustering(G): https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.average_clustering.html#networkx.algorithms.cluster.average_clustering

Task 5: For the above mentioned networks, count the

Task 6: Did the results match your expectations?