During this tutorial we will analyze several real-world networks of different kinds. The first kind of network are social networks, represented by
Les Miserables
:https://iuuk.mff.cuni.cz/~anet/teaching/data/lesmis.gml
(original source http://www-personal.umich.edu/~mejn/netdata/lesmis.zip)
https://iuuk.mff.cuni.cz/~anet/teaching/data/netscience.gml
(original source http://www-personal.umich.edu/~mejn/netdata/netscience.zip)
The second kind of networks will be technical networks where we will have as examples
https://iuuk.mff.cuni.cz/~anet/teaching/data/power.gml
(original source: http://www-personal.umich.edu/~mejn/netdata/power.zip)
https://iuuk.mff.cuni.cz/~anet/teaching/data/air.paj
(original source: http://vlado.fmf.uni-lj.si/pub/networks/data/mix/USAir97.net)
The third kind of networks are biological networks with
https://iuuk.mff.cuni.cz/~anet/teaching/data/celegansneural.gml
(original source: http://www-personal.umich.edu/~mejn/netdata/celegansneural.zip)
The fourth kind of networks are linguistic networks where we have
https://iuuk.mff.cuni.cz/~anet/teaching/data/glossary.paj
(original source: http://vlado.fmf.uni-lj.si/pub/networks/data/DIC/TG/glossTG.htm)
Task 1: Download these networks, import them and list the number of nodes and edges.
import networkx as nx
def print_info(G, desc):
print("{}, nodes: {}, edges: {}".format(desc, len(G), len(G.edges())))
return
lesmis = nx.read_gml('data/lesmis.gml')
print_info(lesmis, 'Les Miserables')
collab = nx.read_gml('data/netscience.gml')
print_info(collab, "Collaboration of network scientists")
powgrid = nx.read_gml('data/power.gml')
print_info(powgrid, "Power grid")
airline = nx.read_pajek('data/air.paj')
print_info(airline, "Air lines in US")
neural = nx.read_gml('data/celegansneural.gml')
print_info(neural, "C. Elegans neural network")
gloss = nx.read_pajek('data/glossary.paj')
print_info(gloss, "Glossary of Graphs and Digraphs")
Les Miserables, nodes: 77, edges: 254 Collaboration of network scientists, nodes: 1589, edges: 2742 Power grid, nodes: 4941, edges: 6594 Air lines in US, nodes: 332, edges: 2126 C. Elegans neural network, nodes: 297, edges: 1875 Glossary of Graphs and Digraphs, nodes: 72, edges: 118
Task 2: Consider the networks mentioned above. Discuss the expected values of
How will probably look like the curve giving degree distribution?
To visualise degree distribution, we can use plotting by matplotlib
. There are several useful ways of plotting, such as histograms by function bar
and line plots by function plot
.
import matplotlib.pyplot as plt
x = range(-10,11)
y = [i**2 for i in x]
plt.bar(x, y)
plt.title("Parabola")
plt.ylabel("y")
plt.xlabel("x")
Text(0.5, 0, 'x')
There are multiple possibilities to set up the view, see https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.bar.html
Task 3: Write a function, which for given graph plots the histogram of its degree distribution.
import collections
import matplotlib.pyplot as plt
import networkx as nx
def draw_deg_distr(G):
degree_sequence = sorted([G.degree[i] for i in G.nodes()], reverse=True)
degree_count = collections.Counter(degree_sequence)
degree, count = zip(*degree_count.items())
plt.bar(degree, count)
plt.title("Degree distribution")
plt.ylabel("Frequency")
plt.xlabel("Degree")
plt.draw()
return
lollipop = nx.lollipop_graph(8, 10)
#nx.draw(lollipop)
#draw_deg_distr(lollipop)
draw_deg_distr(lesmis)
#draw_deg_distr(collab)
#draw_deg_distr(powgrid)
#draw_deg_distr(airline)
#draw_deg_distr(neural)
#draw_deg_distr(gloss)
Les Miserables degree histogram:
Collaboration degree histogram:
Power grid degree histogram:
Airline connections degree histogram:
C. Elegans neural network degree histogram:
Glossary network degree histogram:
A network is considered scale-free if the fraction of vertices $P(k)$ having degree $k$ decays for large values of $k$ similarly as the power law: $$P(k) \approx k^{-\lambda}$$ where $2 < \lambda < 3$. Sometimes we also call this as heavy or fat tailed degree distribution.
To find if the network tends to be scale-free or not, it will be useful to be able to visualize both power law and degree distribution in one image.
This program is suitable for more advanced manipulation with plots or for work with very large data sets.
Gnuplot
is a command line program. It is possible to either write commands directly or write scripts. The latter is recommended especially for plots with more parameters differing from default and for doing similar plots on different data.
A typical script for gnuplot
starts with set {parameter} {value}
on separate lines and ends by line plot {data source} using {order of column for x-data}:{order of column for y-data} title {title for given data} with {type of plotting}
.
By adding a comma and other part starting by {data source}
, you can plot more graphs into one picture.
Parameters which can be set in the first part are for example
[from:to]
The script can be then simply run using gnuplot {name of the script}
.
Task 4: For the networks above, plot the fraction of vertices having degree $k$ for all reasonable $k$'s and compare this plot with the function $k^{-\lambda}$ for $\lambda$ being $2, 2.5$ and $3$.
import collections
import networkx as nx
def save_fraction_deg(G, filename):
n = len(G)
degree_sequence = sorted([G.degree[i] for i in G.nodes()], reverse=True)
degree_count = collections.Counter(degree_sequence)
with open(filename, 'w') as outputfile:
outputfile.write("# Degree Fraction\n")
for deg, count in degree_count.items():
#print("Degree: {}, Fraction: {}".format(deg, count/n))
outputfile.write("{} {}\n".format(deg, count/n))
return
save_fraction_deg(nx.path_graph(4), 'path4-deg-frac.dat')
save_fraction_deg(lesmis, 'lesmis-deg-frac.dat')
save_fraction_deg(collab, 'collab-deg-frac.dat')
save_fraction_deg(powgrid, 'powgrid-deg-frac.dat')
save_fraction_deg(airline, 'airline-deg-frac.dat')
save_fraction_deg(neural, 'neural-deg-frac.dat')
save_fraction_deg(gloss, 'gloss-deg-frac.dat')
Output of this function can be given to following gnuplot script.
set terminal pngcairo
set output 'lesmis-collab-deg-dist.png'
set key center under
set title 'Degree distribution (using fraction of vertices)'
set xlabel 'Degree'
set ylabel 'Fraction of vertices'
set yrange [0:0.5]
set xrange [1:25]
plot 'lesmis-deg-frac.dat' using 1:2 title 'Les Miserables network' with lines, \
'collab-deg-frac.dat' using 1:2 title 'Collaboration network of network scientists' with lines, \
x**(-2) title 'Mean value for P(k) = k^{-2}' with lines, \
x**(-2.5) title 'Mean value for P(k) = k^{-2.5}' with lines, \
x**(-3) title 'Mean value for P(k) = k^{-3}' with lines
This and similar two scripts give the following results:
We can count the clustering coefficient of vertices in the network by NetworkX function clustering(G)
, which returns a dictionary of vertices with their clustering coefficient saved as values. More information is in the documentation:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html#networkx.algorithms.cluster.clustering.
You can even count the average clustering of the network G
using function average_clustering(G)
:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.average_clustering.html#networkx.algorithms.cluster.average_clustering
Task 5: For the above mentioned networks, count the
def largest_conn_comp(G):
if isinstance(G, nx.DiGraph):
largest_cc = max(nx.weakly_connected_components(G), key=len)
elif isinstance(G, nx.Graph):
largest_cc = max(nx.connected_components(G), key=len)
else:
raise TypeError()
return G.subgraph(largest_cc).copy()
def print_stats(G, desc):
avg_len = nx.average_shortest_path_length(G)
clust = nx.average_clustering(G)
print("{}, average shortest path length: {: 0.2f}, average clustering: {: 0.2f}".format(desc, avg_len, clust))
return
print_stats(lesmis, "Les Miserables network")
print_stats(largest_conn_comp(collab), "Netscience collaboration")
print_stats(powgrid, "Power grid network")
print_stats(nx.Graph(airline), "Airline connections US")
print_stats(largest_conn_comp(neural), "C. Elegans neural network")
print_stats(largest_conn_comp(nx.Graph(gloss)), "Graph theory glossary")
Les Miserables network, average shortest path length: 2.64, average clustering: 0.57 Netscience collaboration, average shortest path length: 6.04, average clustering: 0.74 Power grid network, average shortest path length: 18.99, average clustering: 0.08 Airline connections US, average shortest path length: 2.74, average clustering: 0.63 C. Elegans neural network, average shortest path length: 2.12, average clustering: 0.19 Graph theory glossary, average shortest path length: 3.10, average clustering: 0.28
Task 6: Did the results match your expectations?