Tutorial 3: Network centralities

From the lecture

Network characteristics can be local, describing single vertices, e.g. degree and clustering, or they can be global, describing the whole network, e.g. average path length, average clustering coefficient.

Centralities are functions evaluating the importance of a single vertex $v$:

$\sigma_{x,y}$ number of shortest paths between $x$ and $y$ and for

$\sigma_{x,y}(v)$ number of shortest paths between $x$ and $y$ passing through $v$ is

$$C_B(v) = \sum_{\substack{x,y \in V(G) \setminus v\\x \neq y}} \frac{\sigma_{x,y}(v)}{\sigma_{x,y}}$$

Global betweenness centrality $$\bar{C}_B(G) = \frac{1}{n} \sum_{v \in V(G)} C_B(v)$$

Average shortest path length $$L(G) = \frac{1}{n(n-1)} \sum_{\substack{u,v \in V(G)\\ u \neq v}} d(u,v)$$

Theorem Let $G$ be unoriented graph of size $|V(G)| = n$, $L(G)$ its average path length and $\overline{C}_B(G)$ its global betweeness centrality. Then it holds that $\overline{C}_B(G) = (n - 1) (L(G) - 1)$.

Lemma Let $G$ be a graph of size $n$ and let $G'$ be a graph obtained from $G$ by adding an edge between random vertices $u,v$ in distance $d = d(u,v) > 1$. Then it holds that $$\overline{C}_B(G') \leq \overline{C}_B(G) - \frac{2(d - 1)}{n}.$$

Corollary Let $G=(V,E)$ be a graph, $G'=(V,E')$ its subgraph on all vertices and $r = | E \setminus E'|$. Then $$\overline{C}_B(G) \leq \overline{C}_B(G') - \frac{2r}{n}.$$

Corollary Let $G=(V,E)$ be a graph with $|V|=n$ and $|E|=m$, and let $T$ be its spanning tree. Then $$\overline{C}_B(G) \leq \overline{C}_B(T) - \frac{2(m - n + 1)}{n}.$$

A vertex is called a leaf if $\deg(v) = 1$ and a k-neighbourhood of a vertex is defined as $$\Gamma_k(v) = \{v\in V : d(u,v) = k\}.$$

Theorem Let $G$ be a graph of size $n$, $w \in V$ a vertex with eccentricity $\epsilon(w)$ and let $G'$ be a graph obtained from $G$ by adding a vertex $u$ which is connected to $G$ by the edge $\{u,w\}$. Then $$\overline{C}_B(G') = \frac{n}{n - 1}\overline{C}_B(G) + \frac{2}{n - 1}\sum_{k=1}^{\epsilon(w)} k|\Gamma_k(w)|.$$

Theorem Let $G$ be a graph of size $n$ and let $G'$ be a graph obtained from $G$ by adding a new vertex $w$ and edges connecting it with vertices $u$ and $v$ in distance $1 \leq d(u,v) \leq 2$. Then $$\frac{1}{n+1} \left[n\overline{C}_B(G) + n - 2\right] \leq \overline{C}_B(G').$$

Counting centralities

During this tutorial we will compute centralities. We will start with some simple graphs.

Degree centrality

Degree centrality can be computed using function degree_centrality with network as a parameter. This function returns a dictionary of vertices with values corresponding to degree centrality. In the first example we also showcase working with labels.

https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.degree_centrality.html

Eccentricity

Eccentricity of vertex $v$ is the maximum of distances from $v$ to other vertices in $G$.

Closeness centrality

https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html

Betweenness centrality

Betweenness centrality can be computed using NetworkX as follows.

Parameter normalized=False tells us, that we want betweenness in real sizes, not after division by ${n-1 \choose 2}$, which normalizes the values into range $\langle 0, 1 \rangle$. Note that this function computes betweenness for unoriented graphs. To obtain values compatible with values from the lecture, we have to multiply these values by two (or divide the values from the lecture).

More information about implementation of this function is in the documentation: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#networkx.algorithms.centrality.betweenness_centrality

Average path length

Average shortest path length can be also easily computed using NetworkX. The algorithm for counting the lengths of paths in weighted graphs is Dijkstra's algorithm by default, other options are algorithms of Bellman-Ford and Floyd-Warshall. See the documentation for details,

https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.generic.average_shortest_path_length.html

Task 1: Write a function which counts the global betweenness centrality of a given graph $G$. Which of our three graphs do you expect to have largest and lowest average betweenness? First guess and then run the computation.

Task 2: Use the function from previous task to write a function verifying the relation between global betweenness and average path length. Use such function on several graphs.

Spanning tree

Spanning tree of a graph $G$ on $n$ vertices is a subgraph of $G$, which is a tree covering all vertices of $G$. In NetworkX there are many methods for spanning trees, see the documentation for trees, section about spanning trees:

https://networkx.org/documentation/stable/reference/algorithms/tree.html

https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.tree.mst.minimum_spanning_tree.html

Task 3: Verify the bound relating global betweenness of $G$ and its spanning tree by implementing a function enumerating the betweenness and the "error term". Try to find graphs for which it is more tight and graphs where the difference is large.

Task 4: Verify the bound on the change of global betweenness after adding a new vertex connected to two leafs of a star.

Small world

In NetworkX we can generate a Watts-Strogatz model by function watts_strogatz_graph(n, k, p), where $n$ stands for number of vertices, $k$ denotes how many closest vertices is any vertex neighbouring and $p$ is the probability of rewiring an edge. See the docs for more information: https://networkx.org/documentation/stable/reference/generated/networkx.generators.random_graphs.watts_strogatz_graph.html

There are several coefficients and indexes used to evaluate how much small-world is a graph.

Let $C$ be the average clustering coefficient of $G$, $L$ be average shortest path length of $G$, $C_R$ be average clustering coefficient of a reference random model for $G$ and $L_R$ be the average shortest path length of a reference random model. Then the small-world coefficient of a graph $G$ is defined as $$\sigma(G) = \frac{\frac{C}{C_R}}{\frac{L}{L_R}}.$$ Usually, graphs with $\sigma(G) > 1$ are considered to be small-worlds.

https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.smallworld.sigma.html

For the following task it will be useful to visualize numbers in a graph. For simple bar graph we can use for example function bar from matplotlib.pyplot: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html

Task 5: Fix $n=15$, $k=4$. Find how the small-world coefficient changes with growing $p$.

Note that for $n=15$ and $k=4$ the distances are already short, so rewiring decreases small-world coefficient, by not making paths much shorter while destroying clustering.