Tutorial 2: Characteristics of complex networks

Small world

A network is considered as small-world (SW) if

(Local) clusering coefficient is

C(v) can be viewed as probability that two neighbours of the given node are connected.

Average clustering coefficient of $G$ is $$C(G) = \frac{1}{n}\sum_{v\in V(G)} C_v(G)$$

Maximizing number of edges for C(G) = 0

$\text{ex}(n; K_{r+1})$ is the maximal number of edges in a graph of size $n$ not containig $K_{r+1}$.

Turán graph $T_r(n)$ is complete $r$-partite graph of size $n$ with sizes of independent classes as similar as possible. We denote $t_r(n) = |E(T_r(n))|$.

Theorem [Turán, 1940]

Every graph $G$ of size $n$ with more than $t_r(n)$ edges contains $K_{r+1}$.

In other words, $\text{ex}(n; K_{r+1}) = t_r(n)$.

Average path length

We denote $d(u,v)$ the distance between $u$ and $v$, i.e. the length of the shortest path between them.

Then the average (shortest) path length is defined as $$L(G) = \frac{1}{n(n-1)}\sum_{\{u,v\} \subset V, u\neq v} d(u,v)$$

Examples

Stars $S_n$ have short average path length $$L(S_n) = \frac{1}{n(n-1)}\Big(1 \cdot (n-1) + 2 \cdot {n-1 \choose 2}\Big) = \frac{n+1}{n} \rightarrow 1$$ and no clustering $$C(S_n) = 0.$$ The edge density of stars is low, $$\frac{n-1}{{n \choose 2}} = \frac{2(n-1)}{n(n-1)} = \frac{2}{n}$$.

Cycles $C_n$ have long average path length, because for one vertex $v$ we have sum of distances equal to $$2 \cdot (1 + 2 + \dots + \frac{n-1}{2}) = 2 \cdot \frac{1}{2} \cdot \frac{n-1}{2} \cdot \Big(\frac{n-1}{2} + 1\Big) = \frac{(n-1)(n+1)}{2}$$ (for odd $n$) and so $$L(C_n) = \frac{1}{n(n-1)}\cdot n \cdot \frac{(n-1)(n+1)}{2} = \frac{n+1}{2}$$ but no clustering $$C(C_n) = 0.$$ The edge density of cycles is $$\frac{n}{{n \choose 2}} = \frac{2n}{n(n-1)} = \frac{2}{n-1}.$$

Complete graphs $K_n$ have short average path length $$L(K_n) = 1$$ and complete clustering.

However, they are dense with edge density $\frac{{n \choose 2}}{{n \choose 2}} = 1$.

Are there any sparse graphs with these properties?

Watts-Strogatz model

Define a ring lattice $C_{n,k}$ as a graph on vertices $\{1,2,\ldots, n\}$ arranged in a cycle such that each vertex is connected to its $k$-nearest neighbors (on cycle)

In other words, it is a cycle where each $v$ is further connected to $u$ s.t. $d(v,u) \leq \frac{k-2}{2}~(\text{mod}~n)$.

WattsStrogatz-1.png

Task 1: Compute average clustering coefficient of ring lattice $C_{n,k}$ (for $k$ even).

Solution: Lattice is symmetric, so it suffices to count $C(v)$ for fixed $v \in C_{n,k}$.

We basically count the density of edges in the neighbourhood of $v$.

Cnk_left-1.png

Cnk_across-1.png

In total, we have clustering coefficient $$\frac{{2\cdot {\frac{k}{2} \choose 2}} + {{\frac{k}{2} \choose 2}}}{{k \choose 2}} = \frac{2\cdot \frac{k}{2}(\frac{k}{2}-1) + \frac{k}{2}(\frac{k}{2}-1)}{k(k-1)} \\ = \frac{3\cdot \frac{k}{2}(\frac{k}{2}-1)}{k(k-1)} = \frac{3}{2}\cdot \frac{k(\frac{k}{2}-1)}{k(k-1)} = \\ \frac{3}{2}\cdot \frac{\frac{k}{2}-1}{k-1} = \frac{3}{2}\cdot \frac{2}{2} \cdot \frac{\frac{k}{2}-1}{k-1} = \frac{3}{4}\cdot \frac{k-2}{k-1} \approx \frac{3}{4}$$

Task 2: Compute average path length of ring lattice $C_{n,k}$ for $n$ odd.

Solution: We use the fact that $C_{n,k}$ is symmetric. The sum of distances from one fixed vertex to the rest of the graph is $$k \cdot \Big(1 + 2 + \dots + \frac{n}{k \cdot 2} \Big).$$ Average shortest path is then $$\frac{nk \cdot \Big(1 + 2 + \dots + \frac{n}{2k} \Big)}{n(n-1)} = \frac{k \cdot \frac{n}{2k} \cdot \Big(\frac{n}{2k} + 1\Big)}{2(n-1)} \approx \frac{n}{2k}.$$

Task 3: Compute edge density of ring lattice $C_{n,k}$.

Solution: $$\rho(C_{n,k}) = \frac{\frac{nk}{2}}{{n \choose 2}} = \frac{nk}{n(n-1)} = \frac{k}{(n-1)} \approx \frac{k}{n}$$

Watts-Strogatz model

a) Take a ring lattice $C_{n,k}$.

b) For each vertex $i$ and all its rightward edges $\{i, j\}$:

This keeps good clustering but shortens distances. If $p$ is too big ($\pi \rightarrow 1$), we end up with a random graph.

WattsStrogatzRandom-1.png

Erdös-Rényi random graph $G(n,p)$ is a graph on $n$ vertices where each pair of vertices is connected by an edge with probability $p$.

Properties for $k$ being average degree: $$L(G_{n,p}) \sim \frac{\log(n)}{k}$$ $$C(G_{n,p}) \sim \frac{k}{n}$$ We have small distances but also small clustering.

Determining small-worldness of a network

We want high clustering and short distances, but these terms have to be relative to something. Reference model for graph $G$ is a random graph $R_G$ with the same number of vertice and edges. We say that graph $G$ is small-world if

when compared to the reference model.

Alternative definition of small-world

Another way to define small-world network is that the average shortest path length grows proportionally to (or less than) the logarithm of the number of vertices $n$ in the network, while having a clustering coefficient which is not small.

Observation Average shortest path length cannot exeed the diameter.

Task 4: 1) What is the diameter of a square portion of square lattice, with $L$ edges (or equivalently $L + 1$ vertices) along each side?

2) What is the diameter of the corresponding hypercubic lattice in $d$ dimensions with $L$ edges along each side? Hence what is the diameter of such a lattice as a function of the number $n$ of vertices?

3) A Cayley tree is a symmetric regular tree in which each vertex is connected to the same number of $k$ others until we get out to the leaves. Show that the number of vertices reachable in $d$ steps from the central vertex is $k(k-1)^{d-1}$ for $d \geq 1$. Hence find an expression for the diameter of the network in terms of $k$ and the number of vertices $n$.

4) Which of the networks in 1), 2) and 3) has a diameter that increases as $\log n$ or slower?

Solution: 1) For convenience let us use indexing of the vertices using two coordinates. We notice that if we move along one edge, we can change only one coordinate. Take vertices $(0,0)$ and $(L, L)$. The difference between the sum of their coordinates is $2L - 0$, which gives distance $2L$. It is easy to see that this distance is the largest in the grid and thus it is the diameter.

2) In $d$-dimensional hypercubic lattice, among the most distant pairs of vertices is again $(0, \dots, 0)$ and $(L, \dots, L)$. By the same argument as above, the diameter is $dL$.

3) In the first step we can reach $k$ vertices, in each other step we cannot go back so there are $(k-1)$ newly reachable neighbours. So for $d$ steps from central vertex we obtain $n = k(k-1)^{d-1}$ vertices. For such network, the diameter is $2d$ (e.g. from leftmost leaf to the rightmost one). Hence we want to express $2d$ using $n = k(k-1)^{d-1}$, which is rouhly $2\log_k n$.

4) In hypercubic lattice of dimension $d$ there are $(L+1)^d$ vertices and diameter is $dL$, which leads to diameter being about $\sqrt[d]{n}$, which is more than $\log n$. The Cayley tree has diameter growing approximately as the logarithm of $n$.

Task 5: Compute the average shortest path length of a path of length $n$.

Task 6: Let us have a binary tree with $d$ levels, where the vertices on the last level are always connected with their siblings. Express the clustering coefficient of such graph using $d$.