Degree Correlation - socrateslab/zh GitHub Wiki
test
Lazaros K. Gallos, Chaoming Song, Hernán A Makse, 2008. Scaling of Degree Correlations and Its Influence on Diffusion in Scale-Free Networks. Physical Review Letters 100(24):248701 Impact Factor: 7.51 DOI: 10.3731/topologica.2.020 [1]
Hernán A Makse http://www-levich.engr.ccny.cuny.edu/webpage/hmakse/software-and-data/
Internet data: The DIMES project Internet As data: http://data.caida.org/datasets/as-relationships/serial-1/
Chaoming, Song. 2011 Self-similarity and Scaling Theory of Complex Networks. Proquest. Google books link
PPT: :File:Degree correlations. Gallos_CT.pdf
<math>P(k_1, k_2) </math> denotes the probability of two nodes of degree <math>k_1</math> and <math>k_2</math> are connected to each other. If no degree correlation: <math>P(k_1, k_2) \sim k_1 P(k_1) k_2 P(k_2) </math>
For scale free networks:<math>P(k_1, k_2) \sim k_1 P(k_1) k_2 P(k_2) \sim k_1^{1-\gamma} k_2^{1-\gamma} </math>
<math>\int P(k_1, k_2) d k_2 = k_1 P(k_1)</math>
等式两边表示的都是度为<math>k_1</math>的节点一共连了多少条边的概率!!!
密度守恒定律,指的是边密度从度分布算和联合度分布算结果总是一样的。
对无标度网络而言,满足 <math>P(k) \sim k^{-\gamma}</math>,所以边密度守恒的公式可以表达为:
<math>\int P(k_1, k_2) d k_2 = k_1 P(k_1) \sim k_1^{-(\gamma -1) }</math>
The statistical similarity of the corresponding plots suggests the invariance of <math>P(k_1, k_2)</math>. Accordingly, this suggests that the <math>k_1</math> and <math>k_2</math> dependence can be separated, and the behavior of the tail of the joint degree distribution is
<math>P(k_1, k_2) \sim k_1^{-(\gamma -1 )} k_2^{-\epsilon}</math> (1)
for completely random networks:
<math>P(k_1, k_2) \sim k_1 P(k_1) k_2 P(k_2) \sim k_1^{1-\gamma} k_2^{1-\gamma}</math> (2)
Using Eq. (1), we can see that:
- For low-degree nodes (<math>k_1 > k_2</math>), integrating over <math>k_2</math>, we retrieve the <math>k^{1-\gamma}</math> dependence.
- For the case of hubs, where integration is over <math>k_1</math>, the dependence on the degree is <math>k^{-\epsilon}</math>
Maslov, S. & Sneppen, K. 2002. Specificity and stability in topology of protein networks. Science[2]
The protein interaction network is a representative of the broad class of scale-free networks (4–6 ) in which the number of nodes with a given number of neighbors (connectivity) <math>K</math>scales as a power law <math>K^{-\gamma}</math>. In our case the histogram of connectivities can be fitted by a power law with <math>\gamma = 2.5 \pm 0.3</math> for <math>K</math> ranging from 2 to about 100 (7, 8).
- <math>\gamma = 2.5 \pm 0.3</math>
- <math>\gamma -1 = 1.5 \pm 0.3</math>
测量度相关的方式很多,Newman提出的度相关系数、<math>k_{nn}</math>等,但是不幸的是它们随着重整化并非不变的(invariant)。<math>E_b(k)</math>可以做到伴随着重整化不变(但要求是无标度网络才行)。
<math>E_b(k) \equiv \frac{\int_{bk}^{\infty} P(k | k') dk'}{ \int_{bk}^{\infty} P( k') dk' } =\frac{ \int_{bk}^{\infty} \frac{P(k, k') }{k' P(k') } dk'} {\int_{bk}^{\infty} P( k') dk'}</math>
- 分子是两节点度为k和大于bk的链接比例;
- 分母是度大于bk的节点的比例
以下哪一个推导正确:
推导1:<math>P(k | k') = \frac{P(k, k') }{\int P(k, k') dk} = \frac{k^{-(\gamma-1)} k'^{-\epsilon}}{k'^{1-\gamma}} = k^{-(\gamma -1)} k'^{-(1+\epsilon - \gamma)}</math>
已知度为k的节点存在的概率为p(k),那么连边的一头连度为k的概率为kP(k)。就是P(k,k')是连边存在的概率,p(k)为度值为k的节点存在的概率,这个没法放到分母下面,放到分布下面的是度值为k的连边存在的概率,这个值还有乘以k才对。
推导2:<math>P(k | k') = \frac{P(k, k') }{P(k')} = \frac{k^{-(\gamma-1)} k'^{-\epsilon}}{k'^{-\gamma}} = k^{-(\gamma -1)} k'^{-\epsilon + \gamma}</math>
- 将链接表达为<math>(k, k')</math>的形式
- 使得其中的 <math>k' > k</math>
- 选择 <math>b = 3</math>
- 计算:
- 对于每一个<math>(k, k')</math>, 计算<math>k' > 3k, k = k</math>的链接数量<math>N_{kk'}</math>
- 计算<math>P(k, k') = \frac{N_{kk’}}{Number\;of\;edges}</math>
- 对于每一个度k', 当<math>k' > 3k</math>时, 计算其比例<math>P(k')</math>和<math>k' P(k')</math>
- 计算<math>P_1 = \sum \frac{P(k, k')}{k' P(k')}</math>和<math>P_2 = \sum P(k')</math>。
- 计算 <math>E_b (k) = P_1/P_2 = \sum \frac{P(k, k')}{k' P(k')} / \sum P(k')</math>
%matplotlib inline import networkx as nx import matplotlib.cm as cm import matplotlib.pyplot as plt from collections import defaultdict def ebkk(g, b): edge_dict = defaultdict(lambda: defaultdict(int)) degree_dict = defaultdict(int) edge_degree = [sorted(g.degree(e).values()) for e in g.edges()] for e in edge_degree: edge_dict[e[0]][e[-1]] +=1 for i in g.degree().values(): degree_dict[i] +=1 edge_number = g.number_of_edges() node_number = g.number_of_nodes() ebks, ks = [], [] for k1 in edge_dict: p1, p2 = 0, 0 for k2 in edge_dict[k1]: if k2 >= b*k1: pkk = float(edge_dict[k1][k2])/edge_number pk2 = float(degree_dict[k2])/node_number k2pk2 = k2*pk2 p1 += pkk/k2pk2 p2 += pk2 if p2 > 0: ebks.append(p1/p2) ks.append(k1) return ebks, ksca = nx.Graph() with open ('/Users/chengjun/bigdata/ca-CondMat.txt') as f: for line in f: if line[0] != '#': x, y = line.strip().split('\t') ca.add_edge(x,y) nx.info(ca) ebk, k = ebkk(ca, b=3) plt.plot(k,ebk,'r^') plt.xlabel(r'$k$', fontsize = 16) plt.ylabel(r'$E_b(k)$', fontsize = 16) plt.xscale('log') plt.yscale('log') plt.show()
- http://snap.stanford.edu/data/ca-CondMat.html
不再采用作者的思路,而是直接计算每一个度k所在的边中另一个度大于等于bk的比例就是ebk,为了减少数据里的波动,采用<math>\sum_{bk}^{\infty} N(k') </math>进行归一化。
def ebw(g, b): edge_degree = [sorted([g.degree(i), g.degree(j)]) for i, j in g.edges()] degree_dict = defaultdict(int) for i in g.degree().values(): degree_dict[i] +=1 node_number = g.number_of_nodes() edge_number = g.number_of_edges() ebks, ks, kv = [],[], set(g.degree().values()) for k in kv: ek, ekk, k_node_num = 0, 0, 0 for k1, k2 in edge_degree: if k1 == k or k2 == k: ek +=1 if k1 ==k and k2 >= b*k: ekk += 1 ebk = float(ekk)/ek for ki in degree_dict: if ki>=b*k: k_node_num += float(degree_dict[ki]) #k_node_num = float(degree_dict[k]) if ebk != 0: ebks.append(ebk/k_node_num) ks.append(k) return ebks, ks import numpy as np def log_binning(x, y, bin_count=35): max_x = np.log10(max(x)) max_y = np.log10(max(y)) max_base = max([max_x,max_y]) xx = [i for i in x if i>0] min_x = np.log10(np.min(xx)) bins = np.logspace(min_x,max_base,num=bin_count) bin_means_y = (np.histogram(x,bins,weights=y)[0] / np.histogram(x,bins)[0]) bin_means_x = (np.histogram(x,bins,weights=x)[0] / np.histogram(x,bins)[0]) return bin_means_x,bin_means_y