[关闭]
@jfruan 2017-04-28T08:20:54.000000Z 字数 6849 阅读 1272

Graph Measures

Network_Science


1. Vertex Measures

1.1 Degree Centrality

Degree is a simple centrality measure that counts how many neighbors a node has. If the network is directed, we have two versions of the measure: in-degree is the number of in-coming links, or the number of predecessor nodes; out-degree is the number of out-going links, or the number of successor nodes. Typically, we are interested in in-degree, since in-links are given by other nodes in the network, while out-links are determined by the node itself. Degree centrality thesis reads as follows:
A node is important if it has many neighbors, or, in the directed case, if there are many other nodes that link to it, or if it links to many other nodes.

Let be the adjacency matrix of a directed graph. The in-degree centrality of node is given by:

or in matrix form ( is a vector with all components equal to unity):
The out-degree centrality of node is given by:
or in matrix form:

The built-in function degree(R, C) computes degree centrality.

A user-defined function degree.centrality follows:

  1. # Degree centrality
  2. # INPUT
  3. # g = graph
  4. # mode = degree mode ("in" for in-degree, "out" for out-degree, "all" for total degree)
  5. degree.centrality = function(g, mode) {
  6. A = get.adjacency(g);
  7. o = rep(1, n);
  8. if (mode == "in") c = o %*% A else
  9. if (mode == "out") c = A %*% o else
  10. c = o %*% (A + t(A));
  11. return(as.vector(c));
  12. }

A graph with nodes labelled with their in-degree centrality follows:

Next is the same graph with nodes labelled with their out-degree centrality:

1.2 Betweenness Centrality

Betweenness centrality measures the extent to which a vertex lies on paths between other vertices. Vertices with high betweenness may have considerable influence within a network by virtue of their control over information passing between others. They are also the ones whose removal from the network will most disrupt communications between other vertices because they lie on the largest number of paths taken by messages.

Mathematically, let be the number of geodesic paths from to that pass through and let be the total number of geodesic paths from to . Recall that a geodesic path is not necessarily unique and the geodesic paths between a pair of vertices need not be node-independent, meaning they may pass through some of the same vertices. Then the betweenness centrality of vertex is:

where by convention the ratio if . Notice that each pair of vertex contribute to the sum for with a weight between 0 and 1 expressing the betweenness of with respect to the pair . Observe that:

It makes little difference in practice to consider the alternative definitions, since one is usually concerned only with the relative magnitudes of the centralities and not with their absolute values. The sum can be normalized by dividing by the total number of ordered pairs of nodes, which is , so that betweenness lies strictly between 0 and 1.

Consider the following graph:

From node 0 to node 4 there are two geodesics, both going through node 2. Hence

Consider once again the following simple network:

Function betweenness (R, C) computes betweenness centrality (the function counts undirected paths in only one direction and computes the sum in the betweenness formula for , and ). This is the same network with nodes labelled with their betweenness centrality:

Notice that our sample graph is in fact a tree, that is a connected acyclic undirected graph, hence the number of geodesics from to is always 1, and the number of geodesics from to that pass through is either 0 or 1. Hence the computed betweenness score for a vertex is the actual number of distinct paths that strictly contain the vertex in between.

Betweenness centrality differs from the other centrality measures. A vertex can have quite low degree, be connected to others that have low degree, even be a long way from others on average, and still have high betweenness. Consider a vertex A that lies on a bridge between two groups of vertices within a network. Since any path between nodes in different groups must go through this bridge, node A acquires high betweenness even though it is not well connected (it lies at the periphery of both groups) and hence it might not have particularly high values for degree, eigenvector, and closeness centrality. Vertices in roles like this are sometimes referred to as brokers.

The maximum possible value for betweenness occurs for the central node of a star graph, a network composed of a vertex attached to other vertices, whose only connection is with the central node. All paths, except the paths from the peripheral vertices to themselves, go through the central vertex, hence its betweenness is . At the other end of the scale, the smallest possible value for betweenness occurs for a leaf node in a graph with a single component, that is a node that is connected to the rest of the network with only one edge. The leaf vertex lies on every path that starts or ends with itself. These are in total: paths from a vertex to others, from others to the vertex, and 1 from the vertex to itself.

Betweenness centrality values are typically distributed over a wide range. Taking again the example of the film actor network, the individual with highest betweenness in the largest component of the network is Fernando Rey (The French Connection). It is no coincidence that he appeared in both European and American films, played roles in several languages, and worked in both film and television, hence he is the archetypal broker. Rey has a betweenness score of , while the lowest score of any actor in the large component is . Thus there is a ratio of almost a thousand between the two limits, much larger than the ratio of 3.6 we saw in the case of closeness. One consequence is that there are clear winners and losers in the betweenness centrality competition. The second highest betweenness score is attributed to Christopher Lee again, with , a difference from the winner.

Betweenness centrality, as defined above, is a measure of information control assuming two important hypothesis: (i) every pair of vertices exchange information with equal probability, and (ii) information flows along the geodesic (shortest) path between two vertices, or one of such path, chosen at random, if there are several. However, information not always takes the shortest route. In social network, for instance, a news about a friend of us might not come directly from the friend but from another mutual friend.

Calculating closeness and betweenness centrality for all nodes in a graph involves computing the (unweighted) shortest paths between all pairs of nodes in the graph. A breath-first visit from a source node can be used to compute all shortest paths from the source in time , where and are the number of nodes and edges of the graph. If this visit is repeated for all the source nodes of the graph the cost amounts to . Typically, real networks are sparse graphs, meaning the number of edges is of the order of the number of nodes. For instance, it is very rare in a social network that a significant number of actors have contacts with all the other actors of the network. Hence the overall cost in practical cases is quadratic.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注