@hainingwyx 2017-06-15T15:39:04.000000Z 字数 4394 阅读 4392

# 聚类的有效性指标

聚类

## 外部指标

### 纯度purity

purity方法的优势是方便计算，值在0～1之间，完全错误的聚类方法值为0，完全正确的方法值为1。同时，purity方法的缺点也很明显它无法对退化的聚类方法给出正确的评价，设想如果聚类算法把每篇文档单独聚成一类，那么算法认为所有文档都被正确分类，那么purity值为1！而这显然不是想要的结果。

### 互信息

MATLAB代码

function adjrand = adjrandindex(u,v)%function adjrand=adjrand(u,v)%% Computes the adjusted Rand index to assess the quality of a clustering.% Perfectly random clustering returns the minimum score of 0, perfect% clustering returns the maximum score of 1.%%INPUTS% u = the labeling as predicted by a clustering algorithm% v = the true labeling%%OUTPUTS% adjrand = the adjusted Rand index%%Author: Tijl De Bie, february 2003.n=length(u);ku=max(u);kv=max(v);m=zeros(ku,kv);for i=1:n    m(u(i),v(i))=m(u(i),v(i))+1;endmu=sum(m,2);mv=sum(m,1);a=0;for i=1:ku    for j=1:kv        if m(i,j)>1            a=a+nchoosek(m(i,j),2);        end    endendb1=0;b2=0;for i=1:ku    if mu(i)>1        b1=b1+nchoosek(mu(i),2);    endendfor i=1:kv    if mv(i)>1        b2=b2+nchoosek(mv(i),2);    endendc=nchoosek(n,2);adjrand=(a-b1*b2/c)/(0.5*(b1+b2)-b1*b2/c);

### F-measure

$F = \frac{2·Precision · Recall}{Precision+Recall}$

## 内部指标

$\mu$ 代表簇$C$的中心点，$avg(C)$对应于簇$C$样本间的平均距离， $diam(C)$ 对应于簇C 内样本间的最远距离， $d_ {min}(C_i, C_j)$ 对应于簇$C_i$与簇$C_j$ 最近样本间的距离， $dcen(C_i ， C_j )$ 对应于簇$C_i$ 与簇$C_j$中心点间的距离.

### Dunn 指数(Dunn Index 简称DI)

function [DB, Dunn] = valid_DbDunn(cintra, cinter, k)% Davies-Bouldin index  R = zeros(k);  dbs=zeros(1,k);  for i = 1:k    for j = i+1:k      if cinter(i,j) == 0          R(i,j) = 0;      else         R(i,j) = (cintra(i) + cintra(j))/cinter(i,j);      end    end    dbs(i) = max(R(i,:));  end  DB = mean(dbs(1:k-1));  % Dunn index  dbs = max(cintra);  R = cinter/dbs;  for i = 1:k-1     S = R(i,i+1:k);     dbs(i) = min(S);  end  Dunn = min(dbs); 

### Silouette

Matlab中提供了这个函数的调用。S= silhouette(X, CLUST)其中X表示$N \times M$的数据，CLUST是类别，S是$N \times 1$的silhouette向量。

### Modurity

matlab代码

function Q = modularity(W,clu)% Calculate the modularity function Q of a clustering result.% % modularity(W,clu) calculates the modularity of a clustering result% represented by vector clu on the graph with adjacency matrix W.% % Author: Kevin Xu% add wyx % Q = 1/ 2m sum(S_ij -(d_i d_j)/2m) delta_ij% sum_all_edges = 2mk = max(clu);Q = 0;sum_all_edges = full(sum(sum(W)));for clust = 1:k    clu_nodes = (clu == clust);    assoc = full(sum(sum(W(clu_nodes,clu_nodes))));    deg = full(sum(sum(W(clu_nodes,:))));    Q = Q + assoc/sum_all_edges - (deg/sum_all_edges)^2;end

### conductance

cond = cutcond(A,s) returns the sum of degrees of vertices in A

• 私有
• 公开
• 删除