[关闭]
@lunar 2016-03-22T22:51:57.000000Z 字数 4099 阅读 1101

Revision of Machine Learning Course @Paris

MachineLearning


24/03期末考试,临时抱个佛脚。

General Concepts

Key words to know

Algorithim need to know in details

ACP

Steps

  1. Calculate the center of data set.
  2. Regularization.Minus the center for each observation.
  3. Calculate the variance/co-variance matrix.

    -transposition of
  4. Calculate the eigen value and its eigen vector.
    By solving the equation :

  5. Calculate the new coordinates of the point by.

R

  1. data
  2. Xc <- scale(data,center=TRUE,scale=FALSE)
  3. Mcov=(1/n)*t(Xc)%*%Xc
  4. pc=eigen(Mcox)
  5. Xc%*%pc$vectors
  6. #Use FactoMineR
  7. library(FactoMineR)
  8. res.pca <- PCA(data,graph=FALSE,axes=c(1,2))
  9. summary(res.pca)

K-means

Steps

  1. Randomly split the data set into K subset.
  2. Calculate the center of gravity for each set.
  3. Calculate the distance between each node and each center, classify the node to the class that is closest to it.
  4. Repeat 2 to 3 until nothing changes.
  5. You can repeat 1 to 4 for several times and select the result with lowest variance.

R

  1. km.out <- kmens(data,k,nstart=20)
  2. km.out$cluster
  3. #Variance of each cluster
  4. km.out$withinss
  5. #Sum of variance
  6. km.out$tot.withinss

CAH

Steps

  1. Regard each obversation as a sub set.
  2. Calculate the distance between subsets. Merge the two sub set that have the smallest distance.Choose the smallest(single)/greatest(complete)/average(average) variable as the distance of it and other subset.
  3. Repeat 2 untill there is only one set.
  4. Draw a tree to present its process.Label the node with the samllest value.

R

  1. hc.complete=hclust(dist(x), method="complete")
  2. hc.average=hclust(dist(x), method="average")
  3. hc.single=hclust(dist(x), method="single")
  4. #Draw the plot
  5. par(mfrow=c(1,3))
  6. plot(hc.complete,main="Complete Linkage", xlab="", sub="", cex=.9)
  7. plot(hc.average, main="Average Linkage", xlab="", sub="", cex=.9)
  8. plot(hc.single, main="Single Linkage", xlab="", sub="", cex=.9)
  9. #Do the cluster
  10. cutree(hc.complete, 2)
  11. cutree(hc.average, 2)
  12. cutree(hc.single, 2)
  13. cutree(hc.single, 4)
  14. xsc=scale(x)
  15. plot(hclust(dist(xsc), method="complete"), main="Hierarchical Clustering with Scaled Features")
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注