@Macux
2015-12-01T06:48:34.000000Z
字数 3035
阅读 957
R语言_学习笔记
1、准备工作
> library(labeling)
> library(ggplot2)
2、小试牛刀
> df <- data.frame(x = c(3, 1, 5), y = c(2, 4, 6), label = c("a","b","c"))
> p <- ggplot(df, aes(x, y)) + xlab(NULL) + ylab(NULL)
> p + geom_area(fill="mediumpurple1") + labs(title = "geom_area")
- 路径图
> p + geom_path(color="orangered1") + labs(title = "geom_path")
- 含标签的散点图
> p + geom_text(aes(label = label)) + labs(title = "geom_text")
- 色深图
> p + geom_tile(aes(fill=label)) + labs(title = "geom_tile")
- 多边形图
> p + geom_polygon(fill="lightpink") + labs(title = "geom_polygon")
3、永远不要指望仅依靠默认的参数,就能对某个具体的分布获得一个表现力强的图形
> ggplot(diamonds,aes(x=depth))+geom_histogram(fill="tan2")
> ggplot(diamonds,aes(x=depth))+geom_histogram(binwidth=.1,fill="tan2")+xlim(55,70)
> depth_dist <- ggplot(diamonds, aes(depth)) + xlim(58, 68)
depth_dist + geom_histogram(aes(fill=cut,y = ..density..), binwidth = 0.1) +facet_grid(cut ~ .) + theme(legend.position="none")
> depth_dist + geom_histogram(aes(fill = cut), binwidth = 0.1, position = "fill")
> depth_dist + geom_freqpoly(aes(y = ..density.., colour = cut), binwidth = 0.1)
从最后的这三幅图中可以看出,随着钻石质量的提高,分布逐渐向左偏移,且愈发对称。
4、箱线图的一般及高级使用
> ggplot(diamonds,aes(cut,depth)) + geom_boxplot(aes(fill=cut))
> ggplot(diamonds,aes(carat,depth,group=round_any(carat,0.1,floor)))+geom_boxplot(fill="orange") +xlim(0,3)
5、处理散点图遮盖问题
> with(diamonds,smoothScatter(carat,price,xlim=c(1,3)))
> library(grid)
> library(lattice)
> library(hexbin)
> dp <- ggplot(diamonds, aes(carat, price)) + xlim(1,3)
> dp + stat_binhex(binwidth=c(0.02, 200))
> d <- ggplot(diamonds, aes(carat, price)) + xlim(1,3)
> d + stat_density2d(geom = "point", aes(size = ..density..), contour = F,color="cornflowerblue") + scale_size_area()
6、统计摘要
偶然发现,bar图和histogram图都只能对一个变量进行描述。虽然,我们可以通过调整position参数,来分组的呈现它们。但本质上还是只描述了data集中的某一个变量。y轴始终只能是两种情况:count or density,绝不可能是data集中另一个变量。解决的办法就是使用stat_summary()
> ggplot(diamonds, aes(cut,price)) + stat_summary(fun.y = median, geom = "bar",aes(fill=cut))
(1) 该图展示的是各种cut类型的钻石的平均价格比较。
(2) 当然也可以比较它们的max、min、median。
(3) 像这种同时存在两个变量的条形图,如果不使用stat_summary是做不到的。
7、为图层添加图形注解
图层还是图层,只是将额外的数据添加到这个图层上,便于我们对数据的理解和分析。
> ggplot(economics,aes(date,unemploy))+ xlab(NULL) + ylab("No.unemployed(1000s)") + geom_line()
> data(presidential)
> presidential <- presidential[-(1:3),] #注意,每RUN一次,都会删除数据集的前三行。
> yrng <- range(economics$unemploy)
#This function returns a vector containing the minimum and maximun of all the given arguments
> xrng <- range(economics$date)
> last_plot() + geom_rect(aes(xmin = start, xmax = end, y = NULL, x = NULL,fill = name), ymin = yrng[1], ymax = yrng[2], data=presidential,alpha=0.2)+scale_fill_manual(values=c("blue","orange","red","green","pink","black","gold"))
> last_plot() + geom_text(aes(x = start, y = yrng[1], label = name),data = presidential, size = 5, hjust =0, vjust =0.5 ,angle=30)
> caption <- paste(strwrap("Unemployment rates in the US have varied a lot over the years", 40), collapse="\n")
> last_plot() + geom_text(aes(x = xrng[2], y = yrng[2], label= caption),data=data.frame(), hjust = 1, vjust = 1, size = 5)
> highest <- subset(economics, unemploy == max(unemploy))
> last_plot()+ geom_point(colour = alpha("red", 0.8), data = highest, size = 7)