@agpwhy 2022-07-05T11:37:55.000000Z 字数 1197 阅读 291

王胖的生信笔记第四十七期：用211校训做个词云

马上上海的同学就要高考了，在这里祝大家发挥出应有水平，考到符合自己预期的专业。对于法八这个神奇的专业感兴趣的可以交流。

扯远了，在这个时间点上，写个应景的小笔记。用211大学（虽然这个有点过时了）的校训集合做个词云。

准备

library(jiebaR)
library(dplyr)
library(readr)
library(magrittr)
library(wordcloud2)
text_comments <- readLines(con = "./211大学的校训合辑.txt",  encoding = "UTF-8")

jiebaR就是一个蛮好用的分词的R包。

开始处理词

具体的处理教程可以看下jiebaR的官方教程

wk <- worker() #这个就是流程需要
split_1 <- segment(text_comments, wk)
split_1 <- split_1[!grepl("大学", split_1)] # 把大学名字删去
split_combined <- sapply(split_1, function(x) {paste(x, collapse = " ")})  #拆分
split_combined <- split_combined[nchar(split_combined)>1] 
split_combined <- split_combined[nchar(split_combined)<6] #去除过长过短的词
comments_freq <- freq(split_combined) 
comments_freq <- comments_freq[order(comments_freq[,2],decreasing = TRUE),] #排序

然后就能出图了

wordcloud2(comments_freq, color = "random-light", size = 2, shape = "circle")

但是会发现储存有点问题。可以用之前提到过的webshot的方法。

library("htmlwidgets")
library("webshot")
webshot::install_phantomjs()
mygraph <- wordcloud2(comments_freq, 
           color = "random-light", size = 2, shape = "circle")
saveWidget(mygraph,"tmp.html",selfcontained = F)
webshot("tmp.html","fig_1.pdf", delay =5, vwidth = 1600, vheight=1600)
webshot("tmp.html","fig_1.png", delay =5, vwidth = 1600, vheight=1600)

最后结果如下啦：

fig_1

我到现在其实对复旦的校训校歌印象还是比交大更深，因为那时候高中几年每周都要唱的hhh。

王胖的生信笔记第四十七期：用211校训做个词云

准备

开始处理词

内容目录