@agpwhy
2022-07-05T11:37:55.000000Z
字数 1197
阅读 371
马上上海的同学就要高考了,在这里祝大家发挥出应有水平,考到符合自己预期的专业。对于法八这个神奇的专业感兴趣的可以交流。
扯远了,在这个时间点上,写个应景的小笔记。用211大学(虽然这个有点过时了)的校训集合做个词云。
library(jiebaR)library(dplyr)library(readr)library(magrittr)library(wordcloud2)text_comments <- readLines(con = "./211大学的校训合辑.txt", encoding = "UTF-8")
jiebaR就是一个蛮好用的分词的R包。
具体的处理教程可以看下jiebaR的官方教程
wk <- worker() #这个就是流程需要split_1 <- segment(text_comments, wk)split_1 <- split_1[!grepl("大学", split_1)] # 把大学名字删去split_combined <- sapply(split_1, function(x) {paste(x, collapse = " ")}) #拆分split_combined <- split_combined[nchar(split_combined)>1]split_combined <- split_combined[nchar(split_combined)<6] #去除过长过短的词comments_freq <- freq(split_combined)comments_freq <- comments_freq[order(comments_freq[,2],decreasing = TRUE),] #排序
然后就能出图了
wordcloud2(comments_freq, color = "random-light", size = 2, shape = "circle")
但是会发现储存有点问题。可以用之前提到过的webshot的方法。
library("htmlwidgets")library("webshot")webshot::install_phantomjs()mygraph <- wordcloud2(comments_freq,color = "random-light", size = 2, shape = "circle")saveWidget(mygraph,"tmp.html",selfcontained = F)webshot("tmp.html","fig_1.pdf", delay =5, vwidth = 1600, vheight=1600)webshot("tmp.html","fig_1.png", delay =5, vwidth = 1600, vheight=1600)
最后结果如下啦:

我到现在其实对复旦的校训校歌印象还是比交大更深,因为那时候高中几年每周都要唱的hhh。