@agpwhy
2022-07-05T11:37:55.000000Z
字数 1197
阅读 291
马上上海的同学就要高考了,在这里祝大家发挥出应有水平,考到符合自己预期的专业。对于法八这个神奇的专业感兴趣的可以交流。
扯远了,在这个时间点上,写个应景的小笔记。用211大学(虽然这个有点过时了)的校训集合做个词云。
library(jiebaR)
library(dplyr)
library(readr)
library(magrittr)
library(wordcloud2)
text_comments <- readLines(con = "./211大学的校训合辑.txt", encoding = "UTF-8")
jiebaR就是一个蛮好用的分词的R包。
具体的处理教程可以看下jiebaR的官方教程
wk <- worker() #这个就是流程需要
split_1 <- segment(text_comments, wk)
split_1 <- split_1[!grepl("大学", split_1)] # 把大学名字删去
split_combined <- sapply(split_1, function(x) {paste(x, collapse = " ")}) #拆分
split_combined <- split_combined[nchar(split_combined)>1]
split_combined <- split_combined[nchar(split_combined)<6] #去除过长过短的词
comments_freq <- freq(split_combined)
comments_freq <- comments_freq[order(comments_freq[,2],decreasing = TRUE),] #排序
然后就能出图了
wordcloud2(comments_freq, color = "random-light", size = 2, shape = "circle")
但是会发现储存有点问题。可以用之前提到过的webshot的方法。
library("htmlwidgets")
library("webshot")
webshot::install_phantomjs()
mygraph <- wordcloud2(comments_freq,
color = "random-light", size = 2, shape = "circle")
saveWidget(mygraph,"tmp.html",selfcontained = F)
webshot("tmp.html","fig_1.pdf", delay =5, vwidth = 1600, vheight=1600)
webshot("tmp.html","fig_1.png", delay =5, vwidth = 1600, vheight=1600)
最后结果如下啦:
我到现在其实对复旦的校训校歌印象还是比交大更深,因为那时候高中几年每周都要唱的hhh。