@evilking 2018-03-03T08:21:38.000000Z 字数 2474 阅读 1390

R基础篇

字符串

不论在何种编程语言中，字符串都是最重要的数据结构之一，尽管R是一门以数值向量和矩阵为核心的统计语言。

本篇主要介绍一些常用的字符串操作的函数的使用方式

什么是字符串

> s <- "this is a string"

> ss <- 'this is a string too'

> s
[1] "this is a string"

> ss
[1] "this is a string too"

> class(s)
[1] "character"
> class(ss)
[1] "character"
> str(s)
 chr "this is a string"
>

字符串这个数据结构在前面我们也多次用到了，R中用""双引号括起来的字符序列就是字符串，用''单引号括起来的字符序列也是字符串，通过class()函数查看字符串的类型为character，所以可以看出R中没有像其他编程语言一样有单个字符一说，只有字符串

常用的字符串函数

> grep("Pole",c("Equator","north Pole","South Pole"))
[1] 2 3

> grep("pole",c("Equator","north Pole","South Pole"))
integer(0)
>

grep(pattern,x)函数在字符串向量x里搜索给定子字符串pattern。如果x有n个元素，即包含n个字符串，则grep(pattern,x)会返回一个长度不超过n的向量。这个向量的每个元素是x的索引，表示在索引对应的元素x[i]中有与pattern匹配的子字符串。pattern区分大小写

> nchar("South Pole")
[1] 10
>

nchar(str)函数返回字符串str的长度；其中需要注意的是，R语言中的字符串末尾没有空字符串NULL；还需注意，如果str不是字符形式，nchar()会得到不可预料的结果，如果想得到更一致的结果，可到CRAN上找Hadley Wickham写的strings包

> paste("North", "Pole")
[1] "North Pole"

> paste("North", "Pole", sep = "")
[1] "NorthPole"

> paste("North", "Pole", sep = ".")
[1] "North.Pole"

> paste("North", "and", "South", "Poles",4)
[1] "North and South Poles 4"
>

paste()函数把若干个字符串拼接起来，默认的拼接分割符是空格，返回一个长字符串；但是可以通过设置sep =参数来改变默认的分割符

> i <- 8

> s <- sprintf("the square of %d is %d",i,i^2)

> s
[1] "the square of 8 is 64"
>

> i <- 6.4
> s <- sprintf("the square of %10.3f is %f",i,i^2)

> s
[1] "the square of      6.400 is 40.960000"
>

sprintf()函数表示的是字符串打印，有点类似于C语言中的print()函数，同样可以通过%m.nd来控制打印的格式

> substring("Equator",3,5)
[1] "uat"

> substr("Equator",3,5)
[1] "uat"
>

substring(x,start,stop)函数返回给定字符串x中指定位置范围start:stop上的子字符串，双闭区间

> strsplit("3-24-2017",split = "-")
[[1]]
[1] "3"    "24"   "2017"

>

strsplit(x,split)函数非常常用，可根据x中的字符串split把字符串x拆分成若干子字符串，返回的这些子字符串构成一个列表

> regexpr("uat","Equator")
[1] 3
attr(,"match.length")
[1] 3
attr(,"useBytes")
[1] TRUE

> gregexpr("iss","Mississippi")
[[1]]
[1] 2 5
attr(,"match.length")
[1] 3 3
attr(,"useBytes")
[1] TRUE

>

regexpr(pattern,text)在字符串text中寻找pattern，返回与pattern匹配的第一个子字符串的起始字符位置和该字符匹配的长度

grepexpr(pattern,text)与上面的函数功能一样，只是会寻找所有匹配的子字符串，并返回所有匹配的子字符串的起始位置

正则表达式

> grep("[au]",c("Equator","North Pole","South Pole")) #查找字符串中含有a或u的字符串
[1] 1 3

> grep("o.e",c("Equator","North Pole","South Pole")) #"."表示通配符，可以表示任意字符
[1] 2 3

#转义字符，"\\."仅表示"."符号本身，不表示通配符
> grep("\\.",c("abc","de","f.g")) 
[1] 3
>

关于R中如何使用正则表达式，可以参考一些网友提供的博客，例如http://bbs.pinggu.org/thread-2250432-1-1.html

正则表达式不管是在何种语言中，规则都是大同小异，详细的正则表达式语法，可参考http://www.runoob.com/regexp/regexp-syntax.html

从键盘中输入读取字符串

> v <- scan("")
1: 1
2: 2
3: 3 4 5
6: 
Read 5 items

> v
[1] 1 2 3 4 5

> vs <- scan("")  #scan()函数只能读取实数
1: "a" "b" "c"
Error in scan("") : scan()需要'a real', 而不是'"a"'
>

> w <- readline()  #readline()函数可以读取单行字符串
abc de f

> w
[1] "abc de f"

> inits <- readline("type your initials: ")    #可以设置提示语
type your initials: nm abc de

> inits
[1] "nm abc de"
>

字符串

什么是字符串

常用的字符串函数

正则表达式

从键盘中输入读取字符串

内容目录