?, +, *, and {} 正则表达式语法和多种/非多种匹配

@zhangyu756897669 2017-09-06T15:25:15.000000Z 字数 2190 阅读 680

python 编程

python官方文档

正则表达式中 ?,+,*,{}的运用

有时，您只能选择匹配的模式。也就是说，正则表达式应该找到匹配，无论文本的位是否存在。那个？字符将其前面的组标记为模式的可选部分。

batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()>'Batman'

'Batman'

mo2 = batRegex.search('The Adventures of Batwoman')
mo2.group()

'Batwoman'

(wo)?在正则表达式中意味着模式wo是可选组。正则表达式将匹配具有零个实例或一个实例的文本。这就是为什么正则表达式匹配“Batman”和“Batwoman”。

以电话号码为例，您可以使正则表达式查找有使用或没有区号的电话号码。

phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo1 = phoneRegex.search('My number is 415-555-4242')
mo1.group()
<div class="md-section-divider"></div>

'415-555-4242'

mo2 = phoneRegex.search('My number is 555-4242')
mo2.group()
<div class="md-section-divider"></div>

'555-4242'

你可以认为？的意思是：匹配零或该问号前面的组之一。
如果您需要匹配实际的问号字符，使用：\?

匹配0 或者匹配*

*表示“匹配零个或多个”，星标之前的组可以在文本中发生任何次数。它可以完全没有或重复一遍。我们再来看Botman的例子。 -

batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()
<div class="md-section-divider"></div>

'Batman'

mo2 = batRegex.search('The Adventures of Batwoman')
mo2.group()
<div class="md-section-divider"></div>

Batwoman

mo3 = batRegex.search('The Adventures of Batwowowowoman')
mo3.group()
<div class="md-section-divider"></div>

'Batwowowowoman'

用+ 匹配一个或多个

* 意味着匹配0个或更多， + 意味着匹配一个或更多，与*不同的是，它不需要它的组出现在匹配的字符串中，加号之前的组必须至少出现一次。它不是可选的。

batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batwoman')
mo1.group()

'Batwoman'

mo2 = batRegex.search('The Adventures of Batwowowowoman')
mo2.group()

'Batwowowowoman'

mo3 = batRegex.search('The Adventures of Batman')
mo3 == None

True

如果您需要匹配实际的加号字符，请使用反斜杠前缀加号以将其转义为：+。

匹配具体重复 , {}

如果您有一组要重复特定次数的组，在正则表达式中将按照大括号中的数字次数重复，例如{ha}{3}将会返回{hahaha}
不是一个数字，您可以通过在大括号之间写入最小值，逗号和最大值来指定范围。例如{ha}{3, 5}将返回{hahaha}, {hahahaha}, {hahahahaha}

haRegex = re.compile(r'(Ha){3}')
mo1 = haRegex.search('HaHaHa')
mo1.group()

'HaHaHa'

mo2 = haRegex.search('Ha')
mo2 == None

True

默认工作模式

由于（Ha）{3,5}可以匹配字符串“HaHaHaHaHa”中的三，四或五个Ha实例，您可能会想知道为什么Match对象在前面的大括号示例中对group（）的调用返回'HaHaHaHaHa'较短的可能性。毕竟，“HaHaHa”和“HaHaHaHa”也是正则表达式（Ha）{3,5}的有效匹配。
Python的正则表达式默认是贪婪的，这意味着在不明确的情况下，它们将匹配最长的字符串。与可能的最短字符串匹配的大括号的非贪心版本具有闭合大括号，后跟一个问号。

greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
mo1.group()

'HaHaHaHaHa'

nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
mo2.group()