Python正则表达式Regular Expression初探

语法	功能	注意事项
.	匹配除换行符（\n）以外，任意一个字符。	\.匹配点本身
[]	匹配[]中列举的字符，可以是很多单个，也可以范围	范围写法例如[2-6]，[“a”-”b”]
\d	匹配数字，即0-9
\D	匹配非数字
\s	匹配空白，即空格、tab键
\S	匹配非空白
\w	匹配单词字符，即a-z,A-Z,0-9,_
\W	匹配非单词字符

?数量匹配

语法	功能	注意事项
*	匹配前一个规则的字符出现0至无数次
+	匹配前一个规则的字符出现1至无数次
？	匹配前一个规则的字符出现0次或1次
{n}	匹配前一个规则的字符出现n次
{n,}	匹配前一个规则的字符出现最少n次
{m,n}	匹配前一个规则的字符出现m到n次

边界匹配

语法	功能	注意事项
^	匹配字符串的开始	放在子表达式前
$	匹配字符串的结束	放在子表达式后
\b	匹配单词的开始或结束
\B	匹配不是单词开头或结束的位置

?分组匹配

语法	功能	注意事项
\|	匹配左右任意一个表达式
()	将匹配的内容里一部分抠出来就用括号	抠出的内容不止一个就放到元组里

贪婪与懒惰

语法	功能	注意事项
*？	重复任意次	尽可能少重复
+？	重复1次或更多次	尽可能少重复
??	重复0次或一次	尽可能少重复
{n,m}?	重复n到m次	尽可能少重复
{n,}?	重复n次以上	尽可能少重复

原版说明

特殊字符

? ? The special characters are:
? ? ? ? "." ? ? ?Matches any character except a newline.
? ? ? ? "^" ? ? ?Matches the start of the string.
? ? ? ? "$" ? ? ?Matches the end of the string or just before the newline at
? ? ? ? ? ? ? ? ?the end of the string.
? ? ? ? "*" ? ? ?Matches 0 or more (greedy) repetitions of the preceding RE.
? ? ? ? ? ? ? ? ?Greedy means that it will match as many repetitions as possible.
? ? ? ? "+" ? ? ?Matches 1 or more (greedy) repetitions of the preceding RE.
? ? ? ? "?" ? ? ?Matches 0 or 1 (greedy) of the preceding RE.
? ? ? ? *?,+?,?? Non-greedy versions of the previous three special characters.
? ? ? ? {m,n} ? ?Matches from m to n repetitions of the preceding RE.
? ? ? ? {m,n}? ? Non-greedy version of the above.
? ? ? ? "\\" ? ? Either escapes special characters or signals a special sequence.
? ? ? ? [] ? ? ? Indicates a set of characters.
? ? ? ? ? ? ? ? ?A "^" as the first character indicates a complementing set.
? ? ? ? "|" ? ? ?A|B, creates an RE that will match either A or B.
? ? ? ? (...) ? ?Matches the RE inside the parentheses.
? ? ? ? ? ? ? ? ?The contents can be retrieved or matched later in the string.
? ? ? ? (?aiLmsux) The letters set the corresponding flags defined below.
? ? ? ? (?:...) ?Non-grouping version of regular parentheses.
? ? ? ? (?P<name>...) The substring matched by the group is accessible by name.
? ? ? ? (?P=name) ? ? Matches the text matched earlier by the group named name.
? ? ? ? (?#...) ?A comment; ignored.
? ? ? ? (?=...) ?Matches if ... matches next, but doesn't consume the string.
? ? ? ? (?!...) ?Matches if ... doesn't match next.
? ? ? ? (?<=...) Matches if preceded by ... (must be fixed length).
? ? ? ? (?<!...) Matches if not preceded by ... (must be fixed length).
? ? ? ? (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
? ? ? ? ? ? ? ? ? ? ? ? ? ?the (optional) no pattern otherwise.

转义序列

? ? The special sequences consist of "\\" and a character from the list
? ? below. ?If the ordinary character is not on the list, then the
? ? resulting RE will match the second character.
? ? ? ? \number ?Matches the contents of the group of the same number.
? ? ? ? \A ? ? ? Matches only at the start of the string.
? ? ? ? \Z ? ? ? Matches only at the end of the string.
? ? ? ? \b ? ? ? Matches the empty string, but only at the start or end of a word.
? ? ? ? \B ? ? ? Matches the empty string, but not at the start or end of a word.
? ? ? ? \d ? ? ? Matches any decimal digit; equivalent to the set [0-9] in
? ? ? ? ? ? ? ? ?bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the whole
? ? ? ? ? ? ? ? ?range of Unicode digits.
? ? ? ? \D ? ? ? Matches any non-digit character; equivalent to [^\d].
? ? ? ? \s ? ? ? Matches any whitespace character; equivalent to [ \t\n\r\f\v] in
? ? ? ? ? ? ? ? ?bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the whole
? ? ? ? ? ? ? ? ?range of Unicode whitespace characters.
? ? ? ? \S ? ? ? Matches any non-whitespace character; equivalent to [^\s].
? ? ? ? \w ? ? ? Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
? ? ? ? ? ? ? ? ?in bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the
? ? ? ? ? ? ? ? ?range of Unicode alphanumeric characters (letters plus digits
? ? ? ? ? ? ? ? ?plus underscore).
? ? ? ? ? ? ? ? ?With LOCALE, it will match the set [0-9_] plus characters defined
? ? ? ? ? ? ? ? ?as letters for the current locale.
? ? ? ? \W ? ? ? Matches the complement of \w.
? ? ? ? \\ ? ? ? Matches a literal backslash.

模块方法

? ? This module exports the following functions:
? ? ? ? match ? ? Match a regular expression pattern to the beginning of a string.
? ? ? ? fullmatch Match a regular expression pattern to all of a string.
? ? ? ? search ? ?Search a string for the presence of a pattern.
? ? ? ? sub ? ? ? Substitute occurrences of a pattern found in a string.
? ? ? ? subn ? ? ?Same as sub, but also return the number of substitutions made.
? ? ? ? split ? ? Split a string by the occurrences of a pattern.
? ? ? ? findall ? Find all occurrences of a pattern in a string.
? ? ? ? finditer ?Return an iterator yielding a Match object for each match.
? ? ? ? compile ? Compile a pattern into a Pattern object.
? ? ? ? purge ? ? Clear the regular expression cache.
? ? ? ? escape ? ?Backslash all non-alphanumerics in a string.

函数说明

match(pattern, string, flags=0)
? ? Try to apply the pattern at the start of the string, returning
? ? a Match object, or None if no match was found.

fullmatch(pattern, string, flags=0)
? ? Try to apply the pattern to all of the string, returning
? ? a Match object, or None if no match was found.

search(pattern, string, flags=0)
? ? Scan through string looking for a match to the pattern, returning
? ? a Match object, or None if no match was found.

sub(pattern, repl, string, count=0, flags=0)
? ? Return the string obtained by replacing the leftmost
? ? non-overlapping occurrences of the pattern in string by the
? ? replacement repl. ?repl can be either a string or a callable;
? ? if a string, backslash escapes in it are processed. ?If it is
? ? a callable, it's passed the Match object and must return
? ? a replacement string to be used.

subn(pattern, repl, string, count=0, flags=0)
? ? Return a 2-tuple containing (new_string, number).
? ? new_string is the string obtained by replacing the leftmost
? ? non-overlapping occurrences of the pattern in the source
? ? string by the replacement repl. ?number is the number of
? ? substitutions that were made. repl can be either a string or a
? ? callable; if a string, backslash escapes in it are processed.
? ? If it is a callable, it's passed the Match object and must
? ? return a replacement string to be used.

split(pattern, string, maxsplit=0, flags=0)
? ? Split the source string by the occurrences of the pattern,
? ? returning a list containing the resulting substrings. ?If
? ? capturing parentheses are used in pattern, then the text of all
? ? groups in the pattern are also returned as part of the resulting
? ? list. ?If maxsplit is nonzero, at most maxsplit splits occur,
? ? and the remainder of the string is returned as the final element
? ? of the list.

findall(pattern, string, flags=0)
? ? Return a list of all non-overlapping matches in the string.

? ? If one or more capturing groups are present in the pattern, return
? ? a list of groups; this will be a list of tuples if the pattern
? ? has more than one group.

? ? Empty matches are included in the result.

finditer(pattern, string, flags=0)
? ? Return an iterator over all non-overlapping matches in the
? ? string. ?For each match, the iterator returns a Match object.

? ? Empty matches are included in the result.

compile(pattern, flags=0)
? ? Compile a regular expression pattern, returning a Pattern object.

purge()
? ? Clear the regular expression caches

escape(pattern)
? ? Escape special characters in a string.

匹配模式

? ? Each function other than purge and escape can take an optional 'flags' argument
? ? consisting of one or more of the following module constants, joined by "|".
? ? A, L, and U are mutually exclusive.
? ? ? ? A ?ASCII ? ? ? For string patterns, make \w, \W, \b, \B, \d, \D
? ? ? ? ? ? ? ? ? ? ? ?match the corresponding ASCII character categories
? ? ? ? ? ? ? ? ? ? ? ?(rather than the whole Unicode categories, which is the
? ? ? ? ? ? ? ? ? ? ? ?default).
? ? ? ? ? ? ? ? ? ? ? ?For bytes patterns, this flag is the only available
? ? ? ? ? ? ? ? ? ? ? ?behaviour and needn't be specified.
? ? ? ? I ?IGNORECASE ?Perform case-insensitive matching.
? ? ? ? L ?LOCALE ? ? ?Make \w, \W, \b, \B, dependent on the current locale.
? ? ? ? M ?MULTILINE ? "^" matches the beginning of lines (after a newline)
? ? ? ? ? ? ? ? ? ? ? ?as well as the string.
? ? ? ? ? ? ? ? ? ? ? ?"$" matches the end of lines (before a newline) as well
? ? ? ? ? ? ? ? ? ? ? ?as the end of the string.
? ? ? ? S ?DOTALL ? ? ?"." matches any character at all, including the newline.
? ? ? ? X ?VERBOSE ? ? Ignore whitespace and comments for nicer looking RE's.
? ? ? ? U ?UNICODE ? ? For compatibility only. Ignored for string patterns (it
? ? ? ? ? ? ? ? ? ? ? ?is the default), and forbidden for bytes patterns.

常用匹配规则

掌握正则的关键是根据规则来编写匹配样式，下面列出一些常用的Regular pattern：

1. 匹配出所有整数

>>> import re
>>> pat = '\d+'
>>> txt = 'No.123;Tel:1396260000'
>>> re.findall(pat, txt)
['123', '1396260000']

2. 匹配11位且13开头的整数

注意r'13\d{9}'，13开头余下的9位数字用\d{9}表示

>>> import re
>>> txt = '''
... 001:13962600001
... 002:1330626001
... 003:18962600002
... 004:13106260003
... 005:16605200006
... '''
>>> pat = r'13\d{9}'
>>> re.findall(pat, txt)
['13962600001', '13106260003']

......

素材收集中。。。。。。

文章来源:https://blog.csdn.net/boysoft2002/article/details/135710524
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！