目录
Python的re模块提供了完整的正则表达式功能。正则表达式(Regular Expression)是一种强大的文本模式匹配工具,它能高效地进行查找、替换、分割等复杂字符串操作。
在Python中,通过?import
?re
?即可引入这一神器。
语法 | 功能 | 注意事项 |
. | 匹配除换行符(\n)以外,任意一个字符。 | \.匹配点本身 |
[] | 匹配[]中列举的字符,可以是很多单个,也可以范围 | 范围写法例如[2-6],[“a”-”b”] |
\d | 匹配数字,即0-9 | |
\D | 匹配非数字 | |
\s | 匹配空白,即空格、tab键 | |
\S | 匹配非空白 | |
\w | 匹配单词字符,即a-z,A-Z,0-9,_ | |
\W | 匹配非单词字符 |
语法 | 功能 | 注意事项 |
* | 匹配前一个规则的字符出现0至无数次 | |
+ | 匹配前一个规则的字符出现1至无数次 | |
? | 匹配前一个规则的字符出现0次或1次 | |
{n} | 匹配前一个规则的字符出现n次 | |
{n,} | 匹配前一个规则的字符出现最少n次 | |
{m,n} | 匹配前一个规则的字符出现m到n次 |
语法 | 功能 | 注意事项 |
^ | 匹配字符串的开始 | 放在子表达式前 |
$ | 匹配字符串的结束 | 放在子表达式后 |
\b | 匹配单词的开始或结束 | |
\B | 匹配不是单词开头或结束的位置 |
语法 | 功能 | 注意事项 |
| | 匹配左右任意一个表达式 | |
() | 将匹配的内容里一部分抠出来就用括号 | 抠出的内容不止一个就放到元组里 |
语法 | 功能 | 注意事项 |
*? | 重复任意次 | 尽可能少重复 |
+? | 重复1次或更多次 | 尽可能少重复 |
?? | 重复0次或一次 | 尽可能少重复 |
{n,m}? | 重复n到m次 | 尽可能少重复 |
{n,}? | 重复n次以上 | 尽可能少重复 |
? ? The special characters are:
? ? ? ? "." ? ? ?Matches any character except a newline.
? ? ? ? "^" ? ? ?Matches the start of the string.
? ? ? ? "$" ? ? ?Matches the end of the string or just before the newline at
? ? ? ? ? ? ? ? ?the end of the string.
? ? ? ? "*" ? ? ?Matches 0 or more (greedy) repetitions of the preceding RE.
? ? ? ? ? ? ? ? ?Greedy means that it will match as many repetitions as possible.
? ? ? ? "+" ? ? ?Matches 1 or more (greedy) repetitions of the preceding RE.
? ? ? ? "?" ? ? ?Matches 0 or 1 (greedy) of the preceding RE.
? ? ? ? *?,+?,?? Non-greedy versions of the previous three special characters.
? ? ? ? {m,n} ? ?Matches from m to n repetitions of the preceding RE.
? ? ? ? {m,n}? ? Non-greedy version of the above.
? ? ? ? "\\" ? ? Either escapes special characters or signals a special sequence.
? ? ? ? [] ? ? ? Indicates a set of characters.
? ? ? ? ? ? ? ? ?A "^" as the first character indicates a complementing set.
? ? ? ? "|" ? ? ?A|B, creates an RE that will match either A or B.
? ? ? ? (...) ? ?Matches the RE inside the parentheses.
? ? ? ? ? ? ? ? ?The contents can be retrieved or matched later in the string.
? ? ? ? (?aiLmsux) The letters set the corresponding flags defined below.
? ? ? ? (?:...) ?Non-grouping version of regular parentheses.
? ? ? ? (?P<name>...) The substring matched by the group is accessible by name.
? ? ? ? (?P=name) ? ? Matches the text matched earlier by the group named name.
? ? ? ? (?#...) ?A comment; ignored.
? ? ? ? (?=...) ?Matches if ... matches next, but doesn't consume the string.
? ? ? ? (?!...) ?Matches if ... doesn't match next.
? ? ? ? (?<=...) Matches if preceded by ... (must be fixed length).
? ? ? ? (?<!...) Matches if not preceded by ... (must be fixed length).
? ? ? ? (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
? ? ? ? ? ? ? ? ? ? ? ? ? ?the (optional) no pattern otherwise.
? ? The special sequences consist of "\\" and a character from the list
? ? below. ?If the ordinary character is not on the list, then the
? ? resulting RE will match the second character.
? ? ? ? \number ?Matches the contents of the group of the same number.
? ? ? ? \A ? ? ? Matches only at the start of the string.
? ? ? ? \Z ? ? ? Matches only at the end of the string.
? ? ? ? \b ? ? ? Matches the empty string, but only at the start or end of a word.
? ? ? ? \B ? ? ? Matches the empty string, but not at the start or end of a word.
? ? ? ? \d ? ? ? Matches any decimal digit; equivalent to the set [0-9] in
? ? ? ? ? ? ? ? ?bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the whole
? ? ? ? ? ? ? ? ?range of Unicode digits.
? ? ? ? \D ? ? ? Matches any non-digit character; equivalent to [^\d].
? ? ? ? \s ? ? ? Matches any whitespace character; equivalent to [ \t\n\r\f\v] in
? ? ? ? ? ? ? ? ?bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the whole
? ? ? ? ? ? ? ? ?range of Unicode whitespace characters.
? ? ? ? \S ? ? ? Matches any non-whitespace character; equivalent to [^\s].
? ? ? ? \w ? ? ? Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
? ? ? ? ? ? ? ? ?in bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the
? ? ? ? ? ? ? ? ?range of Unicode alphanumeric characters (letters plus digits
? ? ? ? ? ? ? ? ?plus underscore).
? ? ? ? ? ? ? ? ?With LOCALE, it will match the set [0-9_] plus characters defined
? ? ? ? ? ? ? ? ?as letters for the current locale.
? ? ? ? \W ? ? ? Matches the complement of \w.
? ? ? ? \\ ? ? ? Matches a literal backslash.
? ? This module exports the following functions:
? ? ? ? match ? ? Match a regular expression pattern to the beginning of a string.
? ? ? ? fullmatch Match a regular expression pattern to all of a string.
? ? ? ? search ? ?Search a string for the presence of a pattern.
? ? ? ? sub ? ? ? Substitute occurrences of a pattern found in a string.
? ? ? ? subn ? ? ?Same as sub, but also return the number of substitutions made.
? ? ? ? split ? ? Split a string by the occurrences of a pattern.
? ? ? ? findall ? Find all occurrences of a pattern in a string.
? ? ? ? finditer ?Return an iterator yielding a Match object for each match.
? ? ? ? compile ? Compile a pattern into a Pattern object.
? ? ? ? purge ? ? Clear the regular expression cache.
? ? ? ? escape ? ?Backslash all non-alphanumerics in a string.
match(pattern, string, flags=0)
? ? Try to apply the pattern at the start of the string, returning
? ? a Match object, or None if no match was found.fullmatch(pattern, string, flags=0)
? ? Try to apply the pattern to all of the string, returning
? ? a Match object, or None if no match was found.search(pattern, string, flags=0)
? ? Scan through string looking for a match to the pattern, returning
? ? a Match object, or None if no match was found.sub(pattern, repl, string, count=0, flags=0)
? ? Return the string obtained by replacing the leftmost
? ? non-overlapping occurrences of the pattern in string by the
? ? replacement repl. ?repl can be either a string or a callable;
? ? if a string, backslash escapes in it are processed. ?If it is
? ? a callable, it's passed the Match object and must return
? ? a replacement string to be used.subn(pattern, repl, string, count=0, flags=0)
? ? Return a 2-tuple containing (new_string, number).
? ? new_string is the string obtained by replacing the leftmost
? ? non-overlapping occurrences of the pattern in the source
? ? string by the replacement repl. ?number is the number of
? ? substitutions that were made. repl can be either a string or a
? ? callable; if a string, backslash escapes in it are processed.
? ? If it is a callable, it's passed the Match object and must
? ? return a replacement string to be used.split(pattern, string, maxsplit=0, flags=0)
? ? Split the source string by the occurrences of the pattern,
? ? returning a list containing the resulting substrings. ?If
? ? capturing parentheses are used in pattern, then the text of all
? ? groups in the pattern are also returned as part of the resulting
? ? list. ?If maxsplit is nonzero, at most maxsplit splits occur,
? ? and the remainder of the string is returned as the final element
? ? of the list.findall(pattern, string, flags=0)
? ? Return a list of all non-overlapping matches in the string.? ? If one or more capturing groups are present in the pattern, return
? ? a list of groups; this will be a list of tuples if the pattern
? ? has more than one group.? ? Empty matches are included in the result.
finditer(pattern, string, flags=0)
? ? Return an iterator over all non-overlapping matches in the
? ? string. ?For each match, the iterator returns a Match object.? ? Empty matches are included in the result.
compile(pattern, flags=0)
? ? Compile a regular expression pattern, returning a Pattern object.purge()
? ? Clear the regular expression cachesescape(pattern)
? ? Escape special characters in a string.
? ? Each function other than purge and escape can take an optional 'flags' argument
? ? consisting of one or more of the following module constants, joined by "|".
? ? A, L, and U are mutually exclusive.
? ? ? ? A ?ASCII ? ? ? For string patterns, make \w, \W, \b, \B, \d, \D
? ? ? ? ? ? ? ? ? ? ? ?match the corresponding ASCII character categories
? ? ? ? ? ? ? ? ? ? ? ?(rather than the whole Unicode categories, which is the
? ? ? ? ? ? ? ? ? ? ? ?default).
? ? ? ? ? ? ? ? ? ? ? ?For bytes patterns, this flag is the only available
? ? ? ? ? ? ? ? ? ? ? ?behaviour and needn't be specified.
? ? ? ? I ?IGNORECASE ?Perform case-insensitive matching.
? ? ? ? L ?LOCALE ? ? ?Make \w, \W, \b, \B, dependent on the current locale.
? ? ? ? M ?MULTILINE ? "^" matches the beginning of lines (after a newline)
? ? ? ? ? ? ? ? ? ? ? ?as well as the string.
? ? ? ? ? ? ? ? ? ? ? ?"$" matches the end of lines (before a newline) as well
? ? ? ? ? ? ? ? ? ? ? ?as the end of the string.
? ? ? ? S ?DOTALL ? ? ?"." matches any character at all, including the newline.
? ? ? ? X ?VERBOSE ? ? Ignore whitespace and comments for nicer looking RE's.
? ? ? ? U ?UNICODE ? ? For compatibility only. Ignored for string patterns (it
? ? ? ? ? ? ? ? ? ? ? ?is the default), and forbidden for bytes patterns.
掌握正则的关键是根据规则来编写匹配样式,下面列出一些常用的Regular pattern:
>>> import re
>>> pat = '\d+'
>>> txt = 'No.123;Tel:1396260000'
>>> re.findall(pat, txt)
['123', '1396260000']
注意r'13\d{9}',13开头余下的9位数字用\d{9}表示
>>> import re
>>> txt = '''
... 001:13962600001
... 002:1330626001
... 003:18962600002
... 004:13106260003
... 005:16605200006
... '''
>>> pat = r'13\d{9}'
>>> re.findall(pat, txt)
['13962600001', '13106260003']
......
素材收集中。。。。。。