Python正则表达式Regular Expression初探

发布时间:2024年01月21日

目录

Regular

匹配规则

单字符匹配

?数量匹配

边界匹配

?分组匹配

贪婪与懒惰

原版说明

特殊字符

转义序列

模块方法

函数说明

匹配模式

常用匹配规则

1. 匹配出所有整数

2. 匹配11位且13开头的整数


Regular

Python的re模块提供了完整的正则表达式功能。正则表达式(Regular Expression)是一种强大的文本模式匹配工具,它能高效地进行查找、替换、分割等复杂字符串操作。

在Python中,通过?import?re?即可引入这一神器。

匹配规则

单字符匹配

语法功能注意事项
.匹配除换行符(\n)以外,任意一个字符。\.匹配点本身
[]匹配[]中列举的字符,可以是很多单个,也可以范围范围写法例如[2-6],[“a”-”b”]
\d匹配数字,即0-9 
\D匹配非数字 
\s匹配空白,即空格、tab键 
\S匹配非空白 
\w匹配单词字符,即a-z,A-Z,0-9,_ 
\W匹配非单词字符 

?数量匹配

语法功能注意事项
*匹配前一个规则的字符出现0至无数次 
+匹配前一个规则的字符出现1至无数次 
匹配前一个规则的字符出现0次或1次 
{n}匹配前一个规则的字符出现n次 
{n,}匹配前一个规则的字符出现最少n次 
{m,n}匹配前一个规则的字符出现m到n次 

边界匹配

语法功能注意事项
^匹配字符串的开始放在子表达式前
$匹配字符串的结束放在子表达式后
\b匹配单词的开始或结束 
\B匹配不是单词开头或结束的位置 

?分组匹配

语法功能注意事项
|匹配左右任意一个表达式 
()将匹配的内容里一部分抠出来就用括号抠出的内容不止一个就放到元组里

贪婪与懒惰

语法功能注意事项
*?重复任意次尽可能少重复
+?重复1次或更多次尽可能少重复
??重复0次或一次尽可能少重复
{n,m}?重复nm尽可能少重复
{n,}?重复n次以上尽可能少重复

原版说明

特殊字符

? ? The special characters are:
? ? ? ? "." ? ? ?Matches any character except a newline.
? ? ? ? "^" ? ? ?Matches the start of the string.
? ? ? ? "$" ? ? ?Matches the end of the string or just before the newline at
? ? ? ? ? ? ? ? ?the end of the string.
? ? ? ? "*" ? ? ?Matches 0 or more (greedy) repetitions of the preceding RE.
? ? ? ? ? ? ? ? ?Greedy means that it will match as many repetitions as possible.
? ? ? ? "+" ? ? ?Matches 1 or more (greedy) repetitions of the preceding RE.
? ? ? ? "?" ? ? ?Matches 0 or 1 (greedy) of the preceding RE.
? ? ? ? *?,+?,?? Non-greedy versions of the previous three special characters.
? ? ? ? {m,n} ? ?Matches from m to n repetitions of the preceding RE.
? ? ? ? {m,n}? ? Non-greedy version of the above.
? ? ? ? "\\" ? ? Either escapes special characters or signals a special sequence.
? ? ? ? [] ? ? ? Indicates a set of characters.
? ? ? ? ? ? ? ? ?A "^" as the first character indicates a complementing set.
? ? ? ? "|" ? ? ?A|B, creates an RE that will match either A or B.
? ? ? ? (...) ? ?Matches the RE inside the parentheses.
? ? ? ? ? ? ? ? ?The contents can be retrieved or matched later in the string.
? ? ? ? (?aiLmsux) The letters set the corresponding flags defined below.
? ? ? ? (?:...) ?Non-grouping version of regular parentheses.
? ? ? ? (?P<name>...) The substring matched by the group is accessible by name.
? ? ? ? (?P=name) ? ? Matches the text matched earlier by the group named name.
? ? ? ? (?#...) ?A comment; ignored.
? ? ? ? (?=...) ?Matches if ... matches next, but doesn't consume the string.
? ? ? ? (?!...) ?Matches if ... doesn't match next.
? ? ? ? (?<=...) Matches if preceded by ... (must be fixed length).
? ? ? ? (?<!...) Matches if not preceded by ... (must be fixed length).
? ? ? ? (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
? ? ? ? ? ? ? ? ? ? ? ? ? ?the (optional) no pattern otherwise.

转义序列

? ? The special sequences consist of "\\" and a character from the list
? ? below. ?If the ordinary character is not on the list, then the
? ? resulting RE will match the second character.
? ? ? ? \number ?Matches the contents of the group of the same number.
? ? ? ? \A ? ? ? Matches only at the start of the string.
? ? ? ? \Z ? ? ? Matches only at the end of the string.
? ? ? ? \b ? ? ? Matches the empty string, but only at the start or end of a word.
? ? ? ? \B ? ? ? Matches the empty string, but not at the start or end of a word.
? ? ? ? \d ? ? ? Matches any decimal digit; equivalent to the set [0-9] in
? ? ? ? ? ? ? ? ?bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the whole
? ? ? ? ? ? ? ? ?range of Unicode digits.
? ? ? ? \D ? ? ? Matches any non-digit character; equivalent to [^\d].
? ? ? ? \s ? ? ? Matches any whitespace character; equivalent to [ \t\n\r\f\v] in
? ? ? ? ? ? ? ? ?bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the whole
? ? ? ? ? ? ? ? ?range of Unicode whitespace characters.
? ? ? ? \S ? ? ? Matches any non-whitespace character; equivalent to [^\s].
? ? ? ? \w ? ? ? Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
? ? ? ? ? ? ? ? ?in bytes patterns or string patterns with the ASCII flag.
? ? ? ? ? ? ? ? ?In string patterns without the ASCII flag, it will match the
? ? ? ? ? ? ? ? ?range of Unicode alphanumeric characters (letters plus digits
? ? ? ? ? ? ? ? ?plus underscore).
? ? ? ? ? ? ? ? ?With LOCALE, it will match the set [0-9_] plus characters defined
? ? ? ? ? ? ? ? ?as letters for the current locale.
? ? ? ? \W ? ? ? Matches the complement of \w.
? ? ? ? \\ ? ? ? Matches a literal backslash.

模块方法

? ? This module exports the following functions:
? ? ? ? match ? ? Match a regular expression pattern to the beginning of a string.
? ? ? ? fullmatch Match a regular expression pattern to all of a string.
? ? ? ? search ? ?Search a string for the presence of a pattern.
? ? ? ? sub ? ? ? Substitute occurrences of a pattern found in a string.
? ? ? ? subn ? ? ?Same as sub, but also return the number of substitutions made.
? ? ? ? split ? ? Split a string by the occurrences of a pattern.
? ? ? ? findall ? Find all occurrences of a pattern in a string.
? ? ? ? finditer ?Return an iterator yielding a Match object for each match.
? ? ? ? compile ? Compile a pattern into a Pattern object.
? ? ? ? purge ? ? Clear the regular expression cache.
? ? ? ? escape ? ?Backslash all non-alphanumerics in a string.

函数说明

match(pattern, string, flags=0)
? ? Try to apply the pattern at the start of the string, returning
? ? a Match object, or None if no match was found.

fullmatch(pattern, string, flags=0)
? ? Try to apply the pattern to all of the string, returning
? ? a Match object, or None if no match was found.

search(pattern, string, flags=0)
? ? Scan through string looking for a match to the pattern, returning
? ? a Match object, or None if no match was found.

sub(pattern, repl, string, count=0, flags=0)
? ? Return the string obtained by replacing the leftmost
? ? non-overlapping occurrences of the pattern in string by the
? ? replacement repl. ?repl can be either a string or a callable;
? ? if a string, backslash escapes in it are processed. ?If it is
? ? a callable, it's passed the Match object and must return
? ? a replacement string to be used.

subn(pattern, repl, string, count=0, flags=0)
? ? Return a 2-tuple containing (new_string, number).
? ? new_string is the string obtained by replacing the leftmost
? ? non-overlapping occurrences of the pattern in the source
? ? string by the replacement repl. ?number is the number of
? ? substitutions that were made. repl can be either a string or a
? ? callable; if a string, backslash escapes in it are processed.
? ? If it is a callable, it's passed the Match object and must
? ? return a replacement string to be used.

split(pattern, string, maxsplit=0, flags=0)
? ? Split the source string by the occurrences of the pattern,
? ? returning a list containing the resulting substrings. ?If
? ? capturing parentheses are used in pattern, then the text of all
? ? groups in the pattern are also returned as part of the resulting
? ? list. ?If maxsplit is nonzero, at most maxsplit splits occur,
? ? and the remainder of the string is returned as the final element
? ? of the list.

findall(pattern, string, flags=0)
? ? Return a list of all non-overlapping matches in the string.

? ? If one or more capturing groups are present in the pattern, return
? ? a list of groups; this will be a list of tuples if the pattern
? ? has more than one group.

? ? Empty matches are included in the result.

finditer(pattern, string, flags=0)
? ? Return an iterator over all non-overlapping matches in the
? ? string. ?For each match, the iterator returns a Match object.

? ? Empty matches are included in the result.

compile(pattern, flags=0)
? ? Compile a regular expression pattern, returning a Pattern object.

purge()
? ? Clear the regular expression caches

escape(pattern)
? ? Escape special characters in a string.


匹配模式

? ? Each function other than purge and escape can take an optional 'flags' argument
? ? consisting of one or more of the following module constants, joined by "|".
? ? A, L, and U are mutually exclusive.
? ? ? ? A ?ASCII ? ? ? For string patterns, make \w, \W, \b, \B, \d, \D
? ? ? ? ? ? ? ? ? ? ? ?match the corresponding ASCII character categories
? ? ? ? ? ? ? ? ? ? ? ?(rather than the whole Unicode categories, which is the
? ? ? ? ? ? ? ? ? ? ? ?default).
? ? ? ? ? ? ? ? ? ? ? ?For bytes patterns, this flag is the only available
? ? ? ? ? ? ? ? ? ? ? ?behaviour and needn't be specified.
? ? ? ? I ?IGNORECASE ?Perform case-insensitive matching.
? ? ? ? L ?LOCALE ? ? ?Make \w, \W, \b, \B, dependent on the current locale.
? ? ? ? M ?MULTILINE ? "^" matches the beginning of lines (after a newline)
? ? ? ? ? ? ? ? ? ? ? ?as well as the string.
? ? ? ? ? ? ? ? ? ? ? ?"$" matches the end of lines (before a newline) as well
? ? ? ? ? ? ? ? ? ? ? ?as the end of the string.
? ? ? ? S ?DOTALL ? ? ?"." matches any character at all, including the newline.
? ? ? ? X ?VERBOSE ? ? Ignore whitespace and comments for nicer looking RE's.
? ? ? ? U ?UNICODE ? ? For compatibility only. Ignored for string patterns (it
? ? ? ? ? ? ? ? ? ? ? ?is the default), and forbidden for bytes patterns.


常用匹配规则

掌握正则的关键是根据规则来编写匹配样式,下面列出一些常用的Regular pattern:

1. 匹配出所有整数

>>> import re
>>> pat = '\d+'
>>> txt = 'No.123;Tel:1396260000'
>>> re.findall(pat, txt)
['123', '1396260000']

2. 匹配11位且13开头的整数

注意r'13\d{9}',13开头余下的9位数字用\d{9}表示

>>> import re
>>> txt = '''
... 001:13962600001
... 002:1330626001
... 003:18962600002
... 004:13106260003
... 005:16605200006
... '''
>>> pat = r'13\d{9}'
>>> re.findall(pat, txt)
['13962600001', '13106260003']

......

素材收集中。。。。。。

文章来源:https://blog.csdn.net/boysoft2002/article/details/135710524
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。