Regular expressions in Python and Perl
Python supports essentially the same regular expression syntax as Perl, as far as the regular expressions themselves. However, the syntax for using regular expressions is substantially different.
Regular expression support is not available out of the box; you must import
the re
module.
Regular expression patterns are contained in strings, in contrast to Perl's
built-in // syntax. This means that some characters need to be escaped in
order to be passed on to the regular expression engine. To be safe, always
use raw strings (r''
or r""
) to contain patterns.
You might think that re.match()
is the analog to Perl's
m//
match operator. It's not! The re.match()
function
matches regular expressions starting at the beginning of a string. It
behaves as if every pattern has ^
prepended. The function
re.search()
behaves like Perl's m//
and is
probably what you want to use exclusively.
The functions match
and search
return an object with a group
method. The
group
method without any argument returns the entire match. The group
method with a positive integer argument returns captured expressions:
group(1)
returns the first capture, group(2)
returns the second,
analogous to $1
, $2
, etc. in Perl. (If there are no matches,
match
and search
return None
and so you must
check whether the match object is valid before calling methods on it.)
The groups()
method returns all matches
as a tuple.
Python doesn't have a global modifier like Perl's /g
option. To
find all matches to a pattern, use re.findall()
rather than
re.search()
. The findall
method returns a list of matches rather than a
match object.
To substitute for a pattern, analogous to Perl's s//
operator,
use re.sub()
. Actually, re.sub()
is analogous to
s//g
since it replaces all instances of a pattern by default.
To change this behavior, you can specify the maximum number of instances to
replace using the max parameter to re.sub()
. Setting this
parameter to 1 causes only the first instance to be substituted, as in
Perl's s//
.
To make a regular expression case-insensitive, pass the argument re.I
(or re.IGNORECASE
) as the final argument to
re.search()
.
The function re.sub()
does not take flags such as re.I
.
So in order to make the regular expression match case-insensitive, one must
modify the regular expression itself by adding (?i)
to the
beginning of the expression. (The modifier (?i)
can go
anywhere, but the regular expression will be most readable if the modifier
goes at the beginning or possibly at the end.)
Resources
Notes on using regular expressions in other languages: PowerShell, C++, R, Mathematica
Other Python articles: making an XML sitemap, languages easy to pick up