Regular Expressions Round-up
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
Jamie Zawinski on August 12, 1997 - more information on the origin of this quote can be found on Jeffrey Friedl’s Blog.
Introduction
Regular expressions or 'regex' as they are commonly called are a very powerful tool. I've given some presentations on this subject. This page is a short round-up of some advice and useful links for the attendees to refer back to.
Some things to keep in mind
- Regular expressions can be slow if your regex is complex and/or your subject string is long. Consider alternatives if possible, such as
explode(),
strstr(),
strpos(),
str_replace(),
the Filter extension (>= PHP 5.2.0)
and the ctype extension (bear in mind that the last one is locale dependent)
- If you decide to use regular expressions anyway, the PRCE implementation is preferred above the POSIX implementation.
- If your string has to be evaluated numerous times to determine whether there is a match or not, your regex will be ...slow... Be as specific as you can about what you want to match.
- Regular expressions are greedy by nature. You can change this behaviour by either using the /U switch or by using ungreedy multipliers such as +? (once or more, but as few times as possible)
- Whenever relevant, use negative selectors - non-greedy by nature. Example: [^\n\r]+ will match everything until the next line-break.
- Only remember what you need. As the (...) syntax is also used for grouping, it is very common to remember a lot more than you need. Use the (?:...) syntax to group without remembering.
- Switches can radically alter the result of your regex:
/..../i = case-insensitive
/..../u = unicode
/..../U = ungreedy
etc
- Using a different regex delimiter from the default / can often prevent unreadable regexes as you'll need to escape less.
Compare: /^http:\/\/[^\/]+\// with `^http://[^/]+/`.
Do remember though, to provide the chosen delimiter character to the preg_quote() function if you use it.
- Naming matched groups makes them accessible via an associate array which is a lot more descriptive than the number-indexed array you'd otherwise get. You can name a match group using the (?P<name>regex) syntax. You can also reference these named groups from within the regex using the (?P=name) syntax.
- Use comments to document how your regular expression has been build up so you can more easily decypher it the next time you need to work on it.
You can either build up the regex string using concatenation and use the commenting style of the programming language you're using:
$regex = '`^((?:19|20)\d{2})'; // = year, group 1
$regex .= '[ /-]?'; // potential separator
$regex .= '([01][0-2]|[1-9])$`'; // = month, group 2
Alternatively, you can use regex native comments using the (?#comment) syntax or by making the regular expression engine ignore whitespace altogether within a pattern using the /..../x modifier:
$regex = '`^((?:19|20)\d{2}) # = year, group 1
[ /-]? # potential separator
([01][0-2]|[1-9]) # = month, group 2
$`x';
More information
Below you'll find some lists of useful resources. The listings are in completely biased and arbitrary order, i.e. based on my personal preference and where I don't have one, in random order.
About regular expressions: Manuals and Tutorials
Example regular expressions
Beware: a lot of these are not very *good* examples
Regex Cheatsheets
Testing regular expressions
Important: When you choose your preferred testing tool, make sure it uses the regex engine implemented in the programming language(s) of your choice!
Visualizing regular expressions
Generating regular expressions
- Regulazy - can build simple regular expressions based on given pattern
- RegexMagic - Regex generator (€ 29,95)
Security considerations
Other useful resources
Contact me:
For feedback, additions or raving enthousiastic thank-you mails, you can contact me by e-mail: