I've been using Perl Compatible Regular Expressions (PCRE) for about four years of my web-development. PCRE gurus may scoff at me for not knowing this beforehand, but here is a neat feature of PCRE I discovered today.
You may specify a sub-pattern as non-capturing using PCRE syntax. To do this, use the ?: symbol after the opening parenthesis of the sub-pattern.
Here's the situation that led me to this discovery:
I needed to split a string at any known XHTML tag. To do this properly, I used PHP's preg_split() function. Because I also needed to include the matched tags (split delimiters in this case) in the returned array I specified the PREG_SPLIT_DELIM_CAPTURE flag.
$all_elements = 'div|span|p|ul|ol|li|strong|em|etc...';
$pattern = '/(<\/?(' . $all_elements . ')[^<>]*?>)/i';
$strings = preg_split($pattern, $string, -1, PREG_SPLIT_DELIM_CAPTURE);
The outside sub-pattern is the entire delimiter (tag), which is what I want in the returned array from preg_split(). The inside sub-pattern is only needed to match any XHTML element name. Because there are two sub-patterns, the array returned from preg_split() contains both the entire tag (desired) and the element name (undesired).
Using the optional non-capturing sub-pattern syntax, my pattern looks like this:
$pattern = '/(<\/?(?:' . $all_elements . ')[^<>]*?>)/i';
Using the revised pattern, my code works as intended; only the full tag match is returned in the split array. This feature turned out to be quite useful for me. I imagine this feature will be useful anywhere that a sub-pattern containing branch syntax is used inside a larger pattern, and the larger pattern needs to be captured.
Comments
Mike Gauthier - April 5, 2007 2:00 AM
As a side note to this post, the code that needed this was the SwatString class from Swat.