Thursday, March 12, 2009

Regular Expressions in Java and C#

This is a summary from reading "Introduction to Regular Expressions" written by Larry Mak

The basic idea of using regular expression is to match and to replace. The former determines if the pattern is in the string, and if so, find it. Replace changes the string according to the pattern to another pattern.

In C#, Regex in
System.Text.RegularExpressions is used to do the match. In Java, Pattern and Matcher in package java.util.regex are used do the work.

C#:
Regex.IsMatch( string data, "Hello" )

Java:
Pattern pat = Pattern.compile("Hello");
Matcher m = pat.matcher( data );
if ( m.find() )

To define a boundary in regular expression, you use "\b"

To match more than once in same string, in C#, you write something like below to loop all the matches:
   for ( Match m = Regex.Match( str, patternToMatch); m.Success; m = m.NextMatch() )
In Java, you use matcher's find method:
   Pattern pat = Pattern.compile(pattern);
Matcher m = pat.matcher( s );
while ( m.find() ){
System.out.println( m.group() );
}
Capturing groups are numbered by counting their opening parentheses from left to right.
E.g. $(\d+)\.(\d\d). For each matching, it can be further detailed to capturing groups.
In C#, GroupCollection gc = m.Groups;
In Java, matcher.group(i)

Replacement
In C#, use Regex.Replace( str, search, replace );
In Java, use matcher.replaceAll() method

One powerful operation you can do with regular expression is to change the
string with what you captured. you define the groups in order from 1,
and you can rearrange the groups by reorganize the order or groups,
for example, $1$3$2.






No comments: