Monday, January 16, 2012

Java regex's

Just a quick note on java.util.Pattern. e.g. extract numbers from a known pattern

We know that our data format will be various letters, the number we want to extract. One or more letters. How do we extract the numbers?

Easy, using a regex Pattern, and the Matcher class.

e.g. (in groovy)
import java.util.regex.*
def testString ="abc4567!TheEnd"
def pattern = Pattern.compile("\\D*(\\d*).*") // N.B. Double quotes to escape \ pattern == \D*(\d*).*.. i.e. not a number/ number/ anything
Matcher m = pattern.matcher(testString)
def i=0
while(i<=m.groupCount()) println }else println "Doesn't match"

Returns our number... 4567

Note groups are returned based 1/ the whole matching string followed by a group for each cycle of parentheses (1 in this case).

e.g. if the above regex is changed to

Then the results would be as follows. Note how as the group index increases, we traverse deeper into the parentheses of the regex
group(0) == abc4567!TheEnd
group(1) == 4567
group(2) == 567
group(3) == 67

