Monday, January 16, 2012

Java regex's

Just a quick note on java.util.Pattern. e.g. extract numbers from a known pattern

We know that our data format will be various letters, the number we want to extract. One or more letters. How do we extract the numbers?

Easy, using a regex Pattern, and the Matcher class.

e.g. (in groovy)
import java.util.regex.*
def testString ="abc4567!TheEnd"
def pattern = Pattern.compile("\\D*(\\d*).*") // N.B. Double quotes to escape \ pattern == \D*(\d*).*.. i.e. not a number/ number/ anything
Matcher m = pattern.matcher(testString)
if(m.matches()){
def i=0
while(i<=m.groupCount()) println m.group(i++) }else println "Doesn't match"


Returns our number... 4567

Note groups are returned based 1/ the whole matching string followed by a group for each cycle of parentheses (1 in this case).

e.g. if the above regex is changed to
"\\D*(\\d(\\d(\\d*))).*"

Then the results would be as follows. Note how as the group index increases, we traverse deeper into the parentheses of the regex
group(0) == abc4567!TheEnd
group(1) == 4567
group(2) == 567
group(3) == 67