Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: regex capability Date: Mon, 04 Apr 2011 08:03:38 -0400 Organization: A noiseless patient Spider Lines: 36 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Mon, 4 Apr 2011 12:04:58 +0000 (UTC) Injection-Info: mx02.eternal-september.org; posting-host="KiwfXDyOjqGhZBXcfNnZBg"; logging-data="439"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+of8mSZ0Hnoj9O9usf/gZs" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 In-Reply-To: Cancel-Lock: sha1:RYczKTpmzWDjxpkMyL5k3vtLTBs= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:2846 On 4/4/2011 3:50 AM, Roedy Green wrote: > On Mon, 04 Apr 2011 02:34:30 -0500, Leif Roar Moldskred > wrote, quoted or indirectly quoted someone who > said : > >> >> Easiest is to just use split. You can always do a regex of the type >> "(\\d+)/((\\d+)/)?((\\d+)/)?((\\d+)/)?" but that's just pointlessly >> complicated. There's no reason why you should use a regex when "normal" >> string parsing is simpler and easier to read. > > (xxx|yyy)+ seems to generate only one group item, no matter how many > repetitions there are. That strikes me as a bug, but likely someone > can explain why it is a feature or inevitability. A (section of a) regex matches a (section of a) string, and the Matcher machinery can tell you what substring was matched. The machinery has no provision for doing further processing on that matched substring, like saying "Oh, your regex didn't match a string this time, but an array of strings." You could, perhaps, cook up substitutes for Pattern and Matcher to do such a thing. But I'm not sure you'd want to, because it could make the API rather complicated. For example, consider a fanex (for "fancy expression," like "regular expression" only more so) along the lines of "(pat1)(pat2)" where "pat1" and "pat2" can match and return arrays of substrings. The FancyMatcher says "I matched five substrings." So you call group(3) to get the third of them -- was it matched by "pat1" or by "pat2"? Yes, you could invent an API to deal with this -- maybe FancyMatcher returns a tree of nodes that point to other nodes and/or to substrings -- but I'm not confident this would be an unqualified improvement. -- Eric Sosman esosman@ieee-dot-org.invalid