Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #2847
| Date | 2011-04-04 05:25 -0700 |
|---|---|
| From | Patricia Shanahan <pats@acm.org> |
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: regex capability |
| References | <u3sip61i207oesd83ckbrt3vjm66p948kd@4ax.com> <jYqdnTTVm5ob6QTQnZ2dnUVZ87-dnZ2d@telenor.com> <futip6thcc2sshjanfkv9s18hqeo69qrsn@4ax.com> <incc59$dn$1@dont-email.me> |
| Message-ID | <5eGdnckj3q8mJQTQnZ2dnUVZ_jWdnZ2d@earthlink.com> (permalink) |
On 4/4/2011 5:03 AM, Eric Sosman wrote: > On 4/4/2011 3:50 AM, Roedy Green wrote: >> On Mon, 04 Apr 2011 02:34:30 -0500, Leif Roar Moldskred >> <leifm@dimnakorr.com> wrote, quoted or indirectly quoted someone who >> said : >> >>> >>> Easiest is to just use split. You can always do a regex of the type >>> "(\\d+)/((\\d+)/)?((\\d+)/)?((\\d+)/)?" but that's just pointlessly >>> complicated. There's no reason why you should use a regex when "normal" >>> string parsing is simpler and easier to read. >> >> (xxx|yyy)+ seems to generate only one group item, no matter how many >> repetitions there are. That strikes me as a bug, but likely someone >> can explain why it is a feature or inevitability. > > A (section of a) regex matches a (section of a) string, and the > Matcher machinery can tell you what substring was matched. The > machinery has no provision for doing further processing on that > matched substring, like saying "Oh, your regex didn't match a > string this time, but an array of strings." > > You could, perhaps, cook up substitutes for Pattern and Matcher > to do such a thing. But I'm not sure you'd want to, because it > could make the API rather complicated. For example, consider a > fanex (for "fancy expression," like "regular expression" only > more so) along the lines of "(pat1)(pat2)" where "pat1" and "pat2" > can match and return arrays of substrings. The FancyMatcher says > "I matched five substrings." So you call group(3) to get the > third of them -- was it matched by "pat1" or by "pat2"? Yes, you > could invent an API to deal with this -- maybe FancyMatcher returns > a tree of nodes that point to other nodes and/or to substrings -- > but I'm not confident this would be an unqualified improvement. > Not only would it make the API complicated, but it would also encourage a problem I've already seen in code posted in newsgroups - use of regex's that are very complicated and messy, just for the sake of fitting a complete job into one regex. Sometimes a single regex match really is the simplest, cleanest, most readable way of expressing some data extraction. Quite often, it is not. Patricia
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
regex capability Roedy Green <see_website@mindprod.com.invalid> - 2011-04-04 00:19 -0700
Re: regex capability Leif Roar Moldskred <leifm@dimnakorr.com> - 2011-04-04 02:34 -0500
Re: regex capability Roedy Green <see_website@mindprod.com.invalid> - 2011-04-04 00:50 -0700
Re: regex capability Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-04-04 08:03 -0400
Re: regex capability Patricia Shanahan <pats@acm.org> - 2011-04-04 05:25 -0700
Re: regex capability David Lamb <dalamb@cs.queensu.ca> - 2011-04-04 18:51 -0400
Re: regex capability Jim Gibson <jimsgibson@gmail.com> - 2011-04-04 17:07 -0700
Re: regex capability bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-04-04 09:26 +0100
Re: regex capability Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2011-04-04 19:20 +0200
Re: regex capability Robert Klemme <shortcutter@googlemail.com> - 2011-04-04 22:13 +0200
Re: regex capability markspace <-@.> - 2011-04-04 18:35 -0700
Re: regex capability bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-04-05 09:09 +0100
Re: regex capability Paul Cager <paul.cager@googlemail.com> - 2011-04-05 02:10 -0700
Re: regex capability Patricia Shanahan <pats@acm.org> - 2011-04-05 05:28 -0700
Re: regex capability Robert Klemme <shortcutter@googlemail.com> - 2011-04-05 06:33 -0700
Re: regex capability markspace <-@.> - 2011-04-05 10:07 -0700
csiph-web