Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #2847

Re: regex capability

Date 2011-04-04 05:25 -0700
From Patricia Shanahan <pats@acm.org>
Newsgroups comp.lang.java.programmer
Subject Re: regex capability
References <u3sip61i207oesd83ckbrt3vjm66p948kd@4ax.com> <jYqdnTTVm5ob6QTQnZ2dnUVZ87-dnZ2d@telenor.com> <futip6thcc2sshjanfkv9s18hqeo69qrsn@4ax.com> <incc59$dn$1@dont-email.me>
Message-ID <5eGdnckj3q8mJQTQnZ2dnUVZ_jWdnZ2d@earthlink.com> (permalink)

Show all headers | View raw


On 4/4/2011 5:03 AM, Eric Sosman wrote:
> On 4/4/2011 3:50 AM, Roedy Green wrote:
>> On Mon, 04 Apr 2011 02:34:30 -0500, Leif Roar Moldskred
>> <leifm@dimnakorr.com> wrote, quoted or indirectly quoted someone who
>> said :
>>
>>>
>>> Easiest is to just use split. You can always do a regex of the type
>>> "(\\d+)/((\\d+)/)?((\\d+)/)?((\\d+)/)?" but that's just pointlessly
>>> complicated. There's no reason why you should use a regex when "normal"
>>> string parsing is simpler and easier to read.
>>
>> (xxx|yyy)+ seems to generate only one group item, no matter how many
>> repetitions there are. That strikes me as a bug, but likely someone
>> can explain why it is a feature or inevitability.
>
> A (section of a) regex matches a (section of a) string, and the
> Matcher machinery can tell you what substring was matched. The
> machinery has no provision for doing further processing on that
> matched substring, like saying "Oh, your regex didn't match a
> string this time, but an array of strings."
>
> You could, perhaps, cook up substitutes for Pattern and Matcher
> to do such a thing. But I'm not sure you'd want to, because it
> could make the API rather complicated. For example, consider a
> fanex (for "fancy expression," like "regular expression" only
> more so) along the lines of "(pat1)(pat2)" where "pat1" and "pat2"
> can match and return arrays of substrings. The FancyMatcher says
> "I matched five substrings." So you call group(3) to get the
> third of them -- was it matched by "pat1" or by "pat2"? Yes, you
> could invent an API to deal with this -- maybe FancyMatcher returns
> a tree of nodes that point to other nodes and/or to substrings --
> but I'm not confident this would be an unqualified improvement.
>

Not only would it make the API complicated, but it would also encourage
a problem I've already seen in code posted in newsgroups - use of
regex's that are very complicated and messy, just for the sake of
fitting a complete job into one regex.

Sometimes a single regex match really is the simplest, cleanest, most
readable way of expressing some data extraction. Quite often, it is not.

Patricia

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

regex capability Roedy Green <see_website@mindprod.com.invalid> - 2011-04-04 00:19 -0700
  Re: regex capability Leif Roar Moldskred <leifm@dimnakorr.com> - 2011-04-04 02:34 -0500
    Re: regex capability Roedy Green <see_website@mindprod.com.invalid> - 2011-04-04 00:50 -0700
      Re: regex capability Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-04-04 08:03 -0400
        Re: regex capability Patricia Shanahan <pats@acm.org> - 2011-04-04 05:25 -0700
        Re: regex capability David Lamb <dalamb@cs.queensu.ca> - 2011-04-04 18:51 -0400
      Re: regex capability Jim Gibson <jimsgibson@gmail.com> - 2011-04-04 17:07 -0700
  Re: regex capability bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-04-04 09:26 +0100
    Re: regex capability Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2011-04-04 19:20 +0200
    Re: regex capability Robert Klemme <shortcutter@googlemail.com> - 2011-04-04 22:13 +0200
      Re: regex capability markspace <-@.> - 2011-04-04 18:35 -0700
        Re: regex capability bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-04-05 09:09 +0100
        Re: regex capability Paul Cager <paul.cager@googlemail.com> - 2011-04-05 02:10 -0700
          Re: regex capability Patricia Shanahan <pats@acm.org> - 2011-04-05 05:28 -0700
            Re: regex capability Robert Klemme <shortcutter@googlemail.com> - 2011-04-05 06:33 -0700
              Re: regex capability markspace <-@.> - 2011-04-05 10:07 -0700

csiph-web