Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #2846

Re: regex capability

From Eric Sosman <esosman@ieee-dot-org.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: regex capability
Date 2011-04-04 08:03 -0400
Organization A noiseless patient Spider
Message-ID <incc59$dn$1@dont-email.me> (permalink)
References <u3sip61i207oesd83ckbrt3vjm66p948kd@4ax.com> <jYqdnTTVm5ob6QTQnZ2dnUVZ87-dnZ2d@telenor.com> <futip6thcc2sshjanfkv9s18hqeo69qrsn@4ax.com>

Show all headers | View raw


On 4/4/2011 3:50 AM, Roedy Green wrote:
> On Mon, 04 Apr 2011 02:34:30 -0500, Leif Roar Moldskred
> <leifm@dimnakorr.com>  wrote, quoted or indirectly quoted someone who
> said :
>
>>
>> Easiest is to just use split. You can always do a regex of the type
>> "(\\d+)/((\\d+)/)?((\\d+)/)?((\\d+)/)?" but that's just pointlessly
>> complicated. There's no reason why you should use a regex when "normal"
>> string parsing is simpler and easier to read.
>
> (xxx|yyy)+  seems to generate only one group item, no matter how many
> repetitions there are. That strikes me as a bug, but likely  someone
> can explain why it is a feature or inevitability.

     A (section of a) regex matches a (section of a) string, and the
Matcher machinery can tell you what substring was matched.  The
machinery has no provision for doing further processing on that
matched substring, like saying "Oh, your regex didn't match a
string this time, but an array of strings."

     You could, perhaps, cook up substitutes for Pattern and Matcher
to do such a thing.  But I'm not sure you'd want to, because it
could make the API rather complicated.  For example, consider a
fanex (for "fancy expression," like "regular expression" only
more so) along the lines of "(pat1)(pat2)" where "pat1" and "pat2"
can match and return arrays of substrings.  The FancyMatcher says
"I matched five substrings."  So you call group(3) to get the
third of them -- was it matched by "pat1" or by "pat2"?  Yes, you
could invent an API to deal with this -- maybe FancyMatcher returns
a tree of nodes that point to other nodes and/or to substrings --
but I'm not confident this would be an unqualified improvement.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

regex capability Roedy Green <see_website@mindprod.com.invalid> - 2011-04-04 00:19 -0700
  Re: regex capability Leif Roar Moldskred <leifm@dimnakorr.com> - 2011-04-04 02:34 -0500
    Re: regex capability Roedy Green <see_website@mindprod.com.invalid> - 2011-04-04 00:50 -0700
      Re: regex capability Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-04-04 08:03 -0400
        Re: regex capability Patricia Shanahan <pats@acm.org> - 2011-04-04 05:25 -0700
        Re: regex capability David Lamb <dalamb@cs.queensu.ca> - 2011-04-04 18:51 -0400
      Re: regex capability Jim Gibson <jimsgibson@gmail.com> - 2011-04-04 17:07 -0700
  Re: regex capability bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-04-04 09:26 +0100
    Re: regex capability Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2011-04-04 19:20 +0200
    Re: regex capability Robert Klemme <shortcutter@googlemail.com> - 2011-04-04 22:13 +0200
      Re: regex capability markspace <-@.> - 2011-04-04 18:35 -0700
        Re: regex capability bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-04-05 09:09 +0100
        Re: regex capability Paul Cager <paul.cager@googlemail.com> - 2011-04-05 02:10 -0700
          Re: regex capability Patricia Shanahan <pats@acm.org> - 2011-04-05 05:28 -0700
            Re: regex capability Robert Klemme <shortcutter@googlemail.com> - 2011-04-05 06:33 -0700
              Re: regex capability markspace <-@.> - 2011-04-05 10:07 -0700

csiph-web