Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #14827

Re: simple regex pattern sought

Path csiph.com!usenet.pasdenom.info!gegeweb.org!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From markspace <-@.>
Newsgroups comp.lang.java.programmer
Subject Re: simple regex pattern sought
Date Sat, 26 May 2012 18:34:00 -0700
Organization A noiseless patient Spider
Lines 41
Message-ID <jps0a9$58k$1@dont-email.me> (permalink)
References <e9vvr7p7l8l5kem31v5a37apdlubrqjq5e@4ax.com> <dc4ca9b0-9aa9-4fe1-bbc9-2d3a28250a9d@googlegroups.com> <jpovld$9la$1@dont-email.me> <jprgls$vnb$1@news.albasani.net>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 7bit
Injection-Date Sun, 27 May 2012 01:34:01 +0000 (UTC)
Injection-Info mx04.eternal-september.org; posting-host="2kn9RzOWSe/v/hLnHgGT4Q"; logging-data="5396"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IRWSi8C68tkLMvmRqEAp8vPOQ+X37nFg="
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
In-Reply-To <jprgls$vnb$1@news.albasani.net>
Cancel-Lock sha1:ooR2Fc/0bthsOjSwBtp0TjkV16g=
Xref csiph.com comp.lang.java.programmer:14827

Show key headers only | View raw


On 5/26/2012 2:07 PM, Lew wrote:
> markspace wrote:
>> Lew wrote:
>>> Use a regex like "[\"'][^\"']+[\"']" is one way. The cleanest? I
>>> don't know.
>>>
>> This would match "John's restaurant" as "John'.
>>
>> The first quote matches ", John does not contain either ' or " as
>> specified,
>> and the last character class matches the '. Not I think what is wanted.
>
> As I correct6ed in my very next post.
>


Unfortunately that one doesn't work either.  The central part, [^"'], 
doesn't allow a match of a ' if the starting delimiter was a ", and that 
doesn't match Roedy's spec.  "John's restaurant" wouldn't be matched at 
all, because the matcher couldn't match past the ' to get to the ".

I think the easiest is to write out a grammar for the expression, then 
translate to regex.

QUOTED_STRING := SQUOTED_STRING | DQUOTED_STRING

SQUOTED_STRING := ' NON_S_QUOTE + '

DQUOTED_STRING := " NON_D_QUOTE + "

NON_S_QUOTE := [^']

NON_D_QUOTE := [^"]

At this point the grammar is very clear.  (Note I haven't included 
Robert's \x escape sequences.)  I think it's worth learning to use antlr 
rather than regex, which tends to obfuscate more than it helps. 
However, a literal translation into regex isn't hard, and a literal 
translation avoids mis-optimizations.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-25 14:45 -0700
  Re: simple regex pattern sought markspace <-@.> - 2012-05-25 14:55 -0700
  Re: simple regex pattern sought Lew <lewbloch@gmail.com> - 2012-05-25 14:55 -0700
    Re: simple regex pattern sought markspace <-@.> - 2012-05-25 15:04 -0700
      Re: simple regex pattern sought Lew <noone@lewscanon.com> - 2012-05-26 14:07 -0700
        Re: simple regex pattern sought markspace <-@.> - 2012-05-26 18:34 -0700
          Re: simple regex pattern sought Lew <noone@lewscanon.com> - 2012-05-27 11:39 -0700
    Re: simple regex pattern sought Lew <lewbloch@gmail.com> - 2012-05-25 15:03 -0700
    Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 00:12 +0200
      Re: simple regex pattern sought markspace <-@.> - 2012-05-25 18:43 -0700
        Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 16:37 +0200
          Re: simple regex pattern sought markspace <-@.> - 2012-05-26 08:06 -0700
            Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 17:34 +0200
              Re: simple regex pattern sought Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2012-05-26 10:07 -0700
      Re: simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-26 06:19 -0700
        Re: simple regex pattern sought markspace <-@.> - 2012-05-26 07:19 -0700
        Re: simple regex pattern sought markspace <-@.> - 2012-05-26 07:57 -0700
          Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 17:13 +0200
            Re: simple regex pattern sought markspace <-@.> - 2012-05-26 10:08 -0700
              Re: simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-26 14:14 -0700

csiph-web