Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #13190 > unrolled thread

Keeping the split token in a Java regular expression

Started bylaredotornado <laredotornado@zipmail.com>
First post2012-03-26 11:54 -0700
Last post2012-03-28 07:51 +0200
Articles 10 on this page of 50 — 13 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-26 11:54 -0700
    Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 12:22 -0700
      Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-26 22:01 +0200
        Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:46 -0400
          Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:01 +0200
            Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:18 -0400
            Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:21 -0700
              Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:38 +0200
                Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:24 -0700
    Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 13:49 -0700
    Re: Keeping the split token in a Java regular expression laredotornado@gmail.com - 2012-03-26 14:21 -0700
      Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 15:02 -0700
      Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 15:56 -0700
        Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 16:02 -0700
          Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:33 -0700
            Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 01:17 +0000
              Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 21:57 +0000
      Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-26 18:26 -0700
        Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 19:07 -0700
          Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 20:40 -0700
            Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 09:10 -0700
              Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-27 11:09 -0700
                Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 13:32 -0700
                  Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:29 -0700
                    Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 16:22 -0700
                      Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 18:20 -0700
                        Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
                          Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 21:31 -0700
                            Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:41 +0200
                              Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:28 -0700
    Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 16:26 -0700
      Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:36 -0700
      Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:27 +0200
        Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:28 +0200
    Re: Keeping the split token in a Java regular expression "John B. Matthews" <nospam@nospam.invalid> - 2012-03-26 20:49 -0400
    Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:58 -0400
      Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-26 21:14 -0700
        Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:21 -0400
          Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 15:20 -0700
            Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 18:48 -0400
              Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 17:07 -0700
            Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 21:49 -0300
              Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 20:56 -0400
                Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 22:01 -0300
                  Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
    Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 08:15 -0600
      Re: Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-27 07:58 -0700
        Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 09:21 -0600
          Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 09:43 -0700
            Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:51 +0200

Page 3 of 3 — ← Prev page 1 2 [3]


#13243

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-27 17:07 -0700
Message-ID<6Dscr.2628$LE4.1698@newsfe17.iad>
In reply to#13241
On 3/27/12 3:48 PM, Arne Vajhøj wrote:
> On 3/27/2012 6:20 PM, Daniel Pitts wrote:
>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>>> expression, but I would like to keep part of the string used to split
>>>>>> in the results. What I have are Strings like
>>>>>>
>>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>>
>>>>>> What I would like to do is split the expression wherever I have an
>>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>>> the above example, I would like the results to be
>>>>>>
>>>>>> Fri 7:30 PM
>>>>>> Sat 2 PM
>>>>>> Sun 2:30 PM
>>>>>>
>>>>>> But with String.split, the split token is not kept within the
>>>>>> results. How would I write a Java parsing expression to do what I
>>>>>> want?
>>>>>
>>>>> A hackish solution:
>>>>>
>>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>> >
>>>> Nice. As far as hackish, using "split" for this purpose at all is
>>>> hackish.
>>>
>>> That type of split is the typical way in most modern languages
>>> (though usually in a non regex flavor).
>> For functional languages, yes, but those modern languages don't
>> necessarily return an array. Ideally they would return "iterable" of
>> some sort.
>
> .NET String Split return string[] (non regex)
> .NET Regex Split return string[] (regex)
> PHP split return array (regex)
> PHP explode return array (non regex)
> PHP preg_split return array (regex)
>
>> And in any case, this particular problem is not a "split" kind of
>> problem, but a "parse" kind of problem. So, split for this is hackish,
>
> I think it would be rather common in practice.
>
> Arne
>
I thought you meant modern languages like python or ruby :-)

[toc] | [prev] | [next] | [standalone]


#13244

FromArved Sandstrom <asandstrom3minus1@eastlink.ca>
Date2012-03-27 21:49 -0300
Message-ID<8etcr.10573$Ce4.8856@newsfe21.iad>
In reply to#13240
On 12-03-27 07:20 PM, Daniel Pitts wrote:
> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>> expression, but I would like to keep part of the string used to split
>>>>> in the results. What I have are Strings like
>>>>>
>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>
>>>>> What I would like to do is split the expression wherever I have an
>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>> the above example, I would like the results to be
>>>>>
>>>>> Fri 7:30 PM
>>>>> Sat 2 PM
>>>>> Sun 2:30 PM
>>>>>
>>>>> But with String.split, the split token is not kept within the
>>>>> results. How would I write a Java parsing expression to do what I
>>>>> want?
>>>>
>>>> A hackish solution:
>>>>
>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>  >
>>> Nice. As far as hackish, using "split" for this purpose at all is
>>> hackish.
>>
>> That type of split is the typical way in most modern languages
>> (though usually in a non regex flavor).
> For functional languages, yes, but those modern languages don't
> necessarily return an array.  Ideally they would return "iterable" of
> some sort.
[ SNIP ]

These days what's the difference? Both arrays and lists, in computing,
are commonly considered to support indexing, and both can be "iterated"
over one way or the other. As far as arrays go, consider what you can do
with Haskell arrays, or with array operations in APL or J, or with
slices in D...no "for" loops happening there.

AHS
-- 
Last week I helped my friend stay put. It's a lot easier'n helpin' 'em
move. I just went over to his house and made sure that he did not start
to load shit into a truck.
-- Mitch Hedberg

[toc] | [prev] | [next] | [standalone]


#13245

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-03-27 20:56 -0400
Message-ID<4f7261ad$0$293$14726298@news.sunsite.dk>
In reply to#13244
On 3/27/2012 8:49 PM, Arved Sandstrom wrote:
> On 12-03-27 07:20 PM, Daniel Pitts wrote:
>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>>> expression, but I would like to keep part of the string used to split
>>>>>> in the results. What I have are Strings like
>>>>>>
>>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>>
>>>>>> What I would like to do is split the expression wherever I have an
>>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>>> the above example, I would like the results to be
>>>>>>
>>>>>> Fri 7:30 PM
>>>>>> Sat 2 PM
>>>>>> Sun 2:30 PM
>>>>>>
>>>>>> But with String.split, the split token is not kept within the
>>>>>> results. How would I write a Java parsing expression to do what I
>>>>>> want?
>>>>>
>>>>> A hackish solution:
>>>>>
>>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>>   >
>>>> Nice. As far as hackish, using "split" for this purpose at all is
>>>> hackish.
>>>
>>> That type of split is the typical way in most modern languages
>>> (though usually in a non regex flavor).
>> For functional languages, yes, but those modern languages don't
>> necessarily return an array.  Ideally they would return "iterable" of
>> some sort.
> [ SNIP ]
>
> These days what's the difference? Both arrays and lists, in computing,
> are commonly considered to support indexing, and both can be "iterated"
> over one way or the other. As far as arrays go, consider what you can do
> with Haskell arrays, or with array operations in APL or J, or with
> slices in D...no "for" loops happening there.

I think what Daniel wanted was a lazy not an eager split.

Instead of doing a full parse and return a data structure
(array or list) then just return an iterator with a pointer
to the start and then do the parsing when asked for next.

Arne

[toc] | [prev] | [next] | [standalone]


#13246

FromArved Sandstrom <asandstrom3minus1@eastlink.ca>
Date2012-03-27 22:01 -0300
Message-ID<mptcr.29557$_C5.2942@newsfe09.iad>
In reply to#13245
On 12-03-27 09:56 PM, Arne Vajhøj wrote:
> On 3/27/2012 8:49 PM, Arved Sandstrom wrote:
>> On 12-03-27 07:20 PM, Daniel Pitts wrote:
>>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>>>> expression, but I would like to keep part of the string used to
>>>>>>> split
>>>>>>> in the results. What I have are Strings like
>>>>>>>
>>>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>>>
>>>>>>> What I would like to do is split the expression wherever I have an
>>>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>>>> the above example, I would like the results to be
>>>>>>>
>>>>>>> Fri 7:30 PM
>>>>>>> Sat 2 PM
>>>>>>> Sun 2:30 PM
>>>>>>>
>>>>>>> But with String.split, the split token is not kept within the
>>>>>>> results. How would I write a Java parsing expression to do what I
>>>>>>> want?
>>>>>>
>>>>>> A hackish solution:
>>>>>>
>>>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>>>   >
>>>>> Nice. As far as hackish, using "split" for this purpose at all is
>>>>> hackish.
>>>>
>>>> That type of split is the typical way in most modern languages
>>>> (though usually in a non regex flavor).
>>> For functional languages, yes, but those modern languages don't
>>> necessarily return an array.  Ideally they would return "iterable" of
>>> some sort.
>> [ SNIP ]
>>
>> These days what's the difference? Both arrays and lists, in computing,
>> are commonly considered to support indexing, and both can be "iterated"
>> over one way or the other. As far as arrays go, consider what you can do
>> with Haskell arrays, or with array operations in APL or J, or with
>> slices in D...no "for" loops happening there.
> 
> I think what Daniel wanted was a lazy not an eager split.
> 
> Instead of doing a full parse and return a data structure
> (array or list) then just return an iterator with a pointer
> to the start and then do the parsing when asked for next.
> 
> Arne
> 
A generator, IOW.

AHS
-- 
Last week I helped my friend stay put. It's a lot easier'n helpin' 'em
move. I just went over to his house and made sure that he did not start
to load shit into a truck.
-- Mitch Hedberg

[toc] | [prev] | [next] | [standalone]


#13248

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-27 18:27 -0700
Message-ID<JNtcr.6781$V94.2683@newsfe19.iad>
In reply to#13246
On 3/27/12 6:01 PM, Arved Sandstrom wrote:
> On 12-03-27 09:56 PM, Arne Vajhøj wrote:
>> On 3/27/2012 8:49 PM, Arved Sandstrom wrote:
>>> On 12-03-27 07:20 PM, Daniel Pitts wrote:
>>>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>>>> That type of split is the typical way in most modern languages
>>>>> (though usually in a non regex flavor).
>>>> For functional languages, yes, but those modern languages don't
>>>> necessarily return an array.  Ideally they would return "iterable" of
>>>> some sort.
>>> [ SNIP ]
>>>
>>> These days what's the difference? Both arrays and lists, in computing,
>>> are commonly considered to support indexing, and both can be "iterated"
>>> over one way or the other. As far as arrays go, consider what you can do
>>> with Haskell arrays, or with array operations in APL or J, or with
>>> slices in D...no "for" loops happening there.
>>
>> I think what Daniel wanted was a lazy not an eager split.
>>
>> Instead of doing a full parse and return a data structure
>> (array or list) then just return an iterator with a pointer
>> to the start and then do the parsing when asked for next.
>>
>> Arne
>>
> A generator, IOW.

Basically, yes. That was what I was trying to get at. Calling split on 
an unknown String (without using the limit param) is just asking for a 
D.O.S. attack.

[toc] | [prev] | [next] | [standalone]


#13224

FromJim Janney <jjanney@shell.xmission.com>
Date2012-03-27 08:15 -0600
Message-ID<2pvclq9lll.fsf@shell.xmission.com>
In reply to#13190
laredotornado <laredotornado@zipmail.com> writes:

> Hi,
>
> I'm using Java 6.  I want to split a Java string on a regular
> expression, but I would like to keep part of the string used to split
> in the results.  What I have are Strings like
>
>     Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> What I would like to do is split the expression wherever I have an
> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
> the above example, I would like the results to be
>
>     Fri 7:30 PM
>     Sat 2 PM
>     Sun 2:30 PM
>
> But with String.split, the split token is not kept within the
> results.  How would I write a Java parsing expression to do what I
> want?
>
> Thanks, - Dave

You want to match ,? only when it is preceded by (am|pm).  That's what
lookbehind is for:

public class LookBehind {
  public static void main(String[] args) {
    
    String data = "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM";
    String pattern = "(?i)(?<=am|pm),?";
    
    String[] split = data.split(pattern);
    for (String s : split) {
      System.out.println("'" + s + "'");
    }
  }
}

See http://www.regular-expressions.info/lookaround.html for a tutorial.

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]


#13225

Fromlaredotornado <laredotornado@zipmail.com>
Date2012-03-27 07:58 -0700
Message-ID<cc5b2412-cf3f-4f74-8eea-c3484117e8f5@w6g2000pbp.googlegroups.com>
In reply to#13224
On Mar 27, 9:15 am, Jim Janney <jjan...@shell.xmission.com> wrote:
> laredotornado <laredotorn...@zipmail.com> writes:
> > Hi,
>
> > I'm using Java 6.  I want to split a Java string on a regular
> > expression, but I would like to keep part of the string used to split
> > in the results.  What I have are Strings like
>
> >     Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> > What I would like to do is split the expression wherever I have an
> > expression matching /(am|pm),?/i .  Hopefully I got that right.  In
> > the above example, I would like the results to be
>
> >     Fri 7:30 PM
> >     Sat 2 PM
> >     Sun 2:30 PM
>
> > But with String.split, the split token is not kept within the
> > results.  How would I write a Java parsing expression to do what I
> > want?
>
> > Thanks, - Dave
>
> You want to match ,? only when it is preceded by (am|pm).  That's what
> lookbehind is for:
>
> public class LookBehind {
>   public static void main(String[] args) {
>
>     String data = "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM";
>     String pattern = "(?i)(?<=am|pm),?";
>
>     String[] split = data.split(pattern);
>     for (String s : split) {
>       System.out.println("'" + s + "'");
>     }
>   }
>
> }
>
> Seehttp://www.regular-expressions.info/lookaround.htmlfor a tutorial.
>
> --
> Jim Janney

Jim, That's absolutely brilliant and does exactly what I want in a
short amount of code.

Stefan, thanks for your solution as well.  I tried that out first and
it works too. - Dave

[toc] | [prev] | [next] | [standalone]


#13226

FromJim Janney <jjanney@shell.xmission.com>
Date2012-03-27 09:21 -0600
Message-ID<2pr4we9ii3.fsf@shell.xmission.com>
In reply to#13225
laredotornado <laredotornado@zipmail.com> writes:

> On Mar 27, 9:15 am, Jim Janney <jjan...@shell.xmission.com> wrote:
>> laredotornado <laredotorn...@zipmail.com> writes:
>
> Jim, That's absolutely brilliant and does exactly what I want in a
> short amount of code.
>
> Stefan, thanks for your solution as well.  I tried that out first and
> it works too. - Dave

It turns out that lookbehind only works with some patterns; the engine
has to be able to determine the length of the match in advance.  Not
surprising when you think about it.  It's an interesting question and
gave me a reason to learn something new.

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]


#13229

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-27 09:43 -0700
Message-ID<Y6mcr.27182$QC3.4987@newsfe16.iad>
In reply to#13226
On 3/27/12 8:21 AM, Jim Janney wrote:
> laredotornado<laredotornado@zipmail.com>  writes:
>
>> On Mar 27, 9:15 am, Jim Janney<jjan...@shell.xmission.com>  wrote:
>>> laredotornado<laredotorn...@zipmail.com>  writes:
>>
>> Jim, That's absolutely brilliant and does exactly what I want in a
>> short amount of code.
>>
>> Stefan, thanks for your solution as well.  I tried that out first and
>> it works too. - Dave
>
> It turns out that lookbehind only works with some patterns; the engine
> has to be able to determine the length of the match in advance.  Not
> surprising when you think about it.  It's an interesting question and
> gave me a reason to learn something new.
>
That's interesting. I've written my own Deterministic FSA to implement a 
subset of regex functionality, and arbitrary lookbehind actually would 
be an easy feature to add. Easier than zero-width matches (for example 
word-boundaries).

Anyway, one thing to point out is that Stefan's is likely to perform 
better, and definitely has lower memory overhead for long inputs than 
"split".

[toc] | [prev] | [next] | [standalone]


#13255

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-03-28 07:51 +0200
Message-ID<9tfn68Faj3U1@mid.individual.net>
In reply to#13229
On 03/27/2012 06:43 PM, Daniel Pitts wrote:
> On 3/27/12 8:21 AM, Jim Janney wrote:
>> laredotornado<laredotornado@zipmail.com> writes:
>>
>>> On Mar 27, 9:15 am, Jim Janney<jjan...@shell.xmission.com> wrote:
>>>> laredotornado<laredotorn...@zipmail.com> writes:
>>>
>>> Jim, That's absolutely brilliant and does exactly what I want in a
>>> short amount of code.
>>>
>>> Stefan, thanks for your solution as well. I tried that out first and
>>> it works too. - Dave
>>
>> It turns out that lookbehind only works with some patterns; the engine
>> has to be able to determine the length of the match in advance. Not
>> surprising when you think about it. It's an interesting question and
>> gave me a reason to learn something new.
>>
> That's interesting. I've written my own Deterministic FSA to implement a
> subset of regex functionality, and arbitrary lookbehind actually would
> be an easy feature to add. Easier than zero-width matches (for example
> word-boundaries).

The limitation for lookbehind seems to be quite common (Ruby's Oniguruma 
has it as well).  With arbitrary lookbehind you need a buffer which can 
grow because you must basically operate on the whole string the whole 
time.  And, most modern regular expression engines are implemented as 
NFAs - or better NFA with a lot of special logic stacked onto it.  The 
runtime overhead of two directions of backtracking might be considered 
too big.

Kind regards

	robert

[toc] | [prev] | [standalone]


Page 3 of 3 — ← Prev page 1 2 [3]

Back to top | Article view | comp.lang.java.programmer


csiph-web