Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #13190 > unrolled thread
| Started by | laredotornado <laredotornado@zipmail.com> |
|---|---|
| First post | 2012-03-26 11:54 -0700 |
| Last post | 2012-03-28 07:51 +0200 |
| Articles | 10 on this page of 50 — 13 participants |
Back to article view | Back to comp.lang.java.programmer
Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-26 11:54 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 12:22 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-26 22:01 +0200
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:46 -0400
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:01 +0200
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:18 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:21 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:38 +0200
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:24 -0700
Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 13:49 -0700
Re: Keeping the split token in a Java regular expression laredotornado@gmail.com - 2012-03-26 14:21 -0700
Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 15:02 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 15:56 -0700
Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 16:02 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:33 -0700
Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 01:17 +0000
Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 21:57 +0000
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-26 18:26 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 19:07 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 20:40 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 09:10 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-27 11:09 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 13:32 -0700
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:29 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 16:22 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 18:20 -0700
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 21:31 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:41 +0200
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:28 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 16:26 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:36 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:27 +0200
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:28 +0200
Re: Keeping the split token in a Java regular expression "John B. Matthews" <nospam@nospam.invalid> - 2012-03-26 20:49 -0400
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:58 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-26 21:14 -0700
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:21 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 15:20 -0700
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 18:48 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 17:07 -0700
Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 21:49 -0300
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 20:56 -0400
Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 22:01 -0300
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 08:15 -0600
Re: Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-27 07:58 -0700
Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 09:21 -0600
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 09:43 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:51 +0200
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-03-27 17:07 -0700 |
| Message-ID | <6Dscr.2628$LE4.1698@newsfe17.iad> |
| In reply to | #13241 |
On 3/27/12 3:48 PM, Arne Vajhøj wrote:
> On 3/27/2012 6:20 PM, Daniel Pitts wrote:
>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>>> expression, but I would like to keep part of the string used to split
>>>>>> in the results. What I have are Strings like
>>>>>>
>>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>>
>>>>>> What I would like to do is split the expression wherever I have an
>>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>>> the above example, I would like the results to be
>>>>>>
>>>>>> Fri 7:30 PM
>>>>>> Sat 2 PM
>>>>>> Sun 2:30 PM
>>>>>>
>>>>>> But with String.split, the split token is not kept within the
>>>>>> results. How would I write a Java parsing expression to do what I
>>>>>> want?
>>>>>
>>>>> A hackish solution:
>>>>>
>>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>> >
>>>> Nice. As far as hackish, using "split" for this purpose at all is
>>>> hackish.
>>>
>>> That type of split is the typical way in most modern languages
>>> (though usually in a non regex flavor).
>> For functional languages, yes, but those modern languages don't
>> necessarily return an array. Ideally they would return "iterable" of
>> some sort.
>
> .NET String Split return string[] (non regex)
> .NET Regex Split return string[] (regex)
> PHP split return array (regex)
> PHP explode return array (non regex)
> PHP preg_split return array (regex)
>
>> And in any case, this particular problem is not a "split" kind of
>> problem, but a "parse" kind of problem. So, split for this is hackish,
>
> I think it would be rather common in practice.
>
> Arne
>
I thought you meant modern languages like python or ruby :-)
[toc] | [prev] | [next] | [standalone]
| From | Arved Sandstrom <asandstrom3minus1@eastlink.ca> |
|---|---|
| Date | 2012-03-27 21:49 -0300 |
| Message-ID | <8etcr.10573$Ce4.8856@newsfe21.iad> |
| In reply to | #13240 |
On 12-03-27 07:20 PM, Daniel Pitts wrote:
> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>> expression, but I would like to keep part of the string used to split
>>>>> in the results. What I have are Strings like
>>>>>
>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>
>>>>> What I would like to do is split the expression wherever I have an
>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>> the above example, I would like the results to be
>>>>>
>>>>> Fri 7:30 PM
>>>>> Sat 2 PM
>>>>> Sun 2:30 PM
>>>>>
>>>>> But with String.split, the split token is not kept within the
>>>>> results. How would I write a Java parsing expression to do what I
>>>>> want?
>>>>
>>>> A hackish solution:
>>>>
>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>> >
>>> Nice. As far as hackish, using "split" for this purpose at all is
>>> hackish.
>>
>> That type of split is the typical way in most modern languages
>> (though usually in a non regex flavor).
> For functional languages, yes, but those modern languages don't
> necessarily return an array. Ideally they would return "iterable" of
> some sort.
[ SNIP ]
These days what's the difference? Both arrays and lists, in computing,
are commonly considered to support indexing, and both can be "iterated"
over one way or the other. As far as arrays go, consider what you can do
with Haskell arrays, or with array operations in APL or J, or with
slices in D...no "for" loops happening there.
AHS
--
Last week I helped my friend stay put. It's a lot easier'n helpin' 'em
move. I just went over to his house and made sure that he did not start
to load shit into a truck.
-- Mitch Hedberg
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-03-27 20:56 -0400 |
| Message-ID | <4f7261ad$0$293$14726298@news.sunsite.dk> |
| In reply to | #13244 |
On 3/27/2012 8:49 PM, Arved Sandstrom wrote:
> On 12-03-27 07:20 PM, Daniel Pitts wrote:
>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>>> expression, but I would like to keep part of the string used to split
>>>>>> in the results. What I have are Strings like
>>>>>>
>>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>>
>>>>>> What I would like to do is split the expression wherever I have an
>>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>>> the above example, I would like the results to be
>>>>>>
>>>>>> Fri 7:30 PM
>>>>>> Sat 2 PM
>>>>>> Sun 2:30 PM
>>>>>>
>>>>>> But with String.split, the split token is not kept within the
>>>>>> results. How would I write a Java parsing expression to do what I
>>>>>> want?
>>>>>
>>>>> A hackish solution:
>>>>>
>>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>> >
>>>> Nice. As far as hackish, using "split" for this purpose at all is
>>>> hackish.
>>>
>>> That type of split is the typical way in most modern languages
>>> (though usually in a non regex flavor).
>> For functional languages, yes, but those modern languages don't
>> necessarily return an array. Ideally they would return "iterable" of
>> some sort.
> [ SNIP ]
>
> These days what's the difference? Both arrays and lists, in computing,
> are commonly considered to support indexing, and both can be "iterated"
> over one way or the other. As far as arrays go, consider what you can do
> with Haskell arrays, or with array operations in APL or J, or with
> slices in D...no "for" loops happening there.
I think what Daniel wanted was a lazy not an eager split.
Instead of doing a full parse and return a data structure
(array or list) then just return an iterator with a pointer
to the start and then do the parsing when asked for next.
Arne
[toc] | [prev] | [next] | [standalone]
| From | Arved Sandstrom <asandstrom3minus1@eastlink.ca> |
|---|---|
| Date | 2012-03-27 22:01 -0300 |
| Message-ID | <mptcr.29557$_C5.2942@newsfe09.iad> |
| In reply to | #13245 |
On 12-03-27 09:56 PM, Arne Vajhøj wrote:
> On 3/27/2012 8:49 PM, Arved Sandstrom wrote:
>> On 12-03-27 07:20 PM, Daniel Pitts wrote:
>>> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>>>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>>>> expression, but I would like to keep part of the string used to
>>>>>>> split
>>>>>>> in the results. What I have are Strings like
>>>>>>>
>>>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>>>
>>>>>>> What I would like to do is split the expression wherever I have an
>>>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>>>> the above example, I would like the results to be
>>>>>>>
>>>>>>> Fri 7:30 PM
>>>>>>> Sat 2 PM
>>>>>>> Sun 2:30 PM
>>>>>>>
>>>>>>> But with String.split, the split token is not kept within the
>>>>>>> results. How would I write a Java parsing expression to do what I
>>>>>>> want?
>>>>>>
>>>>>> A hackish solution:
>>>>>>
>>>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>>>> >
>>>>> Nice. As far as hackish, using "split" for this purpose at all is
>>>>> hackish.
>>>>
>>>> That type of split is the typical way in most modern languages
>>>> (though usually in a non regex flavor).
>>> For functional languages, yes, but those modern languages don't
>>> necessarily return an array. Ideally they would return "iterable" of
>>> some sort.
>> [ SNIP ]
>>
>> These days what's the difference? Both arrays and lists, in computing,
>> are commonly considered to support indexing, and both can be "iterated"
>> over one way or the other. As far as arrays go, consider what you can do
>> with Haskell arrays, or with array operations in APL or J, or with
>> slices in D...no "for" loops happening there.
>
> I think what Daniel wanted was a lazy not an eager split.
>
> Instead of doing a full parse and return a data structure
> (array or list) then just return an iterator with a pointer
> to the start and then do the parsing when asked for next.
>
> Arne
>
A generator, IOW.
AHS
--
Last week I helped my friend stay put. It's a lot easier'n helpin' 'em
move. I just went over to his house and made sure that he did not start
to load shit into a truck.
-- Mitch Hedberg
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-03-27 18:27 -0700 |
| Message-ID | <JNtcr.6781$V94.2683@newsfe19.iad> |
| In reply to | #13246 |
On 3/27/12 6:01 PM, Arved Sandstrom wrote: > On 12-03-27 09:56 PM, Arne Vajhøj wrote: >> On 3/27/2012 8:49 PM, Arved Sandstrom wrote: >>> On 12-03-27 07:20 PM, Daniel Pitts wrote: >>>> On 3/27/12 2:21 PM, Arne Vajhøj wrote: >>>>> That type of split is the typical way in most modern languages >>>>> (though usually in a non regex flavor). >>>> For functional languages, yes, but those modern languages don't >>>> necessarily return an array. Ideally they would return "iterable" of >>>> some sort. >>> [ SNIP ] >>> >>> These days what's the difference? Both arrays and lists, in computing, >>> are commonly considered to support indexing, and both can be "iterated" >>> over one way or the other. As far as arrays go, consider what you can do >>> with Haskell arrays, or with array operations in APL or J, or with >>> slices in D...no "for" loops happening there. >> >> I think what Daniel wanted was a lazy not an eager split. >> >> Instead of doing a full parse and return a data structure >> (array or list) then just return an iterator with a pointer >> to the start and then do the parsing when asked for next. >> >> Arne >> > A generator, IOW. Basically, yes. That was what I was trying to get at. Calling split on an unknown String (without using the limit param) is just asking for a D.O.S. attack.
[toc] | [prev] | [next] | [standalone]
| From | Jim Janney <jjanney@shell.xmission.com> |
|---|---|
| Date | 2012-03-27 08:15 -0600 |
| Message-ID | <2pvclq9lll.fsf@shell.xmission.com> |
| In reply to | #13190 |
laredotornado <laredotornado@zipmail.com> writes:
> Hi,
>
> I'm using Java 6. I want to split a Java string on a regular
> expression, but I would like to keep part of the string used to split
> in the results. What I have are Strings like
>
> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> What I would like to do is split the expression wherever I have an
> expression matching /(am|pm),?/i . Hopefully I got that right. In
> the above example, I would like the results to be
>
> Fri 7:30 PM
> Sat 2 PM
> Sun 2:30 PM
>
> But with String.split, the split token is not kept within the
> results. How would I write a Java parsing expression to do what I
> want?
>
> Thanks, - Dave
You want to match ,? only when it is preceded by (am|pm). That's what
lookbehind is for:
public class LookBehind {
public static void main(String[] args) {
String data = "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM";
String pattern = "(?i)(?<=am|pm),?";
String[] split = data.split(pattern);
for (String s : split) {
System.out.println("'" + s + "'");
}
}
}
See http://www.regular-expressions.info/lookaround.html for a tutorial.
--
Jim Janney
[toc] | [prev] | [next] | [standalone]
| From | laredotornado <laredotornado@zipmail.com> |
|---|---|
| Date | 2012-03-27 07:58 -0700 |
| Message-ID | <cc5b2412-cf3f-4f74-8eea-c3484117e8f5@w6g2000pbp.googlegroups.com> |
| In reply to | #13224 |
On Mar 27, 9:15 am, Jim Janney <jjan...@shell.xmission.com> wrote:
> laredotornado <laredotorn...@zipmail.com> writes:
> > Hi,
>
> > I'm using Java 6. I want to split a Java string on a regular
> > expression, but I would like to keep part of the string used to split
> > in the results. What I have are Strings like
>
> > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> > What I would like to do is split the expression wherever I have an
> > expression matching /(am|pm),?/i . Hopefully I got that right. In
> > the above example, I would like the results to be
>
> > Fri 7:30 PM
> > Sat 2 PM
> > Sun 2:30 PM
>
> > But with String.split, the split token is not kept within the
> > results. How would I write a Java parsing expression to do what I
> > want?
>
> > Thanks, - Dave
>
> You want to match ,? only when it is preceded by (am|pm). That's what
> lookbehind is for:
>
> public class LookBehind {
> public static void main(String[] args) {
>
> String data = "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM";
> String pattern = "(?i)(?<=am|pm),?";
>
> String[] split = data.split(pattern);
> for (String s : split) {
> System.out.println("'" + s + "'");
> }
> }
>
> }
>
> Seehttp://www.regular-expressions.info/lookaround.htmlfor a tutorial.
>
> --
> Jim Janney
Jim, That's absolutely brilliant and does exactly what I want in a
short amount of code.
Stefan, thanks for your solution as well. I tried that out first and
it works too. - Dave
[toc] | [prev] | [next] | [standalone]
| From | Jim Janney <jjanney@shell.xmission.com> |
|---|---|
| Date | 2012-03-27 09:21 -0600 |
| Message-ID | <2pr4we9ii3.fsf@shell.xmission.com> |
| In reply to | #13225 |
laredotornado <laredotornado@zipmail.com> writes: > On Mar 27, 9:15 am, Jim Janney <jjan...@shell.xmission.com> wrote: >> laredotornado <laredotorn...@zipmail.com> writes: > > Jim, That's absolutely brilliant and does exactly what I want in a > short amount of code. > > Stefan, thanks for your solution as well. I tried that out first and > it works too. - Dave It turns out that lookbehind only works with some patterns; the engine has to be able to determine the length of the match in advance. Not surprising when you think about it. It's an interesting question and gave me a reason to learn something new. -- Jim Janney
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-03-27 09:43 -0700 |
| Message-ID | <Y6mcr.27182$QC3.4987@newsfe16.iad> |
| In reply to | #13226 |
On 3/27/12 8:21 AM, Jim Janney wrote: > laredotornado<laredotornado@zipmail.com> writes: > >> On Mar 27, 9:15 am, Jim Janney<jjan...@shell.xmission.com> wrote: >>> laredotornado<laredotorn...@zipmail.com> writes: >> >> Jim, That's absolutely brilliant and does exactly what I want in a >> short amount of code. >> >> Stefan, thanks for your solution as well. I tried that out first and >> it works too. - Dave > > It turns out that lookbehind only works with some patterns; the engine > has to be able to determine the length of the match in advance. Not > surprising when you think about it. It's an interesting question and > gave me a reason to learn something new. > That's interesting. I've written my own Deterministic FSA to implement a subset of regex functionality, and arbitrary lookbehind actually would be an easy feature to add. Easier than zero-width matches (for example word-boundaries). Anyway, one thing to point out is that Stefan's is likely to perform better, and definitely has lower memory overhead for long inputs than "split".
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-03-28 07:51 +0200 |
| Message-ID | <9tfn68Faj3U1@mid.individual.net> |
| In reply to | #13229 |
On 03/27/2012 06:43 PM, Daniel Pitts wrote: > On 3/27/12 8:21 AM, Jim Janney wrote: >> laredotornado<laredotornado@zipmail.com> writes: >> >>> On Mar 27, 9:15 am, Jim Janney<jjan...@shell.xmission.com> wrote: >>>> laredotornado<laredotorn...@zipmail.com> writes: >>> >>> Jim, That's absolutely brilliant and does exactly what I want in a >>> short amount of code. >>> >>> Stefan, thanks for your solution as well. I tried that out first and >>> it works too. - Dave >> >> It turns out that lookbehind only works with some patterns; the engine >> has to be able to determine the length of the match in advance. Not >> surprising when you think about it. It's an interesting question and >> gave me a reason to learn something new. >> > That's interesting. I've written my own Deterministic FSA to implement a > subset of regex functionality, and arbitrary lookbehind actually would > be an easy feature to add. Easier than zero-width matches (for example > word-boundaries). The limitation for lookbehind seems to be quite common (Ruby's Oniguruma has it as well). With arbitrary lookbehind you need a buffer which can grow because you must basically operate on the whole string the whole time. And, most modern regular expression engines are implemented as NFAs - or better NFA with a lot of special logic stacked onto it. The runtime overhead of two directions of backtracking might be considered too big. Kind regards robert
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.java.programmer
csiph-web