Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Robert Klemme Newsgroups: comp.lang.java.programmer Subject: Re: Keeping the split token in a Java regular expression Date: Wed, 28 Mar 2012 07:51:03 +0200 Lines: 34 Message-ID: <9tfn68Faj3U1@mid.individual.net> References: <48d35bc3-a391-4ccf-a222-dac64775a2f2@oq7g2000pbb.googlegroups.com> <2pvclq9lll.fsf@shell.xmission.com> <2pr4we9ii3.fsf@shell.xmission.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net 3yBLWvFLc+SEQr9z+g2FiQIRUTFrrEfgqPIRPXD6skKQmipFg= Cancel-Lock: sha1:7AH5/sjnRavfIfCx6QcW1wzp3hk= User-Agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120310 Thunderbird/11.0 In-Reply-To: Xref: csiph.com comp.lang.java.programmer:13255 On 03/27/2012 06:43 PM, Daniel Pitts wrote: > On 3/27/12 8:21 AM, Jim Janney wrote: >> laredotornado writes: >> >>> On Mar 27, 9:15 am, Jim Janney wrote: >>>> laredotornado writes: >>> >>> Jim, That's absolutely brilliant and does exactly what I want in a >>> short amount of code. >>> >>> Stefan, thanks for your solution as well. I tried that out first and >>> it works too. - Dave >> >> It turns out that lookbehind only works with some patterns; the engine >> has to be able to determine the length of the match in advance. Not >> surprising when you think about it. It's an interesting question and >> gave me a reason to learn something new. >> > That's interesting. I've written my own Deterministic FSA to implement a > subset of regex functionality, and arbitrary lookbehind actually would > be an easy feature to add. Easier than zero-width matches (for example > word-boundaries). The limitation for lookbehind seems to be quite common (Ruby's Oniguruma has it as well). With arbitrary lookbehind you need a buffer which can grow because you must basically operate on the whole string the whole time. And, most modern regular expression engines are implemented as NFAs - or better NFA with a lot of special logic stacked onto it. The runtime overhead of two directions of backtracking might be considered too big. Kind regards robert