Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Robert Klemme Newsgroups: comp.lang.java.programmer Subject: Re: Keeping the split token in a Java regular expression Date: Wed, 28 Mar 2012 07:28:13 +0200 Lines: 78 Message-ID: <9tflrdF259U1@mid.individual.net> References: <48d35bc3-a391-4ccf-a222-dac64775a2f2@oq7g2000pbb.googlegroups.com> <21500379.296.1332804401740.JavaMail.geo-discussion-forums@pbbpk10> <9tepmvFhjvU1@mid.individual.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net 1/oz8lfBhP8WdMwigs72nQt0wNHWud0Ue0reD/+c6cNqx/XqY= Cancel-Lock: sha1:8Q4RHnbDPRFhYfNLkO/L3IxlrJo= User-Agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120310 Thunderbird/11.0 In-Reply-To: <9tepmvFhjvU1@mid.individual.net> Xref: csiph.com comp.lang.java.programmer:13252 On 03/27/2012 11:27 PM, Robert Klemme wrote: > On 03/27/2012 01:26 AM, Lew wrote: >> Stefan Ram wrote: >>> laredotornado writes: >>>> What I would like to do is split the expression wherever I have an >>> >>> public class Main ... >> >> This excellent (except for layout) example deserves to be archived. > > What do you find excellent about this? I find it has some deficiencies > > - the separator is included in the match (which goes against the > requirements despite the thread subject) > - spaces after a separator comma are included in the next token as > leading text > - the method really does more than splitting (namely printing), so the > name does not reflect what's going on > - the Pattern is compiled on _every_ invocation of the method > - the method is unnecessary restricted, argument type CharSequence is > sufficient > > Test output for > "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" > "Fri 8 PM, Sat 1, 3, and 5 PM" > > Fri 7:30 PM, > Sat 2 PM, > Sun 2:30 PM > --- > Fri 8 PM, > Sat 1, 3, and 5 PM > --- > > I would change that to import java.util.ArrayList; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { private static final Pattern SPLIT_PATTERN = Pattern.compile( "(\\S.*?[ap]m)(?:,\\s*)?", Pattern.CASE_INSENSITIVE); public static void splitPrint(final CharSequence text) { for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) { System.out.println(m.group(1)); } } public static List split(final CharSequence text) { final List result = new ArrayList(); for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) { result.add(m.group(1)); } return result; } public static void main(final java.lang.String[] args) { splitPrint("Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM"); System.out.println("---"); splitPrint("Fri 8 PM, Sat 1, 3, and 5 PM"); System.out.println("---"); } } I had overlooked a fairly obvious improvement with regards to am/pm parsing. > I might even sneak a "\\s*" in between "pm)" and "(?:," to even catch > cases where there are spaces before the separator. Kind regards robert