Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!news-1.dfn.de!news.dfn.de!news.informatik.hu-berlin.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Robert Klemme Newsgroups: comp.lang.java.programmer Subject: Re: Keeping the split token in a Java regular expression Date: Tue, 27 Mar 2012 23:27:58 +0200 Lines: 86 Message-ID: <9tepmvFhjvU1@mid.individual.net> References: <48d35bc3-a391-4ccf-a222-dac64775a2f2@oq7g2000pbb.googlegroups.com> <21500379.296.1332804401740.JavaMail.geo-discussion-forums@pbbpk10> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net xte3d4ihx6SYF67PPngyLQH7LDsVzwj7N8AUqJiH49ouN++kA= Cancel-Lock: sha1:SDPTsnigNyjZBkfozNjxpBdLqXA= User-Agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120310 Thunderbird/11.0 In-Reply-To: <21500379.296.1332804401740.JavaMail.geo-discussion-forums@pbbpk10> Xref: csiph.com comp.lang.java.programmer:13237 On 03/27/2012 01:26 AM, Lew wrote: > Stefan Ram wrote: >> laredotornado writes: >>> What I would like to do is split the expression wherever I have an >> >> public class Main >> { >> public static void split >> ( final java.lang.String text ) >> { java.util.regex.Pattern pattern = >> java.util.regex.Pattern.compile >> ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE ); >> java.util.regex.Matcher matcher = pattern.matcher( text ); >> while( matcher.find() ) >> java.lang.System.out.println( matcher.group( 0 )); } >> >> public static void main( final java.lang.String[] args ) >> { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }} > > This excellent (except for layout) example deserves to be archived. What do you find excellent about this? I find it has some deficiencies - the separator is included in the match (which goes against the requirements despite the thread subject) - spaces after a separator comma are included in the next token as leading text - the method really does more than splitting (namely printing), so the name does not reflect what's going on - the Pattern is compiled on _every_ invocation of the method - the method is unnecessary restricted, argument type CharSequence is sufficient Test output for "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" "Fri 8 PM, Sat 1, 3, and 5 PM" Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM --- Fri 8 PM, Sat 1, 3, and 5 PM --- I would change that to import java.util.ArrayList; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { private static final Pattern SPLIT_PATTERN = Pattern.compile( "(\\S.*?(?:am|pm))(?:,\\s*)?", Pattern.CASE_INSENSITIVE); public static void splitPrint(final CharSequence text) { for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) { System.out.println(m.group(1)); } } public static List split(final CharSequence text) { final List result = new ArrayList(); for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) { result.add(m.group(1)); } return result; } public static void main(final java.lang.String[] args) { splitPrint("Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM"); System.out.println("---"); splitPrint("Fri 8 PM, Sat 1, 3, and 5 PM"); System.out.println("---"); } } I might even sneak a "\\s*" in between "pm)" and "(?:," to even catch cases where there are spaces before the separator. Kind regards robert