Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #14810

Re: simple regex pattern sought

From Roedy Green <see_website@mindprod.com.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: simple regex pattern sought
Date 2012-05-26 06:19 -0700
Organization Canadian Mind Products
Message-ID <6sl1s7dpqhg4l0gfa5duva3j8m9rf9opr5@4ax.com> (permalink)
References <e9vvr7p7l8l5kem31v5a37apdlubrqjq5e@4ax.com> <dc4ca9b0-9aa9-4fe1-bbc9-2d3a28250a9d@googlegroups.com> <a2aeesF2s0U1@mid.individual.net>

Show all headers | View raw


On Sat, 26 May 2012 00:12:34 +0200, Robert Klemme
<shortcutter@googlemail.com> wrote, quoted or indirectly quoted
someone who said :

>On 25.05.2012 23:55, Lew wrote:
>> Roedy Green wrote:
>>> I often have to search for things of the form
>>>
>>> "xxxxx"
>>> or
>>> 'xxxxx'
>>>
>>> where xxx is anything not " or '.  It might be Russian or English or
>>> any other language.
/*
 * [TestRegexFindQuotedString.java]
 *
 * Summary: Finding a quoted String with a regex.
.
 *
 * Copyright: (c) 2012 Roedy Green, Canadian Mind Products,
http://mindprod.com
 *
 * Licence: This software may be copied and used freely for any
purpose but military.
 *          http://mindprod.com/contact/nonmil.html
 *
 * Requires: JDK 1.7+
 *
 * Created with: JetBrains IntelliJ IDEA IDE
http://www.jetbrains.com/idea/
 *
 * Version History:
 *  1.0 2012-05-25 initial release
 */
package com.mindprod.example;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import static java.lang.System.out;

/**
 * Finding a quoted String with a regex.
 *
 * @author Roedy Green, Canadian Mind Products
 * @version 1.0 2012-05-25 initial release
 * @since 2012-05-25
 */
public class TestRegexFindQuotedString
    {
    // ------------------------------ CONSTANTS
------------------------------

    private static final String lookIn = "George said \"that's the
ticket\"." +
                                         " Jeb replied '\"ticket?\"
what ticket'." +
                                         " \"How na\u00efve!\"." +
                                         " empty: \"\"" +
                                         " 'unbalanced\"";

    // -------------------------- STATIC METHODS
--------------------------

    /**
     * exercise that pattern to see what if can find
     */
    static void exercisePattern( Pattern pattern )
        {
        out.println();
        out.println( "Pattern: " + pattern.toString() );
        final Matcher m = pattern.matcher( lookIn );  // Matchers are
used both for matching and finding.
        while ( m.find() )
            {
            out.println( m.group( 0 ) );
            }
        }

    // --------------------------- main() method
---------------------------

    /**
     * test harness
     *
     * @param args not used
     */
    public static void main( String[] args )
        {
        // We want to find Strings of the form "xx'xx" or 'xx"xx'
        // We want to avoid the following problems:
        // 1. Works even if String contains foreign languages, even
Russian or accented letters.
        // 2. If starts with " must end with ", if starts with ' must
end with '.
        // 3. ' is ok inside "...", and " is ok inside '...'
        // 4. We don't worry about how to use ' inside '...'.

        // here are some suggested techniques:

        exercisePattern( Pattern.compile( "[\"']\\p{Print}+?[\"']" )
);  // fails 1 2 3

        exercisePattern( Pattern.compile( "[\"'][^\"']+[\"']" ) );  //
fails 2 3

        exercisePattern( Pattern.compile( "([\"'])[^\"']+\\1" ) ); //
fails 3, uses a capturing group.

        exercisePattern( Pattern.compile( "\"[^\"]+\"|'[^']+'" ) ); //
works, rejects empty strings by Mark Space.

        exercisePattern( Pattern.compile( "\"[^\"]*\"|'[^']*'" ) ); //
works, accepts empty strings by Robert Klemme.

        exercisePattern( Pattern.compile(
"\"(?:\\\\.|[^\\\"])*\"|'(?:\\\\.|[^\\'])*'" ) ); // works, accepts
empty strings
        // (?: ) is a non-capturing group. This is Robert Klemme's
contribution. I don't understand how it works.
        }
    }
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
I would be quite surprised if the NSA (National Security Agency)
did not have a computer program to scan bits of shredded
documents and electronically put them back together like a giant
jigsaw puzzle. This suggests you cannot just shred, you must also burn.
.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-25 14:45 -0700
  Re: simple regex pattern sought markspace <-@.> - 2012-05-25 14:55 -0700
  Re: simple regex pattern sought Lew <lewbloch@gmail.com> - 2012-05-25 14:55 -0700
    Re: simple regex pattern sought markspace <-@.> - 2012-05-25 15:04 -0700
      Re: simple regex pattern sought Lew <noone@lewscanon.com> - 2012-05-26 14:07 -0700
        Re: simple regex pattern sought markspace <-@.> - 2012-05-26 18:34 -0700
          Re: simple regex pattern sought Lew <noone@lewscanon.com> - 2012-05-27 11:39 -0700
    Re: simple regex pattern sought Lew <lewbloch@gmail.com> - 2012-05-25 15:03 -0700
    Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 00:12 +0200
      Re: simple regex pattern sought markspace <-@.> - 2012-05-25 18:43 -0700
        Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 16:37 +0200
          Re: simple regex pattern sought markspace <-@.> - 2012-05-26 08:06 -0700
            Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 17:34 +0200
              Re: simple regex pattern sought Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2012-05-26 10:07 -0700
      Re: simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-26 06:19 -0700
        Re: simple regex pattern sought markspace <-@.> - 2012-05-26 07:19 -0700
        Re: simple regex pattern sought markspace <-@.> - 2012-05-26 07:57 -0700
          Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 17:13 +0200
            Re: simple regex pattern sought markspace <-@.> - 2012-05-26 10:08 -0700
              Re: simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-26 14:14 -0700

csiph-web