Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #11081 > unrolled thread

Regex doesn't recognize single quote

Started byJerric <jerricgao@gmail.com>
First post2012-01-06 14:08 -0800
Last post2012-01-07 20:58 -0400
Articles 15 — 9 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Regex doesn't recognize single quote Jerric <jerricgao@gmail.com> - 2012-01-06 14:08 -0800
    Re: Regex doesn't recognize single quote Martin Gregorie <martin@address-in-sig.invalid> - 2012-01-06 22:23 +0000
    Re: Regex doesn't recognize single quote Jake Jarvis <pig_in_shoes@yahoo.com> - 2012-01-06 23:48 +0100
    Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-06 15:49 -0800
      Re: Regex doesn't recognize single quote Jim Janney <jjanney@shell.xmission.com> - 2012-01-07 19:02 -0700
        Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-09 09:58 -0800
    Re: Regex doesn't recognize single quote Roedy Green <see_website@mindprod.com.invalid> - 2012-01-06 21:47 -0800
      Re: Regex doesn't recognize single quote Roedy Green <see_website@mindprod.com.invalid> - 2012-01-07 14:41 -0800
        Re: Regex doesn't recognize single quote Rafael Villar <morgano5@hotmail.com> - 2012-01-08 09:05 +1000
          Re: Regex doesn't recognize single quote Rafael Villar <morgano5@hotmail.com> - 2012-01-08 09:10 +1000
          Re: Regex doesn't recognize single quote Lew <noone@lewscanon.com> - 2012-01-07 17:20 -0800
        Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-09 10:08 -0800
          Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-09 10:33 -0800
      Re: Regex doesn't recognize single quote Roedy Green <see_website@mindprod.com.invalid> - 2012-01-07 14:48 -0800
        Re: Regex doesn't recognize single quote Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-01-07 20:58 -0400

#11081 — Regex doesn't recognize single quote

FromJerric <jerricgao@gmail.com>
Date2012-01-06 14:08 -0800
SubjectRegex doesn't recognize single quote
Message-ID<74f4b448-24bf-448f-9f4a-06fd1b79c86d@o12g2000vbd.googlegroups.com>
Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
the following code, but it removed single quote. seems to me java
cannot handle the pattern like [^'].

String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");

Thanks a lot,

[toc] | [next] | [standalone]


#11082

FromMartin Gregorie <martin@address-in-sig.invalid>
Date2012-01-06 22:23 +0000
Message-ID<je7s9d$70i$1@localhost.localdomain>
In reply to#11081
On Fri, 06 Jan 2012 14:08:49 -0800, Jerric wrote:

> Hi, I need to remove special characters, except \w and single quotes,
> from a string, can someone please help me on the regex?
> 
> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried the
> following code, but it removed single quote. seems to me java cannot
> handle the pattern like [^'].
> 
> String val = "ab'de+fg";
> val = val.replaceAll("[^\\w']+", "");
> 
Did you try escaping the single quote?


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]


#11083

FromJake Jarvis <pig_in_shoes@yahoo.com>
Date2012-01-06 23:48 +0100
Message-ID<9mpc1pFpl2U1@mid.uni-berlin.de>
In reply to#11081
On 06.01.2012 23:08, Jerric wrote:
> Hi, I need to remove special characters, except \w and single quotes,
> from a string, can someone please help me on the regex?
>
> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
> the following code, but it removed single quote. seems to me java
> cannot handle the pattern like [^'].
>
> String val = "ab'de+fg";
> val = val.replaceAll("[^\\w']+", "");
>

That exact same code gives that result?

-- 
Jake Jarvis

[toc] | [prev] | [next] | [standalone]


#11085

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-01-06 15:49 -0800
Message-ID<ZLLNq.55350$mJ.21710@newsfe10.iad>
In reply to#11081
On 1/6/12 2:08 PM, Jerric wrote:
> Hi, I need to remove special characters, except \w and single quotes,
> from a string, can someone please help me on the regex?
>
> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
> the following code, but it removed single quote. seems to me java
> cannot handle the pattern like [^'].
>
> String val = "ab'de+fg";
> val = val.replaceAll("[^\\w']+", "");
>
> Thanks a lot,
It works for me, which indicates the problem is somewhere in the code 
you didn't post.  Here is an SSCCE:

public class Works {
     public static void main(String[] args) {
         String val = "ab'de+fg";
         System.out.println(val.replaceAll("[^\\w']+", ""));

     }
}

Try posting exactly the code which causes the problem.

[toc] | [prev] | [next] | [standalone]


#11101

FromJim Janney <jjanney@shell.xmission.com>
Date2012-01-07 19:02 -0700
Message-ID<2pr4zbq7dv.fsf@shell.xmission.com>
In reply to#11085
Daniel Pitts <newsgroup.nospam@virtualinfinity.net> writes:

> On 1/6/12 2:08 PM, Jerric wrote:
>> Hi, I need to remove special characters, except \w and single quotes,
>> from a string, can someone please help me on the regex?
>>
>> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
>> the following code, but it removed single quote. seems to me java
>> cannot handle the pattern like [^'].
>>
>> String val = "ab'de+fg";
>> val = val.replaceAll("[^\\w']+", "");
>>
>> Thanks a lot,
> It works for me, which indicates the problem is somewhere in the code
> you didn't post.  Here is an SSCCE:
>
> public class Works {
>     public static void main(String[] args) {
>         String val = "ab'de+fg";
>         System.out.println(val.replaceAll("[^\\w']+", ""));
>
>     }
> }
>
> Try posting exactly the code which causes the problem.

Since replaceAll is being used, the closure is unnecessary, so this can
be shortened by one character :-)

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]


#11132

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-01-09 09:58 -0800
Message-ID<9VFOq.68940$_H.27688@newsfe16.iad>
In reply to#11101
On 1/7/12 6:02 PM, Jim Janney wrote:
> Daniel Pitts<newsgroup.nospam@virtualinfinity.net>  writes:
>
>> On 1/6/12 2:08 PM, Jerric wrote:
>>> Hi, I need to remove special characters, except \w and single quotes,
>>> from a string, can someone please help me on the regex?
>>>
>>> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
>>> the following code, but it removed single quote. seems to me java
>>> cannot handle the pattern like [^'].
>>>
>>> String val = "ab'de+fg";
>>> val = val.replaceAll("[^\\w']+", "");
>>>
>>> Thanks a lot,
>> It works for me, which indicates the problem is somewhere in the code
>> you didn't post.  Here is an SSCCE:
>>
>> public class Works {
>>      public static void main(String[] args) {
>>          String val = "ab'de+fg";
>>          System.out.println(val.replaceAll("[^\\w']+", ""));
>>
>>      }
>> }
>>
>> Try posting exactly the code which causes the problem.
>
> Since replaceAll is being used, the closure is unnecessary, so this can
> be shortened by one character :-)
>
Perhaps, but I wouldn't be surprised if there was a performance 
difference in the two.  I'm not saying there definitely is, but there 
very well could be.

Also, they are only equivalent because the replacement string is zero 
length.

[toc] | [prev] | [next] | [standalone]


#11087

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-01-06 21:47 -0800
Message-ID<prmfg79jlt8o86otpnabqfs924cc399var@4ax.com>
In reply to#11081
On Fri, 6 Jan 2012 14:08:49 -0800 (PST), Jerric <jerricgao@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>Hi, I need to remove special characters, except \w and single quotes,
>from a string, can someone please help me on the regex?

That is not what a regex is for. Just use a StringBuilder the length
of your String. Then loop through the chars with charAt.  If the
character is a ' or \w, ignore it, else append.  If it gets complex,
use a switch or if it gets really complicated use a BitSet.


-- 
Roedy Green Canadian Mind Products
http://mindprod.com
If you can't remember the name of some method, 
consider changing it to something you can remember.
 

[toc] | [prev] | [next] | [standalone]


#11092

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-01-07 14:41 -0800
Message-ID<tphhg71jbmq4q2vj5dtno5r01igvgdavh2@4ax.com>
In reply to#11087
On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote,
quoted or indirectly quoted someone who said :

>>That is not what a regex is for.
>
>  How do you know what it is for?

Regexes are for searching for patterns.  Transforming or deleting
characters is much simpler done with a for loop. 

How do I know what a regex is for? I am familiar with the API. I have
attempted to use them for various purposes and discovered they were
suitable for some and not for others. 
>
>>Just use a StringBuilder the length of your String. Then
>>loop through the chars with charAt.  If the character is a
>>' or \w, ignore it, else append.  If it gets complex, use a
>>switch or if it gets really complicated use a BitSet.
>
>  This might be needless (as far as we know right now)
>  optimization bloating the code reducing its readability and
>  low-level thinking, which might be required sometimes, but
>  does not serve as a general rule. Still it is nice to know
>  how it could be done if required.

What is your simpler implementation?  

/** remove ' and \w from string
  * @param s string to process
  * @return string without ' or \w
  */
private static String scrunch( final String s ) 
{
final Stringbuilder sb = new StringBuilder( s.length() );
for (int i=0; i<s.length(); i++ )
  { 
  char c = s.charAt(i);
  if ( !( c = '\'' || c = '\w' ) )
     {
     sb.append ( c );
     }
  }
return sb.toString();
}
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
If you can't remember the name of some method, 
consider changing it to something you can remember.
 

[toc] | [prev] | [next] | [standalone]


#11095

FromRafael Villar <morgano5@hotmail.com>
Date2012-01-08 09:05 +1000
Message-ID<jeaj4f$2dm$1@speranza.aioe.org>
In reply to#11092
On 08/01/12 08:41, Roedy Green wrote:
> On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote,
> quoted or indirectly quoted someone who said :
> 
>>> That is not what a regex is for.
>>
>>  How do you know what it is for?
> 
> Regexes are for searching for patterns.  Transforming or deleting
> characters is much simpler done with a for loop. 
> 
> How do I know what a regex is for? I am familiar with the API. I have
> attempted to use them for various purposes and discovered they were
> suitable for some and not for others. 
>>
>>> Just use a StringBuilder the length of your String. Then
>>> loop through the chars with charAt.  If the character is a
>>> ' or \w, ignore it, else append.  If it gets complex, use a
>>> switch or if it gets really complicated use a BitSet.
>>
>>  This might be needless (as far as we know right now)
>>  optimization bloating the code reducing its readability and
>>  low-level thinking, which might be required sometimes, but
>>  does not serve as a general rule. Still it is nice to know
>>  how it could be done if required.
> 
> What is your simpler implementation?  
> 
> /** remove ' and \w from string
>   * @param s string to process
>   * @return string without ' or \w
>   */
> private static String scrunch( final String s ) 
> {
> final Stringbuilder sb = new StringBuilder( s.length() );
> for (int i=0; i<s.length(); i++ )
>   { 
>   char c = s.charAt(i);
>   if ( !( c = '\'' || c = '\w' ) )
>      {
>      sb.append ( c );
>      }
>   }
> return sb.toString();
> }

In most cases is better to use a StringBuilder to perform replacements,
but in this particular case String.replaceAll() is better. By the way,
the escape sequence \w is not a java regular escape sequence but belongs
to the pattern syntax (although you should already know about it, as you
say you are familiar with the API).

Anyway a simpler implementation (and one which works, because yours
doesn't):

/** remove ' and \w from string
 * @param s string to process
 * @return string without ' or \w
 */
private static String scrunch( final String s ) {
   return s.replaceAll("[^'\\w]+", "");
}

[toc] | [prev] | [next] | [standalone]


#11096

FromRafael Villar <morgano5@hotmail.com>
Date2012-01-08 09:10 +1000
Message-ID<jeajet$322$1@speranza.aioe.org>
In reply to#11095
Mea Culpa, Sorry, it seems Roedy didn't understand the original problem,
and also I didn't understand what Roedy was understanding (sorry Roedy)

Anyway, a simpler method that does what Roedy intends to do:

/** remove ' and \w from string
 * @param s string to process
 * @return string without ' or \w
 */
private static String scrunch( final String s ) {
   return s.replaceAll("['\\w]+", "");
}

However the original problem is unknown as the original code is actually
working.

[toc] | [prev] | [next] | [standalone]


#11100

FromLew <noone@lewscanon.com>
Date2012-01-07 17:20 -0800
Message-ID<jear17$320$1@news.albasani.net>
In reply to#11095
> Roedy Green wrote:
>> What is your simpler implementation?
>>
>> /** remove ' and \w from string
>>    * @param s string to process
>>    * @return string without ' or \w
>>    */
>> private static String scrunch( final String s )
>> {
>> final Stringbuilder sb = new StringBuilder( s.length() );
>> for (int i=0; i<s.length(); i++ )
>>    {
>>    char c = s.charAt(i);
>>    if ( !( c = '\'' || c = '\w' ) )
>>       {
>>       sb.append ( c );
>>       }
>>    }
>> return sb.toString();
>> }

That will not perform the specified action, which is to remove non-word 
characters and to _keep_ apostrophes.  '\w' is not legitimate Java syntax, 
thus will cause a compilation error.
"It is a compile-time error if the character following a backslash in an 
escape is not an ASCII b, t, n, f, r, ", ', \, 0, 1, 2, 3, 4, 5, 6, or 7."
<http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6>

The simpler approach was already posted by Daniel Pitts, and has the added 
virtues of both meeting the requirement and compiling:

   public class Works {
     public static void main(String[] args) {
       String val = "ab'de+fg";
       System.out.println(val.replaceAll("[^\\w']+", ""));
     }
   }

-- 
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

[toc] | [prev] | [next] | [standalone]


#11133

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-01-09 10:08 -0800
Message-ID<l2GOq.543$pM1.358@newsfe15.iad>
In reply to#11092
Wow, that is some of the worst String manipulation code I've seen.

On 1/7/12 3:36 PM, Stefan Ram wrote:
>    static String scrunch( final String s )
>    { final java.lang.String string = s.toString();
s.toString() == s for all non-null instances of String. Unneeded.

>      final java.lang.String result = s.replaceAll( "('|\\\\w)", "" );
You don't need an intermediate here.

>      return new String( result ); }
Strings are (mostly) immutable. There are extremely few good reasons to 
invoke the String(String) constructor manually.  Not to mention 
s.replaceAll() will already potentially return a new String.
>
>    (Assuming the class »String« has an appropriate constructor.)
It does, but why use it unless you want to guaranty that they are 
.equals, but !=.



I'm not even going to comment on your insane style, as I think you've 
rebuffed all comments in the past.  What I will comment on is the lack 
of consistency in this snippet. Some places use use "String" and others 
"java.lang.String".


>    (This implements your documentation, not what the OP wanted.)
So does this, but with less waste and confusion.
static String scrunch( final String source) {
    return s.replaceAll( "('|\\\\w)", "" );
}

[toc] | [prev] | [next] | [standalone]


#11135

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-01-09 10:33 -0800
Message-ID<5qGOq.27262$d52.16183@newsfe22.iad>
In reply to#11133
On 1/9/12 10:23 AM, Stefan Ram wrote:
> Daniel Pitts<newsgroup.nospam@virtualinfinity.net>  writes:
>> I'm not even going to comment on your insane style, as I think you've
>> rebuffed all comments in the past.  What I will comment on is the lack
>> of consistency in this snippet. Some places use use "String" and others
>> "java.lang.String".
>
>    »String« is a class name used by Roedy.
>
>    The actual class bound to the name of »String« depends on
>    the context the snippet given by Roedy will be placed in.
>
>    Since I have no information on that class »String«,
>    I started by converting the String instance into a
>    java.lang.String instance. Then, I was able to apply the
>    operations of java.lang.String, which /are/ known to me.
>    In the final end, I had to convert the java.lang.String
>    instance back to an instance of the class »String«,
>    because this was required by the interface of that method
>    as given by Roedy.
>
Since String is in the java.lang package, it is safe to assume that 
"String" refers to the java.lang.String class, unless you are given 
context otherwise.

[toc] | [prev] | [next] | [standalone]


#11093

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-01-07 14:48 -0800
Message-ID<1kihg7hn1rr434mlg5krkdpi1ef563cjd4@4ax.com>
In reply to#11087
On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote,
quoted or indirectly quoted someone who said :

>  How do you know what it is for?

I see what you mean.  I saw the problem as the pattern  translation of
various characters to various other characters.  The problem is
actually simpler than that.  It translates various different
characters all to the same empty "character".

I find the replace methods dangerous. They are improperly named and
thus it  is easy to accidentally use a regex or non-regex.  They also
have to compile the pattern every time. I tend to avoid them.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
If you can't remember the name of some method, 
consider changing it to something you can remember.
 

[toc] | [prev] | [next] | [standalone]


#11099

FromArved Sandstrom <asandstrom3minus1@eastlink.ca>
Date2012-01-07 20:58 -0400
Message-ID<xS5Oq.67743$_H.51387@newsfe16.iad>
In reply to#11093
On 12-01-07 06:48 PM, Roedy Green wrote:
> On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote,
> quoted or indirectly quoted someone who said :
> 
>>  How do you know what it is for?
> 
> I see what you mean.  I saw the problem as the pattern  translation of
> various characters to various other characters.  The problem is
> actually simpler than that.  It translates various different
> characters all to the same empty "character".
> 
> I find the replace methods dangerous. They are improperly named and
> thus it  is easy to accidentally use a regex or non-regex.  They also
> have to compile the pattern every time. I tend to avoid them.

The methods that accept 'char' or 'CharSequence" are named 'replace'.
The two methods that use regexes are called 'replaceAll' and
'replaceFirst'. I don't see a possibility of accidents here.

The methods are not remotely improperly named: they replace text. That
some of them use literals, and others use regular expressions, to
specify what text is to be replaced, does not alter that central fact.

AHS

-- 
...wherever the people are well informed they can be trusted with their
own government...
-- Thomas Jefferson, 1789

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web