Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #11081 > unrolled thread
| Started by | Jerric <jerricgao@gmail.com> |
|---|---|
| First post | 2012-01-06 14:08 -0800 |
| Last post | 2012-01-07 20:58 -0400 |
| Articles | 15 — 9 participants |
Back to article view | Back to comp.lang.java.programmer
Regex doesn't recognize single quote Jerric <jerricgao@gmail.com> - 2012-01-06 14:08 -0800
Re: Regex doesn't recognize single quote Martin Gregorie <martin@address-in-sig.invalid> - 2012-01-06 22:23 +0000
Re: Regex doesn't recognize single quote Jake Jarvis <pig_in_shoes@yahoo.com> - 2012-01-06 23:48 +0100
Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-06 15:49 -0800
Re: Regex doesn't recognize single quote Jim Janney <jjanney@shell.xmission.com> - 2012-01-07 19:02 -0700
Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-09 09:58 -0800
Re: Regex doesn't recognize single quote Roedy Green <see_website@mindprod.com.invalid> - 2012-01-06 21:47 -0800
Re: Regex doesn't recognize single quote Roedy Green <see_website@mindprod.com.invalid> - 2012-01-07 14:41 -0800
Re: Regex doesn't recognize single quote Rafael Villar <morgano5@hotmail.com> - 2012-01-08 09:05 +1000
Re: Regex doesn't recognize single quote Rafael Villar <morgano5@hotmail.com> - 2012-01-08 09:10 +1000
Re: Regex doesn't recognize single quote Lew <noone@lewscanon.com> - 2012-01-07 17:20 -0800
Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-09 10:08 -0800
Re: Regex doesn't recognize single quote Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-01-09 10:33 -0800
Re: Regex doesn't recognize single quote Roedy Green <see_website@mindprod.com.invalid> - 2012-01-07 14:48 -0800
Re: Regex doesn't recognize single quote Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-01-07 20:58 -0400
| From | Jerric <jerricgao@gmail.com> |
|---|---|
| Date | 2012-01-06 14:08 -0800 |
| Subject | Regex doesn't recognize single quote |
| Message-ID | <74f4b448-24bf-448f-9f4a-06fd1b79c86d@o12g2000vbd.googlegroups.com> |
Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?
for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
the following code, but it removed single quote. seems to me java
cannot handle the pattern like [^'].
String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");
Thanks a lot,
[toc] | [next] | [standalone]
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Date | 2012-01-06 22:23 +0000 |
| Message-ID | <je7s9d$70i$1@localhost.localdomain> |
| In reply to | #11081 |
On Fri, 06 Jan 2012 14:08:49 -0800, Jerric wrote:
> Hi, I need to remove special characters, except \w and single quotes,
> from a string, can someone please help me on the regex?
>
> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried the
> following code, but it removed single quote. seems to me java cannot
> handle the pattern like [^'].
>
> String val = "ab'de+fg";
> val = val.replaceAll("[^\\w']+", "");
>
Did you try escaping the single quote?
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
[toc] | [prev] | [next] | [standalone]
| From | Jake Jarvis <pig_in_shoes@yahoo.com> |
|---|---|
| Date | 2012-01-06 23:48 +0100 |
| Message-ID | <9mpc1pFpl2U1@mid.uni-berlin.de> |
| In reply to | #11081 |
On 06.01.2012 23:08, Jerric wrote:
> Hi, I need to remove special characters, except \w and single quotes,
> from a string, can someone please help me on the regex?
>
> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
> the following code, but it removed single quote. seems to me java
> cannot handle the pattern like [^'].
>
> String val = "ab'de+fg";
> val = val.replaceAll("[^\\w']+", "");
>
That exact same code gives that result?
--
Jake Jarvis
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-01-06 15:49 -0800 |
| Message-ID | <ZLLNq.55350$mJ.21710@newsfe10.iad> |
| In reply to | #11081 |
On 1/6/12 2:08 PM, Jerric wrote:
> Hi, I need to remove special characters, except \w and single quotes,
> from a string, can someone please help me on the regex?
>
> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
> the following code, but it removed single quote. seems to me java
> cannot handle the pattern like [^'].
>
> String val = "ab'de+fg";
> val = val.replaceAll("[^\\w']+", "");
>
> Thanks a lot,
It works for me, which indicates the problem is somewhere in the code
you didn't post. Here is an SSCCE:
public class Works {
public static void main(String[] args) {
String val = "ab'de+fg";
System.out.println(val.replaceAll("[^\\w']+", ""));
}
}
Try posting exactly the code which causes the problem.
[toc] | [prev] | [next] | [standalone]
| From | Jim Janney <jjanney@shell.xmission.com> |
|---|---|
| Date | 2012-01-07 19:02 -0700 |
| Message-ID | <2pr4zbq7dv.fsf@shell.xmission.com> |
| In reply to | #11085 |
Daniel Pitts <newsgroup.nospam@virtualinfinity.net> writes:
> On 1/6/12 2:08 PM, Jerric wrote:
>> Hi, I need to remove special characters, except \w and single quotes,
>> from a string, can someone please help me on the regex?
>>
>> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
>> the following code, but it removed single quote. seems to me java
>> cannot handle the pattern like [^'].
>>
>> String val = "ab'de+fg";
>> val = val.replaceAll("[^\\w']+", "");
>>
>> Thanks a lot,
> It works for me, which indicates the problem is somewhere in the code
> you didn't post. Here is an SSCCE:
>
> public class Works {
> public static void main(String[] args) {
> String val = "ab'de+fg";
> System.out.println(val.replaceAll("[^\\w']+", ""));
>
> }
> }
>
> Try posting exactly the code which causes the problem.
Since replaceAll is being used, the closure is unnecessary, so this can
be shortened by one character :-)
--
Jim Janney
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-01-09 09:58 -0800 |
| Message-ID | <9VFOq.68940$_H.27688@newsfe16.iad> |
| In reply to | #11101 |
On 1/7/12 6:02 PM, Jim Janney wrote:
> Daniel Pitts<newsgroup.nospam@virtualinfinity.net> writes:
>
>> On 1/6/12 2:08 PM, Jerric wrote:
>>> Hi, I need to remove special characters, except \w and single quotes,
>>> from a string, can someone please help me on the regex?
>>>
>>> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
>>> the following code, but it removed single quote. seems to me java
>>> cannot handle the pattern like [^'].
>>>
>>> String val = "ab'de+fg";
>>> val = val.replaceAll("[^\\w']+", "");
>>>
>>> Thanks a lot,
>> It works for me, which indicates the problem is somewhere in the code
>> you didn't post. Here is an SSCCE:
>>
>> public class Works {
>> public static void main(String[] args) {
>> String val = "ab'de+fg";
>> System.out.println(val.replaceAll("[^\\w']+", ""));
>>
>> }
>> }
>>
>> Try posting exactly the code which causes the problem.
>
> Since replaceAll is being used, the closure is unnecessary, so this can
> be shortened by one character :-)
>
Perhaps, but I wouldn't be surprised if there was a performance
difference in the two. I'm not saying there definitely is, but there
very well could be.
Also, they are only equivalent because the replacement string is zero
length.
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-01-06 21:47 -0800 |
| Message-ID | <prmfg79jlt8o86otpnabqfs924cc399var@4ax.com> |
| In reply to | #11081 |
On Fri, 6 Jan 2012 14:08:49 -0800 (PST), Jerric <jerricgao@gmail.com> wrote, quoted or indirectly quoted someone who said : >Hi, I need to remove special characters, except \w and single quotes, >from a string, can someone please help me on the regex? That is not what a regex is for. Just use a StringBuilder the length of your String. Then loop through the chars with charAt. If the character is a ' or \w, ignore it, else append. If it gets complex, use a switch or if it gets really complicated use a BitSet. -- Roedy Green Canadian Mind Products http://mindprod.com If you can't remember the name of some method, consider changing it to something you can remember.
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-01-07 14:41 -0800 |
| Message-ID | <tphhg71jbmq4q2vj5dtno5r01igvgdavh2@4ax.com> |
| In reply to | #11087 |
On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote,
quoted or indirectly quoted someone who said :
>>That is not what a regex is for.
>
> How do you know what it is for?
Regexes are for searching for patterns. Transforming or deleting
characters is much simpler done with a for loop.
How do I know what a regex is for? I am familiar with the API. I have
attempted to use them for various purposes and discovered they were
suitable for some and not for others.
>
>>Just use a StringBuilder the length of your String. Then
>>loop through the chars with charAt. If the character is a
>>' or \w, ignore it, else append. If it gets complex, use a
>>switch or if it gets really complicated use a BitSet.
>
> This might be needless (as far as we know right now)
> optimization bloating the code reducing its readability and
> low-level thinking, which might be required sometimes, but
> does not serve as a general rule. Still it is nice to know
> how it could be done if required.
What is your simpler implementation?
/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s )
{
final Stringbuilder sb = new StringBuilder( s.length() );
for (int i=0; i<s.length(); i++ )
{
char c = s.charAt(i);
if ( !( c = '\'' || c = '\w' ) )
{
sb.append ( c );
}
}
return sb.toString();
}
--
Roedy Green Canadian Mind Products
http://mindprod.com
If you can't remember the name of some method,
consider changing it to something you can remember.
[toc] | [prev] | [next] | [standalone]
| From | Rafael Villar <morgano5@hotmail.com> |
|---|---|
| Date | 2012-01-08 09:05 +1000 |
| Message-ID | <jeaj4f$2dm$1@speranza.aioe.org> |
| In reply to | #11092 |
On 08/01/12 08:41, Roedy Green wrote:
> On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote,
> quoted or indirectly quoted someone who said :
>
>>> That is not what a regex is for.
>>
>> How do you know what it is for?
>
> Regexes are for searching for patterns. Transforming or deleting
> characters is much simpler done with a for loop.
>
> How do I know what a regex is for? I am familiar with the API. I have
> attempted to use them for various purposes and discovered they were
> suitable for some and not for others.
>>
>>> Just use a StringBuilder the length of your String. Then
>>> loop through the chars with charAt. If the character is a
>>> ' or \w, ignore it, else append. If it gets complex, use a
>>> switch or if it gets really complicated use a BitSet.
>>
>> This might be needless (as far as we know right now)
>> optimization bloating the code reducing its readability and
>> low-level thinking, which might be required sometimes, but
>> does not serve as a general rule. Still it is nice to know
>> how it could be done if required.
>
> What is your simpler implementation?
>
> /** remove ' and \w from string
> * @param s string to process
> * @return string without ' or \w
> */
> private static String scrunch( final String s )
> {
> final Stringbuilder sb = new StringBuilder( s.length() );
> for (int i=0; i<s.length(); i++ )
> {
> char c = s.charAt(i);
> if ( !( c = '\'' || c = '\w' ) )
> {
> sb.append ( c );
> }
> }
> return sb.toString();
> }
In most cases is better to use a StringBuilder to perform replacements,
but in this particular case String.replaceAll() is better. By the way,
the escape sequence \w is not a java regular escape sequence but belongs
to the pattern syntax (although you should already know about it, as you
say you are familiar with the API).
Anyway a simpler implementation (and one which works, because yours
doesn't):
/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s ) {
return s.replaceAll("[^'\\w]+", "");
}
[toc] | [prev] | [next] | [standalone]
| From | Rafael Villar <morgano5@hotmail.com> |
|---|---|
| Date | 2012-01-08 09:10 +1000 |
| Message-ID | <jeajet$322$1@speranza.aioe.org> |
| In reply to | #11095 |
Mea Culpa, Sorry, it seems Roedy didn't understand the original problem,
and also I didn't understand what Roedy was understanding (sorry Roedy)
Anyway, a simpler method that does what Roedy intends to do:
/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s ) {
return s.replaceAll("['\\w]+", "");
}
However the original problem is unknown as the original code is actually
working.
[toc] | [prev] | [next] | [standalone]
| From | Lew <noone@lewscanon.com> |
|---|---|
| Date | 2012-01-07 17:20 -0800 |
| Message-ID | <jear17$320$1@news.albasani.net> |
| In reply to | #11095 |
> Roedy Green wrote:
>> What is your simpler implementation?
>>
>> /** remove ' and \w from string
>> * @param s string to process
>> * @return string without ' or \w
>> */
>> private static String scrunch( final String s )
>> {
>> final Stringbuilder sb = new StringBuilder( s.length() );
>> for (int i=0; i<s.length(); i++ )
>> {
>> char c = s.charAt(i);
>> if ( !( c = '\'' || c = '\w' ) )
>> {
>> sb.append ( c );
>> }
>> }
>> return sb.toString();
>> }
That will not perform the specified action, which is to remove non-word
characters and to _keep_ apostrophes. '\w' is not legitimate Java syntax,
thus will cause a compilation error.
"It is a compile-time error if the character following a backslash in an
escape is not an ASCII b, t, n, f, r, ", ', \, 0, 1, 2, 3, 4, 5, 6, or 7."
<http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6>
The simpler approach was already posted by Daniel Pitts, and has the added
virtues of both meeting the requirement and compiling:
public class Works {
public static void main(String[] args) {
String val = "ab'de+fg";
System.out.println(val.replaceAll("[^\\w']+", ""));
}
}
--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-01-09 10:08 -0800 |
| Message-ID | <l2GOq.543$pM1.358@newsfe15.iad> |
| In reply to | #11092 |
Wow, that is some of the worst String manipulation code I've seen.
On 1/7/12 3:36 PM, Stefan Ram wrote:
> static String scrunch( final String s )
> { final java.lang.String string = s.toString();
s.toString() == s for all non-null instances of String. Unneeded.
> final java.lang.String result = s.replaceAll( "('|\\\\w)", "" );
You don't need an intermediate here.
> return new String( result ); }
Strings are (mostly) immutable. There are extremely few good reasons to
invoke the String(String) constructor manually. Not to mention
s.replaceAll() will already potentially return a new String.
>
> (Assuming the class »String« has an appropriate constructor.)
It does, but why use it unless you want to guaranty that they are
.equals, but !=.
I'm not even going to comment on your insane style, as I think you've
rebuffed all comments in the past. What I will comment on is the lack
of consistency in this snippet. Some places use use "String" and others
"java.lang.String".
> (This implements your documentation, not what the OP wanted.)
So does this, but with less waste and confusion.
static String scrunch( final String source) {
return s.replaceAll( "('|\\\\w)", "" );
}
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-01-09 10:33 -0800 |
| Message-ID | <5qGOq.27262$d52.16183@newsfe22.iad> |
| In reply to | #11133 |
On 1/9/12 10:23 AM, Stefan Ram wrote: > Daniel Pitts<newsgroup.nospam@virtualinfinity.net> writes: >> I'm not even going to comment on your insane style, as I think you've >> rebuffed all comments in the past. What I will comment on is the lack >> of consistency in this snippet. Some places use use "String" and others >> "java.lang.String". > > »String« is a class name used by Roedy. > > The actual class bound to the name of »String« depends on > the context the snippet given by Roedy will be placed in. > > Since I have no information on that class »String«, > I started by converting the String instance into a > java.lang.String instance. Then, I was able to apply the > operations of java.lang.String, which /are/ known to me. > In the final end, I had to convert the java.lang.String > instance back to an instance of the class »String«, > because this was required by the interface of that method > as given by Roedy. > Since String is in the java.lang package, it is safe to assume that "String" refers to the java.lang.String class, unless you are given context otherwise.
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-01-07 14:48 -0800 |
| Message-ID | <1kihg7hn1rr434mlg5krkdpi1ef563cjd4@4ax.com> |
| In reply to | #11087 |
On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote, quoted or indirectly quoted someone who said : > How do you know what it is for? I see what you mean. I saw the problem as the pattern translation of various characters to various other characters. The problem is actually simpler than that. It translates various different characters all to the same empty "character". I find the replace methods dangerous. They are improperly named and thus it is easy to accidentally use a regex or non-regex. They also have to compile the pattern every time. I tend to avoid them. -- Roedy Green Canadian Mind Products http://mindprod.com If you can't remember the name of some method, consider changing it to something you can remember.
[toc] | [prev] | [next] | [standalone]
| From | Arved Sandstrom <asandstrom3minus1@eastlink.ca> |
|---|---|
| Date | 2012-01-07 20:58 -0400 |
| Message-ID | <xS5Oq.67743$_H.51387@newsfe16.iad> |
| In reply to | #11093 |
On 12-01-07 06:48 PM, Roedy Green wrote: > On 7 Jan 2012 11:42:26 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote, > quoted or indirectly quoted someone who said : > >> How do you know what it is for? > > I see what you mean. I saw the problem as the pattern translation of > various characters to various other characters. The problem is > actually simpler than that. It translates various different > characters all to the same empty "character". > > I find the replace methods dangerous. They are improperly named and > thus it is easy to accidentally use a regex or non-regex. They also > have to compile the pattern every time. I tend to avoid them. The methods that accept 'char' or 'CharSequence" are named 'replace'. The two methods that use regexes are called 'replaceAll' and 'replaceFirst'. I don't see a possibility of accidents here. The methods are not remotely improperly named: they replace text. That some of them use literals, and others use regular expressions, to specify what text is to be replaced, does not alter that central fact. AHS -- ...wherever the people are well informed they can be trusted with their own government... -- Thomas Jefferson, 1789
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web