Groups > comp.lang.python > #74296 > unrolled thread

Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

Started by	rxjwg98@gmail.com
First post	2014-07-10 03:05 -0700
Last post	2014-07-10 15:08 +0100
Articles	5 — 4 participants

Back to article view | Back to comp.lang.python

  Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? rxjwg98@gmail.com - 2014-07-10 03:05 -0700
    Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? MRAB <python@mrabarnett.plus.com> - 2014-07-10 12:18 +0100
      Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? fl <rxjwg98@gmail.com> - 2014-07-10 06:32 -0700
        Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? Ned Batchelder <ned@nedbatchelder.com> - 2014-07-10 10:04 -0400
        Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? MRAB <python@mrabarnett.plus.com> - 2014-07-10 15:08 +0100

#74296 — Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

From	rxjwg98@gmail.com
Date	2014-07-10 03:05 -0700
Subject	Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?
Message-ID	<1e8dbd65-bd19-4b9d-a7ec-961e8304ace0@googlegroups.com>

Hi,

On a tutorial it says that '\s': Matches whitespace. Equivalent to [\t\n\r\f].

I test it with:

>>> re.match(r'\s*\d\d*$', '   111')
<_sre.SRE_Match object at 0x03642BB8>
>>> re.match(r'\t\n\r\f*\d\d*$', '   111')    # fails
>>> re.match(r'[\t\n\r\f]*\d\d*$', '   111') # fails
>>> re.match(r'[\t\n\r\f]\d\d*$', '   111') # fails
>>> re.match(r'[\t\n\r\f]*$', '   111') # fails

What is wrong in above script? Thanks

[toc] | [next] | [standalone]

#74297

From	MRAB <python@mrabarnett.plus.com>
Date	2014-07-10 12:18 +0100
Message-ID	<mailman.11722.1404991086.18130.python-list@python.org>
In reply to	#74296

On 2014-07-10 11:05, rxjwg98@gmail.com wrote:
> Hi,
>
> On a tutorial it says that '\s': Matches whitespace. Equivalent to [\t\n\r\f].
>
It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
either the tutorial is wrong, or you didn't look closely enough. :-)

> I test it with:
>
>>>> re.match(r'\s*\d\d*$', '   111')
> <_sre.SRE_Match object at 0x03642BB8>
>>>> re.match(r'\t\n\r\f*\d\d*$', '   111')    # fails

The string starts with ' ', not '\t'.

>>>> re.match(r'[\t\n\r\f]*\d\d*$', '   111') # fails
>>>> re.match(r'[\t\n\r\f]\d\d*$', '   111') # fails
>>>> re.match(r'[\t\n\r\f]*$', '   111') # fails

The string starts with ' ', which isn't in the character set.

>
> What is wrong in above script? Thanks
>

[toc] | [prev] | [next] | [standalone]

#74300

From	fl <rxjwg98@gmail.com>
Date	2014-07-10 06:32 -0700
Message-ID	<7593d956-f202-4d1c-9e35-1269ab3dda57@googlegroups.com>
In reply to	#74297

On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
> On 2014-07-10 11:05, rx@gmail.com wrote:
> 
> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
> 
> either the tutorial is wrong, or you didn't look closely enough. :-)
> 
> 
> The string starts with ' ', not '\t'.
> 
> 
> 
> 
> 
> The string starts with ' ', which isn't in the character set.
> 
> 
The '\s' description is on link:

http://www.tutorialspoint.com/python/python_reg_expressions.htm


Could you give me an example to use the equivalent pattern?

Thanks

[toc] | [prev] | [next] | [standalone]

#74303

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2014-07-10 10:04 -0400
Message-ID	<mailman.11725.1405001097.18130.python-list@python.org>
In reply to	#74300

On 7/10/14 9:32 AM, fl wrote:
> On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
>> On 2014-07-10 11:05, rx@gmail.com wrote:
>>
>> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
>>
>> either the tutorial is wrong, or you didn't look closely enough. :-)
>>
>>
>> The string starts with ' ', not '\t'.
>>
>>
>>
>>
>>
>> The string starts with ' ', which isn't in the character set.
>>
>>
> The '\s' description is on link:
>
> http://www.tutorialspoint.com/python/python_reg_expressions.htm
>

For some reason, that page shows much of its information twice.  The 
first occurrence of \s there is:

     \s    Matches whitespace. Equivalent to [\t\n\r\f].

The second is:

     \s    Match a whitespace character: [ \t\r\n\f]

The second one is correct.  The first is wrong.  You might want to send 
the author a bug report.

Actually, neither is strictly correct, since as the official docs 
(https://docs.python.org/2/library/re.html) say,

     \s    When the UNICODE flag is not specified, it matches any
     whitespace character, this is equivalent to the set [ \t\n\r\f\v].
     The LOCALE flag has no extra effect on matching of the space. If
     UNICODE is set, this will match the characters [ \t\n\r\f\v] plus
     whatever is classified as space in the Unicode character properties
     database.


>
> Could you give me an example to use the equivalent pattern?
>
> Thanks
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]

#74304

From	MRAB <python@mrabarnett.plus.com>
Date	2014-07-10 15:08 +0100
Message-ID	<mailman.11726.1405001309.18130.python-list@python.org>
In reply to	#74300

On 2014-07-10 14:32, fl wrote:
> On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
>> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
>> either the tutorial is wrong, or you didn't look closely enough. :-)
>>
>> The string starts with ' ', not '\t'.
>>
>> The string starts with ' ', which isn't in the character set.
>>
> The '\s' description is on link:
>
> http://www.tutorialspoint.com/python/python_reg_expressions.htm
>
I can see that the space is missing. It should say:

     \s    Matches whitespace. Equivalent to [ \t\n\r\f].

> Could you give me an example to use the equivalent pattern?
>
(I'm using Python 3.4, which is why the match object looks different.)

 >>> import re
 >>> re.match(r'\s*\d\d*$', '   111')
<_sre.SRE_Match object; span=(0, 6), match='   111'>
 >>> re.match(r'[ \t\n\r\f]*\d\d*$', '   111')
<_sre.SRE_Match object; span=(0, 6), match='   111'>

[toc] | [prev] | [standalone]

csiph-web

Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

Contents

#74296 — Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

#74297

#74300

#74303

#74304