Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #74296 > unrolled thread

Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

Started byrxjwg98@gmail.com
First post2014-07-10 03:05 -0700
Last post2014-07-10 15:08 +0100
Articles 5 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? rxjwg98@gmail.com - 2014-07-10 03:05 -0700
    Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? MRAB <python@mrabarnett.plus.com> - 2014-07-10 12:18 +0100
      Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? fl <rxjwg98@gmail.com> - 2014-07-10 06:32 -0700
        Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? Ned Batchelder <ned@nedbatchelder.com> - 2014-07-10 10:04 -0400
        Re: Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]? MRAB <python@mrabarnett.plus.com> - 2014-07-10 15:08 +0100

#74296 — Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

Fromrxjwg98@gmail.com
Date2014-07-10 03:05 -0700
SubjectWhy is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?
Message-ID<1e8dbd65-bd19-4b9d-a7ec-961e8304ace0@googlegroups.com>
Hi,

On a tutorial it says that '\s': Matches whitespace. Equivalent to [\t\n\r\f].

I test it with:

>>> re.match(r'\s*\d\d*$', '   111')
<_sre.SRE_Match object at 0x03642BB8>
>>> re.match(r'\t\n\r\f*\d\d*$', '   111')    # fails
>>> re.match(r'[\t\n\r\f]*\d\d*$', '   111') # fails
>>> re.match(r'[\t\n\r\f]\d\d*$', '   111') # fails
>>> re.match(r'[\t\n\r\f]*$', '   111') # fails

What is wrong in above script? Thanks

[toc] | [next] | [standalone]


#74297

FromMRAB <python@mrabarnett.plus.com>
Date2014-07-10 12:18 +0100
Message-ID<mailman.11722.1404991086.18130.python-list@python.org>
In reply to#74296
On 2014-07-10 11:05, rxjwg98@gmail.com wrote:
> Hi,
>
> On a tutorial it says that '\s': Matches whitespace. Equivalent to [\t\n\r\f].
>
It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
either the tutorial is wrong, or you didn't look closely enough. :-)

> I test it with:
>
>>>> re.match(r'\s*\d\d*$', '   111')
> <_sre.SRE_Match object at 0x03642BB8>
>>>> re.match(r'\t\n\r\f*\d\d*$', '   111')    # fails

The string starts with ' ', not '\t'.

>>>> re.match(r'[\t\n\r\f]*\d\d*$', '   111') # fails
>>>> re.match(r'[\t\n\r\f]\d\d*$', '   111') # fails
>>>> re.match(r'[\t\n\r\f]*$', '   111') # fails

The string starts with ' ', which isn't in the character set.

>
> What is wrong in above script? Thanks
>

[toc] | [prev] | [next] | [standalone]


#74300

Fromfl <rxjwg98@gmail.com>
Date2014-07-10 06:32 -0700
Message-ID<7593d956-f202-4d1c-9e35-1269ab3dda57@googlegroups.com>
In reply to#74297
On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
> On 2014-07-10 11:05, rx@gmail.com wrote:
> 
> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
> 
> either the tutorial is wrong, or you didn't look closely enough. :-)
> 
> 
> The string starts with ' ', not '\t'.
> 
> 
> 
> 
> 
> The string starts with ' ', which isn't in the character set.
> 
> 
The '\s' description is on link:

http://www.tutorialspoint.com/python/python_reg_expressions.htm


Could you give me an example to use the equivalent pattern?

Thanks

[toc] | [prev] | [next] | [standalone]


#74303

FromNed Batchelder <ned@nedbatchelder.com>
Date2014-07-10 10:04 -0400
Message-ID<mailman.11725.1405001097.18130.python-list@python.org>
In reply to#74300
On 7/10/14 9:32 AM, fl wrote:
> On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
>> On 2014-07-10 11:05, rx@gmail.com wrote:
>>
>> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
>>
>> either the tutorial is wrong, or you didn't look closely enough. :-)
>>
>>
>> The string starts with ' ', not '\t'.
>>
>>
>>
>>
>>
>> The string starts with ' ', which isn't in the character set.
>>
>>
> The '\s' description is on link:
>
> http://www.tutorialspoint.com/python/python_reg_expressions.htm
>

For some reason, that page shows much of its information twice.  The 
first occurrence of \s there is:

     \s    Matches whitespace. Equivalent to [\t\n\r\f].

The second is:

     \s    Match a whitespace character: [ \t\r\n\f]

The second one is correct.  The first is wrong.  You might want to send 
the author a bug report.

Actually, neither is strictly correct, since as the official docs 
(https://docs.python.org/2/library/re.html) say,

     \s    When the UNICODE flag is not specified, it matches any
     whitespace character, this is equivalent to the set [ \t\n\r\f\v].
     The LOCALE flag has no extra effect on matching of the space. If
     UNICODE is set, this will match the characters [ \t\n\r\f\v] plus
     whatever is classified as space in the Unicode character properties
     database.


>
> Could you give me an example to use the equivalent pattern?
>
> Thanks
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]


#74304

FromMRAB <python@mrabarnett.plus.com>
Date2014-07-10 15:08 +0100
Message-ID<mailman.11726.1405001309.18130.python-list@python.org>
In reply to#74300
On 2014-07-10 14:32, fl wrote:
> On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
>> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
>> either the tutorial is wrong, or you didn't look closely enough. :-)
>>
>> The string starts with ' ', not '\t'.
>>
>> The string starts with ' ', which isn't in the character set.
>>
> The '\s' description is on link:
>
> http://www.tutorialspoint.com/python/python_reg_expressions.htm
>
I can see that the space is missing. It should say:

     \s    Matches whitespace. Equivalent to [ \t\n\r\f].

> Could you give me an example to use the equivalent pattern?
>
(I'm using Python 3.4, which is why the match object looks different.)

 >>> import re
 >>> re.match(r'\s*\d\d*$', '   111')
<_sre.SRE_Match object; span=(0, 6), match='   111'>
 >>> re.match(r'[ \t\n\r\f]*\d\d*$', '   111')
<_sre.SRE_Match object; span=(0, 6), match='   111'>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web