Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Mark Lawrence <breamoreboy@yahoo.co.uk>
Subject: Re: re Questions
Date: Sun, 26 Jan 2014 17:30:47 +0000
References: <3f568767-e13a-4c7d-a4fb-85caca2adf6e@googlegroups.com> <mailman.5996.1390756093.18130.python-list@python.org> <c5c75189-1280-40fb-8bd2-de00eba97257@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
In-Reply-To: <c5c75189-1280-40fb-8bd2-de00eba97257@googlegroups.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5998.1390757464.18130.python-list@python.org>
Lines: 75
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:64785

On 26/01/2014 17:15, Blake Adams wrote:
> On Sunday, January 26, 2014 12:08:01 PM UTC-5, Chris Angelico wrote:
>> On Mon, Jan 27, 2014 at 3:59 AM, Blake Adams <blakesadams@gmail.com> wrote:
>>
>>> If I want to set up a match replicating the '\w' pattern I would assume that would be done with '[A-z0-9_]'.  However, when I run the following:
>>
>>>
>>
>>> re.findall('[A-z0-9_]','^;z %C\@0~_') it matches ['^', 'z', 'C', '\\', '0', '_'].  I would expect the match to be ['z', 'C', '0', '_'].
>>
>>>
>>
>>> Why does this happen?
>>
>>
>>
>> Because \w is not the same as [A-z0-9_]. Quoting from the docs:
>>
>>
>>
>> """
>>
>> \w For Unicode (str) patterns:Matches Unicode word characters; this
>>
>> includes most characters that can be part of a word in any language,
>>
>> as well as numbers and the underscore. If the ASCII flag is used, only
>>
>> [a-zA-Z0-9_] is matched (but the flag affects the entire regular
>>
>> expression, so in such cases using an explicit [a-zA-Z0-9_] may be a
>>
>> better choice).For 8-bit (bytes) patterns:Matches characters
>>
>> considered alphanumeric in the ASCII character set; this is equivalent
>>
>> to [a-zA-Z0-9_].
>>
>> """
>>
>>
>>
>> If you're working with a byte string, then you're close, but A-z is
>>
>> quite different from A-Za-z. The set [A-z] is equivalent to
>>
>> [ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz] (that's
>>
>> a literal backslash in there, btw), so it'll also catch several
>>
>> non-alphabetic characters. With a Unicode string, it's quite
>>
>> distinctly different. Either way, \w means "word characters", though,
>>
>> so just go ahead and use it whenever you want word characters :)
>>
>>
>>
>> ChrisA
>
> Thanks Chris
>

I'm pleased to see that your question has been answered.

Now would you please read and action this 
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the 
double line spacing above, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence