Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > de.comp.lang.python > #4498
| From | Christopher Arndt <chris@chrisarndt.de> |
|---|---|
| Newsgroups | de.comp.lang.python |
| Subject | [Python-de] re.split und Unicode in Python 3 |
| Date | 2016-07-29 16:45 +0200 |
| Message-ID | <mailman.27.1469803528.6033.python-de@python.org> (permalink) |
| References | <7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de> |
Ich habe gerade dieses merkwürdige Verhalten von Python 3.5 festgestellt:
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = 'One\u2003Two'
>>> re.search('\s+', s)
<_sre.SRE_Match object; span=(3, 4), match='\u2003'>
>>> re.search('\s+', s, re.ASCII)
>>>
^^^ # --> No match
>>> re.split('\s+', s)
['One', 'Two']
>>> re.split('\s+', s, re.ASCII)
['One', 'Two']
Bug?
Zum Verständnis: '\u2003' == em space, also ein Whitespace-Char in Unicode.
Chris
Back to de.comp.lang.python | Previous | Next | Find similar
[Python-de] re.split und Unicode in Python 3 Christopher Arndt <chris@chrisarndt.de> - 2016-07-29 16:45 +0200
csiph-web