[Python-de] re.split und Unicode in Python 3

Path	csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From	Christopher Arndt <chris@chrisarndt.de>
Newsgroups	de.comp.lang.python
Subject	[Python-de] re.split und Unicode in Python 3
Date	Fri, 29 Jul 2016 16:45:16 +0200
Lines	28
Message-ID	<mailman.27.1469803528.6033.python-de@python.org> (permalink)
References	<7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de>
Mime-Version	1.0
Content-Type	text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding	8bit
X-Trace	news.uni-berlin.de 5/Dbp4ZRPjcbpOLT1bV9EQC1vXBg8d7INx0QrIh4BMlg==
Return-Path	<chris@chrisarndt.de>
X-Original-To	python-de@python.org
Delivered-To	python-de@mail.python.org
X-Virus-Scanned	Debian amavisd-new at mx1.0x20.eu
User-Agent	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
X-BeenThere	python-de@python.org
X-Mailman-Version	2.1.22
Precedence	list
List-Id	Die Deutsche Python Mailingliste <python-de.python.org>
List-Unsubscribe	<https://mail.python.org/mailman/options/python-de>, <mailto:python-de-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-de/>
List-Post	<mailto:python-de@python.org>
List-Help	<mailto:python-de-request@python.org?subject=help>
List-Subscribe	<https://mail.python.org/mailman/listinfo/python-de>, <mailto:python-de-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID	<7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de>
Xref	csiph.com de.comp.lang.python:4498

Show key headers only | View raw

Ich habe gerade dieses merkwürdige Verhalten von Python 3.5 festgestellt:


Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> import re
 >>> s = 'One\u2003Two'


 >>> re.search('\s+', s)
<_sre.SRE_Match object; span=(3, 4), match='\u2003'>
 >>> re.search('\s+', s, re.ASCII)
 >>>
     ^^^ # --> No match

 >>> re.split('\s+', s)
['One', 'Two']
 >>> re.split('\s+', s, re.ASCII)
['One', 'Two']

Bug?


Zum Verständnis: '\u2003' == em space, also ein Whitespace-Char in Unicode.


Chris

Back to de.comp.lang.python | Previous | Next | Find similar

Thread

[Python-de] re.split und Unicode in Python 3 Christopher Arndt <chris@chrisarndt.de> - 2016-07-29 16:45 +0200

csiph-web