Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Stefan Behnel Newsgroups: de.comp.lang.python Subject: Re: [Python-de] re.split und Unicode in Python 3 Date: Fri, 29 Jul 2016 16:57:18 +0200 Lines: 37 Message-ID: References: <7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de> <7ce4398d-3c64-f8a5-8d41-7213bc0437d4@behnel.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de XrJjWm8C+t0vp8TAlKsBeAnAmUwTIYzAJa+Q4p0c+atQ== Return-Path: X-Original-To: python-de@python.org Delivered-To: python-de@mail.python.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1469804608; l=822; s=domk; d=behnel.de; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version: Date:From:References:To:Subject; bh=VsjmKkIZmxWLjJvetVlr/XQss1fz6ZvYc73C24tPbSk=; b=qNRObBRryRgRCjUpNSY86fao7bg6F2a3ZzHS/l4VeCLM7q8Wdu00qVeIC1eCkSX8Yqj DJueeurSzdcRULpBUEcoPKdqEuy30NQCKGXsPuLSuXl+yareoEhFgTASp1GC+Qrbmg8Es 0P9QX5zjtZZqFPK6Gc9lk3/9pGtJ0VvvnRI= X-RZG-AUTH: :E1MMdFW4b++AXZOTwA41DOYM0Dv9LNWvavC/fJZqSuoXq8/b35jbpofBHhmEPdX2LYjifqo54A== X-RZG-CLASS-ID: mo00 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 In-Reply-To: <7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de> X-BeenThere: python-de@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Die Deutsche Python Mailingliste List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <7ce4398d-3c64-f8a5-8d41-7213bc0437d4@behnel.de> X-Mailman-Original-References: <7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de> Xref: csiph.com de.comp.lang.python:4499 Christopher Arndt schrieb am 29.07.2016 um 16:45: > Ich habe gerade dieses merkwürdige Verhalten von Python 3.5 festgestellt: > > > Python 3.5.1+ (default, Mar 30 2016, 22:46:26) > [GCC 5.3.1 20160330] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import re > >>> s = 'One\u2003Two' > > > >>> re.search('\s+', s) > <_sre.SRE_Match object; span=(3, 4), match='\u2003'> > >>> re.search('\s+', s, re.ASCII) > >>> > ^^^ # --> No match > > >>> re.split('\s+', s) > ['One', 'Two'] > >>> re.split('\s+', s, re.ASCII) > ['One', 'Two'] > > Bug? Nein. >>> re.split('\s+', s, flags=re.ASCII) ['One\u2003Two'] Die Signatur von re.split() ist re.split(pattern, string, maxsplit=0, flags=0) https://docs.python.org/3/library/re.html#re.split Stefan