Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > de.comp.lang.python > #4498
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Christopher Arndt <chris@chrisarndt.de> |
| Newsgroups | de.comp.lang.python |
| Subject | [Python-de] re.split und Unicode in Python 3 |
| Date | Fri, 29 Jul 2016 16:45:16 +0200 |
| Lines | 28 |
| Message-ID | <mailman.27.1469803528.6033.python-de@python.org> (permalink) |
| References | <7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=utf-8; format=flowed |
| Content-Transfer-Encoding | 8bit |
| X-Trace | news.uni-berlin.de 5/Dbp4ZRPjcbpOLT1bV9EQC1vXBg8d7INx0QrIh4BMlg== |
| Return-Path | <chris@chrisarndt.de> |
| X-Original-To | python-de@python.org |
| Delivered-To | python-de@mail.python.org |
| X-Virus-Scanned | Debian amavisd-new at mx1.0x20.eu |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
| X-BeenThere | python-de@python.org |
| X-Mailman-Version | 2.1.22 |
| Precedence | list |
| List-Id | Die Deutsche Python Mailingliste <python-de.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-de>, <mailto:python-de-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-de/> |
| List-Post | <mailto:python-de@python.org> |
| List-Help | <mailto:python-de-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-de>, <mailto:python-de-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <7ae0837f-8596-a55b-7195-e6d85492dd51@chrisarndt.de> |
| Xref | csiph.com de.comp.lang.python:4498 |
Show key headers only | View raw
Ich habe gerade dieses merkwürdige Verhalten von Python 3.5 festgestellt:
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = 'One\u2003Two'
>>> re.search('\s+', s)
<_sre.SRE_Match object; span=(3, 4), match='\u2003'>
>>> re.search('\s+', s, re.ASCII)
>>>
^^^ # --> No match
>>> re.split('\s+', s)
['One', 'Two']
>>> re.split('\s+', s, re.ASCII)
['One', 'Two']
Bug?
Zum Verständnis: '\u2003' == em space, also ein Whitespace-Char in Unicode.
Chris
Back to de.comp.lang.python | Previous | Next | Find similar
[Python-de] re.split und Unicode in Python 3 Christopher Arndt <chris@chrisarndt.de> - 2016-07-29 16:45 +0200
csiph-web