Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39689

Re: Correct handling of case in unicode and regexps

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <jeanpierreda@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.019
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'mrab': 0.05; 'matches': 0.07; 'rules.': 0.09; 'sat,': 0.15; 'folding': 0.16; 'record,': 0.16; 'subject:case': 0.16; 'subject:handling': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.17; 'feb': 0.19; 'module': 0.19; 'matching': 0.23; 'header:In-Reply-To:1': 0.25; 'message- id:@mail.gmail.com': 0.27; 'received:209.85.212': 0.28; 'perl': 0.29; 'helpful': 0.30; 'not.': 0.32; 'getting': 0.33; 'says': 0.33; 'docs': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'clear': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'but': 0.36; 'be.': 0.36; 'should': 0.36; 'does': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; "you've": 0.61; 'mentioned': 0.63; 'worth': 0.63; 'limit': 0.65; '2013': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding; bh=2j1dKwGwS8Pco8mJTIakpoQ5ypb3yQ1UvzMDYoSYL3Q=; b=Wa+my3aUhL8w5ksL2z+Nqd+IPADYMkPu7ZwYosBxmrRWe16vnNEoxsANpjytuy3Pc/ 5YAWkLiBxZkkfosZfZl2Lj70FSY0Xq9ULZnJcR7N3Pr5Hw0b4RJeDMrW5wJw0ZuaKYks btI/qrKE667ErTy1QXwDX9AdQkfhnxsbqEBHJIuJPffOQoy2Z04Tp5H1QxhL6YjQmz80 E30FXHotZ/XvbkwQJZbnvXKSO/IkaVlhtGF9Toz7fxFBxhDpx1bcZX8sK0i/AHXPc2bG T14qIQjnBkhMWvC8/Z+93PMNR2VcoMZlAlz2P9sqIhC++5Xw++F+vlsezZMTDmEYUgbL myug==
X-Received by 10.52.37.81 with SMTP id w17mr6620683vdj.70.1361641946681; Sat, 23 Feb 2013 09:52:26 -0800 (PST)
MIME-Version 1.0
In-Reply-To <5128FF37.7060500@mrabarnett.plus.com>
References <CABicbJLzQ9AHrGuaooiBRk45U5CHZYw6CodJFiQvAuF4+7kToA@mail.gmail.com> <CAHzaPEMmSExoFunOp_OyRCEOKE-+WzEO-hdb61DUiZFnzOG_rw@mail.gmail.com> <CABicbJJ0RoyQVdX9Hyd-fYeumS4faH2TVpYHiMwW0MRuPZUL8g@mail.gmail.com> <CABicbJ+fQW0og8rJsL5Bio_uTNCUtNwEN2MAtSdWmg49Zw7r8Q@mail.gmail.com> <5128FF37.7060500@mrabarnett.plus.com>
From Devin Jeanpierre <jeanpierreda@gmail.com>
Date Sat, 23 Feb 2013 12:51:46 -0500
Subject Re: Correct handling of case in unicode and regexps
To python-list@python.org
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2360.1361641954.2939.python-list@python.org> (permalink)
Lines 18
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1361641954 news.xs4all.nl 6880 [2001:888:2000:d::a6]:55983
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:39689

Show key headers only | View raw


On Sat, Feb 23, 2013 at 12:41 PM, MRAB <python@mrabarnett.plus.com> wrote:
> Getting full case folding to work can be tricky. There's always going to
> be a limit to what's worth doing.
>
> There are also areas where it's not clear what the result should be.
> You've already mentioned matching 's' against 'ß' (fails) and matching
> 'ss' against 'ß' (succeeds), but how about matching '(s)(s)' against 'ß'
> (fails)?
>
> For the record, Perl also says that 'ss' matches 'ß', but 's+' does not.

I would find it helpful to know the exact rules. The regex module docs
say that it works, but don't say what it means to "work".

-- Devin

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Correct handling of case in unicode and regexps Devin Jeanpierre <jeanpierreda@gmail.com> - 2013-02-23 12:51 -0500

csiph-web