Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.019 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'mrab': 0.05; 'matches': 0.07; 'rules.': 0.09; 'sat,': 0.15; 'folding': 0.16; 'record,': 0.16; 'subject:case': 0.16; 'subject:handling': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.17; 'feb': 0.19; 'module': 0.19; 'matching': 0.23; 'header:In-Reply-To:1': 0.25; 'message- id:@mail.gmail.com': 0.27; 'received:209.85.212': 0.28; 'perl': 0.29; 'helpful': 0.30; 'not.': 0.32; 'getting': 0.33; 'says': 0.33; 'docs': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'clear': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'but': 0.36; 'be.': 0.36; 'should': 0.36; 'does': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; "you've": 0.61; 'mentioned': 0.63; 'worth': 0.63; 'limit': 0.65; '2013': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding; bh=2j1dKwGwS8Pco8mJTIakpoQ5ypb3yQ1UvzMDYoSYL3Q=; b=Wa+my3aUhL8w5ksL2z+Nqd+IPADYMkPu7ZwYosBxmrRWe16vnNEoxsANpjytuy3Pc/ 5YAWkLiBxZkkfosZfZl2Lj70FSY0Xq9ULZnJcR7N3Pr5Hw0b4RJeDMrW5wJw0ZuaKYks btI/qrKE667ErTy1QXwDX9AdQkfhnxsbqEBHJIuJPffOQoy2Z04Tp5H1QxhL6YjQmz80 E30FXHotZ/XvbkwQJZbnvXKSO/IkaVlhtGF9Toz7fxFBxhDpx1bcZX8sK0i/AHXPc2bG T14qIQjnBkhMWvC8/Z+93PMNR2VcoMZlAlz2P9sqIhC++5Xw++F+vlsezZMTDmEYUgbL myug== X-Received: by 10.52.37.81 with SMTP id w17mr6620683vdj.70.1361641946681; Sat, 23 Feb 2013 09:52:26 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <5128FF37.7060500@mrabarnett.plus.com> References: <5128FF37.7060500@mrabarnett.plus.com> From: Devin Jeanpierre Date: Sat, 23 Feb 2013 12:51:46 -0500 Subject: Re: Correct handling of case in unicode and regexps To: python-list@python.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 18 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1361641954 news.xs4all.nl 6880 [2001:888:2000:d::a6]:55983 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:39689 On Sat, Feb 23, 2013 at 12:41 PM, MRAB wrote: > Getting full case folding to work can be tricky. There's always going to > be a limit to what's worth doing. > > There are also areas where it's not clear what the result should be. > You've already mentioned matching 's' against '=C3=9F' (fails) and matchi= ng > 'ss' against '=C3=9F' (succeeds), but how about matching '(s)(s)' against= '=C3=9F' > (fails)? > > For the record, Perl also says that 'ss' matches '=C3=9F', but 's+' does = not. I would find it helpful to know the exact rules. The regex module docs say that it works, but don't say what it means to "work". -- Devin