Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #39691 > unrolled thread
| Started by | MRAB <python@mrabarnett.plus.com> |
|---|---|
| First post | 2013-02-23 18:12 +0000 |
| Last post | 2013-02-23 18:12 +0000 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Correct handling of case in unicode and regexps MRAB <python@mrabarnett.plus.com> - 2013-02-23 18:12 +0000
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-02-23 18:12 +0000 |
| Subject | Re: Correct handling of case in unicode and regexps |
| Message-ID | <mailman.2362.1361643158.2939.python-list@python.org> |
On 2013-02-23 17:51, Devin Jeanpierre wrote: > On Sat, Feb 23, 2013 at 12:41 PM, MRAB <python@mrabarnett.plus.com> > wrote: >> Getting full case folding to work can be tricky. There's always >> going to be a limit to what's worth doing. >> >> There are also areas where it's not clear what the result should >> be. You've already mentioned matching 's' against 'ß' (fails) and >> matching 'ss' against 'ß' (succeeds), but how about matching >> '(s)(s)' against 'ß' (fails)? >> >> For the record, Perl also says that 'ss' matches 'ß', but 's+' does >> not. > > I would find it helpful to know the exact rules. The regex module > docs say that it works, but don't say what it means to "work". > The basic rule is that a series of characters in the regex must match a series of characters in the text, with no partial matches in either. For example, 'ss' can match 'ß', but 's' can't match 'ß' because that would be matching part of 'ß'. In a regex like 's+', you're asking it to match one or more repetitions of 's', but that would mean that 's' would have to match part of 'ß' in the first iteration and the remainder of 'ß' in the second iteration. Although it's theoretically possible to do that, the code is already difficult enough. The cost outweighs the potential benefit. If you'd like to have a go at implementing it, the code _is_ open source. :-)
Back to top | Article view | comp.lang.python
csiph-web