Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89876
| References | <slrnmkccs4.apd.jon+usenet@frosty.unequivocal.co.uk> <mailman.67.1430665534.12865.python-list@python.org> <slrnmkcftt.230.jon+usenet@frosty.unequivocal.co.uk> |
|---|---|
| Date | 2015-05-04 01:48 +1000 |
| Subject | Re: Unicode surrogate pairs (Python 3.4) |
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.68.1430668130.12865.python-list@python.org> (permalink) |
On Mon, May 4, 2015 at 1:32 AM, Jon Ribbens <jon+usenet@unequivocal.co.uk> wrote: >> You shouldn't even actually _have_ those in your string in the first >> place. How did you construct/receive that data? Ideally, catch it at >> that point, and deal with it there. > > That would, unfortunately, be "tell the Unicode Consortium to format > their documents differently", which seems unlikely to happen. I'm > trying to read in: http://www.unicode.org/Public/idna/6.3.0/IdnaTest.txt Ah, so what you _actually_ have is "\\udb40\\udd9d" - the backslashes are in your input. I'm not sure what the best way to deal with that is... it's a bit of a mess. You may find yourself needing to do something manually, unless there's a way to ask Python to encode to pseudo-UCS-2 that allows surrogates. Some languages may have sloppy conversions available, but Python's seems to be quite strict (which is correct). Is there an errors handler that can do this? ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 14:40 +0000
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 01:05 +1000
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 15:32 +0000
Re: Unicode surrogate pairs (Python 3.4) Marko Rauhamaa <marko@pacujo.net> - 2015-05-03 18:35 +0300
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 01:48 +1000
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 16:30 +0000
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 02:47 +1000
Re: Unicode surrogate pairs (Python 3.4) MRAB <python@mrabarnett.plus.com> - 2015-05-03 16:53 +0100
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 16:26 +0000
Re: Unicode surrogate pairs (Python 3.4) MRAB <python@mrabarnett.plus.com> - 2015-05-03 18:09 +0100
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 19:20 +0000
csiph-web