Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89870
| References | <slrnmkccs4.apd.jon+usenet@frosty.unequivocal.co.uk> |
|---|---|
| Date | 2015-05-04 01:05 +1000 |
| Subject | Re: Unicode surrogate pairs (Python 3.4) |
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.67.1430665534.12865.python-list@python.org> (permalink) |
On Mon, May 4, 2015 at 12:40 AM, Jon Ribbens <jon+usenet@unequivocal.co.uk> wrote: > If I have a string containing surrogate pairs like this in Python 3.4: > > "\udb40\udd9d" > > How do I convert it into the proper form: > > "\U000E019D" > > ? The answer appears not to be "unicodedata.normalize". No, it's not, because Unicode normalization is a very specific thing. You're looking for a fix for some kind of encoding issue; Unicode normalization translates between combining characters and combined characters. You shouldn't even actually _have_ those in your string in the first place. How did you construct/receive that data? Ideally, catch it at that point, and deal with it there. But if you absolutely have to convert the surrogates, it ought to be possible to do a sloppy UCS-2 conversion to bytes, then a proper UTF-16 decode on the result. ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 14:40 +0000
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 01:05 +1000
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 15:32 +0000
Re: Unicode surrogate pairs (Python 3.4) Marko Rauhamaa <marko@pacujo.net> - 2015-05-03 18:35 +0300
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 01:48 +1000
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 16:30 +0000
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 02:47 +1000
Re: Unicode surrogate pairs (Python 3.4) MRAB <python@mrabarnett.plus.com> - 2015-05-03 16:53 +0100
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 16:26 +0000
Re: Unicode surrogate pairs (Python 3.4) MRAB <python@mrabarnett.plus.com> - 2015-05-03 18:09 +0100
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 19:20 +0000
csiph-web