Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89876
| Path | csiph.com!usenet.pasdenom.info!nntpfeed.proxad.net!proxad.net!feeder1-1.proxad.net!ecngs!feeder2.ecngs.de!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.002 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'handler': 0.05; 'subject:Python': 0.06; 'strict': 0.07; 'string': 0.09; 'conversions': 0.09; 'encode': 0.09; 'happen.': 0.09; 'url:unicode': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'ah,': 0.16; 'backslashes': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'ideally,': 0.16; 'subject:Unicode': 0.16; 'unlikely': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'trying': 0.19; "python's": 0.19; 'seems': 0.21; 'cc:addr:python.org': 0.22; 'this?': 0.23; "shouldn't": 0.24; 'unicode': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; 'header:In-Reply- To:1': 0.27; 'am,': 0.29; 'errors': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; '(which': 0.31; 'allows': 0.31; 'there.': 0.32; 'languages': 0.32; 'quite': 0.32; 'subject: (': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'in:': 0.36; 'url:org': 0.36; 'sure': 0.39; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'read': 0.60; 'catch': 0.60; 'url:3': 0.61; 'first': 0.61; 'needing': 0.65; 'url:0': 0.67; 'yourself': 0.78; '2015': 0.84; 'to:none': 0.92 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=PMlDl1wjQQd2hXys1XDcbMZYjorzfQot16jE4tR978M=; b=J39jEFXaoKDrB/ZMN5MHN8hjBaS2KGj++YUHAoRpKaSHq4ADFIlI+5KOvom41wBrVF xJpaN8VlrkRrYhAyXfY/HsekWkvEMqpOsRM3oXk6pYeAS5/dq0PRk6dJj0Ozwmy3XdRc RBX2KBP1om52CCGohwCZk93bqGss9bBkosQ2dl4i5aooVXAW1/uH73QpH9o3goHmTdMt VPS2K+VEN0zXK9OybWkUVB3dfZeAVd2FaW0PbWjWpAN5Ph7vBE6V8cXynrNKGqcPgzCN ORQfY/UYvsxOKLPr6s6u/O/SaQLcZQk24a1M2GMGr4Y0wrVmGs4n6Dc137e2An2Q7YVy +kJg== |
| MIME-Version | 1.0 |
| X-Received | by 10.107.16.32 with SMTP id y32mr23128588ioi.53.1430668127535; Sun, 03 May 2015 08:48:47 -0700 (PDT) |
| In-Reply-To | <slrnmkcftt.230.jon+usenet@frosty.unequivocal.co.uk> |
| References | <slrnmkccs4.apd.jon+usenet@frosty.unequivocal.co.uk> <mailman.67.1430665534.12865.python-list@python.org> <slrnmkcftt.230.jon+usenet@frosty.unequivocal.co.uk> |
| Date | Mon, 4 May 2015 01:48:47 +1000 |
| Subject | Re: Unicode surrogate pairs (Python 3.4) |
| From | Chris Angelico <rosuav@gmail.com> |
| Cc | "python-list@python.org" <python-list@python.org> |
| Content-Type | text/plain; charset=UTF-8 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.68.1430668130.12865.python-list@python.org> (permalink) |
| Lines | 19 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1430668130 news.xs4all.nl 2873 [2001:888:2000:d::a6]:59258 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:89876 |
Show key headers only | View raw
On Mon, May 4, 2015 at 1:32 AM, Jon Ribbens <jon+usenet@unequivocal.co.uk> wrote: >> You shouldn't even actually _have_ those in your string in the first >> place. How did you construct/receive that data? Ideally, catch it at >> that point, and deal with it there. > > That would, unfortunately, be "tell the Unicode Consortium to format > their documents differently", which seems unlikely to happen. I'm > trying to read in: http://www.unicode.org/Public/idna/6.3.0/IdnaTest.txt Ah, so what you _actually_ have is "\\udb40\\udd9d" - the backslashes are in your input. I'm not sure what the best way to deal with that is... it's a bit of a mess. You may find yourself needing to do something manually, unless there's a way to ask Python to encode to pseudo-UCS-2 that allows surrogates. Some languages may have sloppy conversions available, but Python's seems to be quite strict (which is correct). Is there an errors handler that can do this? ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 14:40 +0000
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 01:05 +1000
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 15:32 +0000
Re: Unicode surrogate pairs (Python 3.4) Marko Rauhamaa <marko@pacujo.net> - 2015-05-03 18:35 +0300
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 01:48 +1000
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 16:30 +0000
Re: Unicode surrogate pairs (Python 3.4) Chris Angelico <rosuav@gmail.com> - 2015-05-04 02:47 +1000
Re: Unicode surrogate pairs (Python 3.4) MRAB <python@mrabarnett.plus.com> - 2015-05-03 16:53 +0100
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 16:26 +0000
Re: Unicode surrogate pairs (Python 3.4) MRAB <python@mrabarnett.plus.com> - 2015-05-03 18:09 +0100
Re: Unicode surrogate pairs (Python 3.4) Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-03 19:20 +0000
csiph-web