Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'argument': 0.05; 'subject:text': 0.05; 'string': 0.09; 'mind,': 0.09; 'strings.': 0.09; 'subject:question': 0.10; 'cc:addr:python-list': 0.11; '1:09': 0.16; 'finney': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'help?': 0.16; 'literal,': 0.16; 'surrogate': 0.16; 'valueerror,': 0.16; 'weird': 0.16; 'appropriate': 0.16; 'language': 0.16; 'wrote:': 0.18; 'not,': 0.20; 'cc:addr:python.org': 0.22; 'error': 0.23; 'compilation': 0.24; "shouldn't": 0.24; 'unicode': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'specifically': 0.29; 'statement': 0.30; 'message-id:@mail.gmail.com': 0.30; "d'aprano": 0.31; 'steven': 0.31; 'writes:': 0.31; 'cases': 0.33; "i'd": 0.34; 'received:google.com': 0.35; 'there': 0.35; 'edge': 0.36; 'should': 0.36; 'error.': 0.37; 'ben': 0.38; 'pm,': 0.38; 'does': 0.39; 'more': 0.64; 'mar': 0.68; '2015': 0.84; 'hassle': 0.84; 'lone': 0.84; 'worth?': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=ggi92MU2ZethyHxZxkJoeb2l6Rdou1n2X85+KHlv3No=; b=gXSSK9RjTKnFGY9xiLvkrQGTEXjESynfRFyX4ygoPQDj+6RwEgJn+EuoOGiiXP7Waf LWUjqA8ylgs3Dy4t2B6gpl3DS3sxytswjcjCaxtoUIDcdPe4fkxmhu6NoT+ppvB40d2S e0PjegkGN9tGDyOgyv/iNgjeoo/aWVIdsivaStCl3crQPk3VE+K7nDuvz2iBy6ns+5U3 rYYwZPEj4hHdjlZjWK9HSJqwu5evpjpn5UKm2xLzSCPhhIg7p/AO5lsHsmLu4aM7h6BI LzjnzCKxjeHuD20C2oS0/+Of88kWzzdTM+erp0EFBPjLGmJhXs1RIkUotgkagQAv0Efz hW5A== MIME-Version: 1.0 X-Received: by 10.107.136.14 with SMTP id k14mr27875510iod.53.1425867518674; Sun, 08 Mar 2015 19:18:38 -0700 (PDT) In-Reply-To: <85y4n6x6p5.fsf@benfinney.id.au> References: <9169f3b1-2ac7-42a3-8033-584f84b88a1f@googlegroups.com> <7a75a23c-4678-4d7a-a2ec-9e8fff4c07f8@googlegroups.com> <132d5ce6-f672-4eec-99f9-1cc9e88b94f3@googlegroups.com> <619e4cb5-1c4c-449b-a5d7-951101b32b45@googlegroups.com> <54f862ca$0$13014$c3e8da3$5496439d@news.astraweb.com> <54fadc70$0$13004$c3e8da3$5496439d@news.astraweb.com> <87twxxxbvd.fsf@elektro.pacujo.net> <54fb1bf4$0$12993$c3e8da3$5496439d@news.astraweb.com> <87twxw4xlz.fsf@elektro.pacujo.net> <54fba9d4$0$12988$c3e8da3$5496439d@news.astraweb.com> <87y4n8uf9a.fsf@elektro.pacujo.net> <87twxvvrjl.fsf@elektro.pacujo.net> <54fc9400$0$13009$c3e8da3$5496439d@news.astraweb.com> <87d24juu8r.fsf@elektro.pacujo.net> <54fcfac0$0$12995$c3e8da3$5496439d@news.astraweb.com> <85y4n6x6p5.fsf@benfinney.id.au> Date: Mon, 9 Mar 2015 13:18:38 +1100 Subject: Re: Newbie question about text encoding From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.19 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 21 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1425867521 news.xs4all.nl 2938 [2001:888:2000:d::a6]:44121 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:87169 On Mon, Mar 9, 2015 at 1:09 PM, Ben Finney wrote: > Steven D'Aprano writes: > >> '\udd00' should be a SyntaxError. > > I find your argument convincing, that attempting to construct a Unicode > string of a lone surrogate should be an error. > > Shouldn't the error type be a ValueError, though? The statement is not, > to my mind, erroneous syntax. For the string literal, I would say SyntaxError is more appropriate than ValueError, as a string object has to be constructed at compilation time. I'd still like to see a report from someone who has used a language that specifically disallows all surrogates in strings. Does it help? Is it more hassle than it's worth? Are there weird edge cases that it breaks? ChrisA