X-Received: by 10.66.84.164 with SMTP id a4mr5950147paz.26.1357393248031; Sat, 05 Jan 2013 05:40:48 -0800 (PST) Received: by 10.49.94.143 with SMTP id dc15mr9231523qeb.32.1357393247975; Sat, 05 Jan 2013 05:40:47 -0800 (PST) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!b8no8206404pbd.0!news-out.google.com!s9ni79141pbb.0!nntp.google.com!b8no8206398pbd.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.java.programmer Date: Sat, 5 Jan 2013 05:40:47 -0800 (PST) In-Reply-To: <70744efd-9848-42ef-944f-dcd667f75045@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=50.14.119.173; posting-account=lPVJQwoAAACjT2AlnY0YSj2LC4j2qtwQ NNTP-Posting-Host: 50.14.119.173 References: <0f28108e-6d35-43a1-a9df-b6c5636fb0ec@googlegroups.com> <50e4d730$0$288$14726298@news.sunsite.dk> <24e3a8de-a422-4d4e-a319-aeedddb9df03@googlegroups.com> <70744efd-9848-42ef-944f-dcd667f75045@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: question on java lang spec chapter 3.3 (unicode char lexing) From: "Aryeh M. Friedman" Injection-Date: Sat, 05 Jan 2013 13:40:48 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.java.programmer:20986 On Saturday, January 5, 2013 8:34:38 AM UTC-5, Aryeh M. Friedman wrote: > On Saturday, January 5, 2013 7:58:57 AM UTC-5, Chris Uppal wrote: >=20 > > Patricia Shanahan wrote: >=20 > >=20 >=20 > >=20 >=20 > >=20 >=20 > > > You would at least need to detect the escapes to get a usable error >=20 > >=20 >=20 > > > message. Once you have done that, it is so easy to replace each escap= e >=20 > >=20 >=20 > > > with the equivalent Unicode character that it is not worth doing >=20 > >=20 >=20 > > > anything else. >=20 > >=20 >=20 > >=20 >=20 > >=20 >=20 > > I'm not so sure about that. IIRC the rules about interpretting Unicode= escapes=20 >=20 > >=20 >=20 > > have some seriously wierd convolutions. Something to do with protecting= against=20 >=20 > >=20 >=20 > > multiply-encoded files, I think. It badly fails the Principle of Least= WTF. >=20 > >=20 >=20 > >=20 >=20 > >=20 >=20 > > It's in the spec, but I'm too lazy to go find the exact reference :-( >=20 > >=20 >=20 > >=20 >=20 > >=20 >=20 > > -- chruis >=20 >=20 >=20 > agreed for example the following is just ugly but perfectly valid Java co= de: >=20 >=20 >=20 > Foo.java: >=20 > \u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\= u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u00= 69\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\= u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u00= 67\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\= u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u00= 2E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\= u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u00= 0A\u0009\u007D\u000A\u007D\u000A >=20 >=20 >=20 > % javac Foo.java >=20 > % java Foo >=20 > hello, world Just a quick note I did end up implementing unicode escapes the way JLSv3 s= ays to and the above is one our test inputs...