X-Received: by 10.68.226.168 with SMTP id rt8mr7124518pbc.8.1357154165887; Wed, 02 Jan 2013 11:16:05 -0800 (PST) Received: by 10.50.13.130 with SMTP id h2mr12921215igc.16.1357154165642; Wed, 02 Jan 2013 11:16:05 -0800 (PST) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!f6no6141762pbd.1!news-out.google.com!s9ni70930pbb.0!nntp.google.com!kr7no14839029pbb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.java.programmer Date: Wed, 2 Jan 2013 11:16:05 -0800 (PST) In-Reply-To: <0f28108e-6d35-43a1-a9df-b6c5636fb0ec@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=69.28.149.29; posting-account=CP-lKQoAAAAGtB5diOuGlDQk0jIwmH0T NNTP-Posting-Host: 69.28.149.29 References: <0f28108e-6d35-43a1-a9df-b6c5636fb0ec@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: question on java lang spec chapter 3.3 (unicode char lexing) From: Lew Injection-Date: Wed, 02 Jan 2013 19:16:05 +0000 Content-Type: text/plain; charset=ISO-8859-1 Xref: csiph.com comp.lang.java.programmer:20876 On Wednesday, January 2, 2013 12:20:12 AM UTC-8, Aryeh M. Friedman wrote: > If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal > representation of text) and 100% of the code that I will be lexing is from that environment do I need still > deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code > will be imported from non-unicode environments What do you mean "have to deal with"? If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final authority on what that constitutes. Being "in a 100% unicode [sic] environment" (whatever that's supposed to mean) does not excuse any responsibilities. Nor does it obviate the need for the occasional "\uXXXX" in source. However, I don't think the lexer deals with that. Unicode escape sequences are a precompile phenomenon. Everything is substituted before parsing starts. -- Lew