Groups > comp.lang.java.programmer > #7947

Re: unicode

From	markspace <-@.>
Newsgroups	comp.lang.java.programmer
Subject	Re: unicode
Date	2011-09-12 20:16 -0700
Organization	A noiseless patient Spider
Message-ID	<j4mhtv$ppb$1@dont-email.me> (permalink)
References	<6c991195-ab57-417c-92e0-6d5ee1c451dc@dq7g2000vbb.googlegroups.com> <nfss679ije8c4r70tn9kmnr055vm6nfua0@4ax.com> <4e6e7a2a$0$309$14726298@news.sunsite.dk> <j4m4rs$l5g$1@dont-email.me> <88ff0d8c-af5f-4086-8232-26c80e5d8270@glegroupsg2000goo.googlegroups.com>

Show all headers | View raw

On 9/12/2011 5:46 PM, Lew wrote:
>
> That would defeat its purpose, which is somewhat similar to the
> purpose of trigraphs in C, AIUI.

There's only nine trigraphs, they're a lot harder to "hit" accidentally.

>  That is, if your keyboard lacks
> certain characters, you can express source in "\u" notation and the
> source parser will read it correctly.

The problem is that \u is a lot more common than ??-.  For example, \u 
also occurs in regex, which unfortunately seems to be the OP's confusion.

>  Its whole raison d'etre is to
> precede compilation, not to be part of it.  So how could it go away?
> What would you do instead?

I'd make the \u sequence a string and character escape.  \u00A0 would be 
interpreted the same as \n.  It would put a new line in the string, not 
in the compiler input.  Every other type of \u escape (comments, parts 
of code) would be interpreted literally.  Legacy code that relies on \u 
outside of strings and character constants would break.  If you need to 
type a character that your keyboard doesn't have, get your editor to 
recognize an escape sequence, not the compiler.

There's also digraphs in C, which are only recognized in tokenization, 
not as a preprocessed type of substitution.  These are much better, as 
they are not recognized in string literals, character literals, or 
comments.  I'd consider replacing \u for "missing keys" with C's 
digraphs.  There's only five digraphs in C.

The presence of \u in comments is especially pernicious, imo.  The Java 
doc tool already has HTML escapes, we don't need a second redundant 
method of specifying unusual characters.

Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

unicode bob <bob@coolgroups.com> - 2011-09-12 12:24 -0700
  Re: unicode Knute Johnson <nospam@knutejohnson.com> - 2011-09-12 14:04 -0700
  Re: unicode Roedy Green <see_website@mindprod.com.invalid> - 2011-09-12 14:08 -0700
    Re: unicode Arne Vajhøj <arne@vajhoej.dk> - 2011-09-12 17:31 -0400
      Re: unicode markspace <-@.> - 2011-09-12 16:33 -0700
        Re: unicode Lew <lewbloch@gmail.com> - 2011-09-12 17:46 -0700
          Re: unicode markspace <-@.> - 2011-09-12 20:16 -0700
          Re: unicode Roedy Green <see_website@mindprod.com.invalid> - 2011-09-12 22:05 -0700
            Re: unicode Roedy Green <see_website@mindprod.com.invalid> - 2011-09-12 22:10 -0700
            Re: unicode Andreas Leitgeb <avl@gamma.logic.tuwien.ac.at> - 2011-09-13 07:18 +0000
        Re: unicode Arne Vajhøj <arne@vajhoej.dk> - 2011-09-12 20:57 -0400
          Re: unicode markspace <-@.> - 2011-09-12 19:51 -0700
            Re: unicode Arne Vajhøj <arne@vajhoej.dk> - 2011-09-13 20:17 -0400
              Re: unicode markspace <-@.> - 2011-09-13 19:32 -0700
                Re: unicode Roedy Green <see_website@mindprod.com.invalid> - 2011-09-14 11:49 -0700
          Re: unicode Paul Cager <paul.cager@googlemail.com> - 2011-09-13 04:05 -0700
        Re: unicode Roedy Green <see_website@mindprod.com.invalid> - 2011-09-12 22:02 -0700
          Re: unicode Arne Vajhøj <arne@vajhoej.dk> - 2011-09-13 20:30 -0400
  Re: unicode Arne Vajhøj <arne@vajhoej.dk> - 2011-09-12 17:29 -0400
    Re: unicode Lew <lewbloch@gmail.com> - 2011-09-12 15:48 -0700

csiph-web