Groups > comp.lang.java.programmer > #20864 > unrolled thread

question on java lang spec chapter 3.3 (unicode char lexing)

Started by	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
First post	2013-01-02 00:20 -0800
Last post	2013-01-02 19:54 -0500
Articles	20 on this page of 41 — 7 participants

Back to article view | Back to comp.lang.java.programmer

  question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 00:20 -0800
    Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 00:24 -0800
      Re: question on java lang spec chapter 3.3 (unicode char lexing) Patricia Shanahan <pats@acm.org> - 2013-01-02 12:24 -0800
    Re: question on java lang spec chapter 3.3 (unicode char lexing) Lew <lewbloch@gmail.com> - 2013-01-02 11:16 -0800
      Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 19:55 -0500
        Re: question on java lang spec chapter 3.3 (unicode char lexing) Lew <lewbloch@gmail.com> - 2013-01-02 17:21 -0800
          Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 20:40 -0500
    Re: question on java lang spec chapter 3.3 (unicode char lexing) Roedy Green <see_website@mindprod.com.invalid> - 2013-01-02 11:17 -0800
      Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 19:56 -0500
        Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 17:27 -0800
          Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 17:32 -0800
          Re: question on java lang spec chapter 3.3 (unicode char lexing) Lew <lewbloch@gmail.com> - 2013-01-02 17:42 -0800
            Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 17:55 -0800
              Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 18:02 -0800
                Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 21:12 -0500
                  Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 18:16 -0800
                    Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 21:20 -0500
                      Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 18:22 -0800
                        Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 21:26 -0500
                          Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 18:27 -0800
                            Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 21:46 -0500
                              Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 20:41 -0800
                                Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-06 21:54 -0500
              Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 21:15 -0500
                Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-02 18:20 -0800
          Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 21:17 -0500
          Re: question on java lang spec chapter 3.3 (unicode char lexing) Patricia Shanahan <pats@acm.org> - 2013-01-02 22:33 -0800
            Re: question on java lang spec chapter 3.3 (unicode char lexing) "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2013-01-05 12:58 +0000
              Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-05 05:34 -0800
                Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-05 05:40 -0800
                Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-06 21:56 -0500
        Re: question on java lang spec chapter 3.3 (unicode char lexing) Martin Gregorie <martin@address-in-sig.invalid> - 2013-01-03 21:14 +0000
          Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-03 17:51 -0500
            Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-03 20:54 -0800
              Re: question on java lang spec chapter 3.3 (unicode char lexing) Martin Gregorie <martin@address-in-sig.invalid> - 2013-01-05 00:15 +0000
              Re: question on java lang spec chapter 3.3 (unicode char lexing) "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2013-01-05 13:03 +0000
                Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-05 05:25 -0800
                  Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-06 21:49 -0500
                    Re: question on java lang spec chapter 3.3 (unicode char lexing) "Aryeh M. Friedman" <Aryeh.Friedman@gmail.com> - 2013-01-06 23:26 -0800
              Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-06 21:44 -0500
    Re: question on java lang spec chapter 3.3 (unicode char lexing) Arne Vajhøj <arne@vajhoej.dk> - 2013-01-02 19:54 -0500

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

#20909

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-02 21:46 -0500
Message-ID	<50e4f0f7$0$284$14726298@news.sunsite.dk>
In reply to	#20907

On 1/2/2013 9:27 PM, Aryeh M. Friedman wrote:
> On Wednesday, January 2, 2013 9:26:03 PM UTC-5, Arne Vajhøj wrote:
>> On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:
>>> an other requirement not satisfied by any IDE we have found is
>>> the
>>> ability to lay the source tree out in such a way that it can be
>>> compiled without the IDE (a requirement for almost all our
>>> projects
>>> because none of our clients have IDE's and in almost all cases
>>> there
>>> are minor changes needed to make the code happy on their site
>>> that
>>
>>> make testing impossible on the development machine)
>>
>> The Java IDE's I know put code in a structure that fits
>>
>> java tools, ant and maven.
>
> And in almost any non-trivial case this is completely incorrect...

Given that a big part (my estimate: 80-90%!) of all Java applications
are build:
- developer use IDE and checkin to VCS
- build process checkout from VCS and use ant/maven to build
then it has to be correct.

> even though I love Java as a lang I have a serious issue with some of
> the attitudes/assumptions made by tools... namely the universe does
> not revolve around the JVM

I find it natural that tools developed for Java development are the
best for Java development and tools developed for C development are
the best for C development and ... PHP ... Python ... etc..

Arne

[toc] | [prev] | [next] | [standalone]

#20912

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-02 20:41 -0800
Message-ID	<d4cd0273-9399-4353-8e44-53e03f59162a@googlegroups.com>
In reply to	#20909

On Wednesday, January 2, 2013 9:46:12 PM UTC-5, Arne Vajhøj wrote:
> On 1/2/2013 9:27 PM, Aryeh M. Friedman wrote:
> 
> > On Wednesday, January 2, 2013 9:26:03 PM UTC-5, Arne Vajhï¿½j wrote:
> 
> >> On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:
> 
> >>> an other requirement not satisfied by any IDE we have found is
> 
> >>> the
> 
> >>> ability to lay the source tree out in such a way that it can be
> 
> >>> compiled without the IDE (a requirement for almost all our
> 
> >>> projects
> 
> >>> because none of our clients have IDE's and in almost all cases
> 
> >>> there
> 
> >>> are minor changes needed to make the code happy on their site
> 
> >>> that
> 
> >>
> 
> >>> make testing impossible on the development machine)
> 
> >>
> 
> >> The Java IDE's I know put code in a structure that fits
> 
> >>
> 
> >> java tools, ant and maven.
> 
> >
> 
> > And in almost any non-trivial case this is completely incorrect...
> 
> 
> 
> Given that a big part (my estimate: 80-90%!) of all Java applications
> 
> are build:
> 
> - developer use IDE and checkin to VCS
> 
> - build process checkout from VCS and use ant/maven to build
> 
> then it has to be correct.

Correct in what sense?  Passing it's own tests?   If that is the case aegis is the *ONLY* VCS that actually requires this before a checkin.  The idea there is the baseline (repo in most other VCS's jargon) is guernteed to be working (as defined above)).  Namely every modification is *NEW* [see note] atomic in regards to new functionality and *MUST* be accompanied by automated tests (it is possible to turn this off but for obvious reasons not recommended unless the change is essencially untestable like documentation updates).

> 
> 
> 
> > even though I love Java as a lang I have a serious issue with some of
> 
> > the attitudes/assumptions made by tools... namely the universe does
> 
> > not revolve around the JVM
> 
> 
> 
> I find it natural that tools developed for Java development are the
> 
> best for Java development and tools developed for C development are
> 
> the best for C development and ... PHP ... Python ... etc..

Most real world projects (unless they a part of a larger effort) have several components/languages (for us for example it is typical to have a HTML/CSS/JS component and a Java/"JSP" component [I am defining "JSP" a little loosely because we often need to support more then just web front-ends]... it is also common for us to have some native code accessed via a JNLP wrapper)...

Note:

There is a slight mismatch between aegis's requirements in this reguard and how xUnit like frameworks work.   We typically solve this by reusing the same test script but requiring that the total number of pass's needs to be at least one larger then the previous change.

[toc] | [prev] | [next] | [standalone]

#21123

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-06 21:54 -0500
Message-ID	<50ea38f0$0$282$14726298@news.sunsite.dk>
In reply to	#20912

On 1/2/2013 11:41 PM, Aryeh M. Friedman wrote:
> On Wednesday, January 2, 2013 9:46:12 PM UTC-5, Arne Vajhøj wrote:
>> On 1/2/2013 9:27 PM, Aryeh M. Friedman wrote:
>>> On Wednesday, January 2, 2013 9:26:03 PM UTC-5, Arne Vajhï¿½j wrote:
>>>> On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:
>>>>> an other requirement not satisfied by any IDE we have found is
>>>>> the
>>>>> ability to lay the source tree out in such a way that it can be
>>>>> compiled without the IDE (a requirement for almost all our
>>>>> projects
>>>>> because none of our clients have IDE's and in almost all cases
>>>>> there
>>>>> are minor changes needed to make the code happy on their site
>>>>> that
>>>>> make testing impossible on the development machine)
>>>> The Java IDE's I know put code in a structure that fits
>>>> java tools, ant and maven.
>>> And in almost any non-trivial case this is completely incorrect...
>>
>> Given that a big part (my estimate: 80-90%!) of all Java applications
>>
>> are build:
>>
>> - developer use IDE and checkin to VCS
>> - build process checkout from VCS and use ant/maven to build
>>
>> then it has to be correct.
>
> Correct in what sense?

Same sense as you used incorrect!

>>> even though I love Java as a lang I have a serious issue with some of
>>> the attitudes/assumptions made by tools... namely the universe does
>>> not revolve around the JVM
>>
>> I find it natural that tools developed for Java development are the
>> best for Java development and tools developed for C development are
>> the best for C development and ... PHP ... Python ... etc..
>
> Most real world projects (unless they a part of a larger effort) have
> several components/languages (for us for example it is typical to
> have a HTML/CSS/JS component and a Java/"JSP" component [I am
> defining "JSP" a little loosely because we often need to support more
> then just web front-ends]... it is also common for us to have some
> native code accessed via a JNLP wrapper)...

(JNI wrapper??)

Eclipse and NetBeans can support all those languages.

But if you have sufficient much work in each language then
a different IDE for the HTML/CSS/JS and another for the
C/C++ could make sense.

You may want to use ant for the Java stuff and make for the
C/C++ stuff.

But ant can call make and make can call ant, so they can be integrated.

Arne

[toc] | [prev] | [next] | [standalone]

#20900

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-02 21:15 -0500
Message-ID	<50e4e9d0$0$287$14726298@news.sunsite.dk>
In reply to	#20897

On 1/2/2013 8:55 PM, Aryeh M. Friedman wrote:
> On Wednesday, January 2, 2013 8:42:57 PM UTC-5, Lew wrote:
>> On Wednesday, January 2, 2013 5:27:21 PM UTC-8, Aryeh M. Friedman
>>> A long term personal project of mine is to write a OS completely
>>> from the ground up in a super set of Java (the only addition I
>>> see that is needed is some type of "safe" pointer type)... in
>>> this case safe being defined as you can assign a literal address
>>> to it but your not allowed to do ptr math on it
>>
>> Also known as "a JVM"?
>
> As far I know the JVM can not be directly booted (as in if I turn on
> my PC it can not boot into the JVM)...

The JVM was not created for writing OS'es but for writing
applications so that is correct.

>                                             neither for performance
> reasons does it make sense to run a VM at the bottom layer...

Performance should not be a problem.

>                                                                   an
> other reason is there is a lot of junk in the JRE (like how do you do
> garbage collection if you do not have some way of the OS allocating
> mem to a process in the first place)

Again. Java was designed to write applications not OS'es.

The "junk" you are talking about is what makes it useful
for the big majority.

Arne

[toc] | [prev] | [next] | [standalone]

#20903

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-02 18:20 -0800
Message-ID	<1c8ea3fc-155c-471c-b7ba-6e3fb6f6c71d@googlegroups.com>
In reply to	#20900

> The JVM was not created for writing OS'es but for writing
> 
> applications so that is correct.

In my professional life that's how I use java the comment only pertained to the motivation for writing a native compiler (which is for fun)

[toc] | [prev] | [next] | [standalone]

#20902

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-02 21:17 -0500
Message-ID	<50e4ea1f$0$287$14726298@news.sunsite.dk>
In reply to	#20891

On 1/2/2013 8:27 PM, Aryeh M. Friedman wrote:
> the ground up in a super set of Java (the only addition I see that is
> needed is some type of "safe" pointer type)... in this case safe
> being defined as you can assign a literal address to it but your not
> allowed to do ptr math on it

See http://en.wikipedia.org/wiki/Singularity_%28operating_system%29
for an example of OS in managed language.

Arne

[toc] | [prev] | [next] | [standalone]

#20915

From	Patricia Shanahan <pats@acm.org>
Date	2013-01-02 22:33 -0800
Message-ID	<NYqdnSF1y-Gvu3jNnZ2dnUVZ_qmdnZ2d@earthlink.com>
In reply to	#20891

On 1/2/2013 5:27 PM, Aryeh M. Friedman wrote:
>
>>
>> Well - since he is writing a lexer for Java then ...
>
> A little more on the project... while the over all project *IS* for
> fun a few components may find there way into more serious work
> related projects but only to be used on code written by me or others
> on my team... specifically we may use the lexing/parsing component to
> make the following tools (the actual code generation/etc. of the
> compilation is currently purely fun [see note]):

If you intend the lexing/parsing component for production work, you
should deal with the escapes.

You may someday want to import code that was, for example, edited using
a keyboard that did not support all the characters that were needed, or
where the programmer wanted to be absolutely sure that a particular
Unicode character was used.

You would at least need to detect the escapes to get a usable error
message. Once you have done that, it is so easy to replace each escape
with the equivalent Unicode character that it is not worth doing
anything else.

Patricia

[toc] | [prev] | [next] | [standalone]

#20979

From	"Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Date	2013-01-05 12:58 +0000
Message-ID	<WZ6dnS6Bc7o4vnXNnZ2dnUVZ8kOdnZ2d@bt.com>
In reply to	#20915

Patricia Shanahan wrote:

> You would at least need to detect the escapes to get a usable error
> message. Once you have done that, it is so easy to replace each escape
> with the equivalent Unicode character that it is not worth doing
> anything else.

I'm not so sure about that.  IIRC the rules about interpretting Unicode escapes 
have some seriously wierd convolutions. Something to do with protecting against 
multiply-encoded files, I think.  It badly fails the Principle of Least WTF.

It's in the spec, but I'm too lazy to go find the exact reference :-(

    -- chruis

[toc] | [prev] | [next] | [standalone]

#20985

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-05 05:34 -0800
Message-ID	<70744efd-9848-42ef-944f-dcd667f75045@googlegroups.com>
In reply to	#20979

On Saturday, January 5, 2013 7:58:57 AM UTC-5, Chris Uppal wrote:
> Patricia Shanahan wrote:
> 
> 
> 
> > You would at least need to detect the escapes to get a usable error
> 
> > message. Once you have done that, it is so easy to replace each escape
> 
> > with the equivalent Unicode character that it is not worth doing
> 
> > anything else.
> 
> 
> 
> I'm not so sure about that.  IIRC the rules about interpretting Unicode escapes 
> 
> have some seriously wierd convolutions. Something to do with protecting against 
> 
> multiply-encoded files, I think.  It badly fails the Principle of Least WTF.
> 
> 
> 
> It's in the spec, but I'm too lazy to go find the exact reference :-(
> 
> 
> 
>     -- chruis

agreed for example the following is just ugly but perfectly valid Java code:

Foo.java:
\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A\u0009\u007D\u000A\u007D\u000A

% javac Foo.java
% java Foo
hello, world

[toc] | [prev] | [next] | [standalone]

#20986

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-05 05:40 -0800
Message-ID	<aaabd6d4-87ba-4522-8953-25bca99d1ccd@googlegroups.com>
In reply to	#20985

On Saturday, January 5, 2013 8:34:38 AM UTC-5, Aryeh M. Friedman wrote:
> On Saturday, January 5, 2013 7:58:57 AM UTC-5, Chris Uppal wrote:
> 
> > Patricia Shanahan wrote:
> 
> > 
> 
> > 
> 
> > 
> 
> > > You would at least need to detect the escapes to get a usable error
> 
> > 
> 
> > > message. Once you have done that, it is so easy to replace each escape
> 
> > 
> 
> > > with the equivalent Unicode character that it is not worth doing
> 
> > 
> 
> > > anything else.
> 
> > 
> 
> > 
> 
> > 
> 
> > I'm not so sure about that.  IIRC the rules about interpretting Unicode escapes 
> 
> > 
> 
> > have some seriously wierd convolutions. Something to do with protecting against 
> 
> > 
> 
> > multiply-encoded files, I think.  It badly fails the Principle of Least WTF.
> 
> > 
> 
> > 
> 
> > 
> 
> > It's in the spec, but I'm too lazy to go find the exact reference :-(
> 
> > 
> 
> > 
> 
> > 
> 
> >     -- chruis
> 
> 
> 
> agreed for example the following is just ugly but perfectly valid Java code:
> 
> 
> 
> Foo.java:
> 
> \u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A\u0009\u007D\u000A\u007D\u000A
> 
> 
> 
> % javac Foo.java
> 
> % java Foo
> 
> hello, world

Just a quick note I did end up implementing unicode escapes the way JLSv3 says to and the above is one our test inputs...

[toc] | [prev] | [next] | [standalone]

#21124

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-06 21:56 -0500
Message-ID	<50ea395e$0$282$14726298@news.sunsite.dk>
In reply to	#20985

On 1/5/2013 8:34 AM, Aryeh M. Friedman wrote:
  agreed for example the following is just ugly but perfectly valid Java 
code:
>
> Foo.java:
> \u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A\u0009\u007D\u000A\u007D\u000A
>
> % javac Foo.java
> % java Foo
> hello, world

:-)

It is one of those features that can certainly be misused.

Arne

[toc] | [prev] | [next] | [standalone]

#20929

From	Martin Gregorie <martin@address-in-sig.invalid>
Date	2013-01-03 21:14 +0000
Message-ID	<kc4sbi$4at$2@localhost.localdomain>
In reply to	#20888

On Wed, 02 Jan 2013 19:56:13 -0500, Arne Vajhøj wrote:

> On 1/2/2013 2:17 PM, Roedy Green wrote:
>> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
>> <Aryeh.Friedman@gmail.com> wrote, quoted or indirectly quoted someone
>> who said :
>>
>>> (\uXXXX)
>>
>> The only places you encounter such escapes are in Java source and
>> possibly resource bundles.
> 
> Well - since he is writing a lexer for Java then ...
> 
...which, being lazy, I would not do from scratch. 

Instead, I'd use the Java version of the Coco/R package, which generates 
the lexer and parser as Java source within a framework. Unlike some 
similar tools, you're almost encouraged to rewrite the framework to suit 
your requirements. This is quite short and written in standard Java, so 
modifying it is very easy.

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]

#20933

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-03 17:51 -0500
Message-ID	<50e60b8f$0$282$14726298@news.sunsite.dk>
In reply to	#20929

On 1/3/2013 4:14 PM, Martin Gregorie wrote:
> On Wed, 02 Jan 2013 19:56:13 -0500, Arne Vajhøj wrote:
>
>> On 1/2/2013 2:17 PM, Roedy Green wrote:
>>> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
>>> <Aryeh.Friedman@gmail.com> wrote, quoted or indirectly quoted someone
>>> who said :
>>>
>>>> (\uXXXX)
>>>
>>> The only places you encounter such escapes are in Java source and
>>> possibly resource bundles.
>>
>> Well - since he is writing a lexer for Java then ...
>>
> ...which, being lazy, I would not do from scratch.
>
> Instead, I'd use the Java version of the Coco/R package, which generates
> the lexer and parser as Java source within a framework. Unlike some
> similar tools, you're almost encouraged to rewrite the framework to suit
> your requirements. This is quite short and written in standard Java, so
> modifying it is very easy.

Good point.

Arne

[toc] | [prev] | [next] | [standalone]

#20940

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-03 20:54 -0800
Message-ID	<f3060a14-b802-48f2-8251-a50c77fd7e67@googlegroups.com>
In reply to	#20933

On Thursday, January 3, 2013 5:51:55 PM UTC-5, Arne Vajhøj wrote:
> On 1/3/2013 4:14 PM, Martin Gregorie wrote:
> 
> > On Wed, 02 Jan 2013 19:56:13 -0500, Arne Vajhøj wrote:
> 
> >
> 
> >> On 1/2/2013 2:17 PM, Roedy Green wrote:
> 
> >>> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
> 
> >>> <Aryeh.Friedman@gmail.com> wrote, quoted or indirectly quoted someone
> 
> >>> who said :
> 
> >>>
> 
> >>>> (\uXXXX)
> 
> >>>
> 
> >>> The only places you encounter such escapes are in Java source and
> 
> >>> possibly resource bundles.
> 
> >>
> 
> >> Well - since he is writing a lexer for Java then ...
> 
> >>
> 
> > ...which, being lazy, I would not do from scratch.
> 
> >
> 
> > Instead, I'd use the Java version of the Coco/R package, which generates
> 
> > the lexer and parser as Java source within a framework. Unlike some
> 
> > similar tools, you're almost encouraged to rewrite the framework to suit
> 
> > your requirements. This is quite short and written in standard Java, so
> 
> > modifying it is very easy.
> 
> 
> 
> Good point.
> 
> 
> 
> Arne

The only issue is likely a philosophical one in that I have *NEVER* trusted code generators of any kind they either produce impossible to follow/debug code or have all kinds of fluff in them (the classic example in my mind [html which is not really a programming lang ;-)] is Dreamweaver that produces 75 lines of HTML for "hello, world").

[toc] | [prev] | [next] | [standalone]

#20959

From	Martin Gregorie <martin@address-in-sig.invalid>
Date	2013-01-05 00:15 +0000
Message-ID	<kc7rb5$uah$1@localhost.localdomain>
In reply to	#20940

On Thu, 03 Jan 2013 20:54:09 -0800, Aryeh M. Friedman wrote:

> The only issue is likely a philosophical one in that I have *NEVER*
> trusted code generators of any kind they either produce impossible to
> follow/debug code or have all kinds of fluff in them (the classic
> example in my mind [html which is not really a programming lang ;-)] is
> Dreamweaver that produces 75 lines of HTML for "hello, world").
>
Just saying.

Try it. Look at the generated code. Use it or not. Your choice. 
If you've used Lex and YACC (or Flex and Bison) the learning curve is 
short.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]

#20981

From	"Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Date	2013-01-05 13:03 +0000
Message-ID	<lYKdnXBDefeeu3XNnZ2dnUVZ8r6dnZ2d@bt.com>
In reply to	#20940

Aryeh M. Friedman wrote:

> The only issue is likely a philosophical one in that I have *NEVER*
> trusted code generators of any kind

So you don't care for compilers ?

;-)

    -- chris

P.S.  Seriously: the point of classic compiler generators (or 
"compiler-compilers" as they were often called) are to produce code that works 
and that runs fast in little space.  It is not /AT ALL/ a design principle that 
the code should be comprehensible to humans -- in fact for the kinds of 
algorithms they use, there is no way the resulting code and tables could be 
remotely comprehensible (to an ordinary programmer), that is /why/ we use code 
generators.

[toc] | [prev] | [next] | [standalone]

#20982

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-05 05:25 -0800
Message-ID	<af9ec6de-5eb4-49e3-a2b0-99805fcc48ab@googlegroups.com>
In reply to	#20981

On Saturday, January 5, 2013 8:03:00 AM UTC-5, Chris Uppal wrote:
> Aryeh M. Friedman wrote:
> 
> 
> 
> > The only issue is likely a philosophical one in that I have *NEVER*
> 
> > trusted code generators of any kind
> 
> 
> 
> So you don't care for compilers ?
> 
> 
> 
> ;-)
> 
> 
> 
>     -- chris
> 
> 
> 
> P.S.  Seriously: the point of classic compiler generators (or 
> 
> "compiler-compilers" as they were often called) are to produce code that works 
> 
> and that runs fast in little space.  It is not /AT ALL/ a design principle that 
> 
> the code should be comprehensible to humans -- in fact for the kinds of 
> 
> algorithms they use, there is no way the resulting code and tables could be 
> 
> remotely comprehensible (to an ordinary programmer), that is /why/ we use code 
> 
> generators.

Machine code was never meant to be readable but high level languages can and should be ;-).... on the serious side of the debate there are reasons for shying away from code generators in my case that are currently proprietary (some of the lesser results will likely be FOSS'ed though)... the main reason is we need to (in some cases) deal with multiple languages in the same compilation unit and have developed fairly good (at least in theory and my "fun work" is really nothing more then a proof of concept, without the pressure of deadlines and such, with Java as a typical non-trivial language to work with from the compiler POV)... due to the above using a parse generator would make it very inefficient to create the needed parsers since they are (by there very nature) very non-OO in how they deal with more then one grammar at once... namely they are designed to deal with single languages at a time and not "families" of them

[toc] | [prev] | [next] | [standalone]

#21122

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-06 21:49 -0500
Message-ID	<50ea37b0$0$282$14726298@news.sunsite.dk>
In reply to	#20982

On 1/5/2013 8:25 AM, Aryeh M. Friedman wrote:
> Machine code was never meant to be readable but high level languages
> can and should be ;-).... on the serious side of the debate there are
> reasons for shying away from code generators in my case that are
> currently proprietary (some of the lesser results will likely be
> FOSS'ed though)... the main reason is we need to (in some cases) deal
> with multiple languages in the same compilation unit and have
> developed fairly good (at least in theory and my "fun work" is really
> nothing more then a proof of concept, without the pressure of
> deadlines and such, with Java as a typical non-trivial language to
> work with from the compiler POV)... due to the above using a parse
> generator would make it very inefficient to create the needed parsers
> since they are (by there very nature) very non-OO in how they deal
> with more then one grammar at once... namely they are designed to
> deal with single languages at a time and not "families" of them

????

You have:

1 handwritten lexer + 1 handwritten parser vs 1 generated lexer + 1 
generated parser

and:

N handwritten lexers + N handwritten parsers vs N generated lexers + N 
generated parsers

If it is cheaper to generate for 1 then I would expect it to be cheaper
to generate for N as well.

That the generated lexers and parsers may be more procedural than
object oriented should not be a show stopper.

Common languages like C++ and Java can fine call different
functions from different classes.

Arne

[toc] | [prev] | [next] | [standalone]

#21135

From	"Aryeh M. Friedman" <Aryeh.Friedman@gmail.com>
Date	2013-01-06 23:26 -0800
Message-ID	<1217f831-3562-4c62-856f-8a32f8ff2e38@googlegroups.com>
In reply to	#21122

On Sunday, January 6, 2013 9:49:17 PM UTC-5, Arne Vajhøj wrote:
> On 1/5/2013 8:25 AM, Aryeh M. Friedman wrote:
> 
> > Machine code was never meant to be readable but high level languages
> 
> > can and should be ;-).... on the serious side of the debate there are
> 
> > reasons for shying away from code generators in my case that are
> 
> > currently proprietary (some of the lesser results will likely be
> 
> > FOSS'ed though)... the main reason is we need to (in some cases) deal
> 
> > with multiple languages in the same compilation unit and have
> 
> > developed fairly good (at least in theory and my "fun work" is really
> 
> > nothing more then a proof of concept, without the pressure of
> 
> > deadlines and such, with Java as a typical non-trivial language to
> 
> > work with from the compiler POV)... due to the above using a parse
> 
> > generator would make it very inefficient to create the needed parsers
> 
> > since they are (by there very nature) very non-OO in how they deal
> 
> > with more then one grammar at once... namely they are designed to
> 
> > deal with single languages at a time and not "families" of them
> 
> 
> 
> ????
> 
> 
> 
> You have:
> 
> 
> 
> 1 handwritten lexer + 1 handwritten parser vs 1 generated lexer + 1 
> 
> generated parser
> 
> 
> 
> and:
> 
> 
> 
> N handwritten lexers + N handwritten parsers vs N generated lexers + N 
> 
> generated parsers
> 
> 
> 
> If it is cheaper to generate for 1 then I would expect it to be cheaper
> 
> to generate for N as well.
> 
> 
> 
> That the generated lexers and parsers may be more procedural than
> 
> object oriented should not be a show stopper.
> 
> 
> 
> Common languages like C++ and Java can fine call different
> 
> functions from different classes.
> 
> 
> 
> Arne

Don't forget domain specific langs some of which may rewrite the actual content of the other embedded langs... bottom line a well designed version of this is cheaper in the long run if one of the goals is to quickly add new langs to each family

besides which I compared my hand written code to that produced by yacc/lex (and antlr to make sure I was not seeing stuff) and 1) mine is a fraction of the line count [about 90% smaller], 2) Has a much lower big-O (O(n) vs. O(n^2)), 3) is trivial to hand trace (why I would want to is any other point ;-)), 4) easier to test with unit testing because you can actual get under the hood unlike the above that is totally opaque

[toc] | [prev] | [next] | [standalone]

#21120

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2013-01-06 21:44 -0500
Message-ID	<50ea3696$0$282$14726298@news.sunsite.dk>
In reply to	#20940

On 1/3/2013 11:54 PM, Aryeh M. Friedman wrote:
> The only issue is likely a philosophical one in that I have *NEVER*
> trusted code generators of any kind they either produce impossible to
> follow/debug code or have all kinds of fluff in them (the classic
> example in my mind [html which is not really a programming lang ;-)]
> is Dreamweaver that produces 75 lines of HTML for "hello, world").

Sounds like NIH.

The generated code may be hard to follow, but will be more
well tested.

Arne

[toc] | [prev] | [next] | [standalone]

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

csiph-web

question on java lang spec chapter 3.3 (unicode char lexing)

Contents

#20909

#20912

#21123

#20900

#20903

#20902

#20915

#20979

#20985

#20986

#21124

#20929

#20933

#20940

#20959

#20981

#20982

#21122

#21135

#21120