Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #59331 > unrolled thread

Oh look, another language (ceylon)

Started byNeal Becker <ndbecker2@gmail.com>
First post2013-11-13 14:33 -0500
Last post2013-11-18 14:56 +0000
Articles 20 on this page of 37 — 14 participants

Back to article view | Back to comp.lang.python


Contents

  Oh look, another language (ceylon) Neal Becker <ndbecker2@gmail.com> - 2013-11-13 14:33 -0500
    Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-17 16:41 +1300
      Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-17 15:10 +1100
      Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-17 05:48 +0000
        Re: Oh look, another language (ceylon) jkn <jkn_gg@nicorp.f9.co.uk> - 2013-11-17 00:34 -0800
      Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-17 12:41 +0000
        Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-18 11:33 +1300
          Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 11:42 +1100
          Re: Oh look, another language (ceylon) Tim Daneliuk <tundra@tundraware.com> - 2013-11-17 16:48 -0600
            Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-18 23:51 +0000
              Re: Oh look, another language (ceylon) Tim Daneliuk <tundra@tundraware.com> - 2013-11-18 18:31 -0600
      Re: Oh look, another language (ceylon) Rick Johnson <rantingrickjohnson@gmail.com> - 2013-11-17 16:18 -0800
        Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-18 19:45 +1300
          Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 17:56 +1100
            Re: Oh look, another language (ceylon) wxjmfauth@gmail.com - 2013-11-18 01:44 -0800
              Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-18 09:56 +0000
              Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 21:04 +1100
                Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-18 13:31 +0000
                  Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 00:39 +1100
                  Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-18 14:30 +0000
                    Re: Oh look, another language (ceylon) Dave Angel <davea@davea.name> - 2013-11-18 15:37 -0500
                      Re: Oh look, another language (ceylon) Steven D'Aprano <steve@pearwood.info> - 2013-11-19 02:29 +0000
                    Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 10:25 +1100
                      Re: Oh look, another language (ceylon) Steven D'Aprano <steve@pearwood.info> - 2013-11-19 02:13 +0000
                        Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 13:54 +1100
                        Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 13:56 +1100
                  Re: Oh look, another language (ceylon) wxjmfauth@gmail.com - 2013-11-19 01:10 -0800
                    Re: Oh look, another language (ceylon) Bob Martin <bob.martin@excite.com> - 2013-11-20 08:19 +0000
              Re: Oh look, another language (ceylon) Ian Kelly <ian.g.kelly@gmail.com> - 2013-11-18 05:29 -0700
              Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 23:36 +1100
                Re: Oh look, another language (ceylon) Piet van Oostrum <piet@vanoostrum.org> - 2013-11-18 10:31 -0400
                  Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-18 15:06 +0000
          Re: Oh look, another language (ceylon) Rick Johnson <rantingrickjohnson@gmail.com> - 2013-11-18 19:33 -0800
            Re: Oh look, another language (ceylon) Steven D'Aprano <steve@pearwood.info> - 2013-11-19 07:00 +0000
              Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 18:18 +1100
              Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-20 18:25 +1300
    Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-18 14:56 +0000

Page 1 of 2  [1] 2  Next page →


#59331 — Oh look, another language (ceylon)

FromNeal Becker <ndbecker2@gmail.com>
Date2013-11-13 14:33 -0500
SubjectOh look, another language (ceylon)
Message-ID<mailman.2549.1384371222.18130.python-list@python.org>
http://ceylon-lang.org/documentation/1.0/introduction/

[toc] | [next] | [standalone]


#59685

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2013-11-17 16:41 +1300
Message-ID<beqs6jF6ojmU1@mid.individual.net>
In reply to#59331
Neal Becker wrote:
> http://ceylon-lang.org/documentation/1.0/introduction/

The type system looks very interesting!

It's just a pity they based the syntax on C rather
than something more enlightened. (Why do people
keep doing that when they design languages?)

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#59689

FromChris Angelico <rosuav@gmail.com>
Date2013-11-17 15:10 +1100
Message-ID<mailman.2762.1384661427.18130.python-list@python.org>
In reply to#59685
On Sun, Nov 17, 2013 at 2:41 PM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Neal Becker wrote:
>>
>> http://ceylon-lang.org/documentation/1.0/introduction/
>
>
> The type system looks very interesting!
>
> It's just a pity they based the syntax on C rather
> than something more enlightened. (Why do people
> keep doing that when they design languages?)

Because in many ways it's an excellent syntactic structure, and - more
importantly - it's one that's familiar to a huge number of
programmers. That's pretty valuable.

ChrisA

[toc] | [prev] | [next] | [standalone]


#59697

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-11-17 05:48 +0000
Message-ID<528858ca$0$29975$c3e8da3$5496439d@news.astraweb.com>
In reply to#59685
On Sun, 17 Nov 2013 16:41:07 +1300, Gregory Ewing wrote:

> Neal Becker wrote:
>> http://ceylon-lang.org/documentation/1.0/introduction/
> 
> The type system looks very interesting!
> 
> It's just a pity they based the syntax on C rather than something more
> enlightened. (Why do people keep doing that when they design languages?)


When the only tool you've used is a hammer, every tool you design ends up 
looking like a hammer.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#59702

Fromjkn <jkn_gg@nicorp.f9.co.uk>
Date2013-11-17 00:34 -0800
Message-ID<90f61a64-3afe-4934-9851-9a41c95b30fd@googlegroups.com>
In reply to#59697
Hi Stephen

On Sunday, 17 November 2013 05:48:58 UTC, Steven D'Aprano  wrote:

> [...]

> 
> > It's just a pity they based the syntax on C rather than something more
> > enlightened. (Why do people keep doing that when they design languages?)
> 
> 
> When the only tool you've used is a hammer, every tool you design ends up 
> looking like a hammer.
> 
> 

true, and yet ... if [I] were to design a hammer, would you be justified in assuming that that is the only tool I know about?

    J^n

[toc] | [prev] | [next] | [standalone]


#59716

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-11-17 12:41 +0000
Message-ID<mailman.2771.1384692309.18130.python-list@python.org>
In reply to#59685
On 17/11/2013 03:41, Gregory Ewing wrote:
> Neal Becker wrote:
>> http://ceylon-lang.org/documentation/1.0/introduction/
>
> The type system looks very interesting!
>
> It's just a pity they based the syntax on C rather
> than something more enlightened. (Why do people
> keep doing that when they design languages?)
>

As a rule of thumb people don't like change?  This obviously assumes 
that language designers are people :)

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#59811

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2013-11-18 11:33 +1300
Message-ID<besuicFk82tU1@mid.individual.net>
In reply to#59716
Mark Lawrence wrote:

> As a rule of thumb people don't like change?  This obviously assumes 
> that language designers are people :)

That's probably true (on both counts).

I guess this means we need to encourage more
Pythoneers to become language designers!

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#59818

FromChris Angelico <rosuav@gmail.com>
Date2013-11-18 11:42 +1100
Message-ID<mailman.2811.1384735375.18130.python-list@python.org>
In reply to#59811
On Mon, Nov 18, 2013 at 9:33 AM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Mark Lawrence wrote:
>
>> As a rule of thumb people don't like change?  This obviously assumes that
>> language designers are people :)
>
>
> That's probably true (on both counts).
>
> I guess this means we need to encourage more
> Pythoneers to become language designers!

Easy! Just make Python really bad in every way except syntax. Then
people will be constantly thinking "If only Python were more X and
less Y... great syntax but the language sucks in so many ways!" and
they'll borrow the syntax into their new languages.

If you're setting out to create a new language, you probably want it
to be "Foo, except X" for some Foo and X. So you'll keep everything
about Foo that doesn't conflict with your changes. I would expect to
see Python-like syntax in a language that's designed to be "Python,
except compilable to C for performance"... and whaddayaknow, Cython
fits that description. Thing is, Python is just so much better than
(C, C#, JavaScript, Java) that there's hardly as much impetus to
create a new language.

ChrisA

[toc] | [prev] | [next] | [standalone]


#59917

FromTim Daneliuk <tundra@tundraware.com>
Date2013-11-17 16:48 -0600
Message-ID<528947A8.1010705@tundraware.com>
In reply to#59811
On 11/17/2013 04:33 PM, Gregory Ewing wrote:
> Mark Lawrence wrote:
>
>> As a rule of thumb people don't like change?  This obviously assumes that language designers are people :)
>
> That's probably true (on both counts).
>
> I guess this means we need to encourage more
> Pythoneers to become language designers!
>

Ahem, I already commented on this in some detail"

    https://mail.python.org/pipermail/python-list/2004-September/241055.html

-- 
----------------------------------------------------------------------------
Tim Daneliuk     tundra@tundraware.com
PGP Key:         http://www.tundraware.com/PGP/

[toc] | [prev] | [next] | [standalone]


#59921

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-11-18 23:51 +0000
Message-ID<mailman.2870.1384818713.18130.python-list@python.org>
In reply to#59917
On 17/11/2013 22:48, Tim Daneliuk wrote:
> On 11/17/2013 04:33 PM, Gregory Ewing wrote:
>> Mark Lawrence wrote:
>>
>>> As a rule of thumb people don't like change?  This obviously assumes
>>> that language designers are people :)
>>
>> That's probably true (on both counts).
>>
>> I guess this means we need to encourage more
>> Pythoneers to become language designers!
>>
>
> Ahem, I already commented on this in some detail"
>
>
> https://mail.python.org/pipermail/python-list/2004-September/241055.html
>

Fantastic, very promising indeed.  I know it needs bringing up to date, 
but to make it fly can I safely assume that we'll be seeing a PEP fairly 
shortly?

As an aside, I noticed that the previous message was "negative stride 
list slices", why do I have a strong sense of deja vu?

I refuse to mention another message that I noticed whilst browsing, on 
the grounds that I don't want to be accused of multiple manslaughter by 
way of causing heart attacks :)

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#59928

FromTim Daneliuk <tundra@tundraware.com>
Date2013-11-18 18:31 -0600
Message-ID<u3oqla-3jh2.ln1@ozzie.tundraware.com>
In reply to#59921
On 11/18/2013 05:51 PM, Mark Lawrence wrote:
> can I safely assume that we'll be seeing a PEP fairly shortly?


For Immediate Press Release:


We at TundraWare are now entering our 10th year of debate in the YAPDL
design as to what ought to be a statement and what ought to be a function.
The Statementists are currently winning 3 bouts to 2 over the
Functionists but there is much more gnashing of teeth and wringing of
hands to come. We remain true to the original vision of the language as
an unwanted appendage to Python which will promote fractionalisation and
thus improve opportunity for future billings.

We are also contemplating an offshoot language that melds the best of Java
into YAPDL.  Known as JAPDL ("Jah.piddle") it is targeted particularly
to Rastafri programmers worldwide.  The primary contribution of JAPDL
to the language arts is the replacement of the  GIL (Global Interpreter Lock)
with the much simpler, DR (Dread Lock).
----------------------------------------------------------------------------
Tim Daneliuk     tundra@tundraware.com
PGP Key:         http://www.tundraware.com/PGP/

[toc] | [prev] | [next] | [standalone]


#59816

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2013-11-17 16:18 -0800
Message-ID<1f0ffad0-f9b1-4154-b048-510d8e38846e@googlegroups.com>
In reply to#59685
On Saturday, November 16, 2013 9:41:07 PM UTC-6, Gregory Ewing wrote:
> The type system looks very interesting!

Indeed.

I went to the site assuming this would be another language
that i would never like, however, after a few minutes
reading the tour, i could not stop!

I read through the entire tour with excitement, all the while
actually yelling; "yes" and sometimes even "yes, yes, YES"

But not only is the language interesting, the web site
itself is phenomenal! This is a fine example of twenty first
century design at work.

I've always found the Python web site to be a cluttered
mess, but ceylon-lang.org is just the opposite! A clean and
simplistic web site with integrated console fiddling --
heck, they even took the time to place a button near every
example!

Some of the aspects of ceylons syntax i find interesting are:

    Instead of using single, double, and triple quotes to
    basically represent the same literals ceylon decided to
    implement each uniquely. Also, back-tick interpolation
    and Unicode embedding is much more elegant!
    
    The use of a post-fix question mark to denote a
    declared Type that can optionally be null.

    The ceylon designers ACTUALLY understand what the
    word "variable" means!

    Immutable attributes, yes, yes, YES!

    The multiplication operator can ONLY be used on
    numerics. Goodbye subtle bug!

    Explicit "return" required in methods/functions!

    No "default initialization to null"

    No omitting braces in control structures
    (Consistency is the key!!!)

    The assert statement is much more useful than
    Python's

    The "tagging" of iterable types using regexp
    inspired syntax "*" and "+" is and interesting idea

    Conditional logic is both concise and explicit using
    "exists" and "nonempty" over the implicit "if value:"

    Range objects are vastly superior to Python's lowly
    range() func.

    Comprehensions are ordered more logically than
    Python IMO, since i want to know where i'm looking
    BEFORE i find out what will be the return value


        Ceylon: [for (p in people) p.name]
        Python:[p.name for p in people]
        Ruby: people.collect{|p| p.name}

        Ceylon: for (i in 0..100) if (i%3==0) i
        Python: [i for i in range(100) if i%3==0]
        Ruby: (1..10).select{|x| x%3==0}

        Funny thing is, out of all three languages,
        Ruby's syntax is linear and therefor
        easiest to read. Ruby is the language i
        WANT to love but i can't :( due to too many
        inconsistencies. But this example shines!

> It's just a pity they based the syntax on C rather
> than something more enlightened. (Why do people
> keep doing that when they design languages?)

What do you have in mind?

Please elaborate because we could use a good intelligent
conversation, instead of rampant troll posts.

[toc] | [prev] | [next] | [standalone]


#59836

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2013-11-18 19:45 +1300
Message-ID<betrckFpdk9U1@mid.individual.net>
In reply to#59816
Rick Johnson wrote:
>     The multiplication operator can ONLY be used on
>     numerics.

I'm not convinced about that part. I notice that
subtraction, multiplication and division are bundled
into a single interface Numeric, but there is a
separate one called Summable for addition --
apparently so that they could use + for string
concatenation.

This seems to be a case of one rule for the language
designers and a different one for everyone else.
If it's okay for '+' to be used on something that's
not a number, why not '*'?

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#59837

FromChris Angelico <rosuav@gmail.com>
Date2013-11-18 17:56 +1100
Message-ID<mailman.2823.1384757801.18130.python-list@python.org>
In reply to#59836
On Mon, Nov 18, 2013 at 5:45 PM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Rick Johnson wrote:
>>
>>     The multiplication operator can ONLY be used on
>>     numerics.
>
>
> I'm not convinced about that part. I notice that
> subtraction, multiplication and division are bundled
> into a single interface Numeric, but there is a
> separate one called Summable for addition --
> apparently so that they could use + for string
> concatenation.
>
> This seems to be a case of one rule for the language
> designers and a different one for everyone else.
> If it's okay for '+' to be used on something that's
> not a number, why not '*'?

That's something Java did (using + for strings, but not supporting
operator overloading for custom classes, so you can't make your own
string-like or number-like class and use + with it), and IMO it's one
of the language's annoying flaws. Give people the power to use
whatever operator they choose in whatever way they choose, and accept
that occasionally you'll get less-than-stellar usage. It's a cost that
you pay happily when you let people name their own functions; why not
give the same freedom for operators?

ChrisA

[toc] | [prev] | [next] | [standalone]


#59843

Fromwxjmfauth@gmail.com
Date2013-11-18 01:44 -0800
Message-ID<41f332dd-1c31-4699-9176-7e8589f9c8ae@googlegroups.com>
In reply to#59837
character
Satisfied Interfaces: Comparable<Character>, Enumerable<Character>, Ordinal<Other>
A 32-bit Unicode character.
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>


string
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>
A string of characters. Each character in the string is a 32-bit Unicode
character. The internal UTF-16 encoding is hidden from clients.
A string is a Category of its Characters, and of its substrings:


Clean. Far, far away from a unicode handling which may require
18 bytes (!) more to encode a non ascii n-chars string than a
ascii n-chars string.
(With performances following expectedly "globally" the same logic)

>>> sys.getsizeof('a')
26
>>> sys.getsizeof('\U0001d11e')
44


jmf

[toc] | [prev] | [next] | [standalone]


#59845

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-11-18 09:56 +0000
Message-ID<mailman.2829.1384768579.18130.python-list@python.org>
In reply to#59843
On 18/11/2013 09:44, wxjmfauth@gmail.com wrote:
> character
> Satisfied Interfaces: Comparable<Character>, Enumerable<Character>, Ordinal<Other>
> A 32-bit Unicode character.
> Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
> Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
> List<Character>, Ranged<Integer,String>, Summable<String>
>
>
> string
> Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
> Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
> List<Character>, Ranged<Integer,String>, Summable<String>
> A string of characters. Each character in the string is a 32-bit Unicode
> character. The internal UTF-16 encoding is hidden from clients.
> A string is a Category of its Characters, and of its substrings:
>
>
> Clean. Far, far away from a unicode handling which may require
> 18 bytes (!) more to encode a non ascii n-chars string than a
> ascii n-chars string.
> (With performances following expectedly "globally" the same logic)
>
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('\U0001d11e')
> 44
>
>
> jmf
>

In [3]: sys.getsizeof(1)
Out[3]: 14

What a disaster, 13 bytes wasted storing 1.  I'll just rush off to the 
bug tracker and raise an issue to get the entire Cpython core rewritten 
before Armaggeddon strikes.

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#59847

FromChris Angelico <rosuav@gmail.com>
Date2013-11-18 21:04 +1100
Message-ID<mailman.2831.1384769090.18130.python-list@python.org>
In reply to#59843
On Mon, Nov 18, 2013 at 8:44 PM,  <wxjmfauth@gmail.com> wrote:
> string
> Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
> Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
> List<Character>, Ranged<Integer,String>, Summable<String>
> A string of characters. Each character in the string is a 32-bit Unicode
> character. The internal UTF-16 encoding is hidden from clients.
> A string is a Category of its Characters, and of its substrings:

I'm trying to figure this out. Reading the docs hasn't answered this.
If each character in a string is a 32-bit Unicode character, and (as
can be seen in the examples) string indexing and slicing are
supported, then does string indexing mean counting from the beginning
to see if there were any surrogate pairs?

ChrisA

[toc] | [prev] | [next] | [standalone]


#59862

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-11-18 13:31 +0000
Message-ID<528a16b5$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to#59847
On Mon, 18 Nov 2013 21:04:41 +1100, Chris Angelico wrote:

> On Mon, Nov 18, 2013 at 8:44 PM,  <wxjmfauth@gmail.com> wrote:
>> string
>> Satisfied Interfaces: Category, Cloneable<List<Element>>,
>> Collection<Element>, Comparable<String>,
>> Correspondence<Integer,Element>, Iterable<Element,Null>,
>> List<Character>, Ranged<Integer,String>, Summable<String> A string of
>> characters. Each character in the string is a 32-bit Unicode character.
>> The internal UTF-16 encoding is hidden from clients. A string is a
>> Category of its Characters, and of its substrings:
> 
> I'm trying to figure this out. Reading the docs hasn't answered this. If
> each character in a string is a 32-bit Unicode character, and (as can be
> seen in the examples) string indexing and slicing are supported, then
> does string indexing mean counting from the beginning to see if there
> were any surrogate pairs?

I can't figure out what that means, since it contradicts itself. First it 
says *every* character is 32-bits (presumably UTF-32), then it says that 
internally it uses UTF-16. At least one of these statements is wrong. 
(They could both be wrong, but they can't both be right.)

Unless they have done something *really* clever, the language designers 
lose a hundred million points for screwing up text strings. There is 
*absolutely no excuse* for a new, modern language with no backwards 
compatibility concerns to choose one of the three bad choices:

* choose UTF-16 or UTF-8, and have O(n) primitive string operations (like 
Haskell and, apparently, Ceylon);

* or UTF-16 without support for the supplementary planes (which makes it 
virtually UCS-2), like Javascript;

* choose UTF-32, and use two or four times as much memory as needed.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#59865

FromChris Angelico <rosuav@gmail.com>
Date2013-11-19 00:39 +1100
Message-ID<mailman.2843.1384781959.18130.python-list@python.org>
In reply to#59862
On Tue, Nov 19, 2013 at 12:31 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Unless they have done something *really* clever, the language designers
> lose a hundred million points for screwing up text strings. There is
> *absolutely no excuse* for a new, modern language with no backwards
> compatibility concerns to choose one of the three bad choices:

Yeah, but this compiles to JS, so it does have that backward compat
issue - unless it's going to represent a Ceylon string as something
other than a JS string (maybe an array of integers??), which would
probably cost even more.

You're absolutely right, except in the premise that Ceylon is a new
and unshackled language. At least this way, if anyone actually
implements Ceylon directly in the browser, it can use something
smarter as its backend, without impacting code in any way (other than
performance). I'd much rather they go for O(n) string primitives than
maintaining the user-visible UTF-16 bug.

ChrisA

[toc] | [prev] | [next] | [standalone]


#59873

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-11-18 14:30 +0000
Message-ID<528a249e$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to#59862
On Mon, 18 Nov 2013 13:31:33 +0000, Steven D'Aprano wrote:

> On Mon, 18 Nov 2013 21:04:41 +1100, Chris Angelico wrote:
> 
>> On Mon, Nov 18, 2013 at 8:44 PM,  <wxjmfauth@gmail.com> wrote:
>>> string
>>> Satisfied Interfaces: Category, Cloneable<List<Element>>,
>>> Collection<Element>, Comparable<String>,
>>> Correspondence<Integer,Element>, Iterable<Element,Null>,
>>> List<Character>, Ranged<Integer,String>, Summable<String> A string of
>>> characters. Each character in the string is a 32-bit Unicode
>>> character. The internal UTF-16 encoding is hidden from clients. A
>>> string is a Category of its Characters, and of its substrings:
>> 
>> I'm trying to figure this out. Reading the docs hasn't answered this.
>> If each character in a string is a 32-bit Unicode character, and (as
>> can be seen in the examples) string indexing and slicing are supported,
>> then does string indexing mean counting from the beginning to see if
>> there were any surrogate pairs?
> 
> I can't figure out what that means, since it contradicts itself. First
> it says *every* character is 32-bits (presumably UTF-32), then it says
> that internally it uses UTF-16. At least one of these statements is
> wrong. (They could both be wrong, but they can't both be right.)

Mystery solved: characters are only 32-bits in isolation, when plucked 
out of a string.

http://ceylon-lang.org/documentation/tour/language-module/
#characters_and_character_strings

Ceylon strings are arrays of UTF-16 characters. However, the language 
supports characters in the Supplementary Multilingual Plane by having 
primitive string operations walk the string a code point at a time. When 
you extract a character out of the string, Ceylon gives you four bytes. 
Presumably, if you do something like like this:

# Python syntax, not Ceylon
mystring = "a\U0010FFFF"
c = mystring[0]
d = mystring[1]

c will consist of bytes 0000 0061 and d will consist of the surrogate 
pair DBFF DFFF (the UTF-16BE encoding of code point U+10FFFF, modulo big-
endian versus little-ending). Or possibly the UTF-32 encoding, 0010 FFFF.

I suppose that's not terrible, except for the O(n) string operations 
which is just dumb. Yes, it's better than buggy, broken strings. But 
still dumb, because those aren't the only choices. For example, for the 
sake of an extra two bytes at the start of each string, they could store 
a flag and a length:

- one bit to flag whether the string contained any surrogate pairs or 
not; if not, string ops could assume two-bytes per char and be O(1), if 
the flag was set it could fall back to the slower technique;

- 15 bits for a length.

15 bits give you a maximum length of 32767. There are ways around that. 
E.g. a length of 0 through 32766 means exactly what it says; a length of 
32767 means that the next two bytes are part of the length too, giving 
you a maximum of 4294967295 characters per string. That's an 8GB string. 
Surely big enough for anyone :-)

That gives you O(1) length for *any* string, and O(1) indexing operations 
for those that are entirely in the BMP, which will be most strings for 
most people. It's not 1970 anymore, it's time for strings to be treated 
more seriously and not just as dumb arrays of char. Even back in the 
1970s Pascal had a length byte. It astonishes me that hardly any low-
level language follows their lead.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web