Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #59331 > unrolled thread
| Started by | Neal Becker <ndbecker2@gmail.com> |
|---|---|
| First post | 2013-11-13 14:33 -0500 |
| Last post | 2013-11-18 14:56 +0000 |
| Articles | 20 on this page of 37 — 14 participants |
Back to article view | Back to comp.lang.python
Oh look, another language (ceylon) Neal Becker <ndbecker2@gmail.com> - 2013-11-13 14:33 -0500
Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-17 16:41 +1300
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-17 15:10 +1100
Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-17 05:48 +0000
Re: Oh look, another language (ceylon) jkn <jkn_gg@nicorp.f9.co.uk> - 2013-11-17 00:34 -0800
Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-17 12:41 +0000
Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-18 11:33 +1300
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 11:42 +1100
Re: Oh look, another language (ceylon) Tim Daneliuk <tundra@tundraware.com> - 2013-11-17 16:48 -0600
Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-18 23:51 +0000
Re: Oh look, another language (ceylon) Tim Daneliuk <tundra@tundraware.com> - 2013-11-18 18:31 -0600
Re: Oh look, another language (ceylon) Rick Johnson <rantingrickjohnson@gmail.com> - 2013-11-17 16:18 -0800
Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-18 19:45 +1300
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 17:56 +1100
Re: Oh look, another language (ceylon) wxjmfauth@gmail.com - 2013-11-18 01:44 -0800
Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-18 09:56 +0000
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 21:04 +1100
Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-18 13:31 +0000
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 00:39 +1100
Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-18 14:30 +0000
Re: Oh look, another language (ceylon) Dave Angel <davea@davea.name> - 2013-11-18 15:37 -0500
Re: Oh look, another language (ceylon) Steven D'Aprano <steve@pearwood.info> - 2013-11-19 02:29 +0000
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 10:25 +1100
Re: Oh look, another language (ceylon) Steven D'Aprano <steve@pearwood.info> - 2013-11-19 02:13 +0000
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 13:54 +1100
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 13:56 +1100
Re: Oh look, another language (ceylon) wxjmfauth@gmail.com - 2013-11-19 01:10 -0800
Re: Oh look, another language (ceylon) Bob Martin <bob.martin@excite.com> - 2013-11-20 08:19 +0000
Re: Oh look, another language (ceylon) Ian Kelly <ian.g.kelly@gmail.com> - 2013-11-18 05:29 -0700
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-18 23:36 +1100
Re: Oh look, another language (ceylon) Piet van Oostrum <piet@vanoostrum.org> - 2013-11-18 10:31 -0400
Re: Oh look, another language (ceylon) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-18 15:06 +0000
Re: Oh look, another language (ceylon) Rick Johnson <rantingrickjohnson@gmail.com> - 2013-11-18 19:33 -0800
Re: Oh look, another language (ceylon) Steven D'Aprano <steve@pearwood.info> - 2013-11-19 07:00 +0000
Re: Oh look, another language (ceylon) Chris Angelico <rosuav@gmail.com> - 2013-11-19 18:18 +1100
Re: Oh look, another language (ceylon) Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-11-20 18:25 +1300
Re: Oh look, another language (ceylon) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-18 14:56 +0000
Page 1 of 2 [1] 2 Next page →
| From | Neal Becker <ndbecker2@gmail.com> |
|---|---|
| Date | 2013-11-13 14:33 -0500 |
| Subject | Oh look, another language (ceylon) |
| Message-ID | <mailman.2549.1384371222.18130.python-list@python.org> |
http://ceylon-lang.org/documentation/1.0/introduction/
[toc] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2013-11-17 16:41 +1300 |
| Message-ID | <beqs6jF6ojmU1@mid.individual.net> |
| In reply to | #59331 |
Neal Becker wrote: > http://ceylon-lang.org/documentation/1.0/introduction/ The type system looks very interesting! It's just a pity they based the syntax on C rather than something more enlightened. (Why do people keep doing that when they design languages?) -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-17 15:10 +1100 |
| Message-ID | <mailman.2762.1384661427.18130.python-list@python.org> |
| In reply to | #59685 |
On Sun, Nov 17, 2013 at 2:41 PM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > Neal Becker wrote: >> >> http://ceylon-lang.org/documentation/1.0/introduction/ > > > The type system looks very interesting! > > It's just a pity they based the syntax on C rather > than something more enlightened. (Why do people > keep doing that when they design languages?) Because in many ways it's an excellent syntactic structure, and - more importantly - it's one that's familiar to a huge number of programmers. That's pretty valuable. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-11-17 05:48 +0000 |
| Message-ID | <528858ca$0$29975$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #59685 |
On Sun, 17 Nov 2013 16:41:07 +1300, Gregory Ewing wrote: > Neal Becker wrote: >> http://ceylon-lang.org/documentation/1.0/introduction/ > > The type system looks very interesting! > > It's just a pity they based the syntax on C rather than something more > enlightened. (Why do people keep doing that when they design languages?) When the only tool you've used is a hammer, every tool you design ends up looking like a hammer. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | jkn <jkn_gg@nicorp.f9.co.uk> |
|---|---|
| Date | 2013-11-17 00:34 -0800 |
| Message-ID | <90f61a64-3afe-4934-9851-9a41c95b30fd@googlegroups.com> |
| In reply to | #59697 |
Hi Stephen
On Sunday, 17 November 2013 05:48:58 UTC, Steven D'Aprano wrote:
> [...]
>
> > It's just a pity they based the syntax on C rather than something more
> > enlightened. (Why do people keep doing that when they design languages?)
>
>
> When the only tool you've used is a hammer, every tool you design ends up
> looking like a hammer.
>
>
true, and yet ... if [I] were to design a hammer, would you be justified in assuming that that is the only tool I know about?
J^n
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-11-17 12:41 +0000 |
| Message-ID | <mailman.2771.1384692309.18130.python-list@python.org> |
| In reply to | #59685 |
On 17/11/2013 03:41, Gregory Ewing wrote: > Neal Becker wrote: >> http://ceylon-lang.org/documentation/1.0/introduction/ > > The type system looks very interesting! > > It's just a pity they based the syntax on C rather > than something more enlightened. (Why do people > keep doing that when they design languages?) > As a rule of thumb people don't like change? This obviously assumes that language designers are people :) -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2013-11-18 11:33 +1300 |
| Message-ID | <besuicFk82tU1@mid.individual.net> |
| In reply to | #59716 |
Mark Lawrence wrote: > As a rule of thumb people don't like change? This obviously assumes > that language designers are people :) That's probably true (on both counts). I guess this means we need to encourage more Pythoneers to become language designers! -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-18 11:42 +1100 |
| Message-ID | <mailman.2811.1384735375.18130.python-list@python.org> |
| In reply to | #59811 |
On Mon, Nov 18, 2013 at 9:33 AM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > Mark Lawrence wrote: > >> As a rule of thumb people don't like change? This obviously assumes that >> language designers are people :) > > > That's probably true (on both counts). > > I guess this means we need to encourage more > Pythoneers to become language designers! Easy! Just make Python really bad in every way except syntax. Then people will be constantly thinking "If only Python were more X and less Y... great syntax but the language sucks in so many ways!" and they'll borrow the syntax into their new languages. If you're setting out to create a new language, you probably want it to be "Foo, except X" for some Foo and X. So you'll keep everything about Foo that doesn't conflict with your changes. I would expect to see Python-like syntax in a language that's designed to be "Python, except compilable to C for performance"... and whaddayaknow, Cython fits that description. Thing is, Python is just so much better than (C, C#, JavaScript, Java) that there's hardly as much impetus to create a new language. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Tim Daneliuk <tundra@tundraware.com> |
|---|---|
| Date | 2013-11-17 16:48 -0600 |
| Message-ID | <528947A8.1010705@tundraware.com> |
| In reply to | #59811 |
On 11/17/2013 04:33 PM, Gregory Ewing wrote:
> Mark Lawrence wrote:
>
>> As a rule of thumb people don't like change? This obviously assumes that language designers are people :)
>
> That's probably true (on both counts).
>
> I guess this means we need to encourage more
> Pythoneers to become language designers!
>
Ahem, I already commented on this in some detail"
https://mail.python.org/pipermail/python-list/2004-September/241055.html
--
----------------------------------------------------------------------------
Tim Daneliuk tundra@tundraware.com
PGP Key: http://www.tundraware.com/PGP/
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-11-18 23:51 +0000 |
| Message-ID | <mailman.2870.1384818713.18130.python-list@python.org> |
| In reply to | #59917 |
On 17/11/2013 22:48, Tim Daneliuk wrote: > On 11/17/2013 04:33 PM, Gregory Ewing wrote: >> Mark Lawrence wrote: >> >>> As a rule of thumb people don't like change? This obviously assumes >>> that language designers are people :) >> >> That's probably true (on both counts). >> >> I guess this means we need to encourage more >> Pythoneers to become language designers! >> > > Ahem, I already commented on this in some detail" > > > https://mail.python.org/pipermail/python-list/2004-September/241055.html > Fantastic, very promising indeed. I know it needs bringing up to date, but to make it fly can I safely assume that we'll be seeing a PEP fairly shortly? As an aside, I noticed that the previous message was "negative stride list slices", why do I have a strong sense of deja vu? I refuse to mention another message that I noticed whilst browsing, on the grounds that I don't want to be accused of multiple manslaughter by way of causing heart attacks :) -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Tim Daneliuk <tundra@tundraware.com> |
|---|---|
| Date | 2013-11-18 18:31 -0600 |
| Message-ID | <u3oqla-3jh2.ln1@ozzie.tundraware.com> |
| In reply to | #59921 |
On 11/18/2013 05:51 PM, Mark Lawrence wrote:
> can I safely assume that we'll be seeing a PEP fairly shortly?
For Immediate Press Release:
We at TundraWare are now entering our 10th year of debate in the YAPDL
design as to what ought to be a statement and what ought to be a function.
The Statementists are currently winning 3 bouts to 2 over the
Functionists but there is much more gnashing of teeth and wringing of
hands to come. We remain true to the original vision of the language as
an unwanted appendage to Python which will promote fractionalisation and
thus improve opportunity for future billings.
We are also contemplating an offshoot language that melds the best of Java
into YAPDL. Known as JAPDL ("Jah.piddle") it is targeted particularly
to Rastafri programmers worldwide. The primary contribution of JAPDL
to the language arts is the replacement of the GIL (Global Interpreter Lock)
with the much simpler, DR (Dread Lock).
----------------------------------------------------------------------------
Tim Daneliuk tundra@tundraware.com
PGP Key: http://www.tundraware.com/PGP/
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2013-11-17 16:18 -0800 |
| Message-ID | <1f0ffad0-f9b1-4154-b048-510d8e38846e@googlegroups.com> |
| In reply to | #59685 |
On Saturday, November 16, 2013 9:41:07 PM UTC-6, Gregory Ewing wrote:
> The type system looks very interesting!
Indeed.
I went to the site assuming this would be another language
that i would never like, however, after a few minutes
reading the tour, i could not stop!
I read through the entire tour with excitement, all the while
actually yelling; "yes" and sometimes even "yes, yes, YES"
But not only is the language interesting, the web site
itself is phenomenal! This is a fine example of twenty first
century design at work.
I've always found the Python web site to be a cluttered
mess, but ceylon-lang.org is just the opposite! A clean and
simplistic web site with integrated console fiddling --
heck, they even took the time to place a button near every
example!
Some of the aspects of ceylons syntax i find interesting are:
Instead of using single, double, and triple quotes to
basically represent the same literals ceylon decided to
implement each uniquely. Also, back-tick interpolation
and Unicode embedding is much more elegant!
The use of a post-fix question mark to denote a
declared Type that can optionally be null.
The ceylon designers ACTUALLY understand what the
word "variable" means!
Immutable attributes, yes, yes, YES!
The multiplication operator can ONLY be used on
numerics. Goodbye subtle bug!
Explicit "return" required in methods/functions!
No "default initialization to null"
No omitting braces in control structures
(Consistency is the key!!!)
The assert statement is much more useful than
Python's
The "tagging" of iterable types using regexp
inspired syntax "*" and "+" is and interesting idea
Conditional logic is both concise and explicit using
"exists" and "nonempty" over the implicit "if value:"
Range objects are vastly superior to Python's lowly
range() func.
Comprehensions are ordered more logically than
Python IMO, since i want to know where i'm looking
BEFORE i find out what will be the return value
Ceylon: [for (p in people) p.name]
Python:[p.name for p in people]
Ruby: people.collect{|p| p.name}
Ceylon: for (i in 0..100) if (i%3==0) i
Python: [i for i in range(100) if i%3==0]
Ruby: (1..10).select{|x| x%3==0}
Funny thing is, out of all three languages,
Ruby's syntax is linear and therefor
easiest to read. Ruby is the language i
WANT to love but i can't :( due to too many
inconsistencies. But this example shines!
> It's just a pity they based the syntax on C rather
> than something more enlightened. (Why do people
> keep doing that when they design languages?)
What do you have in mind?
Please elaborate because we could use a good intelligent
conversation, instead of rampant troll posts.
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2013-11-18 19:45 +1300 |
| Message-ID | <betrckFpdk9U1@mid.individual.net> |
| In reply to | #59816 |
Rick Johnson wrote: > The multiplication operator can ONLY be used on > numerics. I'm not convinced about that part. I notice that subtraction, multiplication and division are bundled into a single interface Numeric, but there is a separate one called Summable for addition -- apparently so that they could use + for string concatenation. This seems to be a case of one rule for the language designers and a different one for everyone else. If it's okay for '+' to be used on something that's not a number, why not '*'? -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-18 17:56 +1100 |
| Message-ID | <mailman.2823.1384757801.18130.python-list@python.org> |
| In reply to | #59836 |
On Mon, Nov 18, 2013 at 5:45 PM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > Rick Johnson wrote: >> >> The multiplication operator can ONLY be used on >> numerics. > > > I'm not convinced about that part. I notice that > subtraction, multiplication and division are bundled > into a single interface Numeric, but there is a > separate one called Summable for addition -- > apparently so that they could use + for string > concatenation. > > This seems to be a case of one rule for the language > designers and a different one for everyone else. > If it's okay for '+' to be used on something that's > not a number, why not '*'? That's something Java did (using + for strings, but not supporting operator overloading for custom classes, so you can't make your own string-like or number-like class and use + with it), and IMO it's one of the language's annoying flaws. Give people the power to use whatever operator they choose in whatever way they choose, and accept that occasionally you'll get less-than-stellar usage. It's a cost that you pay happily when you let people name their own functions; why not give the same freedom for operators? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-11-18 01:44 -0800 |
| Message-ID | <41f332dd-1c31-4699-9176-7e8589f9c8ae@googlegroups.com> |
| In reply to | #59837 |
character
Satisfied Interfaces: Comparable<Character>, Enumerable<Character>, Ordinal<Other>
A 32-bit Unicode character.
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>
string
Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
List<Character>, Ranged<Integer,String>, Summable<String>
A string of characters. Each character in the string is a 32-bit Unicode
character. The internal UTF-16 encoding is hidden from clients.
A string is a Category of its Characters, and of its substrings:
Clean. Far, far away from a unicode handling which may require
18 bytes (!) more to encode a non ascii n-chars string than a
ascii n-chars string.
(With performances following expectedly "globally" the same logic)
>>> sys.getsizeof('a')
26
>>> sys.getsizeof('\U0001d11e')
44
jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-11-18 09:56 +0000 |
| Message-ID | <mailman.2829.1384768579.18130.python-list@python.org> |
| In reply to | #59843 |
On 18/11/2013 09:44, wxjmfauth@gmail.com wrote:
> character
> Satisfied Interfaces: Comparable<Character>, Enumerable<Character>, Ordinal<Other>
> A 32-bit Unicode character.
> Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
> Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
> List<Character>, Ranged<Integer,String>, Summable<String>
>
>
> string
> Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>,
> Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>,
> List<Character>, Ranged<Integer,String>, Summable<String>
> A string of characters. Each character in the string is a 32-bit Unicode
> character. The internal UTF-16 encoding is hidden from clients.
> A string is a Category of its Characters, and of its substrings:
>
>
> Clean. Far, far away from a unicode handling which may require
> 18 bytes (!) more to encode a non ascii n-chars string than a
> ascii n-chars string.
> (With performances following expectedly "globally" the same logic)
>
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('\U0001d11e')
> 44
>
>
> jmf
>
In [3]: sys.getsizeof(1)
Out[3]: 14
What a disaster, 13 bytes wasted storing 1. I'll just rush off to the
bug tracker and raise an issue to get the entire Cpython core rewritten
before Armaggeddon strikes.
--
Python is the second best programming language in the world.
But the best has yet to be invented. Christian Tismer
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-18 21:04 +1100 |
| Message-ID | <mailman.2831.1384769090.18130.python-list@python.org> |
| In reply to | #59843 |
On Mon, Nov 18, 2013 at 8:44 PM, <wxjmfauth@gmail.com> wrote: > string > Satisfied Interfaces: Category, Cloneable<List<Element>>, Collection<Element>, > Comparable<String>, Correspondence<Integer,Element>, Iterable<Element,Null>, > List<Character>, Ranged<Integer,String>, Summable<String> > A string of characters. Each character in the string is a 32-bit Unicode > character. The internal UTF-16 encoding is hidden from clients. > A string is a Category of its Characters, and of its substrings: I'm trying to figure this out. Reading the docs hasn't answered this. If each character in a string is a 32-bit Unicode character, and (as can be seen in the examples) string indexing and slicing are supported, then does string indexing mean counting from the beginning to see if there were any surrogate pairs? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-11-18 13:31 +0000 |
| Message-ID | <528a16b5$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #59847 |
On Mon, 18 Nov 2013 21:04:41 +1100, Chris Angelico wrote: > On Mon, Nov 18, 2013 at 8:44 PM, <wxjmfauth@gmail.com> wrote: >> string >> Satisfied Interfaces: Category, Cloneable<List<Element>>, >> Collection<Element>, Comparable<String>, >> Correspondence<Integer,Element>, Iterable<Element,Null>, >> List<Character>, Ranged<Integer,String>, Summable<String> A string of >> characters. Each character in the string is a 32-bit Unicode character. >> The internal UTF-16 encoding is hidden from clients. A string is a >> Category of its Characters, and of its substrings: > > I'm trying to figure this out. Reading the docs hasn't answered this. If > each character in a string is a 32-bit Unicode character, and (as can be > seen in the examples) string indexing and slicing are supported, then > does string indexing mean counting from the beginning to see if there > were any surrogate pairs? I can't figure out what that means, since it contradicts itself. First it says *every* character is 32-bits (presumably UTF-32), then it says that internally it uses UTF-16. At least one of these statements is wrong. (They could both be wrong, but they can't both be right.) Unless they have done something *really* clever, the language designers lose a hundred million points for screwing up text strings. There is *absolutely no excuse* for a new, modern language with no backwards compatibility concerns to choose one of the three bad choices: * choose UTF-16 or UTF-8, and have O(n) primitive string operations (like Haskell and, apparently, Ceylon); * or UTF-16 without support for the supplementary planes (which makes it virtually UCS-2), like Javascript; * choose UTF-32, and use two or four times as much memory as needed. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-19 00:39 +1100 |
| Message-ID | <mailman.2843.1384781959.18130.python-list@python.org> |
| In reply to | #59862 |
On Tue, Nov 19, 2013 at 12:31 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Unless they have done something *really* clever, the language designers > lose a hundred million points for screwing up text strings. There is > *absolutely no excuse* for a new, modern language with no backwards > compatibility concerns to choose one of the three bad choices: Yeah, but this compiles to JS, so it does have that backward compat issue - unless it's going to represent a Ceylon string as something other than a JS string (maybe an array of integers??), which would probably cost even more. You're absolutely right, except in the premise that Ceylon is a new and unshackled language. At least this way, if anyone actually implements Ceylon directly in the browser, it can use something smarter as its backend, without impacting code in any way (other than performance). I'd much rather they go for O(n) string primitives than maintaining the user-visible UTF-16 bug. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-11-18 14:30 +0000 |
| Message-ID | <528a249e$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #59862 |
On Mon, 18 Nov 2013 13:31:33 +0000, Steven D'Aprano wrote: > On Mon, 18 Nov 2013 21:04:41 +1100, Chris Angelico wrote: > >> On Mon, Nov 18, 2013 at 8:44 PM, <wxjmfauth@gmail.com> wrote: >>> string >>> Satisfied Interfaces: Category, Cloneable<List<Element>>, >>> Collection<Element>, Comparable<String>, >>> Correspondence<Integer,Element>, Iterable<Element,Null>, >>> List<Character>, Ranged<Integer,String>, Summable<String> A string of >>> characters. Each character in the string is a 32-bit Unicode >>> character. The internal UTF-16 encoding is hidden from clients. A >>> string is a Category of its Characters, and of its substrings: >> >> I'm trying to figure this out. Reading the docs hasn't answered this. >> If each character in a string is a 32-bit Unicode character, and (as >> can be seen in the examples) string indexing and slicing are supported, >> then does string indexing mean counting from the beginning to see if >> there were any surrogate pairs? > > I can't figure out what that means, since it contradicts itself. First > it says *every* character is 32-bits (presumably UTF-32), then it says > that internally it uses UTF-16. At least one of these statements is > wrong. (They could both be wrong, but they can't both be right.) Mystery solved: characters are only 32-bits in isolation, when plucked out of a string. http://ceylon-lang.org/documentation/tour/language-module/ #characters_and_character_strings Ceylon strings are arrays of UTF-16 characters. However, the language supports characters in the Supplementary Multilingual Plane by having primitive string operations walk the string a code point at a time. When you extract a character out of the string, Ceylon gives you four bytes. Presumably, if you do something like like this: # Python syntax, not Ceylon mystring = "a\U0010FFFF" c = mystring[0] d = mystring[1] c will consist of bytes 0000 0061 and d will consist of the surrogate pair DBFF DFFF (the UTF-16BE encoding of code point U+10FFFF, modulo big- endian versus little-ending). Or possibly the UTF-32 encoding, 0010 FFFF. I suppose that's not terrible, except for the O(n) string operations which is just dumb. Yes, it's better than buggy, broken strings. But still dumb, because those aren't the only choices. For example, for the sake of an extra two bytes at the start of each string, they could store a flag and a length: - one bit to flag whether the string contained any surrogate pairs or not; if not, string ops could assume two-bytes per char and be O(1), if the flag was set it could fall back to the slower technique; - 15 bits for a length. 15 bits give you a maximum length of 32767. There are ways around that. E.g. a length of 0 through 32766 means exactly what it says; a length of 32767 means that the next two bytes are part of the length too, giving you a maximum of 4294967295 characters per string. That's an 8GB string. Surely big enough for anyone :-) That gives you O(1) length for *any* string, and O(1) indexing operations for those that are entirely in the BMP, which will be most strings for most people. It's not 1970 anymore, it's time for strings to be treated more seriously and not just as dumb arrays of char. Even back in the 1970s Pascal had a length byte. It astonishes me that hardly any low- level language follows their lead. -- Steven
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web