Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #86311 > unrolled thread
| Started by | pierrick.brihaye@gmail.com |
|---|---|
| First post | 2015-02-24 02:49 -0800 |
| Last post | 2015-02-27 10:23 +1100 |
| Articles | 20 on this page of 158 — 19 participants |
Back to article view | Back to comp.lang.python
Newbie question about text encoding pierrick.brihaye@gmail.com - 2015-02-24 02:49 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-24 22:09 +1100
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-24 06:25 -0500
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 15:55 +0100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 02:03 +1100
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 16:06 +0100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-24 08:01 -0800
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 16:07 +0100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 02:10 +1100
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 16:24 +0100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 02:33 +1100
Re: Newbie question about text encoding random832@fastmail.us - 2015-02-24 10:38 -0500
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 17:20 +0100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 03:24 +1100
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-24 12:13 -0500
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 20:45 +0100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-02-25 00:21 +0200
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-25 12:20 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-25 06:34 -0800
Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 20:57 +0100
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-25 12:19 +1100
Re: Newbie question about text encoding Marcos Almeida Azevedo <marcos.al.azevedo@gmail.com> - 2015-02-25 12:54 +0800
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-24 15:41 -0500
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 04:40 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 05:15 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 00:24 +1100
Re: Newbie question about text encoding Sam Raker <sam.raker@gmail.com> - 2015-02-26 08:45 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 09:08 -0800
Re: Newbie question about text encoding Terry Reedy <tjreedy@udel.edu> - 2015-02-26 12:02 -0500
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 09:59 -0800
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-26 12:20 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 09:13 +1100
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 12:05 +1100
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-26 20:57 -0500
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 16:58 +1100
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 02:30 -0500
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 22:54 +1100
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 09:02 -0500
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 01:22 +1100
Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 16:00 +0000
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 03:12 +1100
Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 16:45 +0000
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 04:45 +1100
Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 22:13 +0000
Re: Newbie question about text encoding MRAB <python@mrabarnett.plus.com> - 2015-02-27 19:14 +0000
Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 22:09 +0000
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 15:52 -0500
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 08:04 +1100
Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 10:24 -0500
Re: Newbie question about text encoding Grant Edwards <invalid@invalid.invalid> - 2015-02-27 17:46 +0000
Re: Newbie question about text encoding Grant Edwards <invalid@invalid.invalid> - 2015-02-27 17:47 +0000
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-27 01:06 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 11:59 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 10:03 -0800
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-03 10:36 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 20:45 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-04 15:54 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 21:05 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-06 01:06 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-05 06:59 -0800
Re: Newbie question about text encoding random832@fastmail.us - 2015-03-05 14:59 -0500
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-06 09:33 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-05 20:53 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-06 16:20 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 01:02 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 01:06 -0800
Re: Newbie question about text encoding random832@fastmail.us - 2015-03-06 08:33 -0500
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 00:39 +1100
Re: Newbie question about text encoding random832@fastmail.us - 2015-03-06 09:03 -0500
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 01:11 +1100
Re: Newbie question about text encoding random832@fastmail.us - 2015-03-06 09:27 -0500
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-07 03:26 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-06 20:54 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 02:07 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-07 01:50 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 02:27 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 07:37 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 08:20 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 03:45 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 11:41 -0800
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 11:58 -0800
Re: Newbie question about text encoding Terry Reedy <tjreedy@udel.edu> - 2015-03-07 01:11 -0500
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 23:43 -0800
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 00:55 -0800
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 01:08 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-07 21:25 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-07 22:09 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 22:33 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 13:53 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 23:02 +1100
Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 14:07 +0000
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 07:28 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-08 02:40 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 17:48 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:17 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 18:25 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:41 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 18:54 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:58 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 04:00 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 19:14 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 04:26 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 19:50 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 04:59 +1100
Re: Newbie question about text encoding Dan Sommers <dan@tombstonezero.net> - 2015-03-07 18:02 +0000
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 05:13 +1100
Re: Newbie question about text encoding Dan Sommers <dan@tombstonezero.net> - 2015-03-07 18:34 +0000
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 05:44 +1100
Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 19:00 +0000
Re: Newbie question about text encoding Dan Sommers <dan@tombstonezero.net> - 2015-03-07 19:16 +0000
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 21:01 +0200
Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 16:40 +0000
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 18:48 +0200
Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 17:02 +0000
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 19:16 +0200
Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 18:18 +0000
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-07 21:06 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:53 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 11:03 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-08 12:45 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-08 09:20 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 18:37 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-08 10:09 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 19:23 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-08 01:18 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 05:25 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-08 22:09 +0200
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 12:43 +1100
Re: Newbie question about text encoding Ben Finney <ben+python@benfinney.id.au> - 2015-03-09 13:09 +1100
Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-09 08:31 +0200
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 13:18 +1100
Re: Newbie question about text encoding random832@fastmail.us - 2015-03-09 00:27 -0400
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 07:55 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 08:13 +1100
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 17:34 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 17:44 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-09 02:08 -0700
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-09 07:26 -0700
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-09 05:28 -0700
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-08 19:01 +1100
Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 14:13 +0000
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-07 23:23 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 05:30 +1100
Re: Newbie question about text encoding Cameron Simpson <cs@zip.com.au> - 2015-03-09 13:09 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-08 19:42 -0700
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-04 19:16 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-04 05:43 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 18:53 -0800
Re: Newbie question about text encoding Terry Reedy <tjreedy@udel.edu> - 2015-03-03 18:30 -0500
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-04 13:54 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-04 14:02 +1100
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 20:05 -0800
Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 20:16 -0800
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-04 19:14 +1100
Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-04 02:16 -0800
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 04:29 +1100
Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 10:09 +1100
Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 10:23 +1100
Page 2 of 8 — ← Prev page 1 [2] 3 4 5 6 7 8 Next page →
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-02-25 12:19 +1100 |
| Message-ID | <54ed232f$0$13004$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #86341 |
Laura Creighton wrote: > Dave Angel > are you another Native English speaker living in a world where ASCII > is enough? ASCII was never enough. Not even for Americans, who couldn't write things like "I bought a comic book for 10¢ yesterday", let alone interesting things from maths and science. I missed the whole 7-bit ASCII period, my first computer (Mac 128K) already had an extended character set beyond ASCII. But even that never covered the full range of characters I wanted to write, and then there was the horrible mess that you got whenever you copied text files from a Mac to a DOS or Windows PC or visa versa. Yes, even in 1984 we were transferring files and running into encoding issues. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Marcos Almeida Azevedo <marcos.al.azevedo@gmail.com> |
|---|---|
| Date | 2015-02-25 12:54 +0800 |
| Message-ID | <mailman.19166.1424840069.18130.python-list@python.org> |
| In reply to | #86367 |
[Multipart message — attachments visible in raw view] — view raw
On Wed, Feb 25, 2015 at 9:19 AM, Steven D'Aprano < steve+comp.lang.python@pearwood.info> wrote: > Laura Creighton wrote: > > > Dave Angel > > are you another Native English speaker living in a world where ASCII > > is enough? > > ASCII was never enough. Not even for Americans, who couldn't write things > like "I bought a comic book for 10¢ yesterday", let alone interesting > things from maths and science. > > ASCII was a necessity back then because RAM and storage are too small. > I missed the whole 7-bit ASCII period, my first computer (Mac 128K) already > had an extended character set beyond ASCII. But even that never covered the > I miss the days when I was coding with my XT computer (640kb RAM) too. Things were so simple back then. > full range of characters I wanted to write, and then there was the horrible > mess that you got whenever you copied text files from a Mac to a DOS or > Windows PC or visa versa. Yes, even in 1984 we were transferring files and > running into encoding issues. > > > > -- > Steven > > -- > https://mail.python.org/mailman/listinfo/python-list > -- Marcos | I love PHP, Linux, and Java <http://javadevnotes.com/java-integer-to-string-examples>
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-02-24 15:41 -0500 |
| Message-ID | <mailman.19148.1424810518.18130.python-list@python.org> |
| In reply to | #86311 |
On 02/24/2015 02:57 PM, Laura Creighton wrote:
> Dave Angel
> are you another Native English speaker living in a world where ASCII
> is enough?
I'm a native English speaker, and 7 bits is not nearly enough. Even if
I didn't currently care, I have some history:
No. CDC display code is enough. Who needs lowercase?
No. Baudot code is enough.
No, EBCDIC is good enough. Who cares about other companies.
No, the "golf-ball" only holds this many characters. If we need more,
we can just get the operator to switch balls in the middle of printing.
No. 2 digit years is enough. This world won't last till the millennium
anyway.
No. 2k is all the EPROM you can have. Your code HAS to fit in it, and
only 1.5k RAM.
No. 640k is more than anyone could need.
No, you cannot use a punch card made on a model 26 keypunch in the same
deck as one made on a model 29. Too bad, many of the codes are
different. (This one cost me travel back and forth between two
different locations with different model keypunches)
No. 8 bits is as much as we could ever use for characters. Who could
possibly need names or locations outside of this region? Or from
multiple places within it?
35 years ago I helped design a serial terminal that "spoke" Chinese,
using a two-byte encoding. But a single worldwide standard didn't come
until much later, and I cheered Unicode when it was finally unveiled.
I've worked with many printers that could only print 70 or 80 unique
characters. The laser printer, and even the matrix printer are
relatively recent inventions.
Getting back on topic:
According to:
http://support.esri.com/cn/knowledgebase/techarticles/detail/27345
"""ArcGIS Desktop applications, such as ArcMap, are Unicode based, so
they support Unicode to a certain level. The level of Unicode support
depends on the data format."""
That page was written about 2004, so there was concern even then.
And according to another, """In the header of each shapefile (.DBF), a
reference to a code page is included."""
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2015-02-26 04:40 -0800 |
| Message-ID | <ef520397-b1f0-47bf-8d24-585a9ba230e2@googlegroups.com> |
| In reply to | #86343 |
On Wednesday, February 25, 2015 at 2:12:09 AM UTC+5:30, Dave Angel wrote: > On 02/24/2015 02:57 PM, Laura Creighton wrote: > > Dave Angel > > are you another Native English speaker living in a world where ASCII > > is enough? > > I'm a native English speaker, and 7 bits is not nearly enough. Even if > I didn't currently care, I have some history: > > No. CDC display code is enough. Who needs lowercase? > > No. Baudot code is enough. > > No, EBCDIC is good enough. Who cares about other companies. > > No, the "golf-ball" only holds this many characters. If we need more, > we can just get the operator to switch balls in the middle of printing. > > No. 2 digit years is enough. This world won't last till the millennium > anyway. > > No. 2k is all the EPROM you can have. Your code HAS to fit in it, and > only 1.5k RAM. > > No. 640k is more than anyone could need. > > No, you cannot use a punch card made on a model 26 keypunch in the same > deck as one made on a model 29. Too bad, many of the codes are > different. (This one cost me travel back and forth between two > different locations with different model keypunches) > > No. 8 bits is as much as we could ever use for characters. Who could > possibly need names or locations outside of this region? Or from > multiple places within it? > > 35 years ago I helped design a serial terminal that "spoke" Chinese, > using a two-byte encoding. But a single worldwide standard didn't come > until much later, and I cheered Unicode when it was finally unveiled. > > I've worked with many printers that could only print 70 or 80 unique > characters. The laser printer, and even the matrix printer are > relatively recent inventions. Wrote something up on why we should stop using ASCII: http://blog.languager.org/2015/02/universal-unicode.html (Yeah the world is a bit larger than a small bunch of islands off a half-continent. But this is not that discussion!)
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2015-02-26 05:15 -0800 |
| Message-ID | <7294b798-17a4-4743-901b-e189d37033e5@googlegroups.com> |
| In reply to | #86495 |
On Thursday, February 26, 2015 at 6:10:25 PM UTC+5:30, Rustom Mody wrote: > On Wednesday, February 25, 2015 at 2:12:09 AM UTC+5:30, Dave Angel wrote: > > On 02/24/2015 02:57 PM, Laura Creighton wrote: > > > Dave Angel > > > are you another Native English speaker living in a world where ASCII > > > is enough? > > > > I'm a native English speaker, and 7 bits is not nearly enough. Even if > > I didn't currently care, I have some history: > > > > No. CDC display code is enough. Who needs lowercase? > > > > No. Baudot code is enough. > > > > No, EBCDIC is good enough. Who cares about other companies. > > > > No, the "golf-ball" only holds this many characters. If we need more, > > we can just get the operator to switch balls in the middle of printing. > > > > No. 2 digit years is enough. This world won't last till the millennium > > anyway. > > > > No. 2k is all the EPROM you can have. Your code HAS to fit in it, and > > only 1.5k RAM. > > > > No. 640k is more than anyone could need. > > > > No, you cannot use a punch card made on a model 26 keypunch in the same > > deck as one made on a model 29. Too bad, many of the codes are > > different. (This one cost me travel back and forth between two > > different locations with different model keypunches) > > > > No. 8 bits is as much as we could ever use for characters. Who could > > possibly need names or locations outside of this region? Or from > > multiple places within it? > > > > 35 years ago I helped design a serial terminal that "spoke" Chinese, > > using a two-byte encoding. But a single worldwide standard didn't come > > until much later, and I cheered Unicode when it was finally unveiled. > > > > I've worked with many printers that could only print 70 or 80 unique > > characters. The laser printer, and even the matrix printer are > > relatively recent inventions. > > Wrote something up on why we should stop using ASCII: > http://blog.languager.org/2015/02/universal-unicode.html Dave's list above of instances of 'poverty is a good idea' turning out stupid and narrow-minded in hindsight is neat. Thought I'd ack that explicitly.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-02-27 00:24 +1100 |
| Message-ID | <mailman.19255.1424957046.18130.python-list@python.org> |
| In reply to | #86495 |
On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rustompmody@gmail.com> wrote: > Wrote something up on why we should stop using ASCII: > http://blog.languager.org/2015/02/universal-unicode.html >From that post: """ 5.1 Gibberish When going from the original 2-byte unicode (around version 3?) to the one having supplemental planes, the unicode consortium added blocks such as * Egyptian hieroglyphs * Cuneiform * Shavian * Deseret * Mahjong * Klingon To me (a layman) it looks unprofessional – as though they are playing games – that billions of computing devices, each having billions of storage words should have their storage wasted on blocks such as these. """ The shift from Unicode as a 16-bit code to having multiple planes came in with Unicode 2.0, but the various blocks were assigned separately: * Egyptian hieroglyphs: Unicode 5.2 * Cuneiform: Unicode 5.0 * Shavian: Unicode 4.0 * Deseret: Unicode 3.1 * Mahjong Tiles: Unicode 5.1 * Klingon: Not part of any current standard However, I don't think historians will appreciate you calling all of these "gibberish". To adequately describe and discuss old texts without these Unicode blocks, we'd have to either do everything with images, or craft some kind of reversible transliteration system and have dedicated software to render the texts on screen. Instead, what we have is a well-known and standardized system for transliterating all of these into numbers (code points), and rendering them becomes a simple matter of installing an appropriate font. Also, how does assigning meanings to codepoints "waste storage"? As soon as Unicode 2.0 hit and 16-bit code units stopped being sufficient, everyone needed to allocate storage - either 32 bits per character, or some other system - and the fact that some codepoints were unassigned had absolutely no impact on that. This is decidedly NOT unprofessional, and it's not wasteful either. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Sam Raker <sam.raker@gmail.com> |
|---|---|
| Date | 2015-02-26 08:45 -0800 |
| Message-ID | <0c1f4147-2d5d-4fa6-afb9-2275d878e2c1@googlegroups.com> |
| In reply to | #86499 |
I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue. Also, like, computers are big. Get an external drive for your high-resolution PDF collection of Medieval manuscripts if you feel like you're running out of space. A few extra codepoints aren't going to be the straw that breaks the camel's back. On Thursday, February 26, 2015 at 8:24:34 AM UTC-5, Chris Angelico wrote: > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rustompmody@gmail.com> wrote: > > Wrote something up on why we should stop using ASCII: > > http://blog.languager.org/2015/02/universal-unicode.html > > >From that post: > > """ > 5.1 Gibberish > > When going from the original 2-byte unicode (around version 3?) to the > one having supplemental planes, the unicode consortium added blocks > such as > > * Egyptian hieroglyphs > * Cuneiform > * Shavian > * Deseret > * Mahjong > * Klingon > > To me (a layman) it looks unprofessional - as though they are playing > games - that billions of computing devices, each having billions of > storage words should have their storage wasted on blocks such as > these. > """ > > The shift from Unicode as a 16-bit code to having multiple planes came > in with Unicode 2.0, but the various blocks were assigned separately: > * Egyptian hieroglyphs: Unicode 5.2 > * Cuneiform: Unicode 5.0 > * Shavian: Unicode 4.0 > * Deseret: Unicode 3.1 > * Mahjong Tiles: Unicode 5.1 > * Klingon: Not part of any current standard > > However, I don't think historians will appreciate you calling all of > these "gibberish". To adequately describe and discuss old texts > without these Unicode blocks, we'd have to either do everything with > images, or craft some kind of reversible transliteration system and > have dedicated software to render the texts on screen. Instead, what > we have is a well-known and standardized system for transliterating > all of these into numbers (code points), and rendering them becomes a > simple matter of installing an appropriate font. > > Also, how does assigning meanings to codepoints "waste storage"? As > soon as Unicode 2.0 hit and 16-bit code units stopped being > sufficient, everyone needed to allocate storage - either 32 bits per > character, or some other system - and the fact that some codepoints > were unassigned had absolutely no impact on that. This is decidedly > NOT unprofessional, and it's not wasteful either. > > ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2015-02-26 09:08 -0800 |
| Message-ID | <be5cfdba-14d6-4387-b959-4474f60d06c5@googlegroups.com> |
| In reply to | #86514 |
On Thursday, February 26, 2015 at 10:16:11 PM UTC+5:30, Sam Raker wrote: > I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue. Agreed -- Correcting the inequities caused by ASCII-bias is a good thing. In fact the whole point of my post was to say just that by carving out and focussing on a 'universal' subset of unicode that is considerably larger than ASCII but smaller than unicode, we stand to reduce ASCII-bias. As also other posts like http://blog.languager.org/2014/04/unicoded-python.html http://blog.languager.org/2014/05/unicode-in-haskell-source.html However my example listed > > * Egyptian hieroglyphs > > * Cuneiform > > * Shavian > > * Deseret > > * Mahjong > > * Klingon Ok Chris has corrected me re. Klingon-in-unicode. So lets drop that. Of the others which do you thing is in 'underserved' category? More generally which of http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane are underserved?
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2015-02-26 12:02 -0500 |
| Message-ID | <mailman.19274.1424970167.18130.python-list@python.org> |
| In reply to | #86495 |
On 2/26/2015 8:24 AM, Chris Angelico wrote: > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rustompmody@gmail.com> wrote: >> Wrote something up on why we should stop using ASCII: >> http://blog.languager.org/2015/02/universal-unicode.html I think that the main point of the post, that many Unicode chars are truly planetary rather than just national/regional, is excellent. > From that post: > > """ > 5.1 Gibberish > > When going from the original 2-byte unicode (around version 3?) to the > one having supplemental planes, the unicode consortium added blocks > such as > > * Egyptian hieroglyphs > * Cuneiform > * Shavian > * Deseret > * Mahjong > * Klingon > > To me (a layman) it looks unprofessional – as though they are playing > games – that billions of computing devices, each having billions of > storage words should have their storage wasted on blocks such as > these. > """ > > The shift from Unicode as a 16-bit code to having multiple planes came > in with Unicode 2.0, but the various blocks were assigned separately: > * Egyptian hieroglyphs: Unicode 5.2 > * Cuneiform: Unicode 5.0 > * Shavian: Unicode 4.0 > * Deseret: Unicode 3.1 > * Mahjong Tiles: Unicode 5.1 > * Klingon: Not part of any current standard You should add emoticons, but not call them or the above 'gibberish'. I think that this part of your post is more 'unprofessional' than the character blocks. It is very jarring and seems contrary to your main point. > However, I don't think historians will appreciate you calling all of > these "gibberish". To adequately describe and discuss old texts > without these Unicode blocks, we'd have to either do everything with > images, or craft some kind of reversible transliteration system and > have dedicated software to render the texts on screen. Instead, what > we have is a well-known and standardized system for transliterating > all of these into numbers (code points), and rendering them becomes a > simple matter of installing an appropriate font. > > Also, how does assigning meanings to codepoints "waste storage"? As > soon as Unicode 2.0 hit and 16-bit code units stopped being > sufficient, everyone needed to allocate storage - either 32 bits per > character, or some other system - and the fact that some codepoints > were unassigned had absolutely no impact on that. This is decidedly > NOT unprofessional, and it's not wasteful either. I agree. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2015-02-26 09:59 -0800 |
| Message-ID | <00fbd940-52f6-44e2-bf08-b9f35c12e73f@googlegroups.com> |
| In reply to | #86519 |
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote: > On 2/26/2015 8:24 AM, Chris Angelico wrote: > > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote: > >> Wrote something up on why we should stop using ASCII: > >> http://blog.languager.org/2015/02/universal-unicode.html > > I think that the main point of the post, that many Unicode chars are > truly planetary rather than just national/regional, is excellent. > > > From that post: > > > > """ > > 5.1 Gibberish > > > > When going from the original 2-byte unicode (around version 3?) to the > > one having supplemental planes, the unicode consortium added blocks > > such as > > > > * Egyptian hieroglyphs > > * Cuneiform > > * Shavian > > * Deseret > > * Mahjong > > * Klingon > > > > To me (a layman) it looks unprofessional – as though they are playing > > games – that billions of computing devices, each having billions of > > storage words should have their storage wasted on blocks such as > > these. > > """ > > > > The shift from Unicode as a 16-bit code to having multiple planes came > > in with Unicode 2.0, but the various blocks were assigned separately: > > * Egyptian hieroglyphs: Unicode 5.2 > > * Cuneiform: Unicode 5.0 > > * Shavian: Unicode 4.0 > > * Deseret: Unicode 3.1 > > * Mahjong Tiles: Unicode 5.1 > > * Klingon: Not part of any current standard > > You should add emoticons, but not call them or the above 'gibberish'. Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno… In any case I'd like to stay clear of political(izable) questions > I think that this part of your post is more 'unprofessional' than the > character blocks. It is very jarring and seems contrary to your main point. Ok I need a word for 1. I have no need for this 2. 99.9% of the (living) on this planet also have no need for this > > > However, I don't think historians will appreciate you calling all of > > these "gibberish". To adequately describe and discuss old texts > > without these Unicode blocks, we'd have to either do everything with > > images, or craft some kind of reversible transliteration system and > > have dedicated software to render the texts on screen. Instead, what > > we have is a well-known and standardized system for transliterating > > all of these into numbers (code points), and rendering them becomes a > > simple matter of installing an appropriate font. > > > > Also, how does assigning meanings to codepoints "waste storage"? As > > soon as Unicode 2.0 hit and 16-bit code units stopped being > > sufficient, everyone needed to allocate storage - either 32 bits per > > character, or some other system - and the fact that some codepoints > > were unassigned had absolutely no impact on that. This is decidedly > > NOT unprofessional, and it's not wasteful either. > > I agree. I clearly am more enthusiastic than knowledgeable about unicode. But I know my basic CS well enough (as I am sure you and Chris also do) So I dont get how 4 bytes is not more expensive than 2. Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits You could use a clever representation like UTF-8 or FSR. But I dont see how you can get out of this that full-unicode costs more than exclusive BMP. eg consider the case of 32 vs 64 bit executables. The 64 bit executable is generally larger than the 32 bit one Now consider the case of a machine that has say 2GB RAM and a 64-bit processor. You could -- I think -- make a reasonable case that all those all-zero hi-address-words are 'waste'. And youve got the general sense best so far: > I think that the main point of the post, that many Unicode chars are > truly planetary rather than just national/regional, And if the general tone/tenor of what I have written is probably not getting across by some words (like 'gibberish'?) so I'll try and reword. However let me try and clarify that the whole of section 5 is 'iffy' with 5.1 being only more extreme. Ive not written these in because the point of that post is not to criticise unicode but to highlight the universal(isable) parts. Still if I were to expand on the criticisms here are some examples: Math-Greek: Consider the math-alpha block http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block Now imagine a beginning student not getting the difference between font, glyph, character. To me this block represents this same error cast into concrete and dignified by the (supposed) authority of the unicode consortium. There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium My real reservations about unicode come from their work in areas that I happen to know something about Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok However all this stuff http://xahlee.info/comp/unicode_music_symbols.html makes no sense (to me) given that music (ie standard western music written in staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, chordal Sanskrit/Devanagari: Consists of bogus letters that dont exist in devanagari The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf But not here http://en.wikipedia.org/wiki/Devanagari#Vowels So I call it bogus-devanagari Contrariwise an important letter in vedic pronunciation the double-udatta is missing http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html All of which adds up to the impression that the unicode consortium occasionally fails to do due diligence In any case all of the above is contrary to /irrelevant to my post which is about identifying the more universal parts of unicode
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2015-02-26 12:20 -0800 |
| Message-ID | <5b002d84-3ad8-4a4a-8852-69ed93b45ff3@googlegroups.com> |
| In reply to | #86526 |
Le jeudi 26 février 2015 18:59:24 UTC+1, Rustom Mody a écrit : > > ...To me this block represents this same error cast into concrete and > dignified by the (supposed) authority of the unicode consortium. > Unicode does not prescribe, it registrates. Eg. The inclusion of U+1E9E, 'LATIN CAPITAL LETTER SHARP S' has been officialy proposed by the "German Federal Government". (I have a pdf copy somewhere).
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-02-27 09:13 +1100 |
| Message-ID | <mailman.19293.1424988840.18130.python-list@python.org> |
| In reply to | #86526 |
On Fri, Feb 27, 2015 at 4:59 AM, Rustom Mody <rustompmody@gmail.com> wrote: > On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote: >> I think that this part of your post is more 'unprofessional' than the >> character blocks. It is very jarring and seems contrary to your main point. > > Ok I need a word for > 1. I have no need for this > 2. 99.9% of the (living) on this planet also have no need for this So what, seven million people need it? Sounds pretty useful to me. And your figure is an exaggeration; a lot more people than that use emoji/emoticons. >> > Also, how does assigning meanings to codepoints "waste storage"? As >> > soon as Unicode 2.0 hit and 16-bit code units stopped being >> > sufficient, everyone needed to allocate storage - either 32 bits per >> > character, or some other system - and the fact that some codepoints >> > were unassigned had absolutely no impact on that. This is decidedly >> > NOT unprofessional, and it's not wasteful either. >> >> I agree. > > I clearly am more enthusiastic than knowledgeable about unicode. > But I know my basic CS well enough (as I am sure you and Chris also do) > > So I dont get how 4 bytes is not more expensive than 2. > Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits > You could use a clever representation like UTF-8 or FSR. > But I dont see how you can get out of this that full-unicode costs more than > exclusive BMP. Sure, UCS-2 is cheaper than the current Unicode spec. But Unicode 2.0 was when that changed, and the change was because 65536 characters clearly wouldn't be enough - and that was due to the number of characters needed for other things than those you're complaining about. Every spec since then has not changed anything that affects storage. There are still, today, quite a lot of unallocated blocks of characters (we're really using only about two planes' worth so far, maybe three), but even if Unicode specified just two planes of 64K characters each, you wouldn't be able to save much on transmission (UTF-8 is already flexible and uses only what you need; if a future Unicode spec allows 64K planes, UTF-8 transmission will cost exactly the same for all existing characters), and on an eight-bit-byte system, the very best you'll be able to do is three bytes - which you can do today, too; you already know 21 bits will do. So since the BMP was proven insufficient (back in 1996), no subsequent changes have had any costs in storage. > Still if I were to expand on the criticisms here are some examples: > > Math-Greek: Consider the math-alpha block > http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block > > Now imagine a beginning student not getting the difference between font, glyph, > character. To me this block represents this same error cast into concrete and > dignified by the (supposed) authority of the unicode consortium. > > There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium A lot of these kinds of characters come from a need to unambiguously transliterate text stored in other encodings. I don't personally profess to understand the reasoning behind the various indistinguishable characters, but I'm aware that there are a lot of tricky questions to be decided; and if once the Consortium decides to allocate a character, that character must remain forever allocated. > My real reservations about unicode come from their work in areas that I happen to know something about > > Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok > However all this stuff http://xahlee.info/comp/unicode_music_symbols.html > makes no sense (to me) given that music (ie standard western music written in staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, chordal The placement on the page is up to the display library. You can produce a PDF that places the note symbols at their correct positions, and requires no images to render sheet music. > Sanskrit/Devanagari: > Consists of bogus letters that dont exist in devanagari > The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf > But not here http://en.wikipedia.org/wiki/Devanagari#Vowels > So I call it bogus-devanagari > > Contrariwise an important letter in vedic pronunciation the double-udatta is missing > http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html > > All of which adds up to the impression that the unicode consortium occasionally fails to do due diligence Which proves that they're not perfect. Don't forget, they can always add more characters later. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-02-27 12:05 +1100 |
| Message-ID | <54efc2c8$0$12986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #86526 |
Rustom Mody wrote:
> Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
> In any case I'd like to stay clear of political(izable) questions
Emoji is the term used in Japan, gradually spreading to the rest of the
word. Emoticons, I believe, should be restricted to the practice of using
ASCII-only digraphs and trigraphs such as :-) (colon, hyphen, right-parens)
to indicate "smileys".
I believe that emoji will eventually lead to Unicode's victory. People will
want smileys and piles of poo on their mobile phones, and from there it
will gradually spread to everywhere. All they need to do to make victory
inevitable is add cartoon genitals...
>> I think that this part of your post is more 'unprofessional' than the
>> character blocks. It is very jarring and seems contrary to your main
>> point.
>
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this
0.1% of the living is seven million people. I'll tell you what, you tell me
which seven million people should be relegated to second-class status, and
I'll tell them where you live.
:-)
[...]
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
>
> So I dont get how 4 bytes is not more expensive than 2.
Obviously it is. But it's only twice as expensive, and in computer science
terms that counts as "close enough". It's quite common for data structures
to "waste" space by using "no more than twice as much space as needed",
e.g. Python dicts and lists.
The whole Unicode range U+0000 to U+10FFFF needs only 21 bits, which fits
into three bytes. Nevertheless, there's no three-byte UTF encoding, because
on modern hardware it is more efficient to "waste" an entire extra byte per
code point and deal with an even multiple of bytes.
> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more
> than exclusive BMP.
Are you missing a word there? Costs "no more" perhaps?
> eg consider the case of 32 vs 64 bit executables.
> The 64 bit executable is generally larger than the 32 bit one
> Now consider the case of a machine that has say 2GB RAM and a 64-bit
> processor. You could -- I think -- make a reasonable case that all those
> all-zero hi-address-words are 'waste'.
Sure. The whole point of 64-bit processors is to enable the use of more than
2GB of RAM. One might as well say that using 32-bit processors is wasteful
if you only have 64K of memory. Yes it is, but the only things which use
16-bit or 8-bit processors these days are embedded devices.
[...]
> Math-Greek: Consider the math-alpha block
>
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
>
> Now imagine a beginning student not getting the difference between font,
> glyph,
> character. To me this block represents this same error cast into concrete
> and dignified by the (supposed) authority of the unicode consortium.
Not being privy to the internal deliberations of the Consortium, it is
sometimes difficult to tell why two symbols are sometimes declared to be
mere different glyphs for the same character, and other times declared to
be worthy of being separate characters.
E.g. I think we should all agree that the English "A" and the French "A"
shouldn't count as separate characters, although the Greek "Α" and
Russian "А" do.
In the case of the maths symbols, it isn't obvious to me what the deciding
factors were. I know that one of the considerations they use is to consider
whether or not users of the symbols have a tradition of treating the
symbols as mere different glyphs, i.e. stylistic variations. In this case,
I'm pretty sure that mathematicians would *not* consider:
U+2115 DOUBLE-STRUCK CAPITAL N "ℕ"
U+004E LATIN CAPITAL LETTER N "N"
as mere stylistic variations. If you defined a matrix called ℕ, you would
probably be told off for using the wrong symbol, not for using the wrong
formatting.
On the other hand, I'm not so sure about
U+210E PLANCK CONSTANT "ℎ"
versus a mere lowercase h (possibly in italic).
> There are probably dozens of other such stupidities like distinguishing
> kelvin K from latin K as if that is the business of the unicode consortium
But it *is* the business of the Unicode consortium. They have at least two
important aims:
- to be able to represent every possible human-language character;
- to allow lossless round-trip conversion to all existing legacy encodings
(for the subset of Unicode handled by that encoding).
The second reason is why Unicode includes code points for degree-Celsius and
degree-Fahrenheit, rather than just using °C and °F like sane people.
Because some idiot^W code-page designer back in the 1980s or 90s decided to
add single character ℃ and ℉. So now Unicode has to be able to round-trip
(say) "°C℃" without loss.
I imagine that the same applies to U+212A KELVIN SIGN K.
> My real reservations about unicode come from their work in areas that I
> happen to know something about
>
> Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪
> ♫ is perhaps ok However all this stuff
> http://xahlee.info/comp/unicode_music_symbols.html
> makes no sense (to me) given that music (ie standard western music written
> in staff notation) is inherently 2 dimensional -- multi-voiced,
> multi-staff, chordal
(1) Text can also be two dimensional.
(2) Where you put the symbol on the page is a separate question from whether
or not the symbol exists.
> Consists of bogus letters that dont exist in devanagari
> The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
> But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
> So I call it bogus-devanagari
Hmm, well I love Wikipedia as much as the next guy, but I think that even
Jimmy Wales would suggest that Wikipedia is not a primary source for what
counts as Devanagari vowels. What makes you think that Wikipedia is right
and Unicode is wrong?
That's not to say that Unicode hasn't made some mistakes. There are a few
deprecated code points, or code points that have been given the wrong name.
Oops. Mistakes happen.
> Contrariwise an important letter in vedic pronunciation the double-udatta
> is missing
>
http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html
I quote:
I do not see any need for a "double udaatta". Perhaps "double
ANudaatta" is meant here?
I don't know Sanskrit, but if somebody suggested that Unicode doesn't
support English because the important letter "double-oh" (as
in "moon", "spoon", "croon" etc.) was missing, I wouldn't be terribly
impressed. We have a "double-u" letter, why not "double-oh"?
Another quote:
I should strongly recommend not to hurry with a standardization
proposal until the text collection of Vedic texts has been finished
In other words, even the experts in Vedic texts don't yet know all the
characters which they may or may not need.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-02-26 20:57 -0500 |
| Message-ID | <mailman.19299.1425002275.18130.python-list@python.org> |
| In reply to | #86557 |
On 02/26/2015 08:05 PM, Steven D'Aprano wrote: > Rustom Mody wrote: > > >> eg consider the case of 32 vs 64 bit executables. >> The 64 bit executable is generally larger than the 32 bit one >> Now consider the case of a machine that has say 2GB RAM and a 64-bit >> processor. You could -- I think -- make a reasonable case that all those >> all-zero hi-address-words are 'waste'. > > Sure. The whole point of 64-bit processors is to enable the use of more than > 2GB of RAM. One might as well say that using 32-bit processors is wasteful > if you only have 64K of memory. Yes it is, but the only things which use > 16-bit or 8-bit processors these days are embedded devices. But the 2gig means electrical address lines out of the CPU are wasted, not address space. A 64 bit processor and 64bit OS means you can have more than 4gig in a process space, even if over half of it has to be in the swap file. Linear versus physical makes a big difference. (Although I believe Seymour Cray was quoted as saying that virtual memory is a crock, because "you can't fake what you ain't got.") -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-02-27 16:58 +1100 |
| Message-ID | <54f00787$0$12979$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #86559 |
Dave Angel wrote: > (Although I believe Seymour Cray was quoted as saying that virtual > memory is a crock, because "you can't fake what you ain't got.") If I recall correctly, disk access is about 10000 times slower than RAM, so virtual memory is *at least* that much slower than real memory. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-02-27 02:30 -0500 |
| Message-ID | <mailman.19302.1425022259.18130.python-list@python.org> |
| In reply to | #86562 |
On 02/27/2015 12:58 AM, Steven D'Aprano wrote: > Dave Angel wrote: > >> (Although I believe Seymour Cray was quoted as saying that virtual >> memory is a crock, because "you can't fake what you ain't got.") > > If I recall correctly, disk access is about 10000 times slower than RAM, so > virtual memory is *at least* that much slower than real memory. > It's so much more complicated than that, that I hardly know where to start. I'll describe a generic processor/OS/memory/disk architecture; there will be huge differences between processor models even from a single manufacturer. First, as soon as you add swapping logic to your processor/memory-system, you theoretically slow it down. And in the days of that quote, Cray's memory was maybe 50 times as fast as the memory used by us mortals. So adding swapping logic would have slowed it down quite substantially, even when it was not swapping. But that logic is inside the CPU chip these days, and presumably thoroughly optimized. Next, statistically, a program uses a small subset of its total program & data space in its working set, and the working set should reside in real memory. But when the program greatly increases that working set, and it approaches the amount of physical memory, then swapping becomes more frenzied, and we say the program is thrashing. Simple example, try sorting an array that's about the size of available physical memory. Next, even physical memory is divided into a few levels of caching, some on-chip and some off. And the caching is done in what I call strips, where accessing just one byte causes the whole strip to be loaded from non-cached memory. I forget the current size for that, but it's maybe 64 to 256 bytes or so. If there are multiple processors (not multicore, but actual separate processors), then each one has such internal caches, and any writes on one processor may have to trigger flushes of all the other processors that happen to have the same strip loaded. The processor not only prefetches the next few instructions, but decodes and tentatively executes them, subject to being discarded if a conditional branch doesn't go the way the processor predicted. So some instructions execute in zero time, some of the time. Every address of instruction fetch, or of data fetch or store, goes through a couple of layers of translation. Segment register plus offset gives linear address. Lookup those in tables to get physical address, and if table happens not to be in on-chip cache, swap it in. If physical address isn't valid, a processor exception causes the OS to potentially swap something out, and something else in. Once we're paging from the swapfile, the size of the read is perhaps 4k. And that read is regardless of whether we're only going to use one byte or all of it. The ratio between an access which was in the L1 cache and one which required a page to be swapped in from disk? Much bigger than your 10,000 figure. But hopefully it doesn't happen a big percentage of the time. Many, many other variables, like the fact that RAM chips are not directly addressable by bytes, but instead count on rows and columns. So if you access many bytes in the same row, it can be much quicker than random access. So simple access time specifications don't mean as much as it would seem; the controller has to balance the RAM spec with the various cache requirements. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-02-27 22:54 +1100 |
| Message-ID | <54f05aff$0$12980$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #86564 |
Dave Angel wrote: > On 02/27/2015 12:58 AM, Steven D'Aprano wrote: >> Dave Angel wrote: >> >>> (Although I believe Seymour Cray was quoted as saying that virtual >>> memory is a crock, because "you can't fake what you ain't got.") >> >> If I recall correctly, disk access is about 10000 times slower than RAM, >> so virtual memory is *at least* that much slower than real memory. >> > > It's so much more complicated than that, that I hardly know where to > start. [snip technical details] As interesting as they were, none of those details will make swap faster, hence my comment that virtual memory is *at least* 10000 times slower than RAM. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-02-27 09:02 -0500 |
| Message-ID | <mailman.19306.1425045769.18130.python-list@python.org> |
| In reply to | #86571 |
On 02/27/2015 06:54 AM, Steven D'Aprano wrote: > Dave Angel wrote: > >> On 02/27/2015 12:58 AM, Steven D'Aprano wrote: >>> Dave Angel wrote: >>> >>>> (Although I believe Seymour Cray was quoted as saying that virtual >>>> memory is a crock, because "you can't fake what you ain't got.") >>> >>> If I recall correctly, disk access is about 10000 times slower than RAM, >>> so virtual memory is *at least* that much slower than real memory. >>> >> >> It's so much more complicated than that, that I hardly know where to >> start. > > [snip technical details] > > As interesting as they were, none of those details will make swap faster, > hence my comment that virtual memory is *at least* 10000 times slower than > RAM. > The term "virtual memory" is used for many aspects of the modern memory architecture. But I presume you're using it in the sense of "running in a swapfile" as opposed to running in physical RAM. Yes, a page fault takes on the order of 10,000 times as long as an access to a location in L1 cache. I suspect it's a lot smaller though if the swapfile is on an SSD drive. The first byte is that slow. But once the fault is resolved, the nearby bytes are in physical memory, and some of them are in L3, L2, and L1. So you're not running in the swapfile any more. And even when you run off the end of the page, fetching the sequentially adjacent page from a hard disk is much faster. And if the disk has well designed buffering, faster yet. The OS tries pretty hard to keep the swapfile unfragmented. The trick is to minimize the number of page faults, especially to random locations. If you're getting lots of them, it's called thrashing. There are tools to help with that. To minimize page faults on code, linking with a good working-set-tuner can help, though I don't hear of people bothering these days. To minimize page faults on data, choosing one's algorithm carefully can help. For example, in scanning through a typical matrix, row order might be adjacent locations, while column order might be scattered. Not really much different than reading a text file. If you can arrange to process it a line at a time, rather than reading the whole file into memory, you generally minimize your round-trips to disk. And if you need to randomly access it, it's quite likely more efficient to memory map it, in which case it temporarily becomes part of the swapfile system. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-02-28 01:22 +1100 |
| Message-ID | <mailman.19307.1425046945.18130.python-list@python.org> |
| In reply to | #86571 |
On Sat, Feb 28, 2015 at 1:02 AM, Dave Angel <davea@davea.name> wrote:
> The term "virtual memory" is used for many aspects of the modern memory
> architecture. But I presume you're using it in the sense of "running in a
> swapfile" as opposed to running in physical RAM.
Given that this started with a quote about "you can't fake what you
ain't got", I would say that, yes, this refers to using hard disk to
provide more RAM.
If you're trying to use the pagefile/swapfile as if it's more memory
("I have 256MB of memory, but 10GB of swap space, so that's 10GB of
memory!"), then yes, these performance considerations are huge. But
suppose you need to run a program that's larger than your available
RAM. On MS-DOS, sometimes you'd need to work with program overlays (a
concept borrowed from older systems, but ones that I never worked on,
so I'm going back no further than DOS here). You get a *massive*
complexity hit the instant you start using them, whether your program
would have been able to fit into memory on some systems or not. Just
making it possible to have only part of your code in memory places
demands on your code that you, the programmer, have to think about.
With virtual memory, though, you just write your code as if it's all
in memory, and some of it may, at some times, be on disk. Less code to
debug = less time spent debugging. The performance question is largely
immaterial (you'll be using the disk either way), but the savings on
complexity are tremendous. And then when you do find yourself running
on a system with enough RAM? No code changes needed, and full
performance. That's where virtual memory shines.
It's funny how the world changes, though. Back in the 90s, virtual
memory was the key. No home computer ever had enough RAM. Today? A
home-grade PC could easily have 16GB... and chances are you don't need
all of that. So we go for the opposite optimization: disk caching.
Apart from when I rebuild my "Audio-Only Frozen" project [1] and the
caches get completely blasted through, heaps and heaps of my work can
be done inside the disk cache. Hey, Sikorsky, got any files anywhere
on the hard disk matching *Pastel*.iso case insensitively? *chug chug
chug* Nope. Okay. Sikorsky, got any files matching *Pas5*.iso case
insensitively? *zip* Yeah, here it is. I didn't tell the first search
to hold all that file system data in memory; the hard drive controller
managed it all for me, and I got the performance benefit. Same as the
above: the main benefit is that this sort of thing requires zero
application code complexity. It's all done in a perfectly generic way
at a lower level.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | alister <alister.nospam.ware@ntlworld.com> |
|---|---|
| Date | 2015-02-27 16:00 +0000 |
| Message-ID | <mcq4a9$29g$1@speranza.aioe.org> |
| In reply to | #86573 |
On Sat, 28 Feb 2015 01:22:15 +1100, Chris Angelico wrote:
>
> If you're trying to use the pagefile/swapfile as if it's more memory ("I
> have 256MB of memory, but 10GB of swap space, so that's 10GB of
> memory!"), then yes, these performance considerations are huge. But
> suppose you need to run a program that's larger than your available RAM.
> On MS-DOS, sometimes you'd need to work with program overlays (a concept
> borrowed from older systems, but ones that I never worked on, so I'm
> going back no further than DOS here). You get a *massive* complexity hit
> the instant you start using them, whether your program would have been
> able to fit into memory on some systems or not. Just making it possible
> to have only part of your code in memory places demands on your code
> that you, the programmer, have to think about. With virtual memory,
> though, you just write your code as if it's all in memory, and some of
> it may, at some times, be on disk. Less code to debug = less time spent
> debugging. The performance question is largely immaterial (you'll be
> using the disk either way), but the savings on complexity are
> tremendous. And then when you do find yourself running on a system with
> enough RAM? No code changes needed, and full performance. That's where
> virtual memory shines.
> ChrisA
I think there is a case for bringing back the overlay file, or at least
loading larger programs in sections
only loading the routines as they are required could speed up the start
time of many large applications.
examples libre office, I rarely need the mail merge function, the word
count and may other features that could be added into the running
application on demand rather than all at once.
obviously with large memory & virtual mem there is no need to un-install
them once loaded.
--
Ralph's Observation:
It is a mistake to let any mechanical object realise that you
are in a hurry.
[toc] | [prev] | [next] | [standalone]
Page 2 of 8 — ← Prev page 1 [2] 3 4 5 6 7 8 Next page →
Back to top | Article view | comp.lang.python
csiph-web