Groups > comp.lang.python > #86311 > unrolled thread

Newbie question about text encoding

Started by	pierrick.brihaye@gmail.com
First post	2015-02-24 02:49 -0800
Last post	2015-02-27 10:23 +1100
Articles	20 on this page of 158 — 19 participants

Back to article view | Back to comp.lang.python

  Newbie question about text encoding pierrick.brihaye@gmail.com - 2015-02-24 02:49 -0800
    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-24 22:09 +1100
    Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-24 06:25 -0500
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 15:55 +0100
    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 02:03 +1100
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 16:06 +0100
      Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-24 08:01 -0800
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 16:07 +0100
    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 02:10 +1100
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 16:24 +0100
    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 02:33 +1100
    Re: Newbie question about text encoding random832@fastmail.us - 2015-02-24 10:38 -0500
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 17:20 +0100
    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-25 03:24 +1100
    Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-24 12:13 -0500
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 20:45 +0100
      Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-02-25 00:21 +0200
      Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-25 12:20 +1100
        Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-25 06:34 -0800
    Re: Newbie question about text encoding Laura Creighton <lac@openend.se> - 2015-02-24 20:57 +0100
      Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-25 12:19 +1100
        Re: Newbie question about text encoding Marcos Almeida Azevedo <marcos.al.azevedo@gmail.com> - 2015-02-25 12:54 +0800
    Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-24 15:41 -0500
      Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 04:40 -0800
        Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 05:15 -0800
        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 00:24 +1100
          Re: Newbie question about text encoding Sam Raker <sam.raker@gmail.com> - 2015-02-26 08:45 -0800
            Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 09:08 -0800
        Re: Newbie question about text encoding Terry Reedy <tjreedy@udel.edu> - 2015-02-26 12:02 -0500
          Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 09:59 -0800
            Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-26 12:20 -0800
            Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 09:13 +1100
            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 12:05 +1100
              Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-26 20:57 -0500
                Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 16:58 +1100
                  Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 02:30 -0500
                    Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 22:54 +1100
                      Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 09:02 -0500
                      Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 01:22 +1100
                        Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 16:00 +0000
                          Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 03:12 +1100
                            Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 16:45 +0000
                              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 04:45 +1100
                                Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 22:13 +0000
                              Re: Newbie question about text encoding MRAB <python@mrabarnett.plus.com> - 2015-02-27 19:14 +0000
                                Re: Newbie question about text encoding alister <alister.nospam.ware@ntlworld.com> - 2015-02-27 22:09 +0000
                          Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 15:52 -0500
                          Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-28 08:04 +1100
                      Re: Newbie question about text encoding Dave Angel <davea@davea.name> - 2015-02-27 10:24 -0500
                      Re: Newbie question about text encoding Grant Edwards <invalid@invalid.invalid> - 2015-02-27 17:46 +0000
                        Re: Newbie question about text encoding Grant Edwards <invalid@invalid.invalid> - 2015-02-27 17:47 +0000
              Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-02-27 01:06 -0800
          Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-02-26 11:59 -0800
          Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 10:03 -0800
            Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-03 10:36 -0800
              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 20:45 -0800
                Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-04 15:54 +1100
                  Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 21:05 -0800
                    Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-06 01:06 +1100
                      Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-05 06:59 -0800
                      Re: Newbie question about text encoding random832@fastmail.us - 2015-03-05 14:59 -0500
                        Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-06 09:33 +1100
                      Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-05 20:53 -0800
                        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-06 16:20 +1100
                          Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 01:02 -0800
                            Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 01:06 -0800
                              Re: Newbie question about text encoding random832@fastmail.us - 2015-03-06 08:33 -0500
                              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 00:39 +1100
                              Re: Newbie question about text encoding random832@fastmail.us - 2015-03-06 09:03 -0500
                              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 01:11 +1100
                              Re: Newbie question about text encoding random832@fastmail.us - 2015-03-06 09:27 -0500
                                Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-07 03:26 +1100
                            Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-06 20:54 +1100
                              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 02:07 -0800
                            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-07 01:50 +1100
                              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 02:27 +1100
                              Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 07:37 -0800
                              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-06 08:20 -0800
                                Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 03:45 +1100
                                Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 11:41 -0800
                                  Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 11:58 -0800
                                Re: Newbie question about text encoding Terry Reedy <tjreedy@udel.edu> - 2015-03-07 01:11 -0500
                                  Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-06 23:43 -0800
                                  Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 00:55 -0800
                                    Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 01:08 -0800
                                  Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-07 21:25 -0800
                        Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-07 22:09 +1100
                          Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 22:33 +1100
                          Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 13:53 +0200
                            Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-07 23:02 +1100
                            Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 14:07 +0000
                            Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 07:28 -0800
                            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-08 02:40 +1100
                              Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 17:48 +0200
                                Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:17 +1100
                                  Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 18:25 +0200
                                    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:41 +1100
                                      Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 18:54 +0200
                                        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:58 +1100
                                        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 04:00 +1100
                                          Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 19:14 +0200
                                            Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 04:26 +1100
                                              Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 19:50 +0200
                                                Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 04:59 +1100
                                                  Re: Newbie question about text encoding Dan Sommers <dan@tombstonezero.net> - 2015-03-07 18:02 +0000
                                                    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 05:13 +1100
                                                      Re: Newbie question about text encoding Dan Sommers <dan@tombstonezero.net> - 2015-03-07 18:34 +0000
                                                        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 05:44 +1100
                                                        Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 19:00 +0000
                                                          Re: Newbie question about text encoding Dan Sommers <dan@tombstonezero.net> - 2015-03-07 19:16 +0000
                                                        Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 21:01 +0200
                                    Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 16:40 +0000
                                      Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 18:48 +0200
                                        Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 17:02 +0000
                                          Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-07 19:16 +0200
                                            Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 18:18 +0000
                                              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-07 21:06 -0800
                                    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 03:53 +1100
                                  Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-07 11:03 -0800
                                Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-08 12:45 +1100
                                  Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-08 09:20 +0200
                                    Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 18:37 +1100
                                      Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-08 10:09 +0200
                                        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-08 19:23 +1100
                                          Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-08 01:18 -0800
                                        Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 05:25 +1100
                                          Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-08 22:09 +0200
                                            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 12:43 +1100
                                              Re: Newbie question about text encoding Ben Finney <ben+python@benfinney.id.au> - 2015-03-09 13:09 +1100
                                                Re: Newbie question about text encoding Marko Rauhamaa <marko@pacujo.net> - 2015-03-09 08:31 +0200
                                              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 13:18 +1100
                                              Re: Newbie question about text encoding random832@fastmail.us - 2015-03-09 00:27 -0400
                                          Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 07:55 +1100
                                          Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 08:13 +1100
                                            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 17:34 +1100
                                              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-09 17:44 +1100
                                                Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-09 02:08 -0700
                                                  Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-09 07:26 -0700
                                              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-09 05:28 -0700
                                  Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-08 19:01 +1100
                          Re: Newbie question about text encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-07 14:13 +0000
                          Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-07 23:23 -0800
                            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-09 05:30 +1100
                          Re: Newbie question about text encoding Cameron Simpson <cs@zip.com.au> - 2015-03-09 13:09 +1100
                            Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-08 19:42 -0700
                  Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-04 19:16 +1100
            Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-04 05:43 +1100
              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 18:53 -0800
            Re: Newbie question about text encoding Terry Reedy <tjreedy@udel.edu> - 2015-03-03 18:30 -0500
            Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-04 13:54 +1100
              Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-03-04 14:02 +1100
              Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 20:05 -0800
                Re: Newbie question about text encoding Rustom Mody <rustompmody@gmail.com> - 2015-03-03 20:16 -0800
                Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-04 19:14 +1100
                  Re: Newbie question about text encoding wxjmfauth@gmail.com - 2015-03-04 02:16 -0800
        Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 04:29 +1100
          Re: Newbie question about text encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-27 10:09 +1100
            Re: Newbie question about text encoding Chris Angelico <rosuav@gmail.com> - 2015-02-27 10:23 +1100

Page 3 of 8 — ← Prev page 1 2 [3] 4 5 6 7 8 Next page →

#86576

From	Chris Angelico <rosuav@gmail.com>
Date	2015-02-28 03:12 +1100
Message-ID	<mailman.19309.1425053544.18130.python-list@python.org>
In reply to	#86575

On Sat, Feb 28, 2015 at 3:00 AM, alister
<alister.nospam.ware@ntlworld.com> wrote:
> I think there is a case for bringing back the overlay file, or at least
> loading larger programs in sections
> only loading the routines as they are required could speed up the start
> time of many large applications.
> examples libre office, I rarely need the mail merge function, the word
> count and may other features that could be added into the running
> application on demand rather than all at once.

Downside of that is twofold: firstly the complexity that I already
mentioned, and secondly you pay the startup cost on first usage. So
you might get into the program a bit faster, but as soon as you go to
any feature you didn't already hit this session, the program pauses
for a bit and loads it. Sometimes startup cost is the best time to do
this sort of thing.

Of course, there is an easy way to implement exactly what you're
asking for: use separate programs for everything, instead of expecting
a megantic office suite[1] to do everything for you. Just get yourself
a nice simple text editor, then invoke other programs - maybe from a
terminal, or maybe from within the editor - to do the rest of the
work. A simple disk cache will mean that previously-used programs
start up quickly.

ChrisA

[1] It's slightly less bloated than the gigantic office suite sold by
a top-end software company.

[toc] | [prev] | [next] | [standalone]

#86577

From	alister <alister.nospam.ware@ntlworld.com>
Date	2015-02-27 16:45 +0000
Message-ID	<mcq6ve$cqm$1@speranza.aioe.org>
In reply to	#86576

On Sat, 28 Feb 2015 03:12:16 +1100, Chris Angelico wrote:

> On Sat, Feb 28, 2015 at 3:00 AM, alister
> <alister.nospam.ware@ntlworld.com> wrote:
>> I think there is a case for bringing back the overlay file, or at least
>> loading larger programs in sections only loading the routines as they
>> are required could speed up the start time of many large applications.
>> examples libre office, I rarely need the mail merge function, the word
>> count and may other features that could be added into the running
>> application on demand rather than all at once.
> 
> Downside of that is twofold: firstly the complexity that I already
> mentioned, and secondly you pay the startup cost on first usage. So you
> might get into the program a bit faster, but as soon as you go to any
> feature you didn't already hit this session, the program pauses for a
> bit and loads it. Sometimes startup cost is the best time to do this
> sort of thing.
> 
If the modules are small enough this may not be noticeable but yes I do 
accept there may be delays on first usage.

As to the complexity it has been my observation that as the memory 
footprint available to programmers has increase they have become less & 
less skilled at writing code.

of course my time as a professional programmer was over 20 years ago on 8 
bit micro controllers with 8k of ROM (eventually, original I only had 2k 
to play with) & 128 Bytes (yes bytes!) of RAM so I am very out of date.

I now play with python because it is so much less demanding of me which 
probably makes me just a guilty :-)

> Of course, there is an easy way to implement exactly what you're asking
> for: use separate programs for everything, instead of expecting a
> megantic office suite[1] to do everything for you. Just get yourself a
> nice simple text editor, then invoke other programs - maybe from a
> terminal, or maybe from within the editor - to do the rest of the work.
> A simple disk cache will mean that previously-used programs start up
> quickly.
Libre office was sighted as just one example
Video editing suites are another that could be used as an example 
(perhaps more so, does the rendering engine need to be loaded until you 
start generating the output? a small delay here would be insignificant)
> 
> ChrisA
> 
> [1] It's slightly less bloated than the gigantic office suite sold by a
> top-end software company.





-- 
You don't sew with a fork, so I see no reason to eat with knitting 
needles.
		-- Miss Piggy, on eating Chinese Food

[toc] | [prev] | [next] | [standalone]

#86578

From	Chris Angelico <rosuav@gmail.com>
Date	2015-02-28 04:45 +1100
Message-ID	<mailman.19310.1425059114.18130.python-list@python.org>
In reply to	#86577

On Sat, Feb 28, 2015 at 3:45 AM, alister
<alister.nospam.ware@ntlworld.com> wrote:
> On Sat, 28 Feb 2015 03:12:16 +1100, Chris Angelico wrote:
>
>> On Sat, Feb 28, 2015 at 3:00 AM, alister
>> <alister.nospam.ware@ntlworld.com> wrote:
>>> I think there is a case for bringing back the overlay file, or at least
>>> loading larger programs in sections only loading the routines as they
>>> are required could speed up the start time of many large applications.
>>> examples libre office, I rarely need the mail merge function, the word
>>> count and may other features that could be added into the running
>>> application on demand rather than all at once.
>>
>> Downside of that is twofold: firstly the complexity that I already
>> mentioned, and secondly you pay the startup cost on first usage. So you
>> might get into the program a bit faster, but as soon as you go to any
>> feature you didn't already hit this session, the program pauses for a
>> bit and loads it. Sometimes startup cost is the best time to do this
>> sort of thing.
>>
> If the modules are small enough this may not be noticeable but yes I do
> accept there may be delays on first usage.
>
> As to the complexity it has been my observation that as the memory
> footprint available to programmers has increase they have become less &
> less skilled at writing code.

Perhaps, but on the other hand, the skill of squeezing code into less
memory is being replaced by other skills. We can write code that takes
the simple/dumb approach, let it use an entire megabyte of memory, and
not care about the cost... and we can write that in an hour, instead
of spending a week fiddling with it. Reducing the development cycle
time means we can add all sorts of cool features to a program, all
while the original end user is still excited about it. (Of course, a
comparison between today's World Wide Web and that of the 1990s
suggests that these cool features aren't necessarily beneficial, but
still, we have the option of foregoing austerity.)

> Video editing suites are another that could be used as an example
> (perhaps more so, does the rendering engine need to be loaded until you
> start generating the output? a small delay here would be insignificant)

Hmm, I'm not sure that's actually a big deal, because your *data* will
dwarf the code. I can fire up sox and avconv, both fairly large
programs, and their code will all sit comfortably in memory; but then
they get to work on my data, and suddenly my hard disk is chewing
through 91GB of content. Breaking up avconv into a dozen pieces
wouldn't make a dent in 91GB.

ChrisA

[toc] | [prev] | [next] | [standalone]

#86600

From	alister <alister.nospam.ware@ntlworld.com>
Date	2015-02-27 22:13 +0000
Message-ID	<mcqq5k$hml$2@speranza.aioe.org>
In reply to	#86578

On Sat, 28 Feb 2015 04:45:04 +1100, Chris Angelico wrote:
> Perhaps, but on the other hand, the skill of squeezing code into less
> memory is being replaced by other skills. We can write code that takes
> the simple/dumb approach, let it use an entire megabyte of memory, and
> not care about the cost... and we can write that in an hour, instead of
> spending a week fiddling with it. Reducing the development cycle time
> means we can add all sorts of cool features to a program, all while the
> original end user is still excited about it. (Of course, a comparison
> between today's World Wide Web and that of the 1990s suggests that these
> cool features aren't necessarily beneficial, but still, we have the
> option of foregoing austerity.)
> 
> 
> ChrisA

again I am fluid on this 'Clever' programming is often counter productive 
& unmaintainable, but the again lazy programming can also be just as bad 
fro this, there is no "one size fits all" solution but the modern 
environment does make lazy programming very easy.



-- 
After all, all he did was string together a lot of old, well-known 
quotations.
		-- H.L. Mencken, on Shakespeare

[toc] | [prev] | [next] | [standalone]

#86584

From	MRAB <python@mrabarnett.plus.com>
Date	2015-02-27 19:14 +0000
Message-ID	<mailman.19311.1425064451.18130.python-list@python.org>
In reply to	#86577

On 2015-02-27 16:45, alister wrote:
> On Sat, 28 Feb 2015 03:12:16 +1100, Chris Angelico wrote:
>
>> On Sat, Feb 28, 2015 at 3:00 AM, alister
>> <alister.nospam.ware@ntlworld.com> wrote:
>>> I think there is a case for bringing back the overlay file, or at least
>>> loading larger programs in sections only loading the routines as they
>>> are required could speed up the start time of many large applications.
>>> examples libre office, I rarely need the mail merge function, the word
>>> count and may other features that could be added into the running
>>> application on demand rather than all at once.
>>
>> Downside of that is twofold: firstly the complexity that I already
>> mentioned, and secondly you pay the startup cost on first usage. So you
>> might get into the program a bit faster, but as soon as you go to any
>> feature you didn't already hit this session, the program pauses for a
>> bit and loads it. Sometimes startup cost is the best time to do this
>> sort of thing.
>>
> If the modules are small enough this may not be noticeable but yes I do
> accept there may be delays on first usage.
>
I suppose you could load the basic parts first so that the user can
start working, and then load the additional features in the background.

> As to the complexity it has been my observation that as the memory
> footprint available to programmers has increase they have become less &
> less skilled at writing code.
>
> of course my time as a professional programmer was over 20 years ago on 8
> bit micro controllers with 8k of ROM (eventually, original I only had 2k
> to play with) & 128 Bytes (yes bytes!) of RAM so I am very out of date.
>
> I now play with python because it is so much less demanding of me which
> probably makes me just a guilty :-)
>
>> Of course, there is an easy way to implement exactly what you're asking
>> for: use separate programs for everything, instead of expecting a
>> megantic office suite[1] to do everything for you. Just get yourself a
>> nice simple text editor, then invoke other programs - maybe from a
>> terminal, or maybe from within the editor - to do the rest of the work.
>> A simple disk cache will mean that previously-used programs start up
>> quickly.
> Libre office was sighted as just one example
> Video editing suites are another that could be used as an example
> (perhaps more so, does the rendering engine need to be loaded until you
> start generating the output? a small delay here would be insignificant)
>>
>> ChrisA
>>
>> [1] It's slightly less bloated than the gigantic office suite sold by a
>> top-end software company.
>

[toc] | [prev] | [next] | [standalone]

#86599

From	alister <alister.nospam.ware@ntlworld.com>
Date	2015-02-27 22:09 +0000
Message-ID	<mcqpv3$hml$1@speranza.aioe.org>
In reply to	#86584

On Fri, 27 Feb 2015 19:14:00 +0000, MRAB wrote:

>>
> I suppose you could load the basic parts first so that the user can
> start working, and then load the additional features in the background.
> 
quite possible
my opinion on this is very fluid
it may work for some applications, it probably wouldn't for others.

with python it is generally considered good practice to import all 
modules at the start of a program but there are valid cases fro only 
importing a module if actually needed.

-- 
Some people have parts that are so private they themselves have no
knowledge of them.

[toc] | [prev] | [next] | [standalone]

#86590

From	Dave Angel <davea@davea.name>
Date	2015-02-27 15:52 -0500
Message-ID	<mailman.19316.1425070352.18130.python-list@python.org>
In reply to	#86575

On 02/27/2015 11:00 AM, alister wrote:
> On Sat, 28 Feb 2015 01:22:15 +1100, Chris Angelico wrote:
>
>>
>> If you're trying to use the pagefile/swapfile as if it's more memory ("I
>> have 256MB of memory, but 10GB of swap space, so that's 10GB of
>> memory!"), then yes, these performance considerations are huge. But
>> suppose you need to run a program that's larger than your available RAM.
>> On MS-DOS, sometimes you'd need to work with program overlays (a concept
>> borrowed from older systems, but ones that I never worked on, so I'm
>> going back no further than DOS here). You get a *massive* complexity hit
>> the instant you start using them, whether your program would have been
>> able to fit into memory on some systems or not. Just making it possible
>> to have only part of your code in memory places demands on your code
>> that you, the programmer, have to think about. With virtual memory,
>> though, you just write your code as if it's all in memory, and some of
>> it may, at some times, be on disk. Less code to debug = less time spent
>> debugging. The performance question is largely immaterial (you'll be
>> using the disk either way), but the savings on complexity are
>> tremendous. And then when you do find yourself running on a system with
>> enough RAM? No code changes needed, and full performance. That's where
>> virtual memory shines.
>> ChrisA
>
> I think there is a case for bringing back the overlay file, or at least
> loading larger programs in sections
> only loading the routines as they are required could speed up the start
> time of many large applications.
> examples libre office, I rarely need the mail merge function, the word
> count and may other features that could be added into the running
> application on demand rather than all at once.
>
> obviously with large memory & virtual mem there is no need to un-install
> them once loaded.
>

I can't say how Linux handles it (I'd like to know, but haven't needed 
to yet), but in Windows (NT, XP, etc), a DLL is not "loaded", but rather 
mapped.  And it's not copied into the swapfile, it's mapped directly 
from the DLL.  The mapping mode is "copy-on-write" which means that 
read=only portions are swapped directly from the DLL, on first usage, 
while read-write portions (eg. static/global variables, relocation 
modifications) are copied on first use to the swap file.  I presume 
EXE's are done the same way, but never had a need to know.

If that's the case on the architectures you're talking about, then the 
problem of slow loading is not triggered by the memory usage, but by 
lots of initialization code.  THAT's what should be deferred for 
seldom-used portions of code.

The main point of a working-set-tuner is to group sections of code 
together that are likely to be used together.  To take an extreme case, 
all the fatal exception handlers should be positioned adjacent to each 
other in linear memory, as it's unlikely that any of them will be 
needed, and the code takes up no time or space in physical memory.

Also (in Windows), a DLL can be pre-relocated, so that it has a 
preferred address to be loaded into memory.  If that memory is available 
when it gets loaded (actually mapped), then no relocation needs to 
happen, which saves time and swap space.

In the X86 architecture, most code is self-relocating, everything is 
relative.  But references to other DLL's and jump tables were absolute, 
so they needed to be relocated at load time, when final locations were 
nailed down.

Perhaps the authors of bloated applications have forgotten how to do 
these, as the defaults in the linker puts all DLL's in the same 
location, meaning all but the first will need relocating.  But system 
DLL's  are (were) each given unique addresses.

On one large project, I added the build step of assigning these base 
addresses.  Each DLL had to start on a 64k boundary, and I reserved some 
fractional extra space between them in case one would grow.  Then every 
few months, we double-checked that they didn't overlap, and if necessary 
adjusted the start addresses.  We didn't just automatically assign 
closest addresses, because frequently some of the DLL's would be updated 
independently of the others.
-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86591

From	Chris Angelico <rosuav@gmail.com>
Date	2015-02-28 08:04 +1100
Message-ID	<mailman.19317.1425071093.18130.python-list@python.org>
In reply to	#86575

On Sat, Feb 28, 2015 at 7:52 AM, Dave Angel <davea@davea.name> wrote:
> If that's the case on the architectures you're talking about, then the
> problem of slow loading is not triggered by the memory usage, but by lots of
> initialization code.  THAT's what should be deferred for seldom-used
> portions of code.

s/should/can/

It's still not a clear case of "should", as it's all a big pile of
trade-offs. A few weeks ago I made a very deliberate change to a
process to force some code to get loaded and initialized earlier, to
prevent an unexpected (and thus surprising) slowdown on first use. (It
was, in fact, a Python 'import' statement, so all I had to do was add
a dummy import in the main module - with, of course, a comment making
it clear that this was necessary, even though the name wasn't used.)

But yes, seldom-used code can definitely have its initialization
deferred if you need to speed up startup.

ChrisA

[toc] | [prev] | [next] | [standalone]

#86574

From	Dave Angel <davea@davea.name>
Date	2015-02-27 10:24 -0500
Message-ID	<mailman.19308.1425050702.18130.python-list@python.org>
In reply to	#86571

On 02/27/2015 09:22 AM, Chris Angelico wrote:
> On Sat, Feb 28, 2015 at 1:02 AM, Dave Angel <davea@davea.name> wrote:
>> The term "virtual memory" is used for many aspects of the modern memory
>> architecture.  But I presume you're using it in the sense of "running in a
>> swapfile" as opposed to running in physical RAM.
>
> Given that this started with a quote about "you can't fake what you
> ain't got", I would say that, yes, this refers to using hard disk to
> provide more RAM.
>
> If you're trying to use the pagefile/swapfile as if it's more memory
> ("I have 256MB of memory, but 10GB of swap space, so that's 10GB of
> memory!"), then yes, these performance considerations are huge. But
> suppose you need to run a program that's larger than your available
> RAM. On MS-DOS, sometimes you'd need to work with program overlays (a
> concept borrowed from older systems, but ones that I never worked on,
> so I'm going back no further than DOS here). You get a *massive*
> complexity hit the instant you start using them, whether your program
> would have been able to fit into memory on some systems or not. Just
> making it possible to have only part of your code in memory places
> demands on your code that you, the programmer, have to think about.
> With virtual memory, though, you just write your code as if it's all
> in memory, and some of it may, at some times, be on disk. Less code to
> debug = less time spent debugging. The performance question is largely
> immaterial (you'll be using the disk either way), but the savings on
> complexity are tremendous. And then when you do find yourself running
> on a system with enough RAM? No code changes needed, and full
> performance. That's where virtual memory shines.
>
> It's funny how the world changes, though. Back in the 90s, virtual
> memory was the key. No home computer ever had enough RAM. Today? A
> home-grade PC could easily have 16GB... and chances are you don't need
> all of that. So we go for the opposite optimization: disk caching.
> Apart from when I rebuild my "Audio-Only Frozen" project [1] and the
> caches get completely blasted through, heaps and heaps of my work can
> be done inside the disk cache. Hey, Sikorsky, got any files anywhere
> on the hard disk matching *Pastel*.iso case insensitively? *chug chug
> chug* Nope. Okay. Sikorsky, got any files matching *Pas5*.iso case
> insensitively? *zip* Yeah, here it is. I didn't tell the first search
> to hold all that file system data in memory; the hard drive controller
> managed it all for me, and I got the performance benefit. Same as the
> above: the main benefit is that this sort of thing requires zero
> application code complexity. It's all done in a perfectly generic way
> at a lower level.

In 1973, I did manual swapping to an external 8k ramdisk.  It was a box 
that sat on the floor and contained 8k of core memory (not 
semiconductor).  The memory was non-volatile, so it contained the 
working copy of my code.  Then I built a small swapper that would bring 
in the set of routines currently needed.  My onboard RAM (semiconductor) 
was 1.5k, which had to hold the swapper, the code, and the data.  I was 
writing a GPS system for shipboard use, and the final version of the 
code had to fit entirely in EPROM, 2k of it.  But debugging EPROM code 
is a pain, since every small change took half an hour to make new chips.

Later, I built my first PC with 512k of RAM, and usually used much of it 
as a ramdisk, since programs didn't use nearly that amount.


-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#86579

From	Grant Edwards <invalid@invalid.invalid>
Date	2015-02-27 17:46 +0000
Message-ID	<mcqagp$sjl$2@reader1.panix.com>
In reply to	#86571

On 2015-02-27, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> Dave Angel wrote:
>
>> On 02/27/2015 12:58 AM, Steven D'Aprano wrote:
>>> Dave Angel wrote:
>>>
>>>> (Although I believe Seymour Cray was quoted as saying that virtual
>>>> memory is a crock, because "you can't fake what you ain't got.")
>>>
>>> If I recall correctly, disk access is about 10000 times slower than RAM,
>>> so virtual memory is *at least* that much slower than real memory.
>>>
>> 
>> It's so much more complicated than that, that I hardly know where to
>> start.
>
> [snip technical details]
>
> As interesting as they were, none of those details will make swap faster,
> hence my comment that virtual memory is *at least* 10000 times slower than
> RAM.

Nonsense.  On all of my machines, virtual memory _is_ RAM almost all
of the time.  I don't do the type of things that force the usage of
swap.

-- 
Grant Edwards               grant.b.edwards        Yow! ... I want FORTY-TWO
                                  at               TRYNEL FLOATATION SYSTEMS
                              gmail.com            installed within SIX AND A
                                                   HALF HOURS!!!

[toc] | [prev] | [next] | [standalone]

#86580

From	Grant Edwards <invalid@invalid.invalid>
Date	2015-02-27 17:47 +0000
Message-ID	<mcqaiu$sjl$3@reader1.panix.com>
In reply to	#86579

On 2015-02-27, Grant Edwards <invalid@invalid.invalid> wrote:
> On 2015-02-27, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: Dave Angel wrote:
>>> On 02/27/2015 12:58 AM, Steven D'Aprano wrote: Dave Angel wrote:
>>>>
>>>>> (Although I believe Seymour Cray was quoted as saying that virtual
>>>>> memory is a crock, because "you can't fake what you ain't got.")
>>>>
>>>> If I recall correctly, disk access is about 10000 times slower than RAM,
>>>> so virtual memory is *at least* that much slower than real memory.
>>>>
>>> It's so much more complicated than that, that I hardly know where to
>>> start.
>>
>> [snip technical details]
>>
>> As interesting as they were, none of those details will make swap faster,
>> hence my comment that virtual memory is *at least* 10000 times slower than
>> RAM.
>
> Nonsense.  On all of my machines, virtual memory _is_ RAM almost all
> of the time.  I don't do the type of things that force the usage of
> swap.

And on some of the embedded systems I work on, _all_ virtual memory is
RAM 100.000% of the time.

-- 
Grant Edwards               grant.b.edwards        Yow! Don't SANFORIZE me!!
                                  at               
                              gmail.com

[toc] | [prev] | [next] | [standalone]

#86567

From	wxjmfauth@gmail.com
Date	2015-02-27 01:06 -0800
Message-ID	<bd5c0806-03b9-4b00-a7f6-d7bf5f61c9f8@googlegroups.com>
In reply to	#86557

Le vendredi 27 février 2015 02:05:38 UTC+1, Steven D'Aprano a écrit :
> 
> E.g. I think we should all agree that the English "A" and the French "A"
> shouldn't count as separate characters, although the Greek "Α" and
> Russian "А" do.
> 


Yes. Simple and logical explaination.
Unicode does not handle languages per se, it encodes scripts
for languages.

[toc] | [prev] | [next] | [standalone]

#86537

From	Rustom Mody <rustompmody@gmail.com>
Date	2015-02-26 11:59 -0800
Message-ID	<26b84544-f24f-4298-b631-bb0c2f35e8d1@googlegroups.com>
In reply to	#86519

On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> You should add emoticons, but not call them or the above 'gibberish'.

Done -- and of course not under gibberish.
I dont really know much how emoji are used but I understand they are. 
JFTR I consider it necessary to be respectful to all (living) people.
For that matter even dead people(s) - no need to be disrespectful to the egyptians who created the hieroglyphs or the sumerians who wrote cuneiform.

I only find it crosses a line when the 2 millenia dead creations are made to take 
the space of the living.

Chris wrote:
> * Klingon: Not part of any current standard 

Thanks Removed.

[toc] | [prev] | [next] | [standalone]

#86856

From	Rustom Mody <rustompmody@gmail.com>
Date	2015-03-03 10:03 -0800
Message-ID	<9169f3b1-2ac7-42a3-8033-584f84b88a1f@googlegroups.com>
In reply to	#86519

On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
> > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote:
> >> Wrote something up on why we should stop using ASCII:
> >> http://blog.languager.org/2015/02/universal-unicode.html
> 
> I think that the main point of the post, that many Unicode chars are 
> truly planetary rather than just national/regional, is excellent.

<snipped>

> You should add emoticons, but not call them or the above 'gibberish'.
> I think that this part of your post is more 'unprofessional' than the 
> character blocks.  It is very jarring and seems contrary to your main point.

Ok Done

References to gibberish removed from
http://blog.languager.org/2015/02/universal-unicode.html 

What I was trying to say expanded here
http://blog.languager.org/2015/03/whimsical-unicode.html
[Hope  the word 'whimsical' is less jarring and more accurate than 'gibberish']

[toc] | [prev] | [next] | [standalone]

#86857

From	wxjmfauth@gmail.com
Date	2015-03-03 10:36 -0800
Message-ID	<7a75a23c-4678-4d7a-a2ec-9e8fff4c07f8@googlegroups.com>
In reply to	#86856

Le mardi 3 mars 2015 19:04:06 UTC+1, Rustom Mody a écrit :
> On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> > On 2/26/2015 8:24 AM, Chris Angelico wrote:
> > > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote:
> > >> Wrote something up on why we should stop using ASCII:
> > >> http://blog.languager.org/2015/02/universal-unicode.html
> > 
> > I think that the main point of the post, that many Unicode chars are 
> > truly planetary rather than just national/regional, is excellent.
> 
> <snipped>
> 
> > You should add emoticons, but not call them or the above 'gibberish'.
> > I think that this part of your post is more 'unprofessional' than the 
> > character blocks.  It is very jarring and seems contrary to your main point.
> 
> Ok Done
> 
> References to gibberish removed from
> http://blog.languager.org/2015/02/universal-unicode.html 
> 
> What I was trying to say expanded here
> http://blog.languager.org/2015/03/whimsical-unicode.html
> [Hope  the word 'whimsical' is less jarring and more accurate than 'gibberish']

========

Emoji and Dingbats are now part of Unicode.
They should be considered as well as a "1" or a "a"
or a "mathematical alpha".
So, there is nothing special to say about them.

jmf

[toc] | [prev] | [next] | [standalone]

#86885

From	Rustom Mody <rustompmody@gmail.com>
Date	2015-03-03 20:45 -0800
Message-ID	<132d5ce6-f672-4eec-99f9-1cc9e88b94f3@googlegroups.com>
In reply to	#86857

On Wednesday, March 4, 2015 at 12:07:06 AM UTC+5:30, jmf wrote:
> Le mardi 3 mars 2015 19:04:06 UTC+1, Rustom Mody a écrit :
> > On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> > > On 2/26/2015 8:24 AM, Chris Angelico wrote:
> > > > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote:
> > > >> Wrote something up on why we should stop using ASCII:
> > > >> http://blog.languager.org/2015/02/universal-unicode.html
> > > 
> > > I think that the main point of the post, that many Unicode chars are 
> > > truly planetary rather than just national/regional, is excellent.
> > 
> > <snipped>
> > 
> > > You should add emoticons, but not call them or the above 'gibberish'.
> > > I think that this part of your post is more 'unprofessional' than the 
> > > character blocks.  It is very jarring and seems contrary to your main point.
> > 
> > Ok Done
> > 
> > References to gibberish removed from
> > http://blog.languager.org/2015/02/universal-unicode.html 
> > 
> > What I was trying to say expanded here
> > http://blog.languager.org/2015/03/whimsical-unicode.html
> > [Hope  the word 'whimsical' is less jarring and more accurate than 'gibberish']
> 
> ========
> 
> Emoji and Dingbats are now part of Unicode.
> They should be considered as well as a "1" or a "a"
> or a "mathematical alpha".
> So, there is nothing special to say about them.
> 
> jmf

Maybe you missed this section:
http://blog.languager.org/2015/03/whimsical-unicode.html#half-assed

It lists some examples of software that somehow break/goof going from BMP-only 
unicode to 7.0 unicode.

IOW the suggestion is that the the two-way classification
- ASCII
- Unicode

is less useful and accurate than the 3-way

- ASCII
- BMP
- Unicode

Personally I would be pleased if 𝛌 were used for the math-lambda and
λ left alone for Greek-speaking users' identifiers.
However one should draw a line between personal preferences and a univeral(izable) standard.
As of now, λ works in blogger whereas 𝛌 breaks blogger -- gets replaced by �.
Similar breakages are current in Java, Javascript, Emacs, Mysql, Idle and Windows, various fonts etc etc. [Only one of these is remotely connected with python]

So BMP is practical, 7.0 is idealistic. You are free too pick 😏😉

[toc] | [prev] | [next] | [standalone]

#86886

From	Chris Angelico <rosuav@gmail.com>
Date	2015-03-04 15:54 +1100
Message-ID	<mailman.33.1425444900.21433.python-list@python.org>
In reply to	#86885

On Wed, Mar 4, 2015 at 3:45 PM, Rustom Mody <rustompmody@gmail.com> wrote:
>
> It lists some examples of software that somehow break/goof going from BMP-only
> unicode to 7.0 unicode.
>
> IOW the suggestion is that the the two-way classification
> - ASCII
> - Unicode
>
> is less useful and accurate than the 3-way
>
> - ASCII
> - BMP
> - Unicode

How is that more useful? Aside from storage optimizations (in which
the significant breaks would be Latin-1, UCS-2, and UCS-4), the BMP is
not significantly different from the rest of Unicode.

Also, the expansion from 16-bit was back in Unicode 2.0, not 7.0. Why
do you keep talking about 7.0 as if it's a recent change?

ChrisA

[toc] | [prev] | [next] | [standalone]

#86887

From	Rustom Mody <rustompmody@gmail.com>
Date	2015-03-03 21:05 -0800
Message-ID	<619e4cb5-1c4c-449b-a5d7-951101b32b45@googlegroups.com>
In reply to	#86886

On Wednesday, March 4, 2015 at 10:25:24 AM UTC+5:30, Chris Angelico wrote:
> On Wed, Mar 4, 2015 at 3:45 PM, Rustom Mody  wrote:
> >
> > It lists some examples of software that somehow break/goof going from BMP-only
> > unicode to 7.0 unicode.
> >
> > IOW the suggestion is that the the two-way classification
> > - ASCII
> > - Unicode
> >
> > is less useful and accurate than the 3-way
> >
> > - ASCII
> > - BMP
> > - Unicode
> 
> How is that more useful? Aside from storage optimizations (in which
> the significant breaks would be Latin-1, UCS-2, and UCS-4), the BMP is
> not significantly different from the rest of Unicode.

Sorry... Dont understand.
> 
> Also, the expansion from 16-bit was back in Unicode 2.0, not 7.0. Why
> do you keep talking about 7.0 as if it's a recent change?

It is 2015 as of now. 7.0 is the current standard.

The need for the adjective 'current' should be pondered upon.

In practice, standards change.
However if a standard changes so frequently that that users have to play catching cook
and keep asking: "Which version?" they are justified in asking "Are the standard-makers
doing due diligence?"

[toc] | [prev] | [next] | [standalone]

#86942

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-03-06 01:06 +1100
Message-ID	<54f862ca$0$13014$c3e8da3$5496439d@news.astraweb.com>
In reply to	#86887

Rustom Mody wrote:

> On Wednesday, March 4, 2015 at 10:25:24 AM UTC+5:30, Chris Angelico wrote:
>> On Wed, Mar 4, 2015 at 3:45 PM, Rustom Mody  wrote:
>> >
>> > It lists some examples of software that somehow break/goof going from
>> > BMP-only unicode to 7.0 unicode.
>> >
>> > IOW the suggestion is that the the two-way classification
>> > - ASCII
>> > - Unicode
>> >
>> > is less useful and accurate than the 3-way
>> >
>> > - ASCII
>> > - BMP
>> > - Unicode
>> 
>> How is that more useful? Aside from storage optimizations (in which
>> the significant breaks would be Latin-1, UCS-2, and UCS-4), the BMP is
>> not significantly different from the rest of Unicode.
> 
> Sorry... Dont understand.

Chris is suggesting that going from BMP to all of Unicode is not the hard
part. Going from ASCII to the BMP part of Unicode is the hard part. If you
can do that, you can go the rest of the way easily.

I mostly agree with Chris. Supporting *just* the BMP is non-trivial in UTF-8
and UTF-32, since that goes against the grain of the system. You would have
to program in artificial restrictions that otherwise don't exist.

UTF-16 is different, and that's probably why you think supporting all of
Unicode is hard. With UTF-16, there really is an obvious distinction
between the BMP and the SMP: that's where you jump from a single 2-byte
unit to a pair of 2-byte units. But that distinction doesn't exist in UTF-8
or UTF-32: 

- In UTF-8, about 99.8% of the BMP requires multiple bytes. Whether you
  support the SMP or not doesn't change the fact that you have to deal
  with multi-byte characters.

- In UTF-32, everything is fixed-width whether it is in the BMP or not.

In both cases, supporting the SMPs is no harder than supporting the BMP.
It's only UTF-16 that makes the SMP seem hard.

Conclusion: faulty implementations of UTF-16 which incorrectly handle
surrogate pairs should be replaced by non-faulty implementations, or
changed to UTF-8 or UTF-32; incomplete Unicode implementations which assume
that Unicode is 16-bit only (e.g. UCS-2) are obsolete and should be
upgraded.

Wrong conclusion: SMPs are unnecessary and unneeded, and we need a new
standard that is just like obsolete Unicode version 1.

Unicode version 1 is obsolete for a reason. 16 bits is not enough for even
existing languages, let alone all the code points and characters that are
used in human communication.

>> Also, the expansion from 16-bit was back in Unicode 2.0, not 7.0. Why
>> do you keep talking about 7.0 as if it's a recent change?
> 
> It is 2015 as of now. 7.0 is the current standard.
> 
> The need for the adjective 'current' should be pondered upon.

What's your point?

The UTF encodings have not changed since they were first introduced. They
have been stable for at least twenty years: UTF-8 has existed since 1993,
and UTF-16 since 1996.

Since version 2.0 of Unicode in 1996, the standard has made "stability
guarantees" that no code points will be renamed or removed. Consequently,
there has only been one version which removed characters, version 1.1.
Since then, new versions of the standard have only added characters, never
moved, renamed or deleted them.

http://unicode.org/policies/stability_policy.html

Some highlights in Unicode history:

Unicode 1.0 (1991): initial version, defined 7161 code points.

In January 1993, Rob Pike and Ken Thompson announced the design and working
implementation of the UTF-8 encoding.

1.1 (1993): defined 34233 characters, finalised Han Unification. Removed
some characters from the 1.0 set. This is the first and only time any code
points have been removed.

2.0 (1996): First version to include code points in the Supplementary
Multilingual Planes. Defined 38950 code points. Introduced the UTF-16
encoding.

3.1 (2001): Defined 94205 code points, including 42711 additional Han
ideographs, bringing the total number of CJK code points alone to 71793,
too many to fit in 16 bits.

2006: The People's Republic Of China mandates support for the GB-18030
character set for all software products sold in the PRC. GB-18030 supports
the entire Unicode range, include the SMPs. Since this date, all software
sold in China must support the SMPs.

6.0 (2010): The first emoji or emoticons were added to Unicode.

7.0 (2014): 113021 code points defined in total.

> In practice, standards change.
> However if a standard changes so frequently that that users have to play
> catching cook and keep asking: "Which version?" they are justified in
> asking "Are the standard-makers doing due diligence?"

Since Unicode has stability guarantees, and the encodings have not changed
in twenty years and will not change in the future, this argument is bogus.
Updating to a new version of the standard means, to a first approximation,
merely allocating some new code points which had previously been undefined
but are now defined.

(Code points can be flagged deprecated, but they will never be removed.)

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#86944

From	wxjmfauth@gmail.com
Date	2015-03-05 06:59 -0800
Message-ID	<807007d0-5cf3-43ac-a5f9-d4a268e2b98f@googlegroups.com>
In reply to	#86942

Le jeudi 5 mars 2015 15:06:32 UTC+1, Steven D'Aprano a écrit :
> Rustom Mody wrote:
> 
> > On Wednesday, March 4, 2015 at 10:25:24 AM UTC+5:30, Chris Angelico wrote:
> >> On Wed, Mar 4, 2015 at 3:45 PM, Rustom Mody  wrote:
> >> >
> >> > It lists some examples of software that somehow break/goof going from
> >> > BMP-only unicode to 7.0 unicode.
> >> >
> >> > IOW the suggestion is that the the two-way classification
> >> > - ASCII
> >> > - Unicode
> >> >
> >> > is less useful and accurate than the 3-way
> >> >
> >> > - ASCII
> >> > - BMP
> >> > - Unicode
> >> 
> >> How is that more useful? Aside from storage optimizations (in which
> >> the significant breaks would be Latin-1, UCS-2, and UCS-4), the BMP is
> >> not significantly different from the rest of Unicode.
> > 
> > Sorry... Dont understand.
> 
> Chris is suggesting that going from BMP to all of Unicode is not the hard
> part. Going from ASCII to the BMP part of Unicode is the hard part. If you
> can do that, you can go the rest of the way easily.
> 
> I mostly agree with Chris. Supporting *just* the BMP is non-trivial in UTF-8
> and UTF-32, since that goes against the grain of the system. You would have
> to program in artificial restrictions that otherwise don't exist.
> 
> UTF-16 is different, and that's probably why you think supporting all of
> Unicode is hard. With UTF-16, there really is an obvious distinction
> between the BMP and the SMP: that's where you jump from a single 2-byte
> unit to a pair of 2-byte units. But that distinction doesn't exist in UTF-8
> or UTF-32: 
> 
> - In UTF-8, about 99.8% of the BMP requires multiple bytes. Whether you
>   support the SMP or not doesn't change the fact that you have to deal
>   with multi-byte characters.
> 
> - In UTF-32, everything is fixed-width whether it is in the BMP or not.
> 
> In both cases, supporting the SMPs is no harder than supporting the BMP.
> It's only UTF-16 that makes the SMP seem hard.
> 
> Conclusion: faulty implementations of UTF-16 which incorrectly handle
> surrogate pairs should be replaced by non-faulty implementations, or
> changed to UTF-8 or UTF-32; incomplete Unicode implementations which assume
> that Unicode is 16-bit only (e.g. UCS-2) are obsolete and should be
> upgraded.
> 
> Wrong conclusion: SMPs are unnecessary and unneeded, and we need a new
> standard that is just like obsolete Unicode version 1.
> 
> Unicode version 1 is obsolete for a reason. 16 bits is not enough for even
> existing languages, let alone all the code points and characters that are
> used in human communication.
> 
> 
> >> Also, the expansion from 16-bit was back in Unicode 2.0, not 7.0. Why
> >> do you keep talking about 7.0 as if it's a recent change?
> > 
> > It is 2015 as of now. 7.0 is the current standard.
> > 
> > The need for the adjective 'current' should be pondered upon.
> 
> What's your point?
> 
> The UTF encodings have not changed since they were first introduced. They
> have been stable for at least twenty years: UTF-8 has existed since 1993,
> and UTF-16 since 1996.
> 
> Since version 2.0 of Unicode in 1996, the standard has made "stability
> guarantees" that no code points will be renamed or removed. Consequently,
> there has only been one version which removed characters, version 1.1.
> Since then, new versions of the standard have only added characters, never
> moved, renamed or deleted them.
> 
> http://unicode.org/policies/stability_policy.html
> 
> Some highlights in Unicode history:
> 
> Unicode 1.0 (1991): initial version, defined 7161 code points.
> 
> In January 1993, Rob Pike and Ken Thompson announced the design and working
> implementation of the UTF-8 encoding.
> 
> 1.1 (1993): defined 34233 characters, finalised Han Unification. Removed
> some characters from the 1.0 set. This is the first and only time any code
> points have been removed.
> 
> 2.0 (1996): First version to include code points in the Supplementary
> Multilingual Planes. Defined 38950 code points. Introduced the UTF-16
> encoding.
> 
> 3.1 (2001): Defined 94205 code points, including 42711 additional Han
> ideographs, bringing the total number of CJK code points alone to 71793,
> too many to fit in 16 bits.
> 
> 2006: The People's Republic Of China mandates support for the GB-18030
> character set for all software products sold in the PRC. GB-18030 supports
> the entire Unicode range, include the SMPs. Since this date, all software
> sold in China must support the SMPs.
> 
> 6.0 (2010): The first emoji or emoticons were added to Unicode.
> 
> 7.0 (2014): 113021 code points defined in total.
> 
> 
> > In practice, standards change.
> > However if a standard changes so frequently that that users have to play
> > catching cook and keep asking: "Which version?" they are justified in
> > asking "Are the standard-makers doing due diligence?"
> 
> Since Unicode has stability guarantees, and the encodings have not changed
> in twenty years and will not change in the future, this argument is bogus.
> Updating to a new version of the standard means, to a first approximation,
> merely allocating some new code points which had previously been undefined
> but are now defined.
> 
> (Code points can be flagged deprecated, but they will never be removed.)
> 
> 
> 
> -- 
> Steven

===============
===============

(2012): Some idiots tried to reinvent "Unicode" and
they failed.

jmf

[toc] | [prev] | [next] | [standalone]

Page 3 of 8 — ← Prev page 1 2 [3] 4 5 6 7 8 Next page →

csiph-web

Newbie question about text encoding

Contents

#86576

#86577

#86578

#86600

#86584

#86599

#86590

#86591

#86574

#86579

#86580

#86567

#86537

#86856

#86857

#86885

#86886

#86887

#86942

#86944