Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #51209

Re: RE Module Performance

Newsgroups comp.lang.python
Date 2013-07-25 02:27 -0700
References (2 earlier) <51DFDE65.5040001@Gmail.com> <CAN1F8qUFP3uX57HhiiUPaYqO3h_HiT8Q_YD=vCYky3EAWsdE7Q@mail.gmail.com> <mailman.4666.1373670835.3114.python-list@python.org> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <mailman.5039.1374677274.3114.python-list@python.org>
Message-ID <0420de60-b9b5-4ac4-ba7b-ca5ac2ca65fe@googlegroups.com> (permalink)
Subject Re: RE Module Performance
From wxjmfauth@gmail.com

Show all headers | View raw


Le mercredi 24 juillet 2013 16:47:36 UTC+2, Michael Torrie a écrit :
> On 07/24/2013 07:40 AM, wxjmfauth@gmail.com wrote:
> 
> > Sorry, you are not understanding Unicode. What is a Unicode
> 
> > Transformation Format (UTF), what is the goal of a UTF and
> 
> > why it is important for an implementation to work with a UTF.
> 
> 
> 
> Really?  Enlighten me.
> 
> 
> 
> Personally, I would never use UTF as a representation *in memory* for a
> 
> unicode string if it were up to me.  Why?  Because UTF characters are
> 
> not uniform in byte width so accessing positions within the string is
> 
> terribly slow and has to always be done by starting at the beginning of
> 
> the string.  That's at minimum O(n) compared to FSR's O(1).  Surely you
> 
> understand this.  Do you dispute this fact?
> 
> 
> 
> UTF is a great choice for interchange, though, and indeed that's what it
> 
> was designed for.
> 
> 
> 
> Are you calling for UTF to be adopted as the internal, in-memory
> 
> representation of unicode?  Or would you simply settle for UCS-4?
> 
> Please be clear here.  What are you saying?
> 
> 
> 
> > Short example. Writing an editor with something like the
> 
> > FSR is simply impossible (properly).
> 
> 
> 
> How? FSR is just an implementation detail.  It could be UCS-4 and it
> 
> would also work.

---------

A coding scheme works with a unique set of characters (the repertoire),
and the implementation (the programming) works with a unique set
of encoded code points. The critical step is the path
{unique set of characters} <--> {unique set of encoded code points}


Fact: there is no other way to do it properly (This is explaining
why we have to live today with all these coding schemes or also
explaining why so many coding schemes hadto be created).

How to understand it? With a sheet of paper and a pencil.

In the byte string world, this step is a no-op.

In Unicode, it is exactly the purpose of a "utf" to achieve this
step. "utf": a confusing name covering at the same time the
process and the result of the process.
A "utf chunk", a series of bits (not bytes), hold intrisically
the information about the character it is representing.

Other "exotic" coding schemes like iso6937 of "CID-fonts" are woking
in the same way.

"Unicode" with the help of "utf(s)" does not differ from the basic
rule.

-----

ucs-2: ucs-2 is a perfecly and correctly working coding scheme.
ucs-2 is not different from the other coding schemes and does
not behave differently (cp... or iso-... or ...). It only
covers a smaller repertoire.

-----

utf32: as a pointed many times. You are already using it (maybe
without knowing it). Where? in fonts (OpenType technology),
rendering engines, pdf files. Why? Because there is not other
way to do it better.

------

The Unicode table (its constuction) is a problem per se.
It is not a technical problem, a very important "linguistic
aspect" of Unicode.
See https://groups.google.com/forum/#!topic/comp.lang.python/XkTKE7U8CS0

------

If you are not understanding my "editor" analogy. One other
proposed exercise. Build/create a flexible iso-8859-X coding
scheme. You will quickly understand where the bottleneck
is.
Two working ways:
- stupidly with an editor and your fingers.
- lazily with a sheet of paper and you head.


----

About my benchmarks: No offense. You are not understanding them,
because you do not understand what this FSR does and the coding
of characters. It's a little bit a devil's circle.

Conceptually, this FSR is spending its time in solving the
problem it creates itsself, with plenty of side effects.

-----

There is a clear difference between FSR and ucs-4/utf32.

-----

See also:
http://www.unicode.org/reports/tr17/

(In my mind, quite "dry" and not easy to understand at
a first reading).


jmf

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-11 19:44 -0400
  Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-12 02:23 -0700
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:27 +1000
    Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 10:39 +0100
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:40 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 06:45 -0400
    Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 16:59 +0100
    Re: RE Module Performance Peter Otten <__peter__@web.de> - 2013-07-12 18:15 +0200
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-13 02:21 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 13:58 -0400
      Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 05:37 +0000
        Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-14 11:17 -0700
          Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 06:06 -0400
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-15 12:36 +0000
              Dihedral Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 08:52 -0400
              Re: Dihedral Joel Goldstick <joel.goldstick@gmail.com> - 2013-07-15 09:03 -0400
              Re: Dihedral Wayne Werner <wayne@waynewerner.com> - 2013-07-15 17:43 -0500
              Re: Dihedral Fábio Santos <fabiosantosart@gmail.com> - 2013-07-15 23:54 +0100
              Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-16 08:59 +1000
              Re: Dihedral Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-16 16:06 +1000
              Re: Dihedral Stefan Behnel <stefan_ml@behnel.de> - 2013-07-24 20:08 +0200
              Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:23 +1000
              Re: Dihedral Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-24 20:15 -0400
    Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-13 08:16 +1000
    Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-12 17:13 -0600
      Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-24 06:40 -0700
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-24 23:48 +1000
        Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:17 -0400
        Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:19 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:34 +1000
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:02 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:39 +1000
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 08:47 -0600
          Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 02:27 -0700
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:14 +1000
              Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 12:07 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 05:18 +1000
                RE: RE Module Performance "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-07-25 19:30 +0000
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:06 -0600
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 09:00 -0600
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 05:56 +0000
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:56 +1000
        Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 13:52 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:15 +1000
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:15 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:58 +1000
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 09:22 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:07 +1000
        Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 18:09 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 08:19 +1000
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 16:59 -0600
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 09:24 +1000
        Re: RE Module Performance Serhiy Storchaka <storchaka@gmail.com> - 2013-07-25 08:49 +0300
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 15:58 +1000
        Re: RE Module Performance Jeremy Sanders <jeremy@jeremysanders.net> - 2013-07-25 14:36 +0100
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 15:26 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 01:36 +1000
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 17:18 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 03:27 +1000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:45 -0500
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-26 02:48 +0000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 21:20 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:36 -0700
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 08:46 -0700
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 06:28 +0000
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 03:37 +0000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-26 22:12 -0600
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 05:04 +0000
                Re: RE Module Performance Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-27 12:13 -0400
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:19 -0700
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:09 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:21 -0700
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-26 20:05 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-27 11:21 -0700
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-27 21:53 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 11:13 -0700
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:04 +0100
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:30 -0700
                Re: RE Module Performance Lele Gaifax <lele@metapensiero.it> - 2013-07-28 22:45 +0200
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 22:01 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 07:01 -0700
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 16:38 +0200
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 15:45 +0100
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 17:13 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 18:39 +0200
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 18:14 +0100
                Re: RE Module Performance Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-31 13:09 +1000
                Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-31 03:27 +1000
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-30 18:40 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 20:19 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 12:09 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 21:04 +0100
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:54 -0600
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-31 05:45 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 08:17 +0100
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 13:15 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 21:41 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:11 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 01:32 -0700
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:59 +0200
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:44 -0600
                Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-30 17:05 -0400
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:30 -0600
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 09:23 +0200
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:27 -0600
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 10:45 +0200
                FSR and unicode compliance - was Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-28 09:52 -0600
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:23 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:44 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 21:55 +0200
                Re: FSR and unicode compliance - was Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-28 20:52 +0000
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 04:43 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 12:57 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 05:56 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 07:20 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 15:49 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 09:31 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Heiko Wundram <modelnine@modelnine.org> - 2013-07-29 14:06 +0200
                Re: FSR and unicode compliance - was Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-29 08:43 -0400
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 18:03 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 13:36 -0400
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 06:36 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:03 +0100
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 19:19 +0100
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:29 +0100
                Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 15:06 -0400
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 23:14 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 20:51 +0200
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 00:07 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-26 22:38 +0200
        Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-25 09:44 -0400
        Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:53 -0500
    Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-13 00:16 +0100
    Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-14 05:34 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-16 06:30 -0400
      Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-18 13:17 -0700

csiph-web