Groups > comp.lang.python > #47866 > unrolled thread

Re: A few questiosn about encoding

Started by	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
First post	2013-06-13 00:13 +0000
Last post	2013-06-20 19:08 +0200
Articles	20 on this page of 90 — 31 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 00:13 +0000
    Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 09:09 +0300
      Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 07:11 +0000
        Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 10:42 +0300
          Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-13 17:58 +1000
            Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 11:08 +0300
              Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-13 18:20 +1000
                Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 12:41 +0300
                  Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-13 11:49 +0000
                    Re: A few questiosn about encoding Νικόλαος Κούρας <support@superhost.gr> - 2013-06-13 17:19 +0300
                      Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-14 11:00 +1000
                        Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 09:59 +0300
                          Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-14 20:14 +1000
                            Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 16:58 +0300
                              Re: A few questiosn about encoding Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-14 11:21 -0400
                                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 18:26 +0300
                                  Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-15 03:03 +1000
                                    Re: A few questiosn about encoding Walter Hurry <walterhurry@lavabit.com> - 2013-06-14 23:32 +0000
                              Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-15 10:26 +1000
                              Re: A few questiosn about encoding Denis McMahon <denismfmcmahon@gmail.com> - 2013-06-15 06:34 +0000
                                Re: A few questiosn about encoding Grant Edwards <invalid@invalid.invalid> - 2013-06-15 14:44 +0000
                                  Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-15 17:49 +0300
                                    Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-15 15:30 +0000
                                  Re: A few questiosn about encoding Roy Smith <roy@panix.com> - 2013-06-15 10:59 -0400
                                    Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-15 18:14 +0300
                                      Re: A few questiosn about encoding Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-15 11:35 -0400
                              Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-15 22:26 +0300
                                Re: A few questiosn about encoding Benjamin Schollnick <benjamin@schollnick.net> - 2013-06-15 16:35 -0400
                                Re: A few questiosn about encoding Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-06-16 15:45 +0200
              Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 09:36 +0200
                Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 10:49 +0300
                  Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 10:22 +0200
                    Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 11:37 +0300
                      Don't feed the troll... (was: Re: A few questiosn about encoding) Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 11:06 +0200
                        Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 12:32 +0300
                          Re: Don't feed the troll... Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 13:09 +0200
                            Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:36 +0300
                              Re: Don't feed the troll... Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-14 08:44 -0400
                              Re: Don't feed the troll... Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 15:25 +0200
                                Re: Don't feed the troll... Neil Cerutti <neilc@norwich.edu> - 2013-06-14 15:54 +0000
                          Re: Don't feed the troll... Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 12:15 +0200
                          Re: Don't feed the troll... Guy Scree <nobody@nowhere.com> - 2013-06-14 18:50 -0400
                          Re: Don't feed the troll... Denis McMahon <denismfmcmahon@gmail.com> - 2013-06-15 06:31 +0000
                            Re: Don't feed the troll... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-06-15 13:04 -0400
                          Re: Don't feed the troll... Guy Scree <nobody@nowhere.com> - 2013-06-17 16:15 -0400
                            Re: Don't feed the troll... Chris Angelico <rosuav@gmail.com> - 2013-06-18 07:46 +1000
                      Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-14 20:19 +1000
                        Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:41 +0300
                      Re: Don't feed the troll... (was: Re: A few questiosn about encoding) Fábio Santos <fabiosantosart@gmail.com> - 2013-06-14 11:20 +0100
                        Re: Don't feed the troll... (was: Re: A few questiosn about encoding) rusi <rustompmody@gmail.com> - 2013-06-14 04:51 -0700
                          Re: Don't feed the help-vampire rusi <rustompmody@gmail.com> - 2013-06-14 05:09 -0700
                            Re: Don't feed the help-vampire Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 14:31 +0200
                            Re: Don't feed the help-vampire Ian Kelly <ian.g.kelly@gmail.com> - 2013-06-14 10:51 -0600
                          Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:50 +0300
                            Re: Don't feed the troll... Zero Piraeus <schesis@gmail.com> - 2013-06-14 09:33 -0400
                        Re: Don't feed the troll... Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:45 +0300
                          Re: Don't feed the troll... Heiko Wundram <modelnine@modelnine.org> - 2013-06-14 14:58 +0200
                          Re: Don't feed the troll... Fábio Santos <fabiosantosart@gmail.com> - 2013-06-14 14:25 +0100
                          Re: Don't feed the troll... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-14 17:12 +0100
                      Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 12:50 +0200
                        Re: A few questiosn about encoding Nick the Gr33k <support@superhost.gr> - 2013-06-14 15:59 +0300
                          Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-14 15:52 +0200
                          Re: A few questiosn about encoding Cameron Simpson <cs@zip.com.au> - 2013-06-15 10:28 +1000
                          Re: A few questiosn about encoding Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-06-17 08:49 +0200
                      Re: Don't feed the troll... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-14 12:57 +0100
                      Re: Don't feed the troll... (was: Re: A few questiosn about encoding) "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-06-14 13:13 -0400
                      Re: Don't feed the troll... (was: Re: A few questiosn about encoding) Chris Angelico <rosuav@gmail.com> - 2013-06-15 03:31 +1000
                        Re: Don't feed the troll... (was: Re: A few questiosn about encoding) Grant Edwards <invalid@invalid.invalid> - 2013-06-14 19:40 +0000
                      Re: Don't feed the troll "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-06-14 13:56 -0400
                      Re: Don't feed the troll Tim Chase <python.list@tim.thechases.com> - 2013-06-14 14:00 -0500
                      Re: Don't feed the troll "D'Arcy J.M. Cain" <darcy@druid.net> - 2013-06-14 15:17 -0400
                      Re: Don't feed the troll... Ben Finney <ben+python@benfinney.id.au> - 2013-06-15 10:42 +1000
        Re: A few questiosn about encoding Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-19 18:46 -0700
          Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-20 06:26 +0000
            Re: A few questiosn about encoding MRAB <python@mrabarnett.plus.com> - 2013-06-20 12:43 +0100
              Re: A few questiosn about encoding wxjmfauth@gmail.com - 2013-06-20 09:27 -0700
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 02:37 +1000
                Re: A few questiosn about encoding MRAB <python@mrabarnett.plus.com> - 2013-06-20 18:17 +0100
                  Re: A few questiosn about encoding wxjmfauth@gmail.com - 2013-06-23 08:51 -0700
                    Re: A few questiosn about encoding Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-23 16:30 +0000
                      Re: A few questiosn about encoding wxjmfauth@gmail.com - 2013-06-25 13:16 -0700
                Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 03:21 +1000
                Re: A few questiosn about encoding Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-20 20:43 +0100
            Re: A few questiosn about encoding Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-20 06:40 -0700
              Re: A few questiosn about encoding Andrew Berg <robotsondrugs@gmail.com> - 2013-06-20 09:04 -0500
                Re: A few questiosn about encoding Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-20 08:12 -0700
                  Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 01:26 +1000
                  Re: A few questiosn about encoding Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-06-20 20:25 +0300
              Re: A few questiosn about encoding Chris Angelico <rosuav@gmail.com> - 2013-06-21 01:28 +1000
              Re: A few questiosn about encoding Andreas Perstinger <andipersti@gmail.com> - 2013-06-20 19:08 +0200

Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →

#48141

From	Nick the Gr33k <support@superhost.gr>
Date	2013-06-14 15:59 +0300
Message-ID	<kpf42k$spl$14@news.ntua.gr>
In reply to	#48114

On 14/6/2013 1:50 μμ, Antoon Pardon wrote:

> Python works with numbers, but at the moment
> it has to display such a number it has to produce something
> that is printable. So it will build a string that can be
> used as a notation for that number, a numeral. And that
> is what will be displayed.

so a number is just a number but when this number needs to be displayed 
into a monitor, then the printed form of that number we choose to call 
it a numeral?

So, a numeral = a string representation of a number. Is this correct?

-- 
What is now proved was at first only imagined!

[toc] | [prev] | [next] | [standalone]

#48148

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2013-06-14 15:52 +0200
Message-ID	<mailman.3309.1371217960.3114.python-list@python.org>
In reply to	#48141

Op 14-06-13 14:59, Nick the Gr33k schreef:

> On 14/6/2013 1:50 μμ, Antoon Pardon wrote:
>> Python works with numbers, but at the moment
>> it has to display such a number it has to produce something
>> that is printable. So it will build a string that can be
>> used as a notation for that number, a numeral. And that
>> is what will be displayed.
> so a number is just a number but when this number needs to be displayed 
> into a monitor, then the printed form of that number we choose to call 
> it a numeral?
> So, a numeral = a string representation of a number. Is this correct?
Yes, when you print an integer, what actually happens is something along
the following algorithm (python 2 code):


def write_int(out, nr):
    ord0 = ord('0')
    lst = []
    negative = False
    if nr < 0:
        negative = True
        nr = -nr
    while nr:
        digit = nr % 10
        lst.append(chr(digit + ord0))
        nr /= 10
    if negative:
        lst.append('-')
    lst.reverse()
    if not lst:
        lst.append('0')
    numeral = ''.join(lst)
    out.write(numeral)

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]

#48236

From	Cameron Simpson <cs@zip.com.au>
Date	2013-06-15 10:28 +1000
Message-ID	<mailman.3349.1371256121.3114.python-list@python.org>
In reply to	#48141

On 14Jun2013 15:59, Nikos as SuperHost Support <support@superhost.gr> wrote:
| So, a numeral = a string representation of a number. Is this correct?

No, a numeral is an individual digit from the string representation of a number.
So: 65 requires two numerals: '6' and '5'.
-- 
Cameron Simpson <cs@zip.com.au>

In life, you should always try to know your strong points, but this is
far less important than knowing your weak points.
Martin Fitzpatrick <mfitzpatrick@scot.bbc.co.uk>

[toc] | [prev] | [next] | [standalone]

#48499

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2013-06-17 08:49 +0200
Message-ID	<mailman.3468.1371451793.3114.python-list@python.org>
In reply to	#48141

Op 15-06-13 02:28, Cameron Simpson schreef:
> On 14Jun2013 15:59, Nikos as SuperHost Support <support@superhost.gr> wrote:
> | So, a numeral = a string representation of a number. Is this correct?
>
> No, a numeral is an individual digit from the string representation of a number.
> So: 65 requires two numerals: '6' and '5'.
Wrong context. A numeral as an individual digit is when you are talking about
individual characters in a font. In such a context the set of glyphs that
represent a digit are the numerals.

However in a context of programming, numerals in general refer to the set of
strings that represent a number.

-- 
Antoon.

[toc] | [prev] | [next] | [standalone]

#48120 — Re: Don't feed the troll...

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-06-14 12:57 +0100
Subject	Re: Don't feed the troll...
Message-ID	<mailman.3296.1371211005.3114.python-list@python.org>
In reply to	#48088

On 14/06/2013 11:20, Fábio Santos wrote:

>
> Since this is a lot of spam, I feel like leaving the list, but I also
> honestly want to help people use python and the replies to questions of
> others often give me much insight on several matters.
>

Plenty of genuine people needing genuine help on the tutor mailing list, 
or have you been there already?

-- 
"Steve is going for the pink ball - and for those of you who are 
watching in black and white, the pink is next to the green." Snooker 
commentator 'Whispering' Ted Lowe.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#48188 — Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

From	"D'Arcy J.M. Cain" <darcy@druid.net>
Date	2013-06-14 13:13 -0400
Subject	Re: Don't feed the troll... (was: Re: A few questiosn about encoding)
Message-ID	<mailman.3322.1371230016.3114.python-list@python.org>
In reply to	#48088

On Fri, 14 Jun 2013 11:06:55 +0200
Heiko Wundram <modelnine@modelnine.org> wrote:
> Come on now, this is _so_ obviously trolling, it's not even remotely 
> funny anymore. Why doesn't killfiling work with the mailing list
> version of the python list? :-(

A big problem, other than Mr. Support's shenanigans with his email
address, is that even those of us who seem to have successfully
*plonked* him get the responses to him.  The biggest issue with a troll
isn't so much the annoying emails from him but the amplified slew of
responses.  That's the point of a troll after all.

The answer is to always make sure that you include the previous poster
in the reply as a Cc or To.  I filter out any email that has the string
"support@superhost.gr" in a header so I would also filter out the
replies if people would follow that simple rule.

I have suggested this before but the push back I get is that then
people would get two copies of the email, one to them and one to the
list.  My answer is simple.  Get a proper email system that filters out
duplicates.  Is there an email client out there that does not have this
facility?

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 788 2246     (DoD#0082)    (eNTP)   |  what's for dinner.
IM: darcy@Vex.Net, VOIP: sip:darcy@Vex.Net

[toc] | [prev] | [next] | [standalone]

#48194 — Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

From	Chris Angelico <rosuav@gmail.com>
Date	2013-06-15 03:31 +1000
Subject	Re: Don't feed the troll... (was: Re: A few questiosn about encoding)
Message-ID	<mailman.3327.1371231081.3114.python-list@python.org>
In reply to	#48088

On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain <darcy@druid.net> wrote:
> The answer is to always make sure that you include the previous poster
> in the reply as a Cc or To.  I filter out any email that has the string
> "support@superhost.gr" in a header so I would also filter out the
> replies if people would follow that simple rule.
>
> I have suggested this before but the push back I get is that then
> people would get two copies of the email, one to them and one to the
> list.  My answer is simple.  Get a proper email system that filters out
> duplicates.  Is there an email client out there that does not have this
> facility?

The main downside to that is not the first response, to
somebody@somewhere and python-list, but the subsequent ones. Do you
include everyone's addresses? And if so, how do they then get off the
list? (This is a serious consideration. I had some very angry people
asking me to unsubscribe them from a (private) mailman list I run, but
they weren't subscribed at all - they were being cc'd.)

I prefer to simply mail the list. You should be able to mute entire
threads, and he doesn't start more than a couple a day usually.

ChrisA

[toc] | [prev] | [next] | [standalone]

#48216 — Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

From	Grant Edwards <invalid@invalid.invalid>
Date	2013-06-14 19:40 +0000
Subject	Re: Don't feed the troll... (was: Re: A few questiosn about encoding)
Message-ID	<kpfriq$sd2$4@reader1.panix.com>
In reply to	#48194

On 2013-06-14, Chris Angelico <rosuav@gmail.com> wrote:
> On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain <darcy@druid.net> wrote:
>> The answer is to always make sure that you include the previous poster
>> in the reply as a Cc or To.  I filter out any email that has the string
>> "support@superhost.gr" in a header so I would also filter out the
>> replies if people would follow that simple rule.
>>
>> I have suggested this before but the push back I get is that then
>> people would get two copies of the email, one to them and one to the
>> list.  My answer is simple.  Get a proper email system that filters out
>> duplicates.  Is there an email client out there that does not have this
>> facility?
>
> The main downside to that is not the first response, to
> somebody@somewhere and python-list, but the subsequent ones. Do you
> include everyone's addresses? And if so, how do they then get off the
> list? (This is a serious consideration. I had some very angry people
> asking me to unsubscribe them from a (private) mailman list I run, but
> they weren't subscribed at all - they were being cc'd.)

I think the answer is to automatically kill all threads stared by
"him".

Unfortunately, I don't know if that's possible in most newsreaders.

-- 
Grant Edwards               grant.b.edwards        Yow! A dwarf is passing out
                                  at               somewhere in Detroit!
                              gmail.com

[toc] | [prev] | [next] | [standalone]

#48198 — Re: Don't feed the troll

From	"D'Arcy J.M. Cain" <darcy@druid.net>
Date	2013-06-14 13:56 -0400
Subject	Re: Don't feed the troll
Message-ID	<mailman.3329.1371232585.3114.python-list@python.org>
In reply to	#48088

On Sat, 15 Jun 2013 03:31:12 +1000
Chris Angelico <rosuav@gmail.com> wrote:
> On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain <darcy@druid.net>
> wrote:
> > I have suggested this before but the push back I get is that then
> > people would get two copies of the email, one to them and one to the
> > list.  My answer is simple.  Get a proper email system that filters
> > out duplicates.  Is there an email client out there that does not
> > have this facility?
> 
> The main downside to that is not the first response, to
> somebody@somewhere and python-list, but the subsequent ones. Do you
> include everyone's addresses? And if so, how do they then get off the

No, I think Ccing the From is enough.  Other than the OP who is already
*plonked* replies to the replies tend to have at least a modicum of
information. 
 
> I prefer to simply mail the list. You should be able to mute entire
> threads, and he doesn't start more than a couple a day usually.

But then I have to deal with each thread.  I don't want to deal with
them at all.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 788 2246     (DoD#0082)    (eNTP)   |  what's for dinner.
IM: darcy@Vex.Net, VOIP: sip:darcy@Vex.Net

[toc] | [prev] | [next] | [standalone]

#48206 — Re: Don't feed the troll

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-06-14 14:00 -0500
Subject	Re: Don't feed the troll
Message-ID	<mailman.3335.1371236310.3114.python-list@python.org>
In reply to	#48088

On 2013-06-14 13:56, D'Arcy J.M. Cain wrote:
> > I prefer to simply mail the list. You should be able to mute
> > entire threads, and he doesn't start more than a couple a day
> > usually.
> 
> But then I have to deal with each thread.  I don't want to deal with
> them at all.

At least Thunderbird had the ability to set up a filter of the form
"If the sender matches 'xyz@example.com' then kill this thread" so
the thread-killing (or sub-thread killing) was automatic.

I set that up for Xah posts and my life was far better.

I've since switched to Claws for my mail and miss that kill-thread
functionality. :-/

-tkc

[toc] | [prev] | [next] | [standalone]

#48211 — Re: Don't feed the troll

From	"D'Arcy J.M. Cain" <darcy@druid.net>
Date	2013-06-14 15:17 -0400
Subject	Re: Don't feed the troll
Message-ID	<mailman.3337.1371237438.3114.python-list@python.org>
In reply to	#48088

On Fri, 14 Jun 2013 14:00:17 -0500
Tim Chase <python.list@tim.thechases.com> wrote:
> I set that up for Xah posts and my life was far better.

Has he disappeared or is my filtering just really successful?

> I've since switched to Claws for my mail and miss that kill-thread
> functionality. :-/

Heh.  Exactly what I am using.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 788 2246     (DoD#0082)    (eNTP)   |  what's for dinner.
IM: darcy@Vex.Net, VOIP: sip:darcy@Vex.Net

[toc] | [prev] | [next] | [standalone]

#48237 — Re: Don't feed the troll...

From	Ben Finney <ben+python@benfinney.id.au>
Date	2013-06-15 10:42 +1000
Subject	Re: Don't feed the troll...
Message-ID	<mailman.3350.1371256953.3114.python-list@python.org>
In reply to	#48088

"D'Arcy J.M. Cain" <darcy@druid.net> writes:

> The answer is to always make sure that you include the previous poster
> in the reply as a Cc or To.

Dragging the discussion from one forum (comp.lang.python) to another
(every person's individual email) is obnoxious. Please don't.

> I have suggested this before but the push back I get is that then
> people would get two copies of the email, one to them and one to the
> list.

In my case, I don't want to receive the messages by email *at all*. I
participate in this forum using a non-email system, and it works fine so
long as people continue to participate in this forum.

Even for those who do participate by email, though, your approach is
broken:

> My answer is simple.  Get a proper email system that filters out
> duplicates.

The message sent to the individual typically arrives earlier (since it
is sent straight from you to the individual), and the message on the
forum arrives later (since it typically requires more processing).

But since we're participating in the discussion on the forum and not in
individual email, it is the later one we want, and the earlier one
should be deleted.

So at the point the first message arrives, it isn't a duplicate. The
mail program will show it anyway, because “remove duplicates” can't
catch it when there's no duplicate yet.

The proper solution is for you not to send that one at all, and send
only the message to the forum.

You do this by using your mail client's “reply to list” function, which
uses the RFC 3696 information in every mailing list message.

Is there any mail client which doesn't have this function? If so, use
your vendor's bug reporting system to request this feature as standard,
and/or switch to a better mail client until they fix that.

-- 
 \       “Timid men prefer the calm of despotism to the boisterous sea |
  `\                                    of liberty.” —Thomas Jefferson |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#48767

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2013-06-19 18:46 -0700
Message-ID	<77ba6b16-4b1d-47a6-9b9b-5af45335c4fe@googlegroups.com>
In reply to	#47912

On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:

> Gah! That's twice I've screwed that up. 
> Sorry about that!

Yeah, and your difficulty explaining the Unicode implementation reminds me of a passage from the Python zen:

 "If the implementation is hard to explain, it's a bad idea."

[toc] | [prev] | [next] | [standalone]

#48777

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-06-20 06:26 +0000
Message-ID	<51c2a089$0$29973$c3e8da3$5496439d@news.astraweb.com>
In reply to	#48767

On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:

> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
>  
>> Gah! That's twice I've screwed that up. Sorry about that!
> 
> Yeah, and your difficulty explaining the Unicode implementation reminds
> me of a passage from the Python zen:
> 
>  "If the implementation is hard to explain, it's a bad idea."

The *implementation* is easy to explain. It's the names of the encodings 
which I get tangled up in.

ASCII: Supports exactly 127 code points, each of which takes up exactly 7 
bits. Each code point represents a character.

Latin-1, Latin-2, MacRoman, MacGreek, ISO-8859-7, Big5, Windows-1251, and 
about a gazillion other legacy charsets, all of which are mutually 
incompatible: supports anything from 127 to 65535 different code points, 
usually under 256.

UCS-2: Supports exactly 65535 code points, each of which takes up exactly 
two bytes. That's fewer than required, so it is obsoleted by:

UTF-16: Supports all 1114111 code points in the Unicode charset, using a 
variable-width system where the most popular characters use exactly two-
bytes and the remaining ones use a pair of characters.

UCS-4: Supports exactly 4294967295 code points, each of which takes up 
exactly four bytes. That is more than needed for the Unicode charset, so 
this is obsoleted by:

UTF-32: Supports all 1114111 code points, using exactly four bytes each. 
Code points outside of the range 0 through 1114111 inclusive are an error.

UTF-8: Supports all 1114111 code points, using a variable-width system 
where popular ASCII characters require 1 byte, and others use 2, 3 or 4 
bytes as needed.

Ignoring the legacy charsets, only UTF-16 is a terribly complicated 
implementation, due to the surrogate pairs. But even that is not too bad. 
The real complication comes from the interactions between systems which 
use different encodings, and that's nothing to do with Unicode.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#48785

From	MRAB <python@mrabarnett.plus.com>
Date	2013-06-20 12:43 +0100
Message-ID	<mailman.3620.1371728614.3114.python-list@python.org>
In reply to	#48777

On 20/06/2013 07:26, Steven D'Aprano wrote:
> On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
>
>> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
>>
>>> Gah! That's twice I've screwed that up. Sorry about that!
>>
>> Yeah, and your difficulty explaining the Unicode implementation reminds
>> me of a passage from the Python zen:
>>
>>  "If the implementation is hard to explain, it's a bad idea."
>
> The *implementation* is easy to explain. It's the names of the encodings
> which I get tangled up in.
>
You're off by one below!
>
> ASCII: Supports exactly 127 code points, each of which takes up exactly 7
> bits. Each code point represents a character.
>
128 codepoints.

> Latin-1, Latin-2, MacRoman, MacGreek, ISO-8859-7, Big5, Windows-1251, and
> about a gazillion other legacy charsets, all of which are mutually
> incompatible: supports anything from 127 to 65535 different code points,
> usually under 256.
>
128 to 65536 codepoints.

> UCS-2: Supports exactly 65535 code points, each of which takes up exactly
> two bytes. That's fewer than required, so it is obsoleted by:
>
65536 codepoints.

etc.

> UTF-16: Supports all 1114111 code points in the Unicode charset, using a
> variable-width system where the most popular characters use exactly two-
> bytes and the remaining ones use a pair of characters.
>
> UCS-4: Supports exactly 4294967295 code points, each of which takes up
> exactly four bytes. That is more than needed for the Unicode charset, so
> this is obsoleted by:
>
> UTF-32: Supports all 1114111 code points, using exactly four bytes each.
> Code points outside of the range 0 through 1114111 inclusive are an error.
>
> UTF-8: Supports all 1114111 code points, using a variable-width system
> where popular ASCII characters require 1 byte, and others use 2, 3 or 4
> bytes as needed.
>
>
> Ignoring the legacy charsets, only UTF-16 is a terribly complicated
> implementation, due to the surrogate pairs. But even that is not too bad.
> The real complication comes from the interactions between systems which
> use different encodings, and that's nothing to do with Unicode.
>
>

[toc] | [prev] | [next] | [standalone]

#48806

From	wxjmfauth@gmail.com
Date	2013-06-20 09:27 -0700
Message-ID	<114200cf-2d46-46cb-bb5f-7c5f8ab98a66@googlegroups.com>
In reply to	#48785

Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit :
> On 20/06/2013 07:26, Steven D'Aprano wrote:
> 
> > On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
> 
> >
> 
> >> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
> 
> >>
> 
> >>> Gah! That's twice I've screwed that up. Sorry about that!
> 
> >>
> 
> >> Yeah, and your difficulty explaining the Unicode implementation reminds
> 
> >> me of a passage from the Python zen:
> 
> >>
> 
> >>  "If the implementation is hard to explain, it's a bad idea."
> 
> >
> 
> > The *implementation* is easy to explain. It's the names of the encodings
> 
> > which I get tangled up in.
> 
> >
> 
> You're off by one below!
> 
> >
> 
> > ASCII: Supports exactly 127 code points, each of which takes up exactly 7
> 
> > bits. Each code point represents a character.
> 
> >
> 
> 128 codepoints.
> 
> 
> 
> > Latin-1, Latin-2, MacRoman, MacGreek, ISO-8859-7, Big5, Windows-1251, and
> 
> > about a gazillion other legacy charsets, all of which are mutually
> 
> > incompatible: supports anything from 127 to 65535 different code points,
> 
> > usually under 256.
> 
> >
> 
> 128 to 65536 codepoints.
> 
> 
> 
> > UCS-2: Supports exactly 65535 code points, each of which takes up exactly
> 
> > two bytes. That's fewer than required, so it is obsoleted by:
> 
> >
> 
> 65536 codepoints.
> 
> 
> 
> etc.
> 
> 
> 
> > UTF-16: Supports all 1114111 code points in the Unicode charset, using a
> 
> > variable-width system where the most popular characters use exactly two-
> 
> > bytes and the remaining ones use a pair of characters.
> 
> >
> 
> > UCS-4: Supports exactly 4294967295 code points, each of which takes up
> 
> > exactly four bytes. That is more than needed for the Unicode charset, so
> 
> > this is obsoleted by:
> 
> >
> 
> > UTF-32: Supports all 1114111 code points, using exactly four bytes each.
> 
> > Code points outside of the range 0 through 1114111 inclusive are an error.
> 
> >
> 
> > UTF-8: Supports all 1114111 code points, using a variable-width system
> 
> > where popular ASCII characters require 1 byte, and others use 2, 3 or 4
> 
> > bytes as needed.
> 
> >
> 
> >
> 
> > Ignoring the legacy charsets, only UTF-16 is a terribly complicated
> 
> > implementation, due to the surrogate pairs. But even that is not too bad.
> 
> > The real complication comes from the interactions between systems which
> 
> > use different encodings, and that's nothing to do with Unicode.
> 
> >
> 
> >

And all these coding schemes have something in common,
they work all with a unique set of code points, more
precisely a unique set of encoded code points (not
the set of implemented code points (byte)).

Just what the flexible string representation is not
doing, it artificially devides unicode in subsets and try
to handle eache subset differently.

On this other side, that is because it is impossible to
work properly with multiple sets of encoded code points
that all these coding schemes exist today. There are simply
no other way.

Even "exotic" schemes like "CID-fonts" used in pdf
are based on that scheme.

jmf

[toc] | [prev] | [next] | [standalone]

#48807

From	Chris Angelico <rosuav@gmail.com>
Date	2013-06-21 02:37 +1000
Message-ID	<mailman.3630.1371746277.3114.python-list@python.org>
In reply to	#48806

On Fri, Jun 21, 2013 at 2:27 AM,  <wxjmfauth@gmail.com> wrote:
> And all these coding schemes have something in common,
> they work all with a unique set of code points, more
> precisely a unique set of encoded code points (not
> the set of implemented code points (byte)).
>
> Just what the flexible string representation is not
> doing, it artificially devides unicode in subsets and try
> to handle eache subset differently.
>

UTF-16 divides Unicode into two subsets: BMP characters (encoded using
one 16-bit unit) and astral characters (encoded using two 16-bit units
in the D800::/5 netblock, or equivalent thereof). Your beloved narrow
builds are guilty of exactly the same crime as the hated 3.3.

ChrisA

[toc] | [prev] | [next] | [standalone]

#48810

From	MRAB <python@mrabarnett.plus.com>
Date	2013-06-20 18:17 +0100
Message-ID	<mailman.3632.1371748640.3114.python-list@python.org>
In reply to	#48806

On 20/06/2013 17:37, Chris Angelico wrote:
> On Fri, Jun 21, 2013 at 2:27 AM,  <wxjmfauth@gmail.com> wrote:
>> And all these coding schemes have something in common,
>> they work all with a unique set of code points, more
>> precisely a unique set of encoded code points (not
>> the set of implemented code points (byte)).
>>
>> Just what the flexible string representation is not
>> doing, it artificially devides unicode in subsets and try
>> to handle eache subset differently.
>>
>
>
> UTF-16 divides Unicode into two subsets: BMP characters (encoded using
> one 16-bit unit) and astral characters (encoded using two 16-bit units
> in the D800::/5 netblock, or equivalent thereof). Your beloved narrow
> builds are guilty of exactly the same crime as the hated 3.3.
>
UTF-8 divides Unicode into subsets which are encoded in 1, 2, 3, or 4
bytes, and those who previously used ASCII still need only 1 byte per
codepoint!

[toc] | [prev] | [next] | [standalone]

#48986

From	wxjmfauth@gmail.com
Date	2013-06-23 08:51 -0700
Message-ID	<28586b5f-cb51-4e41-a47d-38a18723b51c@googlegroups.com>
In reply to	#48810

Le jeudi 20 juin 2013 19:17:12 UTC+2, MRAB a écrit :
> On 20/06/2013 17:37, Chris Angelico wrote:
> 
> > On Fri, Jun 21, 2013 at 2:27 AM,  <wxjmfauth@gmail.com> wrote:
> 
> >> And all these coding schemes have something in common,
> 
> >> they work all with a unique set of code points, more
> 
> >> precisely a unique set of encoded code points (not
> 
> >> the set of implemented code points (byte)).
> 
> >>
> 
> >> Just what the flexible string representation is not
> 
> >> doing, it artificially devides unicode in subsets and try
> 
> >> to handle eache subset differently.
> 
> >>
> 
> >
> 
> >
> 
> > UTF-16 divides Unicode into two subsets: BMP characters (encoded using
> 
> > one 16-bit unit) and astral characters (encoded using two 16-bit units
> 
> > in the D800::/5 netblock, or equivalent thereof). Your beloved narrow
> 
> > builds are guilty of exactly the same crime as the hated 3.3.
> 
> >
> 
> UTF-8 divides Unicode into subsets which are encoded in 1, 2, 3, or 4
> 
> bytes, and those who previously used ASCII still need only 1 byte per
> 
> codepoint!

Sorry, but no, it does not work in that way:
confusion between the set of encoded code points
and the implementation of these called code units.

utf-8: how many bytes to hold an "a" in memory?
one byte.

flexible string representation: how many bytes to
hold an "a" in memory? One byte? No, two.
(Funny, it consumes more memory to hold an ascii char
than ascii itself)

utf-8: In a series of bytes implementing the encoded code
points supposed to hold a string, picking a byte and
finding to which encoded code point it belongs is a no prolem.

flexible string representation: In a series of bytes
implementing the encoded code points supposed to hold a
string, picking a byte and finding to which encoded code
point it belongs is ... impossible !

One of the cause of the bad working of this flexible string
representation.

The basics of any coding scheme, unicode included.

jmf

[toc] | [prev] | [next] | [standalone]

#48989

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-06-23 16:30 +0000
Message-ID	<51c722af$0$29999$c3e8da3$5496439d@news.astraweb.com>
In reply to	#48986

On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote:

> utf-8: how many bytes to hold an "a" in memory? one byte.
> 
> flexible string representation: how many bytes to hold an "a" in memory?
> One byte? No, two. (Funny, it consumes more memory to hold an ascii char
> than ascii itself)

Incorrect. Python strings have overhead because they are objects, so 
let's see the difference adding a single character makes:

# Python 3.3, with the hated flexible string representation:
py> sys.getsizeof('a'*100) - sys.getsizeof('a'*99)
1

# Python 3.2:
py> sys.getsizeof('a'*100) - sys.getsizeof('a'*99)
4

How about a French é character? Of course, ASCII cannot store it *at 
all*, but let's see what Python can do:

# The hated Python 3.3 again:
py> sys.getsizeof('é'*100) - sys.getsizeof('é'*99)
1

# And Python 3.2:
py> sys.getsizeof('é'*100) - sys.getsizeof('é'*99)
4

> utf-8: In a series of bytes implementing the encoded code points
> supposed to hold a string, picking a byte and finding to which encoded
> code point it belongs is a no prolem.

Incorrect. UTF-8 is unsuitable for random access, since it has variable-
width characters, anything from 1 to 4 bytes. So you cannot just jump 
directly to character 1000 in a block of text, you have to inspect each 
byte one-by-one to decide whether it is a 1, 2, 3 or 4 byte character.

> flexible string representation: In a series of bytes implementing the
> encoded code points supposed to hold a string, picking a byte and
> finding to which encoded code point it belongs is ... impossible !

Incorrect. It is absolutely trivial. Each string is marked as either 1-
byte, 2-byte or 4-byte. If it is a 1-byte string, then each byte is one 
character. If it is a 2-byte string, then it is just like Python 3.2 
narrow build, and each two bytes is a character. If it is a 4-byte 
string, then it is just like Python 3.2 wide build, and each four bytes 
is a character. Within a single string, the number of bytes per character 
is fixed, and random access is easy and fast.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →

csiph-web

Re: A few questiosn about encoding

Contents

#48141

#48148

#48236

#48499

#48120 — Re: Don't feed the troll...

#48188 — Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

#48194 — Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

#48216 — Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

#48198 — Re: Don't feed the troll

#48206 — Re: Don't feed the troll

#48211 — Re: Don't feed the troll

#48237 — Re: Don't feed the troll...

#48767

#48777

#48785

#48806

#48807

#48810

#48986

#48989