Groups > comp.lang.python > #104132 > unrolled thread

Pyhon 2.x or 3.x, which is faster?

Started by	Tony van der Hoff <tony@vanderhoff.org>
First post	2016-03-06 11:34 +0000
Last post	2016-03-07 19:02 +0000
Articles	20 on this page of 142 — 20 participants

Back to article view | Back to comp.lang.python

  Pyhon 2.x or 3.x, which is faster? Tony van der Hoff <tony@vanderhoff.org> - 2016-03-06 11:34 +0000
    Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-07 01:41 +1100
      Re: Pyhon 2.x or 3.x, which is faster? Tony van der Hoff <tony@vanderhoff.org> - 2016-03-07 10:45 +0000
      Re: Pyhon 2.x or 3.x, which is faster? Andrew Jaffe <a.h.jaffe@gmail.com> - 2016-03-07 11:54 +0000
      Re: Pyhon 2.x or 3.x, which is faster? Terry Reedy <tjreedy@udel.edu> - 2016-03-07 17:33 -0500
    Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-07 11:02 +0000
      Re: Pyhon 2.x or 3.x, which is faster? Marko Rauhamaa <marko@pacujo.net> - 2016-03-07 13:11 +0200
        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-07 11:38 +0000
          Re: Pyhon 2.x or 3.x, which is faster? Fabien <fabien.maussion@gmail.com> - 2016-03-07 13:19 +0100
            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-07 13:25 +0000
              Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 02:31 +1100
                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-07 18:34 +0000
                  Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 06:10 +1100
                    Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-07 20:19 +0000
                      Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 07:47 +1100
                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-07 22:39 +0000
                          Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 10:40 +1100
                            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 00:22 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 00:43 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 11:45 +1100
                              Re: Pyhon 2.x or 3.x, which is faster? MRAB <python@mrabarnett.plus.com> - 2016-03-08 00:47 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-03-07 20:29 -0500
                              Re: Pyhon 2.x or 3.x, which is faster? Terry Reedy <tjreedy@udel.edu> - 2016-03-07 22:51 -0500
                              Re: Pyhon 2.x or 3.x, which is faster? Michael Torrie <torriem@gmail.com> - 2016-03-08 17:34 -0700
                                Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-09 13:01 +1100
                              Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-09 11:38 +1100
                          Re: Pyhon 2.x or 3.x, which is faster? Ben Finney <ben+python@benfinney.id.au> - 2016-03-08 11:05 +1100
                            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 01:00 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 01:12 +0000
                                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 01:47 +0000
                                  Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 02:45 +0000
                                    Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 11:09 +0000
                                      Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 16:09 +0000
                                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 19:15 +0000
                                          Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 20:44 +0000
                                            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 22:38 +0000
                                          Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-09 10:59 +1100
                                            Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-09 08:40 +0000
                                              Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 12:02 +0000
                                                Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-09 21:13 +0000
                                                  Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 23:14 +0000
                                                    Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-09 23:35 +0000
                                                      Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 00:58 +0000
                                                        Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 12:28 +1100
                                                        Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 07:30 +0000
                                                          Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 11:50 +0000
                                                            Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 12:15 +0000
                                                              Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 12:47 +0000
                                                                Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-11 00:08 +1100
                                                                  Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 14:22 +0000
                                                                    Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 19:26 +0000
                                                                      Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-11 16:29 +1100
                                                                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-11 18:57 +0000
                                                                          Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-11 21:59 +0000
                                                                            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-11 22:24 +0000
                                                                              Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-12 16:59 +1100
                                                                              Re: Pyhon 2.x or 3.x, which is faster? alister <alister.ware@ntlworld.com> - 2016-03-12 10:06 +0000
                                                                                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-12 10:31 +0000
                                                                                  Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-12 10:51 +0000
                                                                                  Re: Pyhon 2.x or 3.x, which is faster? alister <alister.ware@ntlworld.com> - 2016-03-12 15:36 +0000
                                                                                    Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-13 14:22 +1100
                                                                              Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-12 10:34 +0000
                                                                                Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-12 21:40 +1100
                                                                    Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-11 07:07 +1100
                                                                      Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-11 16:06 +1100
                                                                        Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-11 16:36 +1100
                                                                Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 13:18 +0000
                                                                Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-11 00:30 +1100
                                                                Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 13:46 +0000
                                                        Re: Pyhon 2.x or 3.x, which is faster? Ben Finney <ben+python@benfinney.id.au> - 2016-03-10 18:43 +1100
                                                        Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 18:55 +1100
                                                      Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 12:59 +1100
                                                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 12:19 +0000
                                                    Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 10:38 +1100
                                                      Re: Pyhon 2.x or 3.x, which is faster? Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2016-03-09 23:48 +0000
                                                        Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 11:03 +1100
                                                          Re: Pyhon 2.x or 3.x, which is faster? Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2016-03-10 02:38 +0000
                                                            Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 14:43 +1100
                                                      Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 01:30 +0000
                                                        Re: Pyhon 2.x or 3.x, which is faster? Ben Finney <ben+python@benfinney.id.au> - 2016-03-10 13:29 +1100
                                                          Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-10 14:32 +0000
                                                        Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 13:45 +1100
                                                    Re: Pyhon 2.x or 3.x, which is faster? Ben Finney <ben+python@benfinney.id.au> - 2016-03-10 11:21 +1100
                              Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 12:23 +1100
                                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 01:33 +0000
                                  Re: Pyhon 2.x or 3.x, which is faster? Ben Finney <ben+python@benfinney.id.au> - 2016-03-08 12:38 +1100
                                  Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 12:40 +1100
                                    Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 02:02 +0000
                                      Re: Pyhon 2.x or 3.x, which is faster? Ben Finney <ben+python@benfinney.id.au> - 2016-03-08 13:28 +1100
                                  Re: Pyhon 2.x or 3.x, which is faster? MRAB <python@mrabarnett.plus.com> - 2016-03-08 02:47 +0000
                                    Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 11:15 +0000
                                      Re: Pyhon 2.x or 3.x, which is faster? Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-08 13:45 +0200
                                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 12:09 +0000
                                  Re: Pyhon 2.x or 3.x, which is faster? Terry Reedy <tjreedy@udel.edu> - 2016-03-07 22:39 -0500
                                  Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 03:48 +0000
                          What will I get when reading from a file? (was: Pyhon 2.x or 3.x, which is faster?) Ben Finney <ben+python@benfinney.id.au> - 2016-03-08 11:09 +1100
                          Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-08 13:12 +1100
                            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 11:53 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-09 10:28 +1100
                                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 00:09 +0000
                                  Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-09 11:36 +1100
                                Re: Pyhon 2.x or 3.x, which is faster? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-03-08 21:03 -0500
                                  Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 03:07 +1100
                            Re: Pyhon 2.x or 3.x, which is faster? Serhiy Storchaka <storchaka@gmail.com> - 2016-03-08 14:48 +0200
                        Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-08 12:34 +1100
                          Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-08 12:49 +1100
                          Re: Pyhon 2.x or 3.x, which is faster? Serhiy Storchaka <storchaka@gmail.com> - 2016-03-08 15:05 +0200
                      Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-08 12:19 +1100
                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 01:41 +0000
                          Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-03-08 15:40 +1100
                            Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 13:49 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 16:15 +0000
                                Re: Pyhon 2.x or 3.x, which is faster? wxjmfauth@gmail.com - 2016-03-08 09:23 -0800
                                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-08 19:02 +0000
                              Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-09 11:04 +1100
                                Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 01:28 +0000
                                  Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-09 13:18 +1100
                                    Re: Pyhon 2.x or 3.x, which is faster? wxjmfauth@gmail.com - 2016-03-09 02:11 -0800
                                    Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 14:03 +0000
                                      Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 01:11 +1100
                                        Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 14:39 +0000
                                          Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 01:54 +1100
                                            Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 02:33 +1100
                                              Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 02:58 +1100
                                          Re: Pyhon 2.x or 3.x, which is faster? Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2016-03-09 14:56 +0000
                                          Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 02:28 +1100
                                        Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 01:57 +1100
                                          Re: Pyhon 2.x or 3.x, which is faster? Chris Angelico <rosuav@gmail.com> - 2016-03-10 02:04 +1100
                                          Re: Pyhon 2.x or 3.x, which is faster? BartC <bc@freeuk.com> - 2016-03-09 16:53 +0000
                                      Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 01:54 +1100
                                        Re: Pyhon 2.x or 3.x, which is faster? Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2016-03-09 15:06 +0000
                                          Re: Pyhon 2.x or 3.x, which is faster? Tim Golden <mail@timgolden.me.uk> - 2016-03-09 15:15 +0000
                                          Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 02:38 +1100
                                      Re: Pyhon 2.x or 3.x, which is faster? Terry Reedy <tjreedy@udel.edu> - 2016-03-09 10:42 -0500
                                        Re: Pyhon 2.x or 3.x, which is faster? wxjmfauth@gmail.com - 2016-03-09 09:04 -0800
                                Re: Pyhon 2.x or 3.x, which is faster? Marko Rauhamaa <marko@pacujo.net> - 2016-03-09 08:08 +0200
                                  Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-09 22:52 +1100
                                    Re: Pyhon 2.x or 3.x, which is faster? Marko Rauhamaa <marko@pacujo.net> - 2016-03-09 14:53 +0200
                                      Re: Pyhon 2.x or 3.x, which is faster? Steven D'Aprano <steve@pearwood.info> - 2016-03-10 03:53 +1100
                  Re: Pyhon 2.x or 3.x, which is faster? Michael Torrie <torriem@gmail.com> - 2016-03-08 17:42 -0700
          Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-08 02:53 +0000
      Re: Pyhon 2.x or 3.x, which is faster? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-07 19:02 +0000

Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →

#104413

From	BartC <bc@freeuk.com>
Date	2016-03-09 14:39 +0000
Message-ID	<nbpcdd$71l$1@dont-email.me>
In reply to	#104411

On 09/03/2016 14:11, Chris Angelico wrote:
> On Thu, Mar 10, 2016 at 1:03 AM, BartC <bc@freeuk.com> wrote:
>> I've just tried a UTF-8 file and getting some odd results. With a file
>> containing [three euro symbols]:
>>
>> €€€
>>
>> (including a 3-byte utf-8 marker at the start), and opened in text mode,
>> Python 3 gives me this series of bytes (ie. the ord() of each character):
>>
>> 239
>> 187
>> 191
>> 226
>> 8218
>> 172
>> 226
>> 8218
>> 172
>> 226
>> 8218
>> 172
>>
>> And prints the resulting string as: ï»¿â‚¬â‚¬â‚¬.
>
> The first three bytes are the "UTF-8 BOM", which suggests you may have
> created this in a broken editor like Notepad.

Yes, that's what I used, but what's broken about it? If Python doesn't 
understand the BOM, it should still resynchronise after a few bytes.

 > For the rest, I'm not sure how you told Python to open this as text,
 > but you certainly did NOT specify an encoding of UTF-8. The 8218
 > entries in there are completely bogus. Can you show your code, please,
 > and also what you get if you open the file as binary?

This is the code:

f=open("input","r")
t=f.read(1000)
f.close()

print ("T",type(t),len(t))

print (t)

for i in t:
	print (ord(i))

This doesn't specify any specific code encoding; I don't know how, and 
Steven didn't mention anything other than a text file. The input data is 
represented by this dump, and this is also what binary mode gives:

0000: ef bb bf e2 82 ac e2 82 ac e2 82 ac    ............

> Unicode handling is easy as long as you (a) understand the fundamental
> difference between text and bytes, and (b) declare your encodings.
> Python isn't magical. It can't know the encoding without being told.

Hence the BOM bytes.

(Isn't it better that it's automatic? Someone sends you a text file that 
you want to open within a Python program. Are you supposed to analyze it 
first, or expect the sender to tell you what it is (they probably won't 
know) then need to hack the program to read it properly?)

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#104414

From	Chris Angelico <rosuav@gmail.com>
Date	2016-03-10 01:54 +1100
Message-ID	<mailman.79.1457535287.15725.python-list@python.org>
In reply to	#104413

On Thu, Mar 10, 2016 at 1:39 AM, BartC <bc@freeuk.com> wrote:
> On 09/03/2016 14:11, Chris Angelico wrote:
>>
>> On Thu, Mar 10, 2016 at 1:03 AM, BartC <bc@freeuk.com> wrote:
>>>
>>> I've just tried a UTF-8 file and getting some odd results. With a file
>>> containing [three euro symbols]:
>>>
>>> €€€
>>>
>>> (including a 3-byte utf-8 marker at the start), and opened in text mode,
>>> Python 3 gives me this series of bytes (ie. the ord() of each character):
>>>
>>> 239
>>> 187
>>> 191
>>> 226
>>> 8218
>>> 172
>>> 226
>>> 8218
>>> 172
>>> 226
>>> 8218
>>> 172
>>>
>>> And prints the resulting string as: ï»¿â‚¬â‚¬â‚¬.
>>
>>
>> The first three bytes are the "UTF-8 BOM", which suggests you may have
>> created this in a broken editor like Notepad.
>
>
> Yes, that's what I used, but what's broken about it? If Python doesn't
> understand the BOM, it should still resynchronise after a few bytes.

It's an extra character. You thought the file contained three
characters; it actually contained four.

>> For the rest, I'm not sure how you told Python to open this as text,
>> but you certainly did NOT specify an encoding of UTF-8. The 8218
>> entries in there are completely bogus. Can you show your code, please,
>> and also what you get if you open the file as binary?
>
> This is the code:
>
> f=open("input","r")
> t=f.read(1000)
> f.close()
>
> print ("T",type(t),len(t))
>
> print (t)
>
> for i in t:
>         print (ord(i))
>
> This doesn't specify any specific code encoding; I don't know how, and
> Steven didn't mention anything other than a text file. The input data is
> represented by this dump, and this is also what binary mode gives:
>
> 0000: ef bb bf e2 82 ac e2 82 ac e2 82 ac    ............

Okay. Try changing your first line to this:

f = open("input", encoding="utf-8")

By default, you get a system-specific encoding, which in your case
appears to be one of the Windows codepages. That's why you're getting
nonsense out of it - you write in one encoding and read in another.
It's commonly called mojibake.

>> Unicode handling is easy as long as you (a) understand the fundamental
>> difference between text and bytes, and (b) declare your encodings.
>> Python isn't magical. It can't know the encoding without being told.
>
>
> Hence the BOM bytes.
>
> (Isn't it better that it's automatic? Someone sends you a text file that you
> want to open within a Python program. Are you supposed to analyze it first,
> or expect the sender to tell you what it is (they probably won't know) then
> need to hack the program to read it properly?)

No, it's not better to be automatic. They are supposed to tell you
what it is. Someone somewhere saved the file using a particular
encoding. In this example, you chose when you told Notepad to save it
as UTF-8; so you carry that information with the file, and open it
using the encoding="UTF-8" parameter.

Analyzing files to try to guess their encodings is fundamentally hard.
I have a source of occasional text files that basically just dumps
stuff on me without any metadata, and I have to figure out (a) what
the encoding is, and (b) what language the text is in. I can generally
assume that the files are ASCII-compatible (on the rare occasions when
they're not, they're usually going to be UTF-16, which is fairly easy
to spot), and then I have two levels of heuristics to try to guess a
most-likely encoding - but ultimately, the script just decodes the
text as best it can, and then hands the result up to the human. If the
result looks mostly like Spanish but has acute accents instead of
tildes over the n's, it's probably the wrong codepage. Or if the text
is all completely meaningless junk, it's probably Cyrillic or Greek
letters, and needs to be decoded using an appropriate eight-bit
encoding. It often ends up being trial-and-error to figure out what
encoding was actually used.

Trying to guess the encoding of text in a file full of bytes is like
trying to guess the modem settings (8N1? 7E1?). If the other end
doesn't tell you, you'll probably end up with something that carries
some decodable content, but not the original content. It's almost
completely useless.

ChrisA

[toc] | [prev] | [next] | [standalone]

#104424

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-10 02:33 +1100
Message-ID	<56e0424b$0$1603$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104414

On Thu, 10 Mar 2016 01:54 am, Chris Angelico wrote:

> I have a source of occasional text files that basically just dumps
> stuff on me without any metadata, and I have to figure out (a) what
> the encoding is, and (b) what language the text is in.

https://pypi.python.org/pypi/chardet

> then I have two levels of heuristics to try to guess a
> most-likely encoding

I'm curious, what do you do?

(I stress that trying to guess the character set or encoding from the text
itself is a second-last ditch tactic, for when you really don't know and
can't find out what the encoding is. The final, last-ditch tactic is to
just say "bugger it, I'll pretend it's Latin-1" and get a mess of
moji-bake, but at least an ASCII characters will decode alright, and as an
English speaker, that's all that's important to me :-)

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104427

From	Chris Angelico <rosuav@gmail.com>
Date	2016-03-10 02:58 +1100
Message-ID	<mailman.86.1457539118.15725.python-list@python.org>
In reply to	#104424

On Thu, Mar 10, 2016 at 2:33 AM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 10 Mar 2016 01:54 am, Chris Angelico wrote:
>
>> I have a source of occasional text files that basically just dumps
>> stuff on me without any metadata, and I have to figure out (a) what
>> the encoding is, and (b) what language the text is in.
>
> https://pypi.python.org/pypi/chardet
>
>> then I have two levels of heuristics to try to guess a
>> most-likely encoding
>
> I'm curious, what do you do?

Collect subtitles files from random internet contributors and
determine whether they add to the existing corpus of material. The
first heuristic level is chardet, as mentioned; but with the specific
files that I'm processing, it has some semi-consistent errors, so I
scripted around that - eg "if chardet says ISO-8859-2, and these byte
patterns exist, it's probably actually codepage 1250". IIRC the second
level is entirely translating from an ISO-8859 to the
nearest-equivalent Windows codepage.

> (I stress that trying to guess the character set or encoding from the text
> itself is a second-last ditch tactic, for when you really don't know and
> can't find out what the encoding is. The final, last-ditch tactic is to
> just say "bugger it, I'll pretend it's Latin-1" and get a mess of
> moji-bake, but at least an ASCII characters will decode alright, and as an
> English speaker, that's all that's important to me :-)

What I do is attempt to guess, *and then hand it to the user*. I have
a little "cdless" script that does a chardet on a file, decodes
accordingly, and pipes the result into 'less' [1]. The most powerful
character encoding detection tool in my arsenal is 'less'.

Pretending that text is Latin-1 is actually a pretty good start. If I
didn't have chardet, I'd be mainly using this:

https://github.com/Rosuav/shed/blob/master/charconv.py

With no args, this will take the beginning of the file (it tries to
get one paragraph of up to 1KB) and decode it using all the ISO-8859-*
encodings, displaying the results for human analysis. That's
surprisingly effective for a manual job. A large number of European
languages use a lot of ASCII letters and then each have their own
distinct non-ASCII characters in between; the only truly confusable
encodings are the ones that are entirely non-ASCII (Cyrillic, Arabic,
Greek, Hebrew - ISO-8859-5 through 8), and mis-decoding one as another
usually results in complete nonsense (words with impossible
vowel/consonant combinations, for instance). It does take *linguistic*
analysis (as opposed to purely mathematical/charcode), but it isn't
too hard.

ChrisA

[1] ... and since Unix pipes carry bytes, not text, this involves
encoding it as UTF-8. But that's an implementation detail between
cdless and less.

[toc] | [prev] | [next] | [standalone]

#104417

From	Jon Ribbens <jon+usenet@unequivocal.co.uk>
Date	2016-03-09 14:56 +0000
Message-ID	<slrnne0eik.19u.jon+usenet@wintry.unequivocal.co.uk>
In reply to	#104413

On 2016-03-09, BartC <bc@freeuk.com> wrote:
> (Isn't it better that it's automatic? Someone sends you a text file that 
> you want to open within a Python program. Are you supposed to analyze it 
> first, or expect the sender to tell you what it is (they probably won't 
> know) then need to hack the program to read it properly?)

Yes, that is exactly what you're supposed to do. But if you really
want to do it the wrong way then look here:
https://github.com/chardet/chardet

[toc] | [prev] | [next] | [standalone]

#104422

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-10 02:28 +1100
Message-ID	<56e04102$0$22141$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104413

On Thu, 10 Mar 2016 01:39 am, BartC wrote:

> This is the code:
> 
> f=open("input","r")
> t=f.read(1000)
> f.close()

If you don't give read an argument, it will try to read the entire file:

t = f.read()

> print ("T",type(t),len(t))
> print (t)
> for i in t:
>     print (ord(i))
> 
> This doesn't specify any specific code encoding; I don't know how, and
> Steven didn't mention anything other than a text file. 

I did warn you that, and I quote, "There's more, but that's the basics". You
could always read the Fine Manual, or even the interactive help (always a
boon for the busy programmer):

help(open) starts with:

open(...)
    open(file, mode='r', buffering=-1, encoding=None,
         errors=None, newline=None, closefd=True, opener=None) 
    -> file object

    Open file and return a stream.  Raise IOError upon failure.

To specify an encoding, pass the name of the encoding as argument:

    open(filename, "r", encoding="utf-8-sig")

for UTF-8 files as created by Notepad, and 

    open(filename, "r", encoding="utf-8")

for UTF-8 files without the leading 3-byte signature.

> The input data is 
> represented by this dump, and this is also what binary mode gives:
> 
> 0000: ef bb bf e2 82 ac e2 82 ac e2 82 ac    ............

That matches the bytes I suggested in a previous post:

b'\xef\xbb\xbf\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'

but not the values you quoted, specifically the triples of:

226, 8218, 172 (decimal) or in hex: e2 201a ac

Obviously hex 201a is too big to fit in a byte. I'm not sure how you could
have got that. Human error perhaps?

>> Unicode handling is easy as long as you (a) understand the fundamental
>> difference between text and bytes, and (b) declare your encodings.
>> Python isn't magical. It can't know the encoding without being told.
> 
> Hence the BOM bytes.

Alas, if only it were that simple. But encoding is *metadata*, not data, and
cannot reliably be read from the file itself. It may be a useful heuristic,
which is *mostly* reliable, but it cannot be considered foolproof.

How do you distinguish between a UTF-8 signature and a Latin-1 file that
happens to start with these three characters "ï»¿"? Or a MacRoman file that
happens to start with the three characters "Ôªø"? To mention just a few.

The problem is, any stream of bytes can only be correctly recognised as text
if you know what encoding the bytes represent:

py> dump = b'\xef\xbb\xbf\x2d\x2d\x2d'
py> dump.decode('utf-8-sig')
'---'
py> dump.decode('latin-1')
'ï»¿---'
py> dump.decode('MacRoman')
'Ôªø---'
py> dump.decode('cp1251')
'п»ї---'

> (Isn't it better that it's automatic? Someone sends you a text file that
> you want to open within a Python program. Are you supposed to analyze it
> first, or expect the sender to tell you what it is (they probably won't
> know) then need to hack the program to read it properly?)

You cannot know for sure what encoding a text file uses, unless it has been
recorded somewhere outside of the text file and transmitted it "out of
band". That is, you ask the sender. And you are right, they probably won't
know. Then you try to guess, and if you guess wrong, the text you read will
contain moji-bake:

https://en.wikipedia.org/wiki/Mojibake

See also:

https://en.wikipedia.org/wiki/Charset_detection

https://en.wikipedia.org/wiki/Bush_hid_the_facts

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104416

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-10 01:57 +1100
Message-ID	<56e039ef$0$22140$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104411

On Thu, 10 Mar 2016 01:11 am, Chris Angelico wrote:

> The first three bytes are the "UTF-8 BOM", which suggests you may have
> created this in a broken editor like Notepad.

Notepad may be horribly crippled, but I'm not entirely sure "broken" is the
right word for it. Does it do anything *wrong*? I don't think so.

UTF-8 pseudo-BOM (more of a signature, less of a Byte Order Mark) is
officially unofficially supported by the Unicode consortium, if you know
what I mean. They specifically point out it is not a BOM, and that you can
use it if you wish, but they'd rather you didn't.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104418

From	Chris Angelico <rosuav@gmail.com>
Date	2016-03-10 02:04 +1100
Message-ID	<mailman.80.1457535854.15725.python-list@python.org>
In reply to	#104416

On Thu, Mar 10, 2016 at 1:57 AM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 10 Mar 2016 01:11 am, Chris Angelico wrote:
>
>> The first three bytes are the "UTF-8 BOM", which suggests you may have
>> created this in a broken editor like Notepad.
>
> Notepad may be horribly crippled, but I'm not entirely sure "broken" is the
> right word for it. Does it do anything *wrong*? I don't think so.
>

Well, okay, this particular behaviour isn't technically "broken". But
Notepad is broken in enough other ways that it's best to avoid it.
(For example, its text *de*coding is famously buggy.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#104433

From	BartC <bc@freeuk.com>
Date	2016-03-09 16:53 +0000
Message-ID	<nbpk81$6bs$1@dont-email.me>
In reply to	#104416

On 09/03/2016 14:57, Steven D'Aprano wrote:
> On Thu, 10 Mar 2016 01:11 am, Chris Angelico wrote:
>
>> The first three bytes are the "UTF-8 BOM", which suggests you may have
>> created this in a broken editor like Notepad.
>
> Notepad may be horribly crippled, but I'm not entirely sure "broken" is the
> right word for it. Does it do anything *wrong*? I don't think so.

(Well, it's quite difficult to get it to save to the right filename (it 
insists on sticking ".txt" on the end even when the file name already 
has an extension).

And if you're not looking, it will 'Save As' in some arbitrary 
directory, because it remembers it from the last time you used it a 
month ago for a different project and in a different location. Very 
puzzling when you try and load it again and it's nowhere to be seen!

It doesn't occur it to use to use the same directory it's just loaded a 
file from as a default.

But it does have a reasonable choice of output encodings.)

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#104415

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-10 01:54 +1100
Message-ID	<56e0393e$0$22140$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104410

On Thu, 10 Mar 2016 01:03 am, BartC wrote:

> On 09/03/2016 02:18, Steven D'Aprano wrote:
>> On Wed, 9 Mar 2016 12:28 pm, BartC wrote:
>>
>>> (Which wasn't as painful as I'd expected. However the next project I
>>> have in mind is 20K lines rather than 0.7K. For that I'm looking at some
>>> mechanical translation I think. And probably some library to wrap around
>>> Python's i/o.)
>>
>> You almost certainly don't need another wrapper around Python's I/O,
>> making it slower still. You need to understand what Python's I/O is
>> doing.
> 
> Well, the original project will be using its file i/o library. So it'll
> use the same interface that will be reimplemented on top of Python i/o.

Just don't complain that it's slow :-)

> And input operations mainly consist of grabbing an entire file at once.

with open(pathname) as f:
    data = f.read()

> Output is a little more mixed.

It often is.

> I've just tried a UTF-8 file and getting some odd results. With a file
> containing [three euro symbols]:
> 
> €€€
> 
> (including a 3-byte utf-8 marker at the start), and opened in text mode,
> Python 3 gives me this series of bytes (ie. the ord() of each character):
> 
> 239
> 187
> 191
> 226
> 8218
> 172
> 226
> 8218
> 172
> 226
> 8218
> 172

Er, do you think that 8218 is a *byte*? (Hint: 1 byte = 8 bits, at least on
any platform you are likely to be running.)

Bart, you have a bad habit of giving us the output of your code, with an
implied "explain this", but without showing us the code you used to
generate the output. Without seeing the code you used, I have *no idea* how
you could get that result. If you read the file in binary, you should get
this:

b'\xef\xbb\xbf\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'

Or in decimal:

239, 187, 191, 226, 130, 172, 226, 130, 172, 226, 130, 172

How you are getting 8218 instead of 130, I have no idea!

If you read the file as text, but using the wrong encoding, say Latin-1, you
would get this:

'ï»¿â\x82¬â\x82¬â\x82¬'

or in decimal:

239, 187, 191, 226, 130, 172, 226, 130, 172, 226, 130, 172

Without seeing your code, I cannot possibly diagnose what you are doing.

> And prints the resulting string as: ï»¿â‚¬â‚¬â‚¬. Although this latter
> might depend on my console's code page setting. 

That is very likely to be the reason for printing strange things. Life is
much easier on Linux and OS-X, where the console works with UTF-8 by
default.

> Changing it to UTF-8 
> however (CHCP 65001 in Windows) gives me this error when I run the
> program again:
> 
> ----------
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> LookupError: unknown encoding: cp65001
> 
> This application has requested the Runtime to terminate it in an unusual
> way.
> Please contact the application's support team for more information.
> ----------

I'm afraid I don't know how to deal with that. It's a Windows-specific
issue.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104419

From	Jon Ribbens <jon+usenet@unequivocal.co.uk>
Date	2016-03-09 15:06 +0000
Message-ID	<slrnne0f5l.19u.jon+usenet@wintry.unequivocal.co.uk>
In reply to	#104415

On 2016-03-09, Steven D'Aprano <steve@pearwood.info> wrote:
> generate the output. Without seeing the code you used, I have *no idea* how
> you could get that result. If you read the file in binary, you should get
> this:
>
> b'\xef\xbb\xbf\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'
>
> Or in decimal:
>
> 239, 187, 191, 226, 130, 172, 226, 130, 172, 226, 130, 172
>
> How you are getting 8218 instead of 130, I have no idea!

Decode as "windows-1252".

>> Changing it to UTF-8 however (CHCP 65001 in Windows) gives me this
>> error when I run the program again:
>> 
>> ----------
>> Fatal Python error: Py_Initialize: can't initialize sys standard streams
>> LookupError: unknown encoding: cp65001

cp65001 was added in Python 3.3, any earlier version (including any
Python 2) will not understand it.

[toc] | [prev] | [next] | [standalone]

#104420

From	Tim Golden <mail@timgolden.me.uk>
Date	2016-03-09 15:15 +0000
Message-ID	<mailman.81.1457536511.15725.python-list@python.org>
In reply to	#104419

On 09/03/2016 15:06, Jon Ribbens wrote:
> On 2016-03-09, Steven D'Aprano <steve@pearwood.info> wrote:
>> generate the output. Without seeing the code you used, I have *no idea* how
>> you could get that result. If you read the file in binary, you should get
>> this:
>>
>> b'\xef\xbb\xbf\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'
>>
>> Or in decimal:
>>
>> 239, 187, 191, 226, 130, 172, 226, 130, 172, 226, 130, 172
>>
>> How you are getting 8218 instead of 130, I have no idea!
> 
> Decode as "windows-1252".
> 
>>> Changing it to UTF-8 however (CHCP 65001 in Windows) gives me this
>>> error when I run the program again:
>>>
>>> ----------
>>> Fatal Python error: Py_Initialize: can't initialize sys standard streams
>>> LookupError: unknown encoding: cp65001
> 
> cp65001 was added in Python 3.3, any earlier version (including any
> Python 2) will not understand it.

It's also somewhat flaky in other ways

  https://mail.python.org/pipermail/python-list/2015-December/700351.html

TJG

[toc] | [prev] | [next] | [standalone]

#104425

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-10 02:38 +1100
Message-ID	<56e04382$0$1603$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104419

On Thu, 10 Mar 2016 02:06 am, Jon Ribbens wrote:

> On 2016-03-09, Steven D'Aprano <steve@pearwood.info> wrote:
>> generate the output. Without seeing the code you used, I have *no idea*
>> how you could get that result. If you read the file in binary, you should
>> get this:
>>
>> b'\xef\xbb\xbf\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'
>>
>> Or in decimal:
>>
>> 239, 187, 191, 226, 130, 172, 226, 130, 172, 226, 130, 172
>>
>> How you are getting 8218 instead of 130, I have no idea!
> 
> Decode as "windows-1252".

Nicely done!


py> b'\xef\xbb\xbf\xe2\x82\xac\xe2\x82\xac\xe2\x82\xac'.decode('cp1252')
'ï»¿â‚¬â‚¬â‚¬'
py> [ord(c) for c in _]
[239, 187, 191, 226, 8218, 172, 226, 8218, 172, 226, 8218, 172]



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104426

From	Terry Reedy <tjreedy@udel.edu>
Date	2016-03-09 10:42 -0500
Message-ID	<mailman.84.1457538207.15725.python-list@python.org>
In reply to	#104410

On 3/9/2016 9:03 AM, BartC wrote:

> I've just tried a UTF-8 file and getting some odd results. With a file
> containing [three euro symbols]:
>
> €€€
>
> (including a 3-byte utf-8 marker at the start), and opened in text mode,
> Python 3 gives me this series of bytes (ie. the ord() of each character):
>
> 239
> 187
> 191
> 226
> 8218
> 172
> 226
> 8218
> 172
> 226
> 8218
> 172
>
> And prints the resulting string as: ï»¿â‚¬â‚¬â‚¬. Although this latter
> might depend on my console's code page setting.

It definitely does.

> Changing it to UTF-8 however (CHCP 65001 in Windows)

CP65001 is MS's ugly pretense of unicode compatibility.  It has been 
known to be buggy for over a decade, though some people claim to have 
gotten some use of it.

 > gives me this error when I run the  program again:
>
> ----------
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> LookupError: unknown encoding: cp65001
>
> This application has requested the Runtime to terminate it in an unusual
> way.
> Please contact the application's support team for more information.
> ----------

> So I think I'll skip Unicode handling to start off with! (I've already
> had plenty of fun and games with it in the past.)

At least on Windows, use IDLE for the BMP subset of unicode.  tk and 
hence tkinter and IDLE can handle any char in the BMP subset.  I believe 
that which are actually displayed and which are shown as boxes depends 
on the font.  On my US Win10 system:

IDLE with Lucida Console:
 >>> s = '€€€'
 >>> s
'€€€'

In the console interpreter: '???' is printed.


-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#104435

From	wxjmfauth@gmail.com
Date	2016-03-09 09:04 -0800
Message-ID	<e68b94a0-0bf5-45fd-aac5-9b90a160a9ac@googlegroups.com>
In reply to	#104426

Le mercredi 9 mars 2016 16:43:40 UTC+1, Terry Reedy a écrit :
> 
> At least on Windows, use IDLE for the BMP subset of unicode.  tk and 
> hence tkinter and IDLE can handle any char in the BMP subset.  I believe 
> that which are actually displayed and which are shown as boxes depends 
> on the font.  On my US Win10 system:
> 
> IDLE with Lucida Console:
>  >>> s = 'EURO  EURO  EURO'
>  >>> s
> 'EURO  EURO  EURO'
> 
> In the console interpreter: '???' is printed.
> 

To summerize the situation, a Windows user as
the choice between

A non working IDLE [*] (even within the BMP)

and this

????????? ??????????? ??? ??? ?????:

After 20 years of development...
Sad reality, but reality.

[*] All Py2's and all Py3's

jmf

[toc] | [prev] | [next] | [standalone]

#104387

From	Marko Rauhamaa <marko@pacujo.net>
Date	2016-03-09 08:08 +0200
Message-ID	<87r3fkyyhy.fsf@elektro.pacujo.net>
In reply to	#104364

Steven D'Aprano <steve@pearwood.info>:

> Possibly a really amateurish, lazy job, but still done.
>
> [...] Brilliant! I love helpful tools like that!
>
> How many years did you say you have been programming?

Let's keep it civil, please.


Marko

[toc] | [prev] | [next] | [standalone]

#104400

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-09 22:52 +1100
Message-ID	<56e00e9c$0$1600$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104387

On Wed, 9 Mar 2016 05:08 pm, Marko Rauhamaa wrote:

> Steven D'Aprano <steve@pearwood.info>:
> 
>> Possibly a really amateurish, lazy job, but still done.
>>
>> [...] Brilliant! I love helpful tools like that!
>>
>> How many years did you say you have been programming?
> 
> Let's keep it civil, please.

Oh, you're no fun!

But seriously, I thought I was. BartC is a big boy and I'm sure he can take
some criticism of his code without his feels being hurt and needing a hug.

Even if I used ... sarcasm.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104406

From	Marko Rauhamaa <marko@pacujo.net>
Date	2016-03-09 14:53 +0200
Message-ID	<87vb4vhkx2.fsf@elektro.pacujo.net>
In reply to	#104400

Steven D'Aprano <steve@pearwood.info>:

> On Wed, 9 Mar 2016 05:08 pm, Marko Rauhamaa wrote:
>
>> Steven D'Aprano <steve@pearwood.info>:
>> 
>>> Possibly a really amateurish, lazy job, but still done.
>>>
>>> [...] Brilliant! I love helpful tools like that!
>>>
>>> How many years did you say you have been programming?
>> 
>> Let's keep it civil, please.
>
> Oh, you're no fun!
>
> But seriously, I thought I was. BartC is a big boy and I'm sure he can
> take some criticism of his code without his feels being hurt and
> needing a hug.

It came off as standard schoolyard bullying.

   If your child often tries to explain away misdeeds with 'We were just
   having fun ...', [...] you may just have a bully in the making living
   under your roof.

   <URL: http://www.kidspot.com.au/schoolzone/Bullying-My-child-is-a-bu
   lly+4615+395+article.htm>


> Even if I used ... sarcasm.

   The dictionary defines Sarcasm as: "The use of irony to mock or
   convey contempt"

   <URL: http://www.scienceofpeople.com/2011/12/sarcasm-why-it-hurts-us/>


Marko

[toc] | [prev] | [next] | [standalone]

#104434

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-03-10 03:53 +1100
Message-ID	<56e0551b$0$1591$c3e8da3$5496439d@news.astraweb.com>
In reply to	#104406

On Wed, 9 Mar 2016 11:53 pm, Marko Rauhamaa wrote:

> It came off as standard schoolyard bullying.

You must have lived a privileged life if you think a little sarcasm is what
school bullies do.

If we're going to exchange pop psychology tales, I don't know what it's like
for young girls, but in my experience when it comes to boys sarcasm is far
more likely to be used by the *victims* of bullying as a (pathetically
ineffective) defence mechanism.

Of course, things may be different on the Internet. Bullies cannot, as a
rule, punch you in the face and stomp on your lunch like they can in the
schoolyard, so perhaps they have learned to co-opt the tactics of their
erstwhile victims.

>    If your child often tries to explain away misdeeds with 'We were just
>    having fun ...', [...] you may just have a bully in the making living
>    under your roof.

They might even be a psychopath. They could even grow up to be a serial
killer! Why, they might even be the next Hitler!!! Better drown them in the
bathtub now, before they kill 6 million people and start a world war!!!1!

Or... maybe they actually were just having fun, just like they said, and the
parents are over-protective, over-suspicious, over-reacting killjoys who
misinterpret the rough and tumble of play as abuse.

>> Even if I used ... sarcasm.
> 
>    The dictionary defines Sarcasm as: "The use of irony to mock or
>    convey contempt"

Well duh. I was mocking and conveying my contempt of Bart's error handling
code ("sys.exit()"). Wasn't that obvious? Perhaps I could add some markup
next time:

<sarcasm meaning="I do not actually love this code I think it sucks">
"Brilliant! I love code like this!"
</sarcasm>

Or do you think it would sting less if I were merely brutally honest?

"Your code is bad and you should feel bad for having written it."

Or perhaps "If you were a student of mine, I would fail you for that code."

Or perhaps you think I should have given him an A+ and an elephant stamp for
effort? "We're all winners here, yay!"

I think Bart made the right response under the circumstances: he pointed out
that his code was obviously quick and dirty code for testing purposes and
not intended as production code, and he mocked me for not having noticed
this fact, as he was completely right to do. But I was mislead by his
earlier comments, and thought that it was production code (even if his
user-base consisted of just one person, himself).

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#104370

From	Michael Torrie <torriem@gmail.com>
Date	2016-03-08 17:42 -0700
Message-ID	<mailman.61.1457484162.15725.python-list@python.org>
In reply to	#104243

On 03/07/2016 11:34 AM, BartC wrote:
> (I'm quite pleased with my version: smaller, faster, works on all the 
> Pythons, supports all 3 colour formats and no decoding bugs that I'm 
> aware of, and it's the first Python program I've written that does 
> something useful.)

I think you should be commended for writing it.  It may not be perfect,
nor is it necessarily "Pythonic" but you had fun doing it.  I hope you
aren't discouraged by the criticism of your code.

I hacked on it a bit, and it only took about a minute to change from
using arrays to immutable byte strings.  The only problem with that was
what I noted in my other reply.  That is that indexing a Python 2 string
gives a different result than indexing a Python 3 byte string.  I'm not
sure what kind of logic should be used to efficiently judge between
Python 2 and 3, and modify the answer accordingly.  In Python 2 I can
simply do "ord(fs.data[fs.pos])".  In Python 3 it's just
"fs.data[fs.pos]".  I'm not sure what kind of speed up using a normal
string provides vs the array.  But it has to be more a bit faster.

I suppose if others want to direct you in a more Pythonic path, they
could take your code and turn it into something that is more idiomatic
(and perhaps faster).  I may take a crack at it.

If it were me I think I'd make your jpeg decoder into it's own class,
and have it operate on any iterable input, be it an open file or
something else.  That may be actually slower though, but more pythonic.

[toc] | [prev] | [next] | [standalone]

Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →

csiph-web

Pyhon 2.x or 3.x, which is faster?

Contents

#104413

#104414

#104424

#104427

#104417

#104422

#104416

#104418

#104433

#104415

#104419

#104420

#104425

#104426

#104435

#104387

#104400

#104406

#104434

#104370