Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #53323 > unrolled thread

UnicodeDecodeError issue

Started byFerrous Cranus <nikos@superhost.gr>
First post2013-08-31 09:41 +0300
Last post2013-09-02 20:49 -0400
Articles 10 on this page of 50 — 11 participants

Back to article view | Back to comp.lang.python


Contents

  UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 09:41 +0300
    Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-08-31 16:53 +1000
      Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:02 +0300
        Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:18 +0300
    Re: UnicodeDecodeError issue Peter Otten <__peter__@web.de> - 2013-08-31 09:25 +0200
      Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:58 +0300
        Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 11:31 +0300
          Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 11:28 +0000
            Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 15:58 +0300
              Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 16:07 +0300
              Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 15:44 +0000
    Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-08-31 23:50 -0700
      Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:12 +1000
        Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 10:23 +0300
          Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:28 +1000
          Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 10:35 +0000
            Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 16:59 +0300
              Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:40 +0000
          Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 20:51 +1000
      Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-01 08:35 +0000
        Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:08 +0300
          Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:25 +0300
          Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:36 +0000
            Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 19:10 +0300
              Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 01:23 +0300
                Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 23:14 +0000
                  Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 07:16 +0300
                    Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 11:38 +0000
                      Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 14:49 +0300
                        Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:21 +0000
                          Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 18:05 +0300
                            Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 18:28 +0000
                              Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-09-04 01:35 -0700
                                Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 11:26 +0000
                                  Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 14:38 +0300
                                    Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 12:38 +0000
                                      Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 17:29 +0300
                                        Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-05 00:17 +0000
                                          Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 03:07 +0000
                                            Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-05 13:59 +1000
                                              Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 05:28 +0000
                    Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 12:56 +0100
                    Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:24 +0000
                    Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 15:44 +0100
                      Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-03 08:23 -0700
                        Re: UnicodeDecodeError issue Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-09-04 10:01 +0200
                          Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-04 07:08 -0700
                    Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-03 08:45 +1000
                      Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-03 14:56 +0000
                    Re: UnicodeDecodeError issue Joel Goldstick <joel.goldstick@gmail.com> - 2013-09-02 20:49 -0400

Page 3 of 3 — ← Prev page 1 2 [3]


#53673

FromSteven D'Aprano <steve@pearwood.info>
Date2013-09-05 05:28 +0000
Message-ID<52281667$0$2743$c3e8da3$76491128@news.astraweb.com>
In reply to#53668
On Thu, 05 Sep 2013 13:59:17 +1000, Chris Angelico wrote:

> On Thu, Sep 5, 2013 at 1:07 PM, Steven D'Aprano <steve@pearwood.info>
> wrote:
>> Technically, it's not ASCII, since ASCII only knows about bytes \x00
>> through \x7F (decimal 0 through 127). That's why it isn't correct to
>> describe Python bytes strings as "ASCII strings". They're byte strings
>> that happen to be displayed as ASCII-plus-other-stuff.
> 
> The line of code is itself entirely ASCII. 
......^^^^^^^^^^^^^^^^^^^^^^


Ah, so it is. Sorry, I got confused about what was being spoken about. 
Apologies to Dave for casting aspersions on his knowledge :-)




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#53476

FromMRAB <python@mrabarnett.plus.com>
Date2013-09-02 12:56 +0100
Message-ID<mailman.485.1378122976.19984.python-list@python.org>
In reply to#53453
On 02/09/2013 12:38, Dave Angel wrote:
> On 2/9/2013 00:16, Ferrous Cranus wrote:
>>>
>>> Have you tried to decode those bytes in various encodings other than
>>> utf-8 ?
>>
>> No, because i wasn't aware of what string/variable they were pertaining at.
>>
>    http://pypi.python.org/pypi/chardet
>
> is a package which tries to 'guess' an encoding for a string of bytes.
> I happen to have the 2.7 version installed, but not the 3.x version, so
> the following is in 2.7. Same thing should work in 3.3....
>
>>>> chardet.detect(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2')
> {'confidence': 0.9638983132261467, 'encoding': 'windows-1253'}
>>>> print b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2'.decode('windows-1253')
> ¶γνωστοόνομα συστήματος
>
> I don't have a clue what it might be;  it's not English, and I don't
> know whatever language it may be in.
>
You don't recognise Greek?

> Does that string make any sense to you?  You may want to try it on your
> own machine, since the email may obscure the encoding.  Or you might
> want to do the decode using whatever the default encoding is for that
> server.
>
> The Linux 'file' utility thinks this string is in ISO-8859, so you might
> want to try a decode('ISO-8859-1') as well.  (and maybe  ISO-8859-2, -3,
> -4, and -5)
>
It's ISO-8859-7 (Greek).

[toc] | [prev] | [next] | [standalone]


#53479

FromDave Angel <davea@davea.name>
Date2013-09-02 12:24 +0000
Message-ID<mailman.488.1378124707.19984.python-list@python.org>
In reply to#53453
On 2/9/2013 07:56, MRAB wrote:

> On 02/09/2013 12:38, Dave Angel wrote:

   <snip>

>> ¶γνωστοόνομα συστήματος
>>
>> I don't have a clue what it might be;  it's not English, and I don't
>> know whatever language it may be in.
>>
> You don't recognise Greek?

I recognize most of those as Greek characters, but as I said, I don't
know Greek.  And because I can't recognize words, I can't assume it
might not be some other language that uses the same glyphs.

>
>> Does that string make any sense to you?  You may want to try it on your
>> own machine, since the email may obscure the encoding.  Or you might
>> want to do the decode using whatever the default encoding is for that
>> server.
>>
>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
>> want to try a decode('ISO-8859-1') as well.  (and maybe  ISO-8859-2, -3,
>> -4, and -5)
>>
> It's ISO-8859-7 (Greek).

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#53488

FromMRAB <python@mrabarnett.plus.com>
Date2013-09-02 15:44 +0100
Message-ID<mailman.492.1378133055.19984.python-list@python.org>
In reply to#53453
On 02/09/2013 13:24, Dave Angel wrote:
> On 2/9/2013 07:56, MRAB wrote:
>
>> On 02/09/2013 12:38, Dave Angel wrote:
>
>     <snip>
>
>>> ¶γνωστοόνομα συστήματος
>>>
>>> I don't have a clue what it might be;  it's not English, and I don't
>>> know whatever language it may be in.
>>>
>> You don't recognise Greek?
>
> I recognize most of those as Greek characters, but as I said, I don't
> know Greek.  And because I can't recognize words, I can't assume it
> might not be some other language that uses the same glyphs.
>
I don't know Greek either, and I don't think there's any other language
that uses the Greek alphabet.

>>
>>> Does that string make any sense to you?  You may want to try it on your
>>> own machine, since the email may obscure the encoding.  Or you might
>>> want to do the decode using whatever the default encoding is for that
>>> server.
>>>
>>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
>>> want to try a decode('ISO-8859-1') as well.  (and maybe  ISO-8859-2, -3,
>>> -4, and -5)
>>>
>> It's ISO-8859-7 (Greek).
>

[toc] | [prev] | [next] | [standalone]


#53571

Fromwxjmfauth@gmail.com
Date2013-09-03 08:23 -0700
Message-ID<3510a783-79ac-48b0-90a9-8262d28eeba0@googlegroups.com>
In reply to#53488
Le lundi 2 septembre 2013 16:44:34 UTC+2, MRAB a écrit :
> On 02/09/2013 13:24, Dave Angel wrote:
> 
> > On 2/9/2013 07:56, MRAB wrote:
> 
> >
> 
> >> On 02/09/2013 12:38, Dave Angel wrote:
> 
> >
> 
> >     <snip>
> 
> >
> 
> >>> ¶γνωστοόνομα συστήματος
> 
> >>>
> 
> >>> I don't have a clue what it might be;  it's not English, and I don't
> 
> >>> know whatever language it may be in.
> 
> >>>
> 
> >> You don't recognise Greek?
> 
> >
> 
> > I recognize most of those as Greek characters, but as I said, I don't
> 
> > know Greek.  And because I can't recognize words, I can't assume it
> 
> > might not be some other language that uses the same glyphs.
> 
> >
> 
> I don't know Greek either, and I don't think there's any other language
> 
> that uses the Greek alphabet.
> 
> 
> 
> >>
> 
> >>> Does that string make any sense to you?  You may want to try it on your
> 
> >>> own machine, since the email may obscure the encoding.  Or you might
> 
> >>> want to do the decode using whatever the default encoding is for that
> 
> >>> server.
> 
> >>>
> 
> >>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
> 
> >>> want to try a decode('ISO-8859-1') as well.  (and maybe  ISO-8859-2, -3,
> 
> >>> -4, and -5)
> 
> >>>
> 
> >> It's ISO-8859-7 (Greek).
> 
> >

--------

The Latin alphabet uses Greek lettering.

The Cyrillic alphabet uses Greek lettering.

Greek: One should not confuse modern Greek
with ancient Greek, polytonic Greek full
of diacritics.

Plenty of European languages (~15) based on the Latin
alphabet uses some ancient Greek diacritics.

Now unicode.

Everything is working very smoothly with the endorsed coding
schemes of Unicode.org.

Expectedly it fails (behaves badly) with Python and its 
Flexible Sting Representation, mainly because it relies on
the latin-1 (iso-8859-1) set.

To take the problem the other way, one can take these
linguistic ascpects to illustrate the wrong design of
the FSR.

jmf

[toc] | [prev] | [next] | [standalone]


#53607

FromAntoon Pardon <antoon.pardon@rece.vub.ac.be>
Date2013-09-04 10:01 +0200
Message-ID<mailman.34.1378281717.5461.python-list@python.org>
In reply to#53571
Op 03-09-13 17:23, wxjmfauth@gmail.com schreef:

> --------
> 
> The Latin alphabet uses Greek lettering.
> 
> The Cyrillic alphabet uses Greek lettering.
> 
> Greek: One should not confuse modern Greek
> with ancient Greek, polytonic Greek full
> of diacritics.
> 
> Plenty of European languages (~15) based on the Latin
> alphabet uses some ancient Greek diacritics.
> 
> Now unicode.
> 
> Everything is working very smoothly with the endorsed coding
> schemes of Unicode.org.
> 
> Expectedly it fails (behaves badly) with Python and its 
> Flexible Sting Representation, mainly because it relies on
> the latin-1 (iso-8859-1) set.

You really seem obsessed. There is no reason at all to think that is
problem is related to the FSR. You are only bringing this up, because
you are looking for opportunities to complain about the FSR.

> To take the problem the other way, one can take these
> linguistic ascpects to illustrate the wrong design of
> the FSR.

No you can't, you are just assuming so because you feel it would
confirm your bias against the FSR.

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]


#53626

Fromwxjmfauth@gmail.com
Date2013-09-04 07:08 -0700
Message-ID<0bce060d-ec91-4739-ac21-9a60795c9611@googlegroups.com>
In reply to#53607
Le mercredi 4 septembre 2013 10:01:50 UTC+2, Antoon Pardon a écrit :
> Op 03-09-13 17:23, wxjmfauth@gmail.com schreef:
> 
> 
> 
> > --------
> 
> > 
> 
> > The Latin alphabet uses Greek lettering.
> 
> > 
> 
> > The Cyrillic alphabet uses Greek lettering.
> 
> > 
> 
> > Greek: One should not confuse modern Greek
> 
> > with ancient Greek, polytonic Greek full
> 
> > of diacritics.
> 
> > 
> 
> > Plenty of European languages (~15) based on the Latin
> 
> > alphabet uses some ancient Greek diacritics.
> 
> > 
> 
> > Now unicode.
> 
> > 
> 
> > Everything is working very smoothly with the endorsed coding
> 
> > schemes of Unicode.org.
> 
> > 
> 
> > Expectedly it fails (behaves badly) with Python and its 
> 
> > Flexible Sting Representation, mainly because it relies on
> 
> > the latin-1 (iso-8859-1) set.
> 
> 
> 
> You really seem obsessed. There is no reason at all to think that is
> 
> problem is related to the FSR. You are only bringing this up, because
> 
> you are looking for opportunities to complain about the FSR.
> 
> 
> 
> > To take the problem the other way, one can take these
> 
> > linguistic ascpects to illustrate the wrong design of
> 
> > the FSR.
> 
> 
> 
> No you can't, you are just assuming so because you feel it would
> 
> confirm your bias against the FSR.
> 
> 
> 
> -- 
> 
> Antoon Pardon

--------


jmf

[toc] | [prev] | [next] | [standalone]


#53537

FromChris Angelico <rosuav@gmail.com>
Date2013-09-03 08:45 +1000
Message-ID<mailman.522.1378161908.19984.python-list@python.org>
In reply to#53453
On Tue, Sep 3, 2013 at 12:44 AM, MRAB <python@mrabarnett.plus.com> wrote:
> I don't know Greek either, and I don't think there's any other language
> that uses the Greek alphabet.

Assuming you don't count mathematics as a language.

ChrisA

[toc] | [prev] | [next] | [standalone]


#53569

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-09-03 14:56 +0000
Message-ID<5225f8b4$0$6599$c3e8da3$5496439d@news.astraweb.com>
In reply to#53537
On Tue, 03 Sep 2013 08:45:00 +1000, Chris Angelico wrote:

> On Tue, Sep 3, 2013 at 12:44 AM, MRAB <python@mrabarnett.plus.com>
> wrote:
>> I don't know Greek either, and I don't think there's any other language
>> that uses the Greek alphabet.
> 
> Assuming you don't count mathematics as a language.


There are a few languages which use the Greek alphabet, with variations. 
Coptic is the main one, although Greek and Coptic letters have their own 
Unicode symbols, in order to support works which need to distinguish them.

Armenian and, of course, Cyrillic, are derived from the Greek alphabet; 
actually so is the Latin alphabet.

Other languages that used, or use, the Greek alphabet include quite a few 
ancient languages, including Gaulish and Bactrian. Old Nubian in the 
Middle Ages used the Greek alphabet plus a few additional letters. A 
number of Slavic languages used the Greek alphabet, although now they use 
Cyrillic. Some Albanian dialects still use the Greek alphabet, as do a 
couple of Turkic languages from the Balkans. See the Wikipedia entry on 
the Greek alphabet for more.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#53544

FromJoel Goldstick <joel.goldstick@gmail.com>
Date2013-09-02 20:49 -0400
Message-ID<mailman.527.1378169400.19984.python-list@python.org>
In reply to#53453
On Mon, Sep 2, 2013 at 6:45 PM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Sep 3, 2013 at 12:44 AM, MRAB <python@mrabarnett.plus.com> wrote:
>> I don't know Greek either, and I don't think there's any other language
>> that uses the Greek alphabet.
>
> Assuming you don't count mathematics as a language.

You need to be rigorous to make mathematical assumptions.  One bad
assumtion and proof == poof!

>
> ChrisA
> --
> http://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [standalone]


Page 3 of 3 — ← Prev page 1 2 [3]

Back to top | Article view | comp.lang.python


csiph-web