Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #53323 > unrolled thread
| Started by | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| First post | 2013-08-31 09:41 +0300 |
| Last post | 2013-09-02 20:49 -0400 |
| Articles | 10 on this page of 50 — 11 participants |
Back to article view | Back to comp.lang.python
UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 09:41 +0300
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-08-31 16:53 +1000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:02 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:18 +0300
Re: UnicodeDecodeError issue Peter Otten <__peter__@web.de> - 2013-08-31 09:25 +0200
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:58 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 11:31 +0300
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 11:28 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 15:58 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 16:07 +0300
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 15:44 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-08-31 23:50 -0700
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:12 +1000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 10:23 +0300
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:28 +1000
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 10:35 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 16:59 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:40 +0000
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 20:51 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-01 08:35 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:08 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:25 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:36 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 19:10 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 01:23 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 23:14 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 07:16 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 11:38 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 14:49 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:21 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 18:05 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 18:28 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-09-04 01:35 -0700
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 11:26 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 14:38 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 12:38 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 17:29 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-05 00:17 +0000
Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 03:07 +0000
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-05 13:59 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 05:28 +0000
Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 12:56 +0100
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:24 +0000
Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 15:44 +0100
Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-03 08:23 -0700
Re: UnicodeDecodeError issue Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-09-04 10:01 +0200
Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-04 07:08 -0700
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-03 08:45 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-03 14:56 +0000
Re: UnicodeDecodeError issue Joel Goldstick <joel.goldstick@gmail.com> - 2013-09-02 20:49 -0400
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2013-09-05 05:28 +0000 |
| Message-ID | <52281667$0$2743$c3e8da3$76491128@news.astraweb.com> |
| In reply to | #53668 |
On Thu, 05 Sep 2013 13:59:17 +1000, Chris Angelico wrote: > On Thu, Sep 5, 2013 at 1:07 PM, Steven D'Aprano <steve@pearwood.info> > wrote: >> Technically, it's not ASCII, since ASCII only knows about bytes \x00 >> through \x7F (decimal 0 through 127). That's why it isn't correct to >> describe Python bytes strings as "ASCII strings". They're byte strings >> that happen to be displayed as ASCII-plus-other-stuff. > > The line of code is itself entirely ASCII. ......^^^^^^^^^^^^^^^^^^^^^^ Ah, so it is. Sorry, I got confused about what was being spoken about. Apologies to Dave for casting aspersions on his knowledge :-) -- Steven
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-09-02 12:56 +0100 |
| Message-ID | <mailman.485.1378122976.19984.python-list@python.org> |
| In reply to | #53453 |
On 02/09/2013 12:38, Dave Angel wrote:
> On 2/9/2013 00:16, Ferrous Cranus wrote:
>>>
>>> Have you tried to decode those bytes in various encodings other than
>>> utf-8 ?
>>
>> No, because i wasn't aware of what string/variable they were pertaining at.
>>
> http://pypi.python.org/pypi/chardet
>
> is a package which tries to 'guess' an encoding for a string of bytes.
> I happen to have the 2.7 version installed, but not the 3.x version, so
> the following is in 2.7. Same thing should work in 3.3....
>
>>>> chardet.detect(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2')
> {'confidence': 0.9638983132261467, 'encoding': 'windows-1253'}
>>>> print b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2'.decode('windows-1253')
> ¶γνωστοόνομα συστήματος
>
> I don't have a clue what it might be; it's not English, and I don't
> know whatever language it may be in.
>
You don't recognise Greek?
> Does that string make any sense to you? You may want to try it on your
> own machine, since the email may obscure the encoding. Or you might
> want to do the decode using whatever the default encoding is for that
> server.
>
> The Linux 'file' utility thinks this string is in ISO-8859, so you might
> want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
> -4, and -5)
>
It's ISO-8859-7 (Greek).
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-02 12:24 +0000 |
| Message-ID | <mailman.488.1378124707.19984.python-list@python.org> |
| In reply to | #53453 |
On 2/9/2013 07:56, MRAB wrote:
> On 02/09/2013 12:38, Dave Angel wrote:
<snip>
>> ¶γνωστοόνομα συστήματος
>>
>> I don't have a clue what it might be; it's not English, and I don't
>> know whatever language it may be in.
>>
> You don't recognise Greek?
I recognize most of those as Greek characters, but as I said, I don't
know Greek. And because I can't recognize words, I can't assume it
might not be some other language that uses the same glyphs.
>
>> Does that string make any sense to you? You may want to try it on your
>> own machine, since the email may obscure the encoding. Or you might
>> want to do the decode using whatever the default encoding is for that
>> server.
>>
>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
>> want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
>> -4, and -5)
>>
> It's ISO-8859-7 (Greek).
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-09-02 15:44 +0100 |
| Message-ID | <mailman.492.1378133055.19984.python-list@python.org> |
| In reply to | #53453 |
On 02/09/2013 13:24, Dave Angel wrote:
> On 2/9/2013 07:56, MRAB wrote:
>
>> On 02/09/2013 12:38, Dave Angel wrote:
>
> <snip>
>
>>> ¶γνωστοόνομα συστήματος
>>>
>>> I don't have a clue what it might be; it's not English, and I don't
>>> know whatever language it may be in.
>>>
>> You don't recognise Greek?
>
> I recognize most of those as Greek characters, but as I said, I don't
> know Greek. And because I can't recognize words, I can't assume it
> might not be some other language that uses the same glyphs.
>
I don't know Greek either, and I don't think there's any other language
that uses the Greek alphabet.
>>
>>> Does that string make any sense to you? You may want to try it on your
>>> own machine, since the email may obscure the encoding. Or you might
>>> want to do the decode using whatever the default encoding is for that
>>> server.
>>>
>>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
>>> want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
>>> -4, and -5)
>>>
>> It's ISO-8859-7 (Greek).
>
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-09-03 08:23 -0700 |
| Message-ID | <3510a783-79ac-48b0-90a9-8262d28eeba0@googlegroups.com> |
| In reply to | #53488 |
Le lundi 2 septembre 2013 16:44:34 UTC+2, MRAB a écrit :
> On 02/09/2013 13:24, Dave Angel wrote:
>
> > On 2/9/2013 07:56, MRAB wrote:
>
> >
>
> >> On 02/09/2013 12:38, Dave Angel wrote:
>
> >
>
> > <snip>
>
> >
>
> >>> ¶γνωστοόνομα συστήματος
>
> >>>
>
> >>> I don't have a clue what it might be; it's not English, and I don't
>
> >>> know whatever language it may be in.
>
> >>>
>
> >> You don't recognise Greek?
>
> >
>
> > I recognize most of those as Greek characters, but as I said, I don't
>
> > know Greek. And because I can't recognize words, I can't assume it
>
> > might not be some other language that uses the same glyphs.
>
> >
>
> I don't know Greek either, and I don't think there's any other language
>
> that uses the Greek alphabet.
>
>
>
> >>
>
> >>> Does that string make any sense to you? You may want to try it on your
>
> >>> own machine, since the email may obscure the encoding. Or you might
>
> >>> want to do the decode using whatever the default encoding is for that
>
> >>> server.
>
> >>>
>
> >>> The Linux 'file' utility thinks this string is in ISO-8859, so you might
>
> >>> want to try a decode('ISO-8859-1') as well. (and maybe ISO-8859-2, -3,
>
> >>> -4, and -5)
>
> >>>
>
> >> It's ISO-8859-7 (Greek).
>
> >
--------
The Latin alphabet uses Greek lettering.
The Cyrillic alphabet uses Greek lettering.
Greek: One should not confuse modern Greek
with ancient Greek, polytonic Greek full
of diacritics.
Plenty of European languages (~15) based on the Latin
alphabet uses some ancient Greek diacritics.
Now unicode.
Everything is working very smoothly with the endorsed coding
schemes of Unicode.org.
Expectedly it fails (behaves badly) with Python and its
Flexible Sting Representation, mainly because it relies on
the latin-1 (iso-8859-1) set.
To take the problem the other way, one can take these
linguistic ascpects to illustrate the wrong design of
the FSR.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
|---|---|
| Date | 2013-09-04 10:01 +0200 |
| Message-ID | <mailman.34.1378281717.5461.python-list@python.org> |
| In reply to | #53571 |
Op 03-09-13 17:23, wxjmfauth@gmail.com schreef: > -------- > > The Latin alphabet uses Greek lettering. > > The Cyrillic alphabet uses Greek lettering. > > Greek: One should not confuse modern Greek > with ancient Greek, polytonic Greek full > of diacritics. > > Plenty of European languages (~15) based on the Latin > alphabet uses some ancient Greek diacritics. > > Now unicode. > > Everything is working very smoothly with the endorsed coding > schemes of Unicode.org. > > Expectedly it fails (behaves badly) with Python and its > Flexible Sting Representation, mainly because it relies on > the latin-1 (iso-8859-1) set. You really seem obsessed. There is no reason at all to think that is problem is related to the FSR. You are only bringing this up, because you are looking for opportunities to complain about the FSR. > To take the problem the other way, one can take these > linguistic ascpects to illustrate the wrong design of > the FSR. No you can't, you are just assuming so because you feel it would confirm your bias against the FSR. -- Antoon Pardon
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-09-04 07:08 -0700 |
| Message-ID | <0bce060d-ec91-4739-ac21-9a60795c9611@googlegroups.com> |
| In reply to | #53607 |
Le mercredi 4 septembre 2013 10:01:50 UTC+2, Antoon Pardon a écrit : > Op 03-09-13 17:23, wxjmfauth@gmail.com schreef: > > > > > -------- > > > > > > The Latin alphabet uses Greek lettering. > > > > > > The Cyrillic alphabet uses Greek lettering. > > > > > > Greek: One should not confuse modern Greek > > > with ancient Greek, polytonic Greek full > > > of diacritics. > > > > > > Plenty of European languages (~15) based on the Latin > > > alphabet uses some ancient Greek diacritics. > > > > > > Now unicode. > > > > > > Everything is working very smoothly with the endorsed coding > > > schemes of Unicode.org. > > > > > > Expectedly it fails (behaves badly) with Python and its > > > Flexible Sting Representation, mainly because it relies on > > > the latin-1 (iso-8859-1) set. > > > > You really seem obsessed. There is no reason at all to think that is > > problem is related to the FSR. You are only bringing this up, because > > you are looking for opportunities to complain about the FSR. > > > > > To take the problem the other way, one can take these > > > linguistic ascpects to illustrate the wrong design of > > > the FSR. > > > > No you can't, you are just assuming so because you feel it would > > confirm your bias against the FSR. > > > > -- > > Antoon Pardon -------- jmf
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-09-03 08:45 +1000 |
| Message-ID | <mailman.522.1378161908.19984.python-list@python.org> |
| In reply to | #53453 |
On Tue, Sep 3, 2013 at 12:44 AM, MRAB <python@mrabarnett.plus.com> wrote: > I don't know Greek either, and I don't think there's any other language > that uses the Greek alphabet. Assuming you don't count mathematics as a language. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-09-03 14:56 +0000 |
| Message-ID | <5225f8b4$0$6599$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #53537 |
On Tue, 03 Sep 2013 08:45:00 +1000, Chris Angelico wrote: > On Tue, Sep 3, 2013 at 12:44 AM, MRAB <python@mrabarnett.plus.com> > wrote: >> I don't know Greek either, and I don't think there's any other language >> that uses the Greek alphabet. > > Assuming you don't count mathematics as a language. There are a few languages which use the Greek alphabet, with variations. Coptic is the main one, although Greek and Coptic letters have their own Unicode symbols, in order to support works which need to distinguish them. Armenian and, of course, Cyrillic, are derived from the Greek alphabet; actually so is the Latin alphabet. Other languages that used, or use, the Greek alphabet include quite a few ancient languages, including Gaulish and Bactrian. Old Nubian in the Middle Ages used the Greek alphabet plus a few additional letters. A number of Slavic languages used the Greek alphabet, although now they use Cyrillic. Some Albanian dialects still use the Greek alphabet, as do a couple of Turkic languages from the Balkans. See the Wikipedia entry on the Greek alphabet for more. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2013-09-02 20:49 -0400 |
| Message-ID | <mailman.527.1378169400.19984.python-list@python.org> |
| In reply to | #53453 |
On Mon, Sep 2, 2013 at 6:45 PM, Chris Angelico <rosuav@gmail.com> wrote: > On Tue, Sep 3, 2013 at 12:44 AM, MRAB <python@mrabarnett.plus.com> wrote: >> I don't know Greek either, and I don't think there's any other language >> that uses the Greek alphabet. > > Assuming you don't count mathematics as a language. You need to be rigorous to make mathematical assumptions. One bad assumtion and proof == poof! > > ChrisA > -- > http://mail.python.org/mailman/listinfo/python-list -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.python
csiph-web