Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #24571 > unrolled thread

Is there any way to decode String using unknown codec?

Started byhowmuchistoday@gmail.com
First post2012-06-27 18:14 -0700
Last post2012-06-28 19:18 +0200
Articles 6 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Is there any way to decode String using unknown codec? howmuchistoday@gmail.com - 2012-06-27 18:14 -0700
    Re: Is there any way to decode String using unknown codec? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-06-27 19:20 -0700
      Re: Is there any way to decode String using unknown codec? howmuchistoday@gmail.com - 2012-06-28 14:27 -0700
      Re: Is there any way to decode String using unknown codec? howmuchistoday@gmail.com - 2012-06-28 14:27 -0700
    Re: Is there any way to decode String using unknown codec? MRAB <python@mrabarnett.plus.com> - 2012-06-28 12:28 +0100
    Re: Is there any way to decode String using unknown codec? Dieter Maurer <dieter@handshake.de> - 2012-06-28 19:18 +0200

#24571 — Is there any way to decode String using unknown codec?

Fromhowmuchistoday@gmail.com
Date2012-06-27 18:14 -0700
SubjectIs there any way to decode String using unknown codec?
Message-ID<c67686b6-4f98-4408-a89c-edc0a6030c24@googlegroups.com>
Hi
I'm a Korean and when I use modules like sys, os, &c,
sometimes the interpreter show me broken strings like
'\x13\xb3\x12\xc8'.
It mustbe the Korean "alphabet" but I can't decode it to the rightway.
I tried to decode it using codecs like cp949,mbcs,utf-8
but It failed.
The only way I found is eval('\x13\xb3\x12\xc8').
It raises an Error with showing right Korean.
Is there any way to deal it being not broken?

[toc] | [next] | [standalone]


#24574

FromBenjamin Kaplan <benjamin.kaplan@case.edu>
Date2012-06-27 19:20 -0700
Message-ID<mailman.1580.1340850037.4697.python-list@python.org>
In reply to#24571
On Wed, Jun 27, 2012 at 6:14 PM,  <howmuchistoday@gmail.com> wrote:
> Hi
> I'm a Korean and when I use modules like sys, os, &c,
> sometimes the interpreter show me broken strings like
> '\x13\xb3\x12\xc8'.
> It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> I tried to decode it using codecs like cp949,mbcs,utf-8
> but It failed.
> The only way I found is eval('\x13\xb3\x12\xc8').
> It raises an Error with showing right Korean.
> Is there any way to deal it being not broken?
> --

It's not broken. You're just using the wrong encodings. Try utf-16le.

[toc] | [prev] | [next] | [standalone]


#24637

Fromhowmuchistoday@gmail.com
Date2012-06-28 14:27 -0700
Message-ID<mailman.1622.1340918855.4697.python-list@python.org>
In reply to#24574
T

2012년 6월 28일 목요일 오전 11시 20분 28초 UTC+9, Benjamin Kaplan 님의 말:
> On Wed, Jun 27, 2012 at 6:14 PM,  <howmuchistoday@gmail.com> wrote:
> > Hi
> > I'm a Korean and when I use modules like sys, os, &c,
> > sometimes the interpreter show me broken strings like
> > '\x13\xb3\x12\xc8'.
> > It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> > I tried to decode it using codecs like cp949,mbcs,utf-8
> > but It failed.
> > The only way I found is eval('\x13\xb3\x12\xc8').
> > It raises an Error with showing right Korean.
> > Is there any way to deal it being not broken?
> > --
> 
> It's not broken. You're just using the wrong encodings. Try utf-16le.

Thank you guys. The problem is solved!

[toc] | [prev] | [next] | [standalone]


#24638

Fromhowmuchistoday@gmail.com
Date2012-06-28 14:27 -0700
Message-ID<615e7c90-b240-43e3-a106-ed07c1ffc500@googlegroups.com>
In reply to#24574
T

2012년 6월 28일 목요일 오전 11시 20분 28초 UTC+9, Benjamin Kaplan 님의 말:
> On Wed, Jun 27, 2012 at 6:14 PM,  <howmuchistoday@gmail.com> wrote:
> > Hi
> > I'm a Korean and when I use modules like sys, os, &c,
> > sometimes the interpreter show me broken strings like
> > '\x13\xb3\x12\xc8'.
> > It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> > I tried to decode it using codecs like cp949,mbcs,utf-8
> > but It failed.
> > The only way I found is eval('\x13\xb3\x12\xc8').
> > It raises an Error with showing right Korean.
> > Is there any way to deal it being not broken?
> > --
> 
> It's not broken. You're just using the wrong encodings. Try utf-16le.

Thank you guys. The problem is solved!

[toc] | [prev] | [next] | [standalone]


#24604

FromMRAB <python@mrabarnett.plus.com>
Date2012-06-28 12:28 +0100
Message-ID<mailman.1596.1340882887.4697.python-list@python.org>
In reply to#24571
On 28/06/2012 02:14, howmuchistoday@gmail.com wrote:
> Hi
> I'm a Korean and when I use modules like sys, os, &c,
> sometimes the interpreter show me broken strings like
> '\x13\xb3\x12\xc8'.
> It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> I tried to decode it using codecs like cp949,mbcs,utf-8
> but It failed.
> The only way I found is eval('\x13\xb3\x12\xc8').
> It raises an Error with showing right Korean.
> Is there any way to deal it being not broken?
>
It might be UTF-16:

 >>> b'\x13\xb3\x12\xc8'.decode("utf16")
'댓젒'

I don't know Korean, but that looks reasonable!

[toc] | [prev] | [next] | [standalone]


#24624

FromDieter Maurer <dieter@handshake.de>
Date2012-06-28 19:18 +0200
Message-ID<mailman.1613.1340903908.4697.python-list@python.org>
In reply to#24571
howmuchistoday@gmail.com writes:

> I'm a Korean and when I use modules like sys, os, &c,
> sometimes the interpreter show me broken strings like
> '\x13\xb3\x12\xc8'.
> It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> I tried to decode it using codecs like cp949,mbcs,utf-8
> but It failed.
> The only way I found is eval('\x13\xb3\x12\xc8').

This looks as if "sys.stdout/sys.stderr" knew the correct encoding.
Check it like this:

      import sys
      sys.stdout.encoding

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web