Groups > comp.lang.python > #24571 > unrolled thread

Is there any way to decode String using unknown codec?

Started by	howmuchistoday@gmail.com
First post	2012-06-27 18:14 -0700
Last post	2012-06-28 19:18 +0200
Articles	6 — 4 participants

Back to article view | Back to comp.lang.python

  Is there any way to decode String using unknown codec? howmuchistoday@gmail.com - 2012-06-27 18:14 -0700
    Re: Is there any way to decode String using unknown codec? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-06-27 19:20 -0700
      Re: Is there any way to decode String using unknown codec? howmuchistoday@gmail.com - 2012-06-28 14:27 -0700
      Re: Is there any way to decode String using unknown codec? howmuchistoday@gmail.com - 2012-06-28 14:27 -0700
    Re: Is there any way to decode String using unknown codec? MRAB <python@mrabarnett.plus.com> - 2012-06-28 12:28 +0100
    Re: Is there any way to decode String using unknown codec? Dieter Maurer <dieter@handshake.de> - 2012-06-28 19:18 +0200

#24571 — Is there any way to decode String using unknown codec?

From	howmuchistoday@gmail.com
Date	2012-06-27 18:14 -0700
Subject	Is there any way to decode String using unknown codec?
Message-ID	<c67686b6-4f98-4408-a89c-edc0a6030c24@googlegroups.com>

Hi
I'm a Korean and when I use modules like sys, os, &c,
sometimes the interpreter show me broken strings like
'\x13\xb3\x12\xc8'.
It mustbe the Korean "alphabet" but I can't decode it to the rightway.
I tried to decode it using codecs like cp949,mbcs,utf-8
but It failed.
The only way I found is eval('\x13\xb3\x12\xc8').
It raises an Error with showing right Korean.
Is there any way to deal it being not broken?

[toc] | [next] | [standalone]

#24574

From	Benjamin Kaplan <benjamin.kaplan@case.edu>
Date	2012-06-27 19:20 -0700
Message-ID	<mailman.1580.1340850037.4697.python-list@python.org>
In reply to	#24571

On Wed, Jun 27, 2012 at 6:14 PM,  <howmuchistoday@gmail.com> wrote:
> Hi
> I'm a Korean and when I use modules like sys, os, &c,
> sometimes the interpreter show me broken strings like
> '\x13\xb3\x12\xc8'.
> It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> I tried to decode it using codecs like cp949,mbcs,utf-8
> but It failed.
> The only way I found is eval('\x13\xb3\x12\xc8').
> It raises an Error with showing right Korean.
> Is there any way to deal it being not broken?
> --

It's not broken. You're just using the wrong encodings. Try utf-16le.

[toc] | [prev] | [next] | [standalone]

#24637

From	howmuchistoday@gmail.com
Date	2012-06-28 14:27 -0700
Message-ID	<mailman.1622.1340918855.4697.python-list@python.org>
In reply to	#24574

T

2012년 6월 28일 목요일 오전 11시 20분 28초 UTC+9, Benjamin Kaplan 님의 말:
> On Wed, Jun 27, 2012 at 6:14 PM,  <howmuchistoday@gmail.com> wrote:
> > Hi
> > I'm a Korean and when I use modules like sys, os, &c,
> > sometimes the interpreter show me broken strings like
> > '\x13\xb3\x12\xc8'.
> > It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> > I tried to decode it using codecs like cp949,mbcs,utf-8
> > but It failed.
> > The only way I found is eval('\x13\xb3\x12\xc8').
> > It raises an Error with showing right Korean.
> > Is there any way to deal it being not broken?
> > --
> 
> It's not broken. You're just using the wrong encodings. Try utf-16le.

Thank you guys. The problem is solved!

[toc] | [prev] | [next] | [standalone]

#24638

From	howmuchistoday@gmail.com
Date	2012-06-28 14:27 -0700
Message-ID	<615e7c90-b240-43e3-a106-ed07c1ffc500@googlegroups.com>
In reply to	#24574

T

2012년 6월 28일 목요일 오전 11시 20분 28초 UTC+9, Benjamin Kaplan 님의 말:
> On Wed, Jun 27, 2012 at 6:14 PM,  <howmuchistoday@gmail.com> wrote:
> > Hi
> > I'm a Korean and when I use modules like sys, os, &c,
> > sometimes the interpreter show me broken strings like
> > '\x13\xb3\x12\xc8'.
> > It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> > I tried to decode it using codecs like cp949,mbcs,utf-8
> > but It failed.
> > The only way I found is eval('\x13\xb3\x12\xc8').
> > It raises an Error with showing right Korean.
> > Is there any way to deal it being not broken?
> > --
> 
> It's not broken. You're just using the wrong encodings. Try utf-16le.

Thank you guys. The problem is solved!

[toc] | [prev] | [next] | [standalone]

#24604

From	MRAB <python@mrabarnett.plus.com>
Date	2012-06-28 12:28 +0100
Message-ID	<mailman.1596.1340882887.4697.python-list@python.org>
In reply to	#24571

On 28/06/2012 02:14, howmuchistoday@gmail.com wrote:
> Hi
> I'm a Korean and when I use modules like sys, os, &c,
> sometimes the interpreter show me broken strings like
> '\x13\xb3\x12\xc8'.
> It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> I tried to decode it using codecs like cp949,mbcs,utf-8
> but It failed.
> The only way I found is eval('\x13\xb3\x12\xc8').
> It raises an Error with showing right Korean.
> Is there any way to deal it being not broken?
>
It might be UTF-16:

 >>> b'\x13\xb3\x12\xc8'.decode("utf16")
'댓젒'

I don't know Korean, but that looks reasonable!

[toc] | [prev] | [next] | [standalone]

#24624

From	Dieter Maurer <dieter@handshake.de>
Date	2012-06-28 19:18 +0200
Message-ID	<mailman.1613.1340903908.4697.python-list@python.org>
In reply to	#24571

howmuchistoday@gmail.com writes:

> I'm a Korean and when I use modules like sys, os, &c,
> sometimes the interpreter show me broken strings like
> '\x13\xb3\x12\xc8'.
> It mustbe the Korean "alphabet" but I can't decode it to the rightway.
> I tried to decode it using codecs like cp949,mbcs,utf-8
> but It failed.
> The only way I found is eval('\x13\xb3\x12\xc8').

This looks as if "sys.stdout/sys.stderr" knew the correct encoding.
Check it like this:

      import sys
      sys.stdout.encoding

[toc] | [prev] | [standalone]

csiph-web

Is there any way to decode String using unknown codec?

Contents

#24571 — Is there any way to decode String using unknown codec?

#24574

#24637

#24638

#24604

#24624