Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52254
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2013-08-09 01:30 -0700 |
| References | <mailman.352.1375972418.1251.python-list@python.org> <9781df99-f9c8-4217-aa67-7a714b7f2ebe@googlegroups.com> <5203B841.4060304@gmail.com> <ku0eo0$9v9$1@ger.gmane.org> <mailman.359.1375979258.1251.python-list@python.org> |
| Message-ID | <9018bc25-e25e-47fb-b7ca-05c33a28b76c@googlegroups.com> (permalink) |
| Subject | Re: right adjusted strings containing umlauts |
| From | wxjmfauth@gmail.com |
Le jeudi 8 août 2013 18:27:06 UTC+2, Kurt Mueller a écrit :
> Now I have this small example:
>
> ----------------------------------------------------------
>
> #!/usr/bin/env python
>
> # vim: set fileencoding=utf-8 :
>
>
>
> from __future__ import print_function
>
> import sys, shlex
>
>
>
> print( repr( sys.stdin.encoding ) )
>
>
>
> strg_form = u'{0:>3} {1:>3} {2:>3} {3:>3} {4:>3}'
>
> for inpt_line in sys.stdin:
>
> proc_line = shlex.split( inpt_line, False, True, )
>
> encoding = "utf-8"
>
> proc_line = [ strg.decode( encoding ) for strg in proc_line ]
>
> print( strg_form.format( *proc_line ) )
>
> ----------------------------------------------------------
>
>
>
> $ echo -e "a b c d e\na ö u 1 2" | file -
>
> /dev/stdin: UTF-8 Unicode text
>
> $ echo -e "a b c d e\na ö u 1 2" | ./align_compact.py
>
> None
>
> a b c d e
>
> a ö u 1 2
>
> $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | file -
>
> /dev/stdin: ISO-8859 text
>
> $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | ./align_compact.py
>
> None
>
> a b c d e
>
> Traceback (most recent call last):
>
> File "./align_compact.py", line 13, in <module>
>
> proc_line = [ strg.decode( encoding ) for strg in proc_line ]
>
> File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
>
> return codecs.utf_8_decode(input, errors, True)
>
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 0: invalid start byte
>
> muk@mcp20:/sw/prog/scripts/text_manip>
>
>
>
> How do I handle this two inputs?
>
>
>
>
>
> TIA
>
> --
>
> Kurt Mueller
--------
It's very easy.
The error msg indicates, you cann't decode your series of bytes
with the utf-8 codec, simply because your string is encoded
in iso-8859-* (you did it explicitly!).
Your problem is not Python, your problem is the coding
of the characters.
You should be aware about the coding of the strings you are
manipulating (creating) and if necessary decode and/or encode
correctly accordingly to what you wish, eg. a suitable coding
for the display. That's on this level that Python (or any
language) matters.
The sys.std*.encoding is a different problem.
iso-8859-* ?
iso-8859-1 == latin-1 and latin9 == iso-8859-15.
If one excepts "das grosse Eszett", both codings are
able to handle German (it seems to be your case) and
there are no problems when working directly with these
codings.
jmf
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
right adjusted strings containing umlauts Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2013-08-08 16:23 +0200
Re: right adjusted strings containing umlauts Neil Cerutti <neilc@norwich.edu> - 2013-08-08 14:40 +0000
Re: right adjusted strings containing umlauts MRAB <python@mrabarnett.plus.com> - 2013-08-08 16:19 +0100
Re: right adjusted strings containing umlauts jfharden@gmail.com - 2013-08-08 07:43 -0700
Re: right adjusted strings containing umlauts Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2013-08-08 17:24 +0200
Re: right adjusted strings containing umlauts Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-10 01:29 +0000
Re: right adjusted strings containing umlauts Peter Otten <__peter__@web.de> - 2013-08-08 17:44 +0200
Re: right adjusted strings containing umlauts Dave Angel <davea@davea.name> - 2013-08-08 15:50 +0000
Re: right adjusted strings containing umlauts Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2013-08-08 18:16 +0200
Re: right adjusted strings containing umlauts Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2013-08-08 18:27 +0200
Re: right adjusted strings containing umlauts wxjmfauth@gmail.com - 2013-08-09 01:30 -0700
Re: right adjusted strings containing umlauts Peter Otten <__peter__@web.de> - 2013-08-08 18:34 +0200
Re: right adjusted strings containing umlauts Chris Angelico <rosuav@gmail.com> - 2013-08-08 17:37 +0100
Re: right adjusted strings containing umlauts Dave Angel <davea@davea.name> - 2013-08-08 17:47 +0000
Re: right adjusted strings containing umlauts Terry Reedy <tjreedy@udel.edu> - 2013-08-08 16:51 -0400
Re: right adjusted strings containing umlauts Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2013-08-23 17:47 +0200
Re: right adjusted strings containing umlauts Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2013-08-28 10:01 +0200
Re: right adjusted strings containing umlauts Dave Angel <davea@davea.name> - 2013-08-28 10:23 +0000
Re: right adjusted strings containing umlauts kurt.alfred.mueller@gmail.com - 2013-08-28 04:17 -0700
csiph-web