Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #63626 > unrolled thread

unicode troubles and postgres

Started byEthan Furman <ethan@stoneleaf.us>
First post2014-01-09 10:49 -0800
Last post2014-01-10 02:44 -0800
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  unicode troubles and postgres Ethan Furman <ethan@stoneleaf.us> - 2014-01-09 10:49 -0800
    Re: unicode troubles and postgres wxjmfauth@gmail.com - 2014-01-10 02:44 -0800

#63626 — unicode troubles and postgres

FromEthan Furman <ethan@stoneleaf.us>
Date2014-01-09 10:49 -0800
Subjectunicode troubles and postgres
Message-ID<mailman.5279.1389296211.18130.python-list@python.org>
So I'm working with postgres, and I get a datadump which I try to restore to my test system, and I get this:

ERROR:  value too long for type character varying(4)
CONTEXT:  COPY res_currency, line 32, column symbol: "руб"

"py6" sure looks like it should fit, but it don't.  Further investigation revealed that "py6" is made up of the bytes d1 
80 d1 83 d0 b1.

Any ideas on what that means, exactly?

--
~Ethan~

[toc] | [next] | [standalone]


#63645

Fromwxjmfauth@gmail.com
Date2014-01-10 02:44 -0800
Message-ID<d0b97b6c-931a-44ac-a927-0604aeeffed0@googlegroups.com>
In reply to#63626
Le jeudi 9 janvier 2014 19:49:27 UTC+1, Ethan Furman a écrit :
> So I'm working with postgres, and I get a datadump which I try to restore to my test system, and I get this:
> 
> 
> 
> ERROR:  value too long for type character varying(4)
> 
> CONTEXT:  COPY res_currency, line 32, column symbol: "руб"
> 
> 
> 
> "py6" sure looks like it should fit, but it don't.  Further investigation revealed that "py6" is made up of the bytes d1 
> 
> 80 d1 83 d0 b1.
> 
> 
> 
> Any ideas on what that means, exactly?
> 
> 

When one has to face such a characteristic sequence,
the first thing to do is to think "utf-8".

(Not a proof)

>>> a = list(range(0x0410, 0x0415))
>>> a += list(range(0x0440, 0x0445))
>>> a += list(range(0x0480, 0x0485))
>>> import unicodedata as ud
>>> for i in a:
...     hex(i), chr(i).encode('utf-8'), ud.name(chr(i))
...     
('0x410', b'\xd0\x90', 'CYRILLIC CAPITAL LETTER A')
('0x411', b'\xd0\x91', 'CYRILLIC CAPITAL LETTER BE')
('0x412', b'\xd0\x92', 'CYRILLIC CAPITAL LETTER VE')
('0x413', b'\xd0\x93', 'CYRILLIC CAPITAL LETTER GHE')
('0x414', b'\xd0\x94', 'CYRILLIC CAPITAL LETTER DE')
('0x440', b'\xd1\x80', 'CYRILLIC SMALL LETTER ER')
('0x441', b'\xd1\x81', 'CYRILLIC SMALL LETTER ES')
('0x442', b'\xd1\x82', 'CYRILLIC SMALL LETTER TE')
('0x443', b'\xd1\x83', 'CYRILLIC SMALL LETTER U')
('0x444', b'\xd1\x84', 'CYRILLIC SMALL LETTER EF')
('0x480', b'\xd2\x80', 'CYRILLIC CAPITAL LETTER KOPPA')
('0x481', b'\xd2\x81', 'CYRILLIC SMALL LETTER KOPPA')
('0x482', b'\xd2\x82', 'CYRILLIC THOUSANDS SIGN')
('0x483', b'\xd2\x83', 'COMBINING CYRILLIC TITLO')
('0x484', b'\xd2\x84', 'COMBINING CYRILLIC PALATALIZATION')

jmf

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web