Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #29079

Re: Least-lossy string.encode to us-ascii?

References <50524F6F.6070604@tim.thechases.com>
Date 2012-09-13 23:44 +0200
Subject Re: Least-lossy string.encode to us-ascii?
From Vlastimil Brom <vlastimil.brom@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.642.1347572661.27098.python-list@python.org> (permalink)

Show all headers | View raw


2012/9/13 Tim Chase <python.list@tim.thechases.com>:
> I've got a bunch of text in Portuguese and to transmit them, need to
> have them in us-ascii (7-bit).  I'd like to keep as much information
> as possible, just stripping accents, cedillas, tildes, etc.  So
> "serviço móvil" becomes "servico movil".  Is there anything stock
> that I've missed?  I can do mystring.encode('us-ascii', 'replace')
> but that doesn't keep as much information as I'd hope.
>
> -tkc
>


Hi,
would something like the following be enough for your needs?
Unfortunately, I can't check it reliably with regard to Portuguese.

>>> import unicodedata
>>> unicodedata.normalize("NFD", u"serviço móvil").encode("ascii", "ignore").decode("ascii")
u'servico movil'
>>>

There is also "Unidecode", but I haven't used it myself sofar...
http://pypi.python.org/pypi/Unidecode/

hth,
  vbr

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Least-lossy string.encode to us-ascii? Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-09-13 23:44 +0200

csiph-web