Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #18594 > unrolled thread
| Started by | Ivan <ivan@llaisdy.com> |
|---|---|
| First post | 2012-01-06 10:03 +0000 |
| Last post | 2012-01-08 12:16 +0100 |
| Articles | 7 — 5 participants |
Back to article view | Back to comp.lang.python
How to support a non-standard encoding? Ivan <ivan@llaisdy.com> - 2012-01-06 10:03 +0000
Re: How to support a non-standard encoding? Tim Wintle <tim.wintle@teamrubber.com> - 2012-01-06 13:47 +0000
Re: How to support a non-standard encoding? Ivan Uemlianin <ivan@llaisdy.com> - 2012-01-06 14:03 +0000
Re: How to support a non-standard encoding? jmfauth <wxjmfauth@gmail.com> - 2012-01-06 12:00 -0800
Re: How to support a non-standard encoding? Tim Wintle <tim.wintle@teamrubber.com> - 2012-01-06 20:42 +0000
Re: How to support a non-standard encoding? Ivan <ivan@llaisdy.com> - 2012-01-08 08:50 +0000
Re: How to support a non-standard encoding? Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-01-08 12:16 +0100
| From | Ivan <ivan@llaisdy.com> |
|---|---|
| Date | 2012-01-06 10:03 +0000 |
| Subject | How to support a non-standard encoding? |
| Message-ID | <je6guf$pns$1@localhost.localdomain> |
Dear All
I'm developing a python application for which I need to support a
non-standard character encoding (specifically ISO 6937/2-1983, Addendum
1-1989). Here are some of the properties of the encoding and its use in
the application:
- I need to read and write data to/from files. The file format
includes two sections in different character encodings (so I
shan't be able to use codecs.open()).
- iso-6937 sections include non-printing control characters
- iso-6937 is a variable width encoding, e.g. "A" = [41],
"Ä" = [0xC8, 0x41]; all non-spacing diacritical marks are in the
range 0xC0-0xCF.
By any chance is there anyone out there working on iso-6937?
Otherwise, I think I need to write a new codec to support reading and
writing this data. Does anyone know of any tutorials or blog posts on
implementing a codec for a non-standard characeter encoding? Would
anyone be interested in reading one?
With thanks and best wishes
Ivan
--
============================================================
Ivan A. Uemlianin
Llaisdy
Speech Technology Research and Development
ivan@llaisdy.com
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin
"Froh, froh! Wie seine Sonnen, seine Sonnen fliegen"
(Schiller, Beethoven)
============================================================
[toc] | [next] | [standalone]
| From | Tim Wintle <tim.wintle@teamrubber.com> |
|---|---|
| Date | 2012-01-06 13:47 +0000 |
| Message-ID | <mailman.4478.1325857654.27778.python-list@python.org> |
| In reply to | #18594 |
On Fri, 2012-01-06 at 10:03 +0000, Ivan wrote: > Dear All > > I'm developing a python application for which I need to support a > non-standard character encoding (specifically ISO 6937/2-1983, Addendum > 1-1989). If your system version of iconv contains that encoding (mine does) then you could use a wrapped iconv library to avoid re-inventing the wheel. I've got a forked version of the "iconv" package from pypi available here: <https://github.com/timwintle/iconv-python> .. it should work on python2.5-2.7 Tim
[toc] | [prev] | [next] | [standalone]
| From | Ivan Uemlianin <ivan@llaisdy.com> |
|---|---|
| Date | 2012-01-06 14:03 +0000 |
| Message-ID | <mailman.4480.1325860009.27778.python-list@python.org> |
| In reply to | #18594 |
Dear Tim
Thanks for your help.
> If your system version of iconv contains that encoding, ...
Alas, it doesn't:
$ iconv -l |grep 6937
$
Also, I'd like to package the app so other people could use it, so I
wouldn't want to depend too much on the local OS.
Best wishes
Ivan
On 06/01/2012 13:47, Tim Wintle wrote:
> On Fri, 2012-01-06 at 10:03 +0000, Ivan wrote:
>> Dear All
>>
>> I'm developing a python application for which I need to support a
>> non-standard character encoding (specifically ISO 6937/2-1983, Addendum
>> 1-1989).
>
> If your system version of iconv contains that encoding (mine does) then
> you could use a wrapped iconv library to avoid re-inventing the wheel.
>
> I've got a forked version of the "iconv" package from pypi available
> here:
>
> <https://github.com/timwintle/iconv-python>
>
> .. it should work on python2.5-2.7
>
> Tim
>
--
============================================================
Ivan A. Uemlianin
Llaisdy
Speech Technology Research and Development
ivan@llaisdy.com
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin
"Froh, froh! Wie seine Sonnen, seine Sonnen fliegen"
(Schiller, Beethoven)
============================================================
[toc] | [prev] | [next] | [standalone]
| From | jmfauth <wxjmfauth@gmail.com> |
|---|---|
| Date | 2012-01-06 12:00 -0800 |
| Message-ID | <1480875f-d133-40a1-8fd1-dd31a2dd430b@d10g2000vbh.googlegroups.com> |
| In reply to | #18594 |
On 6 jan, 11:03, Ivan <i...@llaisdy.com> wrote: > Dear All > > I'm developing a python application for which I need to support a > non-standard character encoding (specifically ISO 6937/2-1983, Addendum > 1-1989). Here are some of the properties of the encoding and its use in > the application: > > - I need to read and write data to/from files. The file format > includes two sections in different character encodings (so I > shan't be able to use codecs.open()). > > - iso-6937 sections include non-printing control characters > > - iso-6937 is a variable width encoding, e.g. "A" = [41], > "Ä" = [0xC8, 0x41]; all non-spacing diacritical marks are in the > range 0xC0-0xCF. > > By any chance is there anyone out there working on iso-6937? > > Otherwise, I think I need to write a new codec to support reading and > writing this data. Does anyone know of any tutorials or blog posts on > implementing a codec for a non-standard characeter encoding? Would > anyone be interested in reading one? > Take a look at the files, Python modules, in the ...\Lib\encodings. This is the place where all codecs are centralized. Python is magically using these a long there are present in that dir. I remember, long time ago, for the fun, I created such a codec quite easily. I picked up one of the file as template and I modified its "table". It was a byte <-> byte table. For multibytes coding scheme, it may be a litte bit more complicated; you may take a look, eg, at the mbcs.py codec. The distibution of such a codec may be a problem. ---- Another simple approach, os independent. You probably do not write your code in iso-6937, but you only need to encode/decode some bytes sequence "on the fly". In that case, work with bytes, create a couple of coding / decoding functions with a created <dict> [*] as helper. It's not so complicate. Use <unicode> Py2 or <str> Py3 (the recommended way ;-) ) as pivot encoding. [*] I also created once a such a dict from # http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt I never checked if it does correpond to the "official" cp1252 codec. jmf
[toc] | [prev] | [next] | [standalone]
| From | Tim Wintle <tim.wintle@teamrubber.com> |
|---|---|
| Date | 2012-01-06 20:42 +0000 |
| Message-ID | <mailman.4494.1325882556.27778.python-list@python.org> |
| In reply to | #18617 |
On Fri, 2012-01-06 at 12:00 -0800, jmfauth wrote: > The distibution of such a codec may be a problem. There is a register_codec method (or similar) in the codecs module. Tim
[toc] | [prev] | [next] | [standalone]
| From | Ivan <ivan@llaisdy.com> |
|---|---|
| Date | 2012-01-08 08:50 +0000 |
| Message-ID | <jeblda$5gk$1@localhost.localdomain> |
| In reply to | #18619 |
Dear jmf, Tim
Thanks for these pointers. They look v useful.
I'll have a go and report back (with success I hope).
Best wishes
Ivan
On 06/01/2012 20:42, Tim Wintle wrote:
> On Fri, 2012-01-06 at 12:00 -0800, jmfauth wrote:
>> The distibution of such a codec may be a problem.
>
> There is a register_codec method (or similar) in the codecs module.
>
> Tim
>
>
--
============================================================
Ivan A. Uemlianin
Llaisdy
Speech Technology Research and Development
ivan@llaisdy.com
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin
"Froh, froh! Wie seine Sonnen, seine Sonnen fliegen"
(Schiller, Beethoven)
============================================================
[toc] | [prev] | [next] | [standalone]
| From | Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> |
|---|---|
| Date | 2012-01-08 12:16 +0100 |
| Message-ID | <jebtu9$5m2$1@r03.glglgl.gl> |
| In reply to | #18617 |
Am 06.01.2012 21:00 schrieb jmfauth: > Another simple approach, os independent. > > You probably do not write your code in iso-6937, but > you only need to encode/decode some bytes sequence > "on the fly". In that case, work with bytes, create > a couple of coding / decoding functions with a > created<dict> [*] as helper. It's not so complicate. > Use<unicode> Py2 or<str> Py3 (the recommended > way ;-) ) as pivot encoding. These coding/decoding functions are exactly the way to create a codec. I. e., it is not much more. Thomas
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web