Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39676 > unrolled thread

Good cross-version ASCII serialisation protocol for simple types

Started byPaul Moore <p.f.moore@gmail.com>
First post2013-02-23 07:45 -0800
Last post2013-02-23 11:00 -0800
Articles 5 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Good cross-version ASCII serialisation protocol for simple types Paul  Moore <p.f.moore@gmail.com> - 2013-02-23 07:45 -0800
    Re: Good cross-version ASCII serialisation protocol for simple types Chris Angelico <rosuav@gmail.com> - 2013-02-24 03:00 +1100
    Re: Good cross-version ASCII serialisation protocol for simple types Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2013-02-23 17:05 +0100
    Re: Good cross-version ASCII serialisation protocol for simple types Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-02-23 18:06 +0200
      Re: Good cross-version ASCII serialisation protocol for simple types Paul  Moore <p.f.moore@gmail.com> - 2013-02-23 11:00 -0800

#39676 — Good cross-version ASCII serialisation protocol for simple types

FromPaul Moore <p.f.moore@gmail.com>
Date2013-02-23 07:45 -0800
SubjectGood cross-version ASCII serialisation protocol for simple types
Message-ID<f1d3acb6-94e2-48f6-8ccd-042b929d0ef4@googlegroups.com>
I need to transfer some data (nothing fancy, some dictionaries, strings, numbers and lists, basically) between 2 Python processes. However, the data (string values) is potentially not ASCII, but the transport is (I'm piping between 2 processes, but thanks to nasty encoding issues, the only characters I can be sure won't be mangled are ASCII).

What's the best ASCII-only protocol to use that's portable between versions of Python back to about 2.6/2.7 and in the stdlib, so I don't need external modules?

At the moment, I'm using

encoded = json.dumps([ord(c) for c in json.dumps(obj)])
decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))

The double-encoding ensures that non-ASCII characters don't make it into the result.

This works fine, but is there something simpler (i.e., less of a hack!) that I could use? (Base64 and the like don't work because they encode bytes->strings, not strings->strings).

Thanks,
Paul

[toc] | [next] | [standalone]


#39678

FromChris Angelico <rosuav@gmail.com>
Date2013-02-24 03:00 +1100
Message-ID<mailman.2353.1361635213.2939.python-list@python.org>
In reply to#39676
On Sun, Feb 24, 2013 at 2:45 AM, Paul  Moore <p.f.moore@gmail.com> wrote:
> At the moment, I'm using
>
> encoded = json.dumps([ord(c) for c in json.dumps(obj)])
> decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
>
> The double-encoding ensures that non-ASCII characters don't make it into the result.
>
> This works fine, but is there something simpler (i.e., less of a hack!) that I could use? (Base64 and the like don't work because they encode bytes->strings, not strings->strings).

Hmm. How likely is it that you'll have non-ASCII characters in the
input? If they're fairly uncommon, you could use UTF-7 - it's fairly
space-efficient when the input is mostly ASCII, but inefficient on
other characters.

Not sure what the problem is with bytes vs strings; you can always do
an encode("ascii") or decode("ascii") to convert 7-bit strings between
those types.

With that covered, I'd just go with a single JSON packaging, and work
with the resulting Unicode string.

Python 2.6:
>>> s=u"asdf\u1234zxcv"
>>> s.encode("utf-7").decode("ascii")
u'asdf+EjQ-zxcv'

Python 3.3:
>>> s=u"asdf\u1234zxcv"
>>> s.encode("utf-7").decode("ascii")
'asdf+EjQ-zxcv'

Another option would be to JSON-encode in pure-ASCII mode:

>>> json.dumps([s],ensure_ascii=True)
'["asdf\\u1234zxcv"]'

Would that cover it?

ChrisA

[toc] | [prev] | [next] | [standalone]


#39680

FromIrmen de Jong <irmen.NOSPAM@xs4all.nl>
Date2013-02-23 17:05 +0100
Message-ID<5128e8ad$0$6843$e4fe514c@news.xs4all.nl>
In reply to#39676
On 23-2-2013 16:45, Paul Moore wrote:
> I need to transfer some data (nothing fancy, some dictionaries, strings, numbers and
> lists, basically) between 2 Python processes. However, the data (string values) is
> potentially not ASCII, but the transport is (I'm piping between 2 processes, but
> thanks to nasty encoding issues, the only characters I can be sure won't be mangled
> are ASCII).
> 
> What's the best ASCII-only protocol to use that's portable between versions of Python
> back to about 2.6/2.7 and in the stdlib, so I don't need external modules?
> 
> At the moment, I'm using
> 
> encoded = json.dumps([ord(c) for c in json.dumps(obj)]) decoded =
> json.loads(''.join([chr(n) for n in json.loads(encoded)]))
> 
> The double-encoding ensures that non-ASCII characters don't make it into the result.

Eww.

> 
> This works fine, but is there something simpler (i.e., less of a hack!) that I could
> use? (Base64 and the like don't work because they encode bytes->strings, not
> strings->strings).

For Python < 3.0, strings and bytes are the same;

>>> import base64
>>> base64.b64encode("hello there")
'aGVsbG8gdGhlcmU='
>>> base64.b64decode(_)
'hello there'
>>>


Other than that, maybe a simple repr(stuff) / ast.literal_eval(string) might do the job?


Irmen

[toc] | [prev] | [next] | [standalone]


#39681

FromJussi Piitulainen <jpiitula@ling.helsinki.fi>
Date2013-02-23 18:06 +0200
Message-ID<qotwqtzf218.fsf@ruuvi.it.helsinki.fi>
In reply to#39676
Paul  Moore writes:

> I need to transfer some data (nothing fancy, some dictionaries,
> strings, numbers and lists, basically) between 2 Python
> processes. However, the data (string values) is potentially not
> ASCII, but the transport is (I'm piping between 2 processes, but
> thanks to nasty encoding issues, the only characters I can be sure
> won't be mangled are ASCII).
> 
> What's the best ASCII-only protocol to use that's portable between
> versions of Python back to about 2.6/2.7 and in the stdlib, so I
> don't need external modules?
> 
> At the moment, I'm using
> 
> encoded = json.dumps([ord(c) for c in json.dumps(obj)])
> decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
> 
> The double-encoding ensures that non-ASCII characters don't make it
> into the result.
> 
> This works fine, but is there something simpler (i.e., less of a
> hack!) that I could use? (Base64 and the like don't work because
> they encode bytes->strings, not strings->strings).

I don't know much of these things but I've been using Python's
json.dump and json.load for a couple of weeks now and they seem to use
ASCII-friendly escapes automatically, writing a four-character string
as "\u00e4\u00e4ni" instead of using the UTF-8 characters that my
environment is set to handle. That's written to stdout which is then
directed to a file in a shell script, and I copy-pasted it here from
the resulting file.

I'm using Python 3.3, though.

[toc] | [prev] | [next] | [standalone]


#39701

FromPaul Moore <p.f.moore@gmail.com>
Date2013-02-23 11:00 -0800
Message-ID<37a63b6b-20f3-48b8-8ba7-b0475961ec2f@googlegroups.com>
In reply to#39681
On Saturday, 23 February 2013 16:06:11 UTC, Jussi Piitulainen  wrote:
> I don't know much of these things but I've been using Python's
> json.dump and json.load for a couple of weeks now and they seem to use
> ASCII-friendly escapes automatically, writing a four-character string
> as "\u00e4\u00e4ni" instead of using the UTF-8 characters that my
> environment is set to handle.

Thanks. When I tried to write a short program to demo what I was doing, I realised that my problem was actually with my test code, not with json. Here's my test code:

import json, subprocess
CODE="""
import json
p = {'x': '\N{EURO SIGN}'}
print json.dumps(p)
"""
data_bytes = subprocess.check_output(['py', '-2', '-c', CODE])
data = json.loads(data_bytes.decode('ASCII'))
print(data)

The problem is that I'm not using a raw string for CODE, so the Euro sign is being put into the string literally, and that causes all sorts of encoding-related fun that I didn't intend!

As you say, json actually works fine for this application, so thanks for pointing that out. I thought it shouldn't need to be as hard as I was making it!!!

Paul.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web