Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #39676 > unrolled thread
| Started by | Paul Moore <p.f.moore@gmail.com> |
|---|---|
| First post | 2013-02-23 07:45 -0800 |
| Last post | 2013-02-23 11:00 -0800 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
Good cross-version ASCII serialisation protocol for simple types Paul Moore <p.f.moore@gmail.com> - 2013-02-23 07:45 -0800
Re: Good cross-version ASCII serialisation protocol for simple types Chris Angelico <rosuav@gmail.com> - 2013-02-24 03:00 +1100
Re: Good cross-version ASCII serialisation protocol for simple types Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2013-02-23 17:05 +0100
Re: Good cross-version ASCII serialisation protocol for simple types Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-02-23 18:06 +0200
Re: Good cross-version ASCII serialisation protocol for simple types Paul Moore <p.f.moore@gmail.com> - 2013-02-23 11:00 -0800
| From | Paul Moore <p.f.moore@gmail.com> |
|---|---|
| Date | 2013-02-23 07:45 -0800 |
| Subject | Good cross-version ASCII serialisation protocol for simple types |
| Message-ID | <f1d3acb6-94e2-48f6-8ccd-042b929d0ef4@googlegroups.com> |
I need to transfer some data (nothing fancy, some dictionaries, strings, numbers and lists, basically) between 2 Python processes. However, the data (string values) is potentially not ASCII, but the transport is (I'm piping between 2 processes, but thanks to nasty encoding issues, the only characters I can be sure won't be mangled are ASCII).
What's the best ASCII-only protocol to use that's portable between versions of Python back to about 2.6/2.7 and in the stdlib, so I don't need external modules?
At the moment, I'm using
encoded = json.dumps([ord(c) for c in json.dumps(obj)])
decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
The double-encoding ensures that non-ASCII characters don't make it into the result.
This works fine, but is there something simpler (i.e., less of a hack!) that I could use? (Base64 and the like don't work because they encode bytes->strings, not strings->strings).
Thanks,
Paul
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-02-24 03:00 +1100 |
| Message-ID | <mailman.2353.1361635213.2939.python-list@python.org> |
| In reply to | #39676 |
On Sun, Feb 24, 2013 at 2:45 AM, Paul Moore <p.f.moore@gmail.com> wrote:
> At the moment, I'm using
>
> encoded = json.dumps([ord(c) for c in json.dumps(obj)])
> decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
>
> The double-encoding ensures that non-ASCII characters don't make it into the result.
>
> This works fine, but is there something simpler (i.e., less of a hack!) that I could use? (Base64 and the like don't work because they encode bytes->strings, not strings->strings).
Hmm. How likely is it that you'll have non-ASCII characters in the
input? If they're fairly uncommon, you could use UTF-7 - it's fairly
space-efficient when the input is mostly ASCII, but inefficient on
other characters.
Not sure what the problem is with bytes vs strings; you can always do
an encode("ascii") or decode("ascii") to convert 7-bit strings between
those types.
With that covered, I'd just go with a single JSON packaging, and work
with the resulting Unicode string.
Python 2.6:
>>> s=u"asdf\u1234zxcv"
>>> s.encode("utf-7").decode("ascii")
u'asdf+EjQ-zxcv'
Python 3.3:
>>> s=u"asdf\u1234zxcv"
>>> s.encode("utf-7").decode("ascii")
'asdf+EjQ-zxcv'
Another option would be to JSON-encode in pure-ASCII mode:
>>> json.dumps([s],ensure_ascii=True)
'["asdf\\u1234zxcv"]'
Would that cover it?
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Irmen de Jong <irmen.NOSPAM@xs4all.nl> |
|---|---|
| Date | 2013-02-23 17:05 +0100 |
| Message-ID | <5128e8ad$0$6843$e4fe514c@news.xs4all.nl> |
| In reply to | #39676 |
On 23-2-2013 16:45, Paul Moore wrote:
> I need to transfer some data (nothing fancy, some dictionaries, strings, numbers and
> lists, basically) between 2 Python processes. However, the data (string values) is
> potentially not ASCII, but the transport is (I'm piping between 2 processes, but
> thanks to nasty encoding issues, the only characters I can be sure won't be mangled
> are ASCII).
>
> What's the best ASCII-only protocol to use that's portable between versions of Python
> back to about 2.6/2.7 and in the stdlib, so I don't need external modules?
>
> At the moment, I'm using
>
> encoded = json.dumps([ord(c) for c in json.dumps(obj)]) decoded =
> json.loads(''.join([chr(n) for n in json.loads(encoded)]))
>
> The double-encoding ensures that non-ASCII characters don't make it into the result.
Eww.
>
> This works fine, but is there something simpler (i.e., less of a hack!) that I could
> use? (Base64 and the like don't work because they encode bytes->strings, not
> strings->strings).
For Python < 3.0, strings and bytes are the same;
>>> import base64
>>> base64.b64encode("hello there")
'aGVsbG8gdGhlcmU='
>>> base64.b64decode(_)
'hello there'
>>>
Other than that, maybe a simple repr(stuff) / ast.literal_eval(string) might do the job?
Irmen
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jpiitula@ling.helsinki.fi> |
|---|---|
| Date | 2013-02-23 18:06 +0200 |
| Message-ID | <qotwqtzf218.fsf@ruuvi.it.helsinki.fi> |
| In reply to | #39676 |
Paul Moore writes:
> I need to transfer some data (nothing fancy, some dictionaries,
> strings, numbers and lists, basically) between 2 Python
> processes. However, the data (string values) is potentially not
> ASCII, but the transport is (I'm piping between 2 processes, but
> thanks to nasty encoding issues, the only characters I can be sure
> won't be mangled are ASCII).
>
> What's the best ASCII-only protocol to use that's portable between
> versions of Python back to about 2.6/2.7 and in the stdlib, so I
> don't need external modules?
>
> At the moment, I'm using
>
> encoded = json.dumps([ord(c) for c in json.dumps(obj)])
> decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
>
> The double-encoding ensures that non-ASCII characters don't make it
> into the result.
>
> This works fine, but is there something simpler (i.e., less of a
> hack!) that I could use? (Base64 and the like don't work because
> they encode bytes->strings, not strings->strings).
I don't know much of these things but I've been using Python's
json.dump and json.load for a couple of weeks now and they seem to use
ASCII-friendly escapes automatically, writing a four-character string
as "\u00e4\u00e4ni" instead of using the UTF-8 characters that my
environment is set to handle. That's written to stdout which is then
directed to a file in a shell script, and I copy-pasted it here from
the resulting file.
I'm using Python 3.3, though.
[toc] | [prev] | [next] | [standalone]
| From | Paul Moore <p.f.moore@gmail.com> |
|---|---|
| Date | 2013-02-23 11:00 -0800 |
| Message-ID | <37a63b6b-20f3-48b8-8ba7-b0475961ec2f@googlegroups.com> |
| In reply to | #39681 |
On Saturday, 23 February 2013 16:06:11 UTC, Jussi Piitulainen wrote:
> I don't know much of these things but I've been using Python's
> json.dump and json.load for a couple of weeks now and they seem to use
> ASCII-friendly escapes automatically, writing a four-character string
> as "\u00e4\u00e4ni" instead of using the UTF-8 characters that my
> environment is set to handle.
Thanks. When I tried to write a short program to demo what I was doing, I realised that my problem was actually with my test code, not with json. Here's my test code:
import json, subprocess
CODE="""
import json
p = {'x': '\N{EURO SIGN}'}
print json.dumps(p)
"""
data_bytes = subprocess.check_output(['py', '-2', '-c', CODE])
data = json.loads(data_bytes.decode('ASCII'))
print(data)
The problem is that I'm not using a raw string for CODE, so the Euro sign is being put into the string literally, and that causes all sorts of encoding-related fun that I didn't intend!
As you say, json actually works fine for this application, so thanks for pointing that out. I thought it shouldn't need to be as hard as I was making it!!!
Paul.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web