Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #17422 > unrolled thread
| Started by | Evan Driscoll <edriscoll@wisc.edu> |
|---|---|
| First post | 2011-12-17 19:29 -0600 |
| Last post | 2011-12-19 15:18 -0800 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.lang.python
Parsing stream of JSON objects incrementally Evan Driscoll <edriscoll@wisc.edu> - 2011-12-17 19:29 -0600
Re: Parsing stream of JSON objects incrementally Miki Tebeka <miki.tebeka@gmail.com> - 2011-12-19 15:18 -0800
Re: Parsing stream of JSON objects incrementally Miki Tebeka <miki.tebeka@gmail.com> - 2011-12-19 15:18 -0800
| From | Evan Driscoll <edriscoll@wisc.edu> |
|---|---|
| Date | 2011-12-17 19:29 -0600 |
| Subject | Parsing stream of JSON objects incrementally |
| Message-ID | <mailman.3772.1324171874.27778.python-list@python.org> |
[Multipart message — attachments visible in raw view] — view raw
I'm interested in writing two programs, A and B, which communicate using
JSON. At a high level, A wants to transfer an array to B.
However, I would very much like to make it possible for A and B to run
in parallel, so my current plan is to have A output and B read a
*sequence* of JSON objects. In other words, instead of
[ {"a": 0}, {"b":0}, {"c": 0} ]
it would just send
{"a": 0}
{"b": 0}
{"c": 0}
I know about the raw_decode() object inside the json.JSONParser class,
and that gets me most of the way there there.
However, what I'm *not* sure about is the best way to get the input to
the raw_decode() function, which expects a "string or buffer":
>>> d = json.JSONDecoder()
>>> d.raw_decode(sys.stdin)
Traceback (most recent call last):
...
File "json\scanner.py", line 42, in iterscan
match = self.scanner.scanner(string, idx).match
TypeError: expected string or buffer
Now I'm not very familiar with the buffer and how it could be used (and
whether a file or stdin could be used as one in an incremental fashion),
but the best way I can come up with is the following:
1. Read a line of input
2. Try to decode it
3. If not, read another line, concatenate it to the end, and try again
4. etc.
That seems... inelegant at least.
Some other information:
* I'm looking for a 2.7 solution ideally
* I'd prefer not to use a different JSON library entirely
* As suggested, I *am* willing to wait for a newline to do processing
* However, I don't want to require exactly one object per line (and
want to allow both multiple objects on one line and newlines within
an object)
Evan
[toc] | [next] | [standalone]
| From | Miki Tebeka <miki.tebeka@gmail.com> |
|---|---|
| Date | 2011-12-19 15:18 -0800 |
| Message-ID | <22477795.145.1324336699742.JavaMail.geo-discussion-forums@yqir6> |
| In reply to | #17422 |
You probably need to accumulate a buffer and try to decode it, when succeeded return the object and read more. See example below (note that for sockets select might be a better option for reading data).
import json
from time import sleep
def json_decoder(fo):
buff = ''
decode = json.JSONDecoder().raw_decode
while True:
line = fo.readline()
if not line:
break
buff += line
print('BUFF: {}'.format(buff))
try:
obj, i = decode(buff)
buff = buff[i:].lstrip()
yield obj
except ValueError as e:
print('ERR: {}'.format(e))
sleep(0.01) # select will probably be a better option :)
def main():
import sys
for obj in json_decoder(sys.stdin):
print(obj)
if __name__ == '__main__':
main()
[toc] | [prev] | [next] | [standalone]
| From | Miki Tebeka <miki.tebeka@gmail.com> |
|---|---|
| Date | 2011-12-19 15:18 -0800 |
| Message-ID | <mailman.3832.1324336701.27778.python-list@python.org> |
| In reply to | #17422 |
You probably need to accumulate a buffer and try to decode it, when succeeded return the object and read more. See example below (note that for sockets select might be a better option for reading data).
import json
from time import sleep
def json_decoder(fo):
buff = ''
decode = json.JSONDecoder().raw_decode
while True:
line = fo.readline()
if not line:
break
buff += line
print('BUFF: {}'.format(buff))
try:
obj, i = decode(buff)
buff = buff[i:].lstrip()
yield obj
except ValueError as e:
print('ERR: {}'.format(e))
sleep(0.01) # select will probably be a better option :)
def main():
import sys
for obj in json_decoder(sys.stdin):
print(obj)
if __name__ == '__main__':
main()
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web