Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17422 > unrolled thread

Parsing stream of JSON objects incrementally

Started byEvan Driscoll <edriscoll@wisc.edu>
First post2011-12-17 19:29 -0600
Last post2011-12-19 15:18 -0800
Articles 3 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  Parsing stream of JSON objects incrementally Evan Driscoll <edriscoll@wisc.edu> - 2011-12-17 19:29 -0600
    Re: Parsing stream of JSON objects incrementally Miki Tebeka <miki.tebeka@gmail.com> - 2011-12-19 15:18 -0800
    Re: Parsing stream of JSON objects incrementally Miki Tebeka <miki.tebeka@gmail.com> - 2011-12-19 15:18 -0800

#17422 — Parsing stream of JSON objects incrementally

FromEvan Driscoll <edriscoll@wisc.edu>
Date2011-12-17 19:29 -0600
SubjectParsing stream of JSON objects incrementally
Message-ID<mailman.3772.1324171874.27778.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

I'm interested in writing two programs, A and B, which communicate using
JSON. At a high level, A wants to transfer an array to B.

However, I would very much like to make it possible for A and B to run
in parallel, so my current plan is to have A output and B read a
*sequence* of JSON objects. In other words, instead of
  [ {"a": 0},  {"b":0}, {"c": 0} ]
it would just send
  {"a": 0}
  {"b": 0}
  {"c": 0}

I know about the raw_decode() object inside the json.JSONParser class,
and that gets me most of the way there there.

However, what I'm *not* sure about is the best way to get the input to
the raw_decode() function, which expects a "string or buffer":

  >>> d = json.JSONDecoder()
  >>> d.raw_decode(sys.stdin)
  Traceback (most recent call last):
    ...
    File "json\scanner.py", line 42, in iterscan
      match = self.scanner.scanner(string, idx).match
  TypeError: expected string or buffer

Now I'm not very familiar with the buffer and how it could be used (and
whether a file or stdin could be used as one in an incremental fashion),
but the best way I can come up with is the following:

1. Read a line of input
2. Try to decode it
3. If not, read another line, concatenate it to the end, and try again
4. etc.

That seems... inelegant at least.


Some other information:

* I'm looking for a 2.7 solution ideally
* I'd prefer not to use a different JSON library entirely
* As suggested, I *am* willing to wait for a newline to do processing
* However, I don't want to require exactly one object per line (and
  want to allow both multiple objects on one line and newlines within
  an object)

Evan

[toc] | [next] | [standalone]


#17528

FromMiki Tebeka <miki.tebeka@gmail.com>
Date2011-12-19 15:18 -0800
Message-ID<22477795.145.1324336699742.JavaMail.geo-discussion-forums@yqir6>
In reply to#17422
You probably need to accumulate a buffer and try to decode it, when succeeded return the object and read more. See example below (note that for sockets select might be a better option for reading data).

import json
from time import sleep

def json_decoder(fo):
    buff = ''
    decode = json.JSONDecoder().raw_decode
    while True:
        line = fo.readline()
        if not line:
            break
        buff += line
        print('BUFF: {}'.format(buff))
        try:
            obj, i = decode(buff)
            buff = buff[i:].lstrip()
            yield obj
        except ValueError as e:
            print('ERR: {}'.format(e))
            sleep(0.01) # select will probably be a better option :)


def main():
    import sys
    for obj in json_decoder(sys.stdin):
        print(obj)

if __name__ == '__main__':
    main()

[toc] | [prev] | [next] | [standalone]


#17529

FromMiki Tebeka <miki.tebeka@gmail.com>
Date2011-12-19 15:18 -0800
Message-ID<mailman.3832.1324336701.27778.python-list@python.org>
In reply to#17422
You probably need to accumulate a buffer and try to decode it, when succeeded return the object and read more. See example below (note that for sockets select might be a better option for reading data).

import json
from time import sleep

def json_decoder(fo):
    buff = ''
    decode = json.JSONDecoder().raw_decode
    while True:
        line = fo.readline()
        if not line:
            break
        buff += line
        print('BUFF: {}'.format(buff))
        try:
            obj, i = decode(buff)
            buff = buff[i:].lstrip()
            yield obj
        except ValueError as e:
            print('ERR: {}'.format(e))
            sleep(0.01) # select will probably be a better option :)


def main():
    import sys
    for obj in json_decoder(sys.stdin):
        print(obj)

if __name__ == '__main__':
    main()

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web