Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #88349 > unrolled thread

Error in processing JSON files in Python

Started byKarthik Sharma <karthik.sharma@gmail.com>
First post2015-03-30 14:27 -0700
Last post2015-03-31 19:58 +0000
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Error in processing JSON files in Python Karthik Sharma <karthik.sharma@gmail.com> - 2015-03-30 14:27 -0700
    Re: Error in processing JSON files in Python MRAB <python@mrabarnett.plus.com> - 2015-03-31 00:18 +0100
    Re: Error in processing JSON files in Python Denis McMahon <denismfmcmahon@gmail.com> - 2015-03-31 19:58 +0000

#88349 — Error in processing JSON files in Python

FromKarthik Sharma <karthik.sharma@gmail.com>
Date2015-03-30 14:27 -0700
SubjectError in processing JSON files in Python
Message-ID<63a251f6-01ff-4662-ab9c-588e3ffcd73a@googlegroups.com>
I have the following python program to read a set of JSON files do some processing on it and dump them back to the same folder. However When I run the below program and then try to see the output of the JSON file using

`cat file.json | python -m json.tool`

I get the following error

`extra data: line 1 column 307 - line 1 column 852 (char 306 - 851)`

What is wrong with my program?
 
    #Process 'new' events to extract more info from 'Messages'
    rootDir = '/home/s_parts'
    for dirName, subdirList, fileList in os.walk(rootDir):
        print('Found directory: %s' % dirName)
        for fname in fileList:
            fname='s_parts/'+fname
            with open(fname, 'r+') as f:
                json_data = json.load(f)
                et = json_data['Et']
                ms = json_data['Ms']
                if (event == 'a.b.c.d') or (event == 'e.f.g.h'):
                    url = re.sub('.+roxy=([^& ]*).*', r'\1', ms)
                    nt = re.findall(r"NT:\s*([^,)]*)",ms)[0]
                    bt = re.findall(r"BT:\s*([^,)]*)",ms)[0]
                    xt = re.findall(r"XT:\s*([^,)]*)",ms)[0]
                    appde = ms.split('Appde:')[1].strip().split('<br>')[0]
                    version = ms.split('version:')[1].strip().split('<br>')[0]
                    json_data["url"] = url
                    json_data["BT"] = bt
                    json_data["XT"] = xt
                    json_data["NT"] = nt
                    json_data["Appde"] = appde
                    json_data["version"] = version
                else:
                    json_data["url"] = "null"
                    json_data["BT"] = "null"
                    json_data["XT"] = "null"
                    json_data["NT"] = "null"
                    json_data["Appde"] = "null"
                    json_data["version"] = "null"
               json.dump(json_data,f)

If I do a `file` command on the output file I get
`s_parts/data_95: ASCII text, with very long lines, with no line terminators` 

[toc] | [next] | [standalone]


#88352

FromMRAB <python@mrabarnett.plus.com>
Date2015-03-31 00:18 +0100
Message-ID<mailman.352.1427757495.10327.python-list@python.org>
In reply to#88349
On 2015-03-30 22:27, Karthik Sharma wrote:
> I have the following python program to read a set of JSON files do some processing on it and dump them back to the same folder. However When I run the below program and then try to see the output of the JSON file using
>
> `cat file.json | python -m json.tool`
>
> I get the following error
>
> `extra data: line 1 column 307 - line 1 column 852 (char 306 - 851)`
>
> What is wrong with my program?
>
>      #Process 'new' events to extract more info from 'Messages'
>      rootDir = '/home/s_parts'
>      for dirName, subdirList, fileList in os.walk(rootDir):
>          print('Found directory: %s' % dirName)
>          for fname in fileList:
>              fname='s_parts/'+fname
>              with open(fname, 'r+') as f:
>                  json_data = json.load(f)
>                  et = json_data['Et']
>                  ms = json_data['Ms']
>                  if (event == 'a.b.c.d') or (event == 'e.f.g.h'):
>                      url = re.sub('.+roxy=([^& ]*).*', r'\1', ms)
>                      nt = re.findall(r"NT:\s*([^,)]*)",ms)[0]
>                      bt = re.findall(r"BT:\s*([^,)]*)",ms)[0]
>                      xt = re.findall(r"XT:\s*([^,)]*)",ms)[0]
>                      appde = ms.split('Appde:')[1].strip().split('<br>')[0]
>                      version = ms.split('version:')[1].strip().split('<br>')[0]
>                      json_data["url"] = url
>                      json_data["BT"] = bt
>                      json_data["XT"] = xt
>                      json_data["NT"] = nt
>                      json_data["Appde"] = appde
>                      json_data["version"] = version
>                  else:
>                      json_data["url"] = "null"
>                      json_data["BT"] = "null"
>                      json_data["XT"] = "null"
>                      json_data["NT"] = "null"
>                      json_data["Appde"] = "null"
>                      json_data["version"] = "null"
>                 json.dump(json_data,f)
>
> If I do a `file` command on the output file I get
> `s_parts/data_95: ASCII text, with very long lines, with no line terminators`
>
open(fname, 'r+') opens the file for update, json.load(f) reads from
the file, and then json.dump(json_data,f) writes back to the file,
_appending_ to it, so the file now contains the old data followed by
the new data.

Another point: "null" is a string and will be written as such. If you
actually want a null in the JSON data, then that should be None.

[toc] | [prev] | [next] | [standalone]


#88400

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-03-31 19:58 +0000
Message-ID<mfeu8b$qs$2@dont-email.me>
In reply to#88349
On Mon, 30 Mar 2015 14:27:14 -0700, Karthik Sharma wrote:

> I have the following python program to read a set of JSON files do some
> processing on it and dump them back to the same folder. However When I
> run the below program and then try to see the output of the JSON file
> using
> 
> `cat file.json | python -m json.tool`
> 
> I get the following error
> 
> `extra data: line 1 column 307 - line 1 column 852 (char 306 - 851)`
> 
> What is wrong with my program?
>  
>     #Process 'new' events to extract more info from 'Messages'
>     rootDir = '/home/s_parts'
>     for dirName, subdirList, fileList in os.walk(rootDir):
>         print('Found directory: %s' % dirName)
>         for fname in fileList:
>             fname='s_parts/'+fname with open(fname, 'r+') as f:
>                 json_data = json.load(f)
>                 # do stuff to the data
>                 json.dump(json_data,f)

You're writing back to the same file as you loaded the data from having 
opened the file in append mode.

This probably leads to a file containing two json objects, the original 
one and the one which you have processed.

That's a bad json file.

Note the caution in the python documentation:

"Note - Unlike pickle and marshal, JSON is not a framed protocol, so 
trying to serialize multiple objects with repeated calls to dump() using 
the same fp will result in an invalid JSON file."

Writing back to the same file as you read with json.load is the same 
thing. If you want to use the same file, you need to close it after the 
json.load(), and then open it again in write mode before the json.dump(). 

Write is "w", *NOT* "w+".

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web