Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #88349 > unrolled thread
| Started by | Karthik Sharma <karthik.sharma@gmail.com> |
|---|---|
| First post | 2015-03-30 14:27 -0700 |
| Last post | 2015-03-31 19:58 +0000 |
| Articles | 3 — 3 participants |
Back to article view | Back to comp.lang.python
Error in processing JSON files in Python Karthik Sharma <karthik.sharma@gmail.com> - 2015-03-30 14:27 -0700
Re: Error in processing JSON files in Python MRAB <python@mrabarnett.plus.com> - 2015-03-31 00:18 +0100
Re: Error in processing JSON files in Python Denis McMahon <denismfmcmahon@gmail.com> - 2015-03-31 19:58 +0000
| From | Karthik Sharma <karthik.sharma@gmail.com> |
|---|---|
| Date | 2015-03-30 14:27 -0700 |
| Subject | Error in processing JSON files in Python |
| Message-ID | <63a251f6-01ff-4662-ab9c-588e3ffcd73a@googlegroups.com> |
I have the following python program to read a set of JSON files do some processing on it and dump them back to the same folder. However When I run the below program and then try to see the output of the JSON file using
`cat file.json | python -m json.tool`
I get the following error
`extra data: line 1 column 307 - line 1 column 852 (char 306 - 851)`
What is wrong with my program?
#Process 'new' events to extract more info from 'Messages'
rootDir = '/home/s_parts'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
fname='s_parts/'+fname
with open(fname, 'r+') as f:
json_data = json.load(f)
et = json_data['Et']
ms = json_data['Ms']
if (event == 'a.b.c.d') or (event == 'e.f.g.h'):
url = re.sub('.+roxy=([^& ]*).*', r'\1', ms)
nt = re.findall(r"NT:\s*([^,)]*)",ms)[0]
bt = re.findall(r"BT:\s*([^,)]*)",ms)[0]
xt = re.findall(r"XT:\s*([^,)]*)",ms)[0]
appde = ms.split('Appde:')[1].strip().split('<br>')[0]
version = ms.split('version:')[1].strip().split('<br>')[0]
json_data["url"] = url
json_data["BT"] = bt
json_data["XT"] = xt
json_data["NT"] = nt
json_data["Appde"] = appde
json_data["version"] = version
else:
json_data["url"] = "null"
json_data["BT"] = "null"
json_data["XT"] = "null"
json_data["NT"] = "null"
json_data["Appde"] = "null"
json_data["version"] = "null"
json.dump(json_data,f)
If I do a `file` command on the output file I get
`s_parts/data_95: ASCII text, with very long lines, with no line terminators`
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-03-31 00:18 +0100 |
| Message-ID | <mailman.352.1427757495.10327.python-list@python.org> |
| In reply to | #88349 |
On 2015-03-30 22:27, Karthik Sharma wrote:
> I have the following python program to read a set of JSON files do some processing on it and dump them back to the same folder. However When I run the below program and then try to see the output of the JSON file using
>
> `cat file.json | python -m json.tool`
>
> I get the following error
>
> `extra data: line 1 column 307 - line 1 column 852 (char 306 - 851)`
>
> What is wrong with my program?
>
> #Process 'new' events to extract more info from 'Messages'
> rootDir = '/home/s_parts'
> for dirName, subdirList, fileList in os.walk(rootDir):
> print('Found directory: %s' % dirName)
> for fname in fileList:
> fname='s_parts/'+fname
> with open(fname, 'r+') as f:
> json_data = json.load(f)
> et = json_data['Et']
> ms = json_data['Ms']
> if (event == 'a.b.c.d') or (event == 'e.f.g.h'):
> url = re.sub('.+roxy=([^& ]*).*', r'\1', ms)
> nt = re.findall(r"NT:\s*([^,)]*)",ms)[0]
> bt = re.findall(r"BT:\s*([^,)]*)",ms)[0]
> xt = re.findall(r"XT:\s*([^,)]*)",ms)[0]
> appde = ms.split('Appde:')[1].strip().split('<br>')[0]
> version = ms.split('version:')[1].strip().split('<br>')[0]
> json_data["url"] = url
> json_data["BT"] = bt
> json_data["XT"] = xt
> json_data["NT"] = nt
> json_data["Appde"] = appde
> json_data["version"] = version
> else:
> json_data["url"] = "null"
> json_data["BT"] = "null"
> json_data["XT"] = "null"
> json_data["NT"] = "null"
> json_data["Appde"] = "null"
> json_data["version"] = "null"
> json.dump(json_data,f)
>
> If I do a `file` command on the output file I get
> `s_parts/data_95: ASCII text, with very long lines, with no line terminators`
>
open(fname, 'r+') opens the file for update, json.load(f) reads from
the file, and then json.dump(json_data,f) writes back to the file,
_appending_ to it, so the file now contains the old data followed by
the new data.
Another point: "null" is a string and will be written as such. If you
actually want a null in the JSON data, then that should be None.
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-03-31 19:58 +0000 |
| Message-ID | <mfeu8b$qs$2@dont-email.me> |
| In reply to | #88349 |
On Mon, 30 Mar 2015 14:27:14 -0700, Karthik Sharma wrote:
> I have the following python program to read a set of JSON files do some
> processing on it and dump them back to the same folder. However When I
> run the below program and then try to see the output of the JSON file
> using
>
> `cat file.json | python -m json.tool`
>
> I get the following error
>
> `extra data: line 1 column 307 - line 1 column 852 (char 306 - 851)`
>
> What is wrong with my program?
>
> #Process 'new' events to extract more info from 'Messages'
> rootDir = '/home/s_parts'
> for dirName, subdirList, fileList in os.walk(rootDir):
> print('Found directory: %s' % dirName)
> for fname in fileList:
> fname='s_parts/'+fname with open(fname, 'r+') as f:
> json_data = json.load(f)
> # do stuff to the data
> json.dump(json_data,f)
You're writing back to the same file as you loaded the data from having
opened the file in append mode.
This probably leads to a file containing two json objects, the original
one and the one which you have processed.
That's a bad json file.
Note the caution in the python documentation:
"Note - Unlike pickle and marshal, JSON is not a framed protocol, so
trying to serialize multiple objects with repeated calls to dump() using
the same fp will result in an invalid JSON file."
Writing back to the same file as you read with json.load is the same
thing. If you want to use the same file, you need to close it after the
json.load(), and then open it again in write mode before the json.dump().
Write is "w", *NOT* "w+".
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web