Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #54344 > unrolled thread
| Started by | Bryan Britten <britten.bryan@gmail.com> |
|---|---|
| First post | 2013-09-17 18:42 -0700 |
| Last post | 2013-09-18 18:47 -0400 |
| Articles | 8 — 5 participants |
Back to article view | Back to comp.lang.python
*.csv to *.txt after adding columns Bryan Britten <britten.bryan@gmail.com> - 2013-09-17 18:42 -0700
Re: *.csv to *.txt after adding columns Dave Angel <davea@davea.name> - 2013-09-18 02:18 +0000
Re: *.csv to *.txt after adding columns Bryan Britten <britten.bryan@gmail.com> - 2013-09-17 19:28 -0700
Re: *.csv to *.txt after adding columns Dave Angel <davea@davea.name> - 2013-09-18 07:55 +0000
Re: *.csv to *.txt after adding columns Peter Otten <__peter__@web.de> - 2013-09-18 09:14 +0200
Re: *.csv to *.txt after adding columns rusi <rustompmody@gmail.com> - 2013-09-18 00:44 -0700
Re: *.csv to *.txt after adding columns Bryan Britten <britten.bryan@gmail.com> - 2013-09-18 04:42 -0700
Re: *.csv to *.txt after adding columns Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-09-18 18:47 -0400
| From | Bryan Britten <britten.bryan@gmail.com> |
|---|---|
| Date | 2013-09-17 18:42 -0700 |
| Subject | *.csv to *.txt after adding columns |
| Message-ID | <41edba54-31d3-48ad-a50f-41f87f32d251@googlegroups.com> |
Hey, gang, I've got a problem here that I'm sure a handful of you will know how to solve. I've got about 6 *.csv files that I am trying to open; change the header names (to get rid of spaces); add two new columns, which are just the results of a string.split() command; drop the column I just split; and then finally export to *.txt files. Here's the code I'm using:
import os
import csv
fileHandle = 'Path/To/Data'
varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
for csvFile in os.listdir(fileHandle):
outFile = open(fileHandle + os.path.splitext(csvFile)[0] + '.txt', 'w')
inFile = open(fileHandle + csvFile, 'rb')
reader = csv.reader(inFile, delimiter=',')
rowNum = 0
for row in reader:
if rowNum == 0:
outFile.write(varNames)
rowNum += 1
else:
date, time = row[2].split()
row.insert(3, date)
row.insert(4, time)
row.remove(row[2])
outFile.write('\t'.join(row) + '\n')
outFile.close()
inFile.close()
The issue I'm having is that the *.txt files I'm generating are empty. I assume some unraised error is being thrown, but I'm new to Python and am self taught, so I don't know where to look or how to troubleshoot.
When I run the code on just one file and print the output instead of writing it, it looks exactly like what I'd want. So I'm at a loss for where the problem is.
Any help is appreciated!
[toc] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-18 02:18 +0000 |
| Message-ID | <mailman.99.1379470752.18130.python-list@python.org> |
| In reply to | #54344 |
On 17/9/2013 21:42, Bryan Britten wrote:
> Hey, gang, I've got a problem here that I'm sure a handful of you will know how to solve. I've got about 6 *.csv files that I am trying to open; change the header names (to get rid of spaces); add two new columns, which are just the results of a string.split() command; drop the column I just split; and then finally export to *.txt files. Here's the code I'm using:
>
> import os
> import csv
>
>
> fileHandle = 'Path/To/Data'
> varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
>
> for csvFile in os.listdir(fileHandle):
> outFile = open(fileHandle + os.path.splitext(csvFile)[0] + '.txt', 'w')
> inFile = open(fileHandle + csvFile, 'rb')
> reader = csv.reader(inFile, delimiter=',')
> rowNum = 0
> for row in reader:
> if rowNum == 0:
> outFile.write(varNames)
> rowNum += 1
> else:
> date, time = row[2].split()
> row.insert(3, date)
> row.insert(4, time)
> row.remove(row[2])
> outFile.write('\t'.join(row) + '\n')
> outFile.close()
> inFile.close()
>
>
> The issue I'm having is that the *.txt files I'm generating are empty. I assume some unraised error is being thrown,
How can you NOT know if an exception is being raised? How are you
running the code, in a terminal window? Can't you just see what gets
printed (as stderr)?
> but I'm new to Python and am self taught, so I don't know where to look
> or how to troubleshoot.
>
> When I run the code on just one file and print the output instead of writing it, it looks exactly like what I'd want. So I'm at a loss for where the problem is.
You describe two changes, but don't show them. How about you do them
one at a time, or perhaps even better, add prints so it both does the
file(s) AND prints the output?
What I think is happening is that you're missing the path separator
between the "Path/To/Data" and the basename. You should be combining
those with os.path.join(), not with +
For a quick & dirty check, add a trailing slash to the fileHandle. But
that's not the right way to fix it.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Bryan Britten <britten.bryan@gmail.com> |
|---|---|
| Date | 2013-09-17 19:28 -0700 |
| Message-ID | <c7b5595f-b847-4172-9a1d-86287e0804d5@googlegroups.com> |
| In reply to | #54344 |
Dave -
I can't print the output because there are close to 1,000,000 records. It would be extremely inefficient and resource intensive to look at every row. Like I said, when I take just one file and run the code over the first few records I get what I'd expect to see. Here's an example(non-redacted code):
INPUT:
import csv
fileHandle = 'C:/Users/Bryan/Data Analysis/Crime Analysis/Data/'
varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
outFile = open(fileHandle + 'ChiCrime01_02.txt', 'w')
inFile = open(fileHandle + 'ChiCrime01_02.csv', 'rb')
reader = csv.reader(inFile, delimiter=',')
rowNum = 0
for row in reader:
if rowNum < 5:
if rowNum >= 1:
date, time = row[2].split()
row.insert(3, date)
row.insert(4, time)
row.remove(row[2])
print '\t'.join(row)
rowNum+=1
else:
print varNames
rowNum+=1
OUTPUT:
ID CaseNum Date Time Block IUCR PrimaryType Description LocDesc Arrest Domestic Beat District Ward CommArea FBICode XCoord YCoord Year UpdatedOn Lat Long Loc
2924745 HJ602602 12/31/2002 23:59 006XX W 117TH ST 841 THEFT FINANCIAL ID THEFT:$300 &UNDER RESIDENCE PORCH/HALLWAY FALSE FALSE 524 5 34 53 6 1173831 1827387 2002 3/30/2006 21:10 41.68175482 -87.63931351 (41.681754819160666, -87.63931350564216)
2523290 HJ101091 12/31/2002 23:59 002XX W 112TH PL 1310 CRIMINAL DAMAGE TO PROPERTY APARTMENT FALSE FALSE 522 34 49 14 1176848 1830383 2002 3/30/2006 21:10 41.68990907 -87.62817988 (41.689909072449474, -87.62817987594765)
2527332 HJ105139 12/31/2002 23:55 005XX E 89TH PL 486 BATTERY DOMESTIC BATTERY SIMPLE RESIDENCE FALSE TRUE 633 6 44 08B 1181369 1845794 2002 3/30/2006 21:10 41.73209609 -87.61115533 (41.732096089465905, -87.61115532670617)
2524251 HJ100175 12/31/2002 23:55 012XX S KARLOV AVE 041A BATTERY AGGRAVATED: HANDGUN SIDEWALK FALSE FALSE 1011 24 29 04B 1149196 1894387 2002 3/30/2006 21:10 41.86612296 -87.72776536 (41.86612295941429, -87.72776535755746)
Like I said, the output is exactly what I want, but it doesn't seem to be writing to the file and I don't know why. I said I didn't know if it was raising an exception because I'm new to Python and I didn't know if there were some methods that included "silent" errors where it would continue the code but produce the wrong results, such as not writing my files.
Lastly, why does everyone seem to push for os.path.join versus the method I have used? Is it just a 'standard' that people like to see?
Thanks for your help
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-09-18 07:55 +0000 |
| Message-ID | <mailman.107.1379490982.18130.python-list@python.org> |
| In reply to | #54348 |
On 17/9/2013 22:28, Bryan Britten wrote: > Dave - > > I can't print the output because there are close to 1,000,000 records. It would be extremely inefficient and resource intensive to look at every row. Not if you made a sample directory with about 3 files, each containing half a dozen lines. > Like I said, when I take just one file and run the code over the first > few records I get what I'd expect to see. Here's an example(non-redacted code): > > INPUT: > > import csv > > fileHandle = 'C:/Users/Bryan/Data Analysis/Crime Analysis/Data/' Now, that directory specification ends with a slash. So "+" will work correctly. But your original did not. > > varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n' > > outFile = open(fileHandle + 'ChiCrime01_02.txt', 'w') > inFile = open(fileHandle + 'ChiCrime01_02.csv', 'rb') Instead of changing this code, you could have switched to a directory containing only one file. > > Like I said, the output is exactly what I want, but it doesn't seem to be writing to the file and I don't know why. I said I didn't know if it was raising an exception because I'm new to Python and I didn't know if there were some methods that included "silent" errors where it would continue the code but produce the wrong results, such as not writing my files. > The only "silent" exception I know of is the one triggered by sys.exit(), used to terminate the process. > Lastly, why does everyone seem to push for os.path.join versus the method I have used? Is it just a 'standard' that people like to see? > Because os.path.join will be smart enough to add the slashes only if they are necessary. That can be useful, especially if the directory you're using as a prefix came from the user. I think Peter's suggestion is probably right on; you don't limit your infiles to *.csv, so you will be processing *.txt files the second time. Another useful debugging aid would have been to add print statements right after opening the files, perhaps something like: print "Reading :", infile.name print "Writing:", outfile.name If those names had been missing slashes, I would have been vindicated, while if they were the same, you'd know Peter had nailed it. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-09-18 09:14 +0200 |
| Message-ID | <mailman.106.1379488457.18130.python-list@python.org> |
| In reply to | #54344 |
Bryan Britten wrote:
> Hey, gang, I've got a problem here that I'm sure a handful of you will
> know how to solve. I've got about 6 *.csv files that I am trying to open;
> change the header names (to get rid of spaces); add two new columns, which
> are just the results of a string.split() command; drop the column I just
> split; and then finally export to *.txt files. Here's the code I'm using:
>
> import os
> import csv
>
>
> fileHandle = 'Path/To/Data'
> varNames =
>
'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
>
You may be processing not just the .csv files, but also the .txt files from
a previous run of your script. Remove them manually to verify my guess.
If I'm right you can (in order of my preference)
(1) write the output into another directory and also do (2)
(2) filter the input files with glob.glob("/path/to/data/*.csv")
(3) or just add the two lines below to your current code.
> for csvFile in os.listdir(fileHandle):
if not csvFile.endswith(".csv"):
continue
> outFile = open(fileHandle + os.path.splitext(csvFile)[0] + '.txt',
> 'w')
> inFile = open(fileHandle + csvFile, 'rb')
> reader = csv.reader(inFile, delimiter=',')
> rowNum = 0
> for row in reader:
> if rowNum == 0:
> outFile.write(varNames)
> rowNum += 1
> else:
> date, time = row[2].split()
> row.insert(3, date)
> row.insert(4, time)
> row.remove(row[2])
> outFile.write('\t'.join(row) + '\n')
> outFile.close()
> inFile.close()
>
>
> The issue I'm having is that the *.txt files I'm generating are empty. I
> assume some unraised error is being thrown, but I'm new to Python and am
> self taught, so I don't know where to look or how to troubleshoot.
>
> When I run the code on just one file and print the output instead of
> writing it, it looks exactly like what I'd want. So I'm at a loss for
> where the problem is.
>
> Any help is appreciated!
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-09-18 00:44 -0700 |
| Message-ID | <19f72b93-27c8-4426-9a97-25d88a9be702@googlegroups.com> |
| In reply to | #54344 |
On Wednesday, September 18, 2013 7:12:21 AM UTC+5:30, Bryan Britten wrote: > Hey, gang, I've got a problem here that I'm sure a handful of you will know how to solve. I've got about 6 *.csv files that I am trying to open; change the header names (to get rid of spaces); add two new columns, which are just the results of a string.split() command; drop the column I just split; and then finally export to *.txt files. Here's the code I'm using: Its generally a good idea in programming to separate separable problems. Since you say: > The issue I'm having is that the *.txt files I'm generating are empty. I > assume some unraised error is being thrown, but I'm new to Python and am self > taught, so I don't know where to look or how to troubleshoot. > When I run the code on just one file and print the output instead of writing > it, it looks exactly like what I'd want. So I'm at a loss for where the > problem is. you should probably write and have checked the code that handles one file. For example a function that takes as parameters the two: the input/output files (maybe names or handles), and then - check that that function works for one file - call it in your loop [BTW your code seems to mixup the variable-names for file-names and handles]
[toc] | [prev] | [next] | [standalone]
| From | Bryan Britten <britten.bryan@gmail.com> |
|---|---|
| Date | 2013-09-18 04:42 -0700 |
| Message-ID | <a8549e34-648f-4c74-9879-c8b46f6a2ab3@googlegroups.com> |
| In reply to | #54344 |
Peter nailed it. Adding in the two lines of code to ensure I was just working with *.csv files fixed the problem. Thanks to everyone for the help and suggestions on best practices.
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2013-09-18 18:47 -0400 |
| Message-ID | <mailman.140.1379544469.18130.python-list@python.org> |
| In reply to | #54344 |
On Tue, 17 Sep 2013 18:42:21 -0700 (PDT), Bryan Britten
<britten.bryan@gmail.com> declaimed the following:
>Hey, gang, I've got a problem here that I'm sure a handful of you will know how to solve. I've got about 6 *.csv files that I am trying to open; change the header names (to get rid of spaces); add two new columns, which are just the results of a string.split() command; drop the column I just split; and then finally export to *.txt files. Here's the code I'm using:
>
>import os
>import csv
>
>
>fileHandle = 'Path/To/Data'
That seems a misnomer... It is not a "file handle" of any sort, just a
base path.
>varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
>
>for csvFile in os.listdir(fileHandle):
> outFile = open(fileHandle + os.path.splitext(csvFile)[0] + '.txt', 'w')
Note that your "fileHandle" does not end with a / -- so I have no idea
what types of names you are trying to open...
Recommend you use
os.path.join(fileHande, os.path.splitext(csvFile)[0] + ".txt")
> inFile = open(fileHandle + csvFile, 'rb')
ditto
> reader = csv.reader(inFile, delimiter=',')
> rowNum = 0
> for row in reader:
> if rowNum == 0:
> outFile.write(varNames)
> rowNum += 1
> else:
> date, time = row[2].split()
> row.insert(3, date)
> row.insert(4, time)
> row.remove(row[2])
> outFile.write('\t'.join(row) + '\n')
Apparently your "text file" is a tab-separated file...
The CSV module is capable of processing TSV just as easily as CSV.
I'd also drop the whole confusion of rowNum
-=-=-=-=- pseudo-code, not runnable
fin = open(os.path.join(basepath, filename), "rb")
csvin = csv.reader(fin, delimiter=",")
fout = open(os.path.join(basepath, otherfilename), "wb")
csvout = csv.writer(fout, delimiter="\t")
junk = reader.next() #skip header
csvout.write(["ID", "caseNum", ... ])
for row in reader:
out = row[:2]
out.extend(row[2].split())
out.extend(row[3:])
csvout.write(row)
fout.close()
fin.close()
del csvout #just paranoia, free up structures
del csvin #ditto
-=-=-=-=-
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web