Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #57986 > unrolled thread
| Started by | Victor Hooi <victorhooi@gmail.com> |
|---|---|
| First post | 2013-10-29 18:02 -0700 |
| Last post | 2013-10-30 13:23 +0000 |
| Articles | 8 — 6 participants |
Back to article view | Back to comp.lang.python
Using "with open(filename, 'ab'):" and calling code only if the file is new? Victor Hooi <victorhooi@gmail.com> - 2013-10-29 18:02 -0700
RE: Using "with open(filename, 'ab'):" and calling code only if the file is new? "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-10-30 01:42 +0000
RE: Using "with open(filename, 'ab'):" and calling code only if the file is new? Dave Angel <davea@davea.name> - 2013-10-30 02:13 +0000
RE: Using "with open(filename, 'ab'):" and calling code only if the file is new? "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-10-30 02:55 +0000
Re: Using "with open(filename, 'ab'):" and calling code only if the file is new? Victor Hooi <victorhooi@gmail.com> - 2013-10-29 20:22 -0700
Fwd: Using "with open(filename, 'ab'):" and calling code only if the file is new? Zachary Ware <zachary.ware+pylist@gmail.com> - 2013-10-29 22:28 -0500
Re: Using "with open(filename, 'ab'):" and calling code only if the file is new? Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-10-30 08:53 +0100
Re: Using "with open(filename, 'ab'):" and calling code only if the file is new? Neil Cerutti <neilc@norwich.edu> - 2013-10-30 13:23 +0000
| From | Victor Hooi <victorhooi@gmail.com> |
|---|---|
| Date | 2013-10-29 18:02 -0700 |
| Subject | Using "with open(filename, 'ab'):" and calling code only if the file is new? |
| Message-ID | <68bd6cb6-44b2-446c-b0e2-043e3ac1c35b@googlegroups.com> |
Hi,
I have a CSV file that I will repeatedly appending to.
I'm using the following to open the file:
with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
# Call csv_writer.writeheader() if file is new.
csv_writer.writerows(my_dict)
I'm wondering what's the best way of calling writeheader() only if the file is new?
My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.
I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.
Is there another way I can execute code only if the file is new?
Cheers,
Victor
[toc] | [next] | [standalone]
| From | "Joseph L. Casale" <jcasale@activenetwerx.com> |
|---|---|
| Date | 2013-10-30 01:42 +0000 |
| Message-ID | <mailman.1783.1383097380.18130.python-list@python.org> |
| In reply to | #57986 |
> with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as > output: > fieldnames = (...) > csv_writer = DictWriter(output, filednames) > # Call csv_writer.writeheader() if file is new. > csv_writer.writerows(my_dict) > > I'm wondering what's the best way of calling writeheader() only if the file is > new? > > My understanding is that I don't want to use os.path.exist(), since that opens > me up to race conditions. What stops you from checking before and setting a flag?
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-10-30 02:13 +0000 |
| Message-ID | <mailman.1786.1383099209.18130.python-list@python.org> |
| In reply to | #57986 |
On 29/10/2013 21:42, Joseph L. Casale wrote: You forgot the attribution line: "Victor says" >> with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as >> output: >> fieldnames = (...) >> csv_writer = DictWriter(output, filednames) >> # Call csv_writer.writeheader() if file is new. >> csv_writer.writerows(my_dict) >> >> I'm wondering what's the best way of calling writeheader() only if the file is >> new? >> >> My understanding is that I don't want to use os.path.exist(), since that opens >> me up to race conditions. > > What stops you from checking before and setting a flag? Like Victor says, that opens him up to race conditions. Victor: You need to more completely specify your environment. Are there multiple instances of this or similar program running simultaneously? If so, you've got lots more problems than a missing or duplicated header line. You could get partial lines intermixing between the two outputs. Chances are if you really need to support more than one program at the same time, you'll need to use a lower-level open, perhaps from the os module. Some form of locking is called for. And if the data SHOULD be interleaved, you'll have to arrange it so it gets done in whole number increments. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | "Joseph L. Casale" <jcasale@activenetwerx.com> |
|---|---|
| Date | 2013-10-30 02:55 +0000 |
| Message-ID | <mailman.1788.1383101757.18130.python-list@python.org> |
| In reply to | #57986 |
> Like Victor says, that opens him up to race conditions.
Slim chance, it's no more possible than it happening in the time try/except
takes to recover an alternative procedure.
with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
if os.path.getsize('out_file'):
print('file not empty')
else:
#write header
print('file was empty')
And if that's still not acceptable (you did say new) than open the out_file 'r+' an seek
and read to check for a header.
But if your file is not new and lacks a header, then what?
jlc
[toc] | [prev] | [next] | [standalone]
| From | Victor Hooi <victorhooi@gmail.com> |
|---|---|
| Date | 2013-10-29 20:22 -0700 |
| Message-ID | <63fdb9ff-314d-4177-9dbe-58d9213cae7b@googlegroups.com> |
| In reply to | #57997 |
Hi,
In theory, it *should* just be our script writing to the output CSV file.
However, I wanted it to be robust - e.g. in case somebody spins up two copies of this script running concurrently.
I suppose the timing would have to be pretty unlucky to hit a race condition there, right?
As in, somebody would have have to open the new file and write to it somewhere in between the check line (os.path.getsize) and the following line (writeheaders).
However, you're saying the only way to be completely safe is some kind of file locking?
Another person (Zachary Ware) suggested using .tell() on the file as well - I suppose that's similar enough to using os.path.getsize(), right?
But basically, I can call .tell() or os.path.getsize() on the file to see if it's zero, and then just call writeheaders() on the following line.
In the future - we may be moving to storing results in something like SQLite, or MongoDB and outputting a CSV directly from there.
Cheers,
Victor
On Wednesday, 30 October 2013 13:55:53 UTC+11, Joseph L. Casale wrote:
> > Like Victor says, that opens him up to race conditions.
>
>
>
> Slim chance, it's no more possible than it happening in the time try/except
>
> takes to recover an alternative procedure.
>
>
>
> with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
>
> if os.path.getsize('out_file'):
>
> print('file not empty')
>
> else:
>
> #write header
>
> print('file was empty')
>
>
>
> And if that's still not acceptable (you did say new) than open the out_file 'r+' an seek
>
> and read to check for a header.
>
>
>
> But if your file is not new and lacks a header, then what?
>
> jlc
[toc] | [prev] | [next] | [standalone]
| From | Zachary Ware <zachary.ware+pylist@gmail.com> |
|---|---|
| Date | 2013-10-29 22:28 -0500 |
| Message-ID | <mailman.1789.1383104100.18130.python-list@python.org> |
| In reply to | #57986 |
On Tue, Oct 29, 2013 at 8:02 PM, Victor Hooi <victorhooi@gmail.com> wrote:
> Hi,
>
> I have a CSV file that I will repeatedly appending to.
>
> I'm using the following to open the file:
>
> with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
> fieldnames = (...)
> csv_writer = DictWriter(output, filednames)
> # Call csv_writer.writeheader() if file is new.
> csv_writer.writerows(my_dict)
>
> I'm wondering what's the best way of calling writeheader() only if the file is new?
>
> My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.
>
> I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.
>
> Is there another way I can execute code only if the file is new?
>
> Cheers,
> Victor
I've not tested, but you might try
with ... open(...) as output:
...
if output.tell() == 0:
csv_writer.writeheader()
...
HTH
--
Zach
(failed to send to the list first time around...)
[toc] | [prev] | [next] | [standalone]
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
|---|---|
| Date | 2013-10-30 08:53 +0100 |
| Message-ID | <mailman.1796.1383119629.18130.python-list@python.org> |
| In reply to | #57986 |
Op 30-10-13 02:02, Victor Hooi schreef:
> Hi,
>
> I have a CSV file that I will repeatedly appending to.
>
> I'm using the following to open the file:
>
> with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
> fieldnames = (...)
> csv_writer = DictWriter(output, filednames)
> # Call csv_writer.writeheader() if file is new.
> csv_writer.writerows(my_dict)
>
> I'm wondering what's the best way of calling writeheader() only if the file is new?
If you are using 3.3 you could use something like this:
with open(self.full_path, 'r') as input:
try:
output = open(self.output_csv, 'abx')
new_file = True
except FileExistsError:
output = open(self.output_csv, 'ab')
new_file = False
fieldnames = (...)
csv_writer = DictWriter(output, filednames)
if new_file:
csv_writer.writeheader()
csv_writer.writerows(my_dict)
[toc] | [prev] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2013-10-30 13:23 +0000 |
| Message-ID | <bdcfj8Fk2pU1@mid.individual.net> |
| In reply to | #57986 |
On 2013-10-30, Victor Hooi <victorhooi@gmail.com> wrote:
> Hi,
>
> I have a CSV file that I will repeatedly appending to.
>
> I'm using the following to open the file:
>
> with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
> fieldnames = (...)
> csv_writer = DictWriter(output, filednames)
> # Call csv_writer.writeheader() if file is new.
> csv_writer.writerows(my_dict)
>
> I'm wondering what's the best way of calling writeheader() only
> if the file is new?
>
> My understanding is that I don't want to use os.path.exist(),
> since that opens me up to race conditions.
>
> I'm guessing I can't use try-except with IOError, since the
> open(..., 'ab') will work whether the file exists or not.
>
> Is there another way I can execute code only if the file is new?
A heavy-duty approach involves prepending the old contents to a
temporary file.
fieldnames = (...)
with tempfile.TempDirectory() as temp:
tempname = os.path.join(temp, 'output.csv')
with open(tempname, 'wb') as output:
writer = csv.DictWriter(output, fieldnames=fieldnames)
writer.writeheader()
try:
with open(self.output_csv, 'b') old_data:
reader = csv.DictReader(old_data)
for rec in reader:
writer.writerow(rec)
except IOError:
pass
with open(self.full_path, 'b') as infile:
# etc...
shutil.copy(tempname, self.output_csv)
This avoids clobbering output_csv unless new data is succesfully
written. I believe TempDirectory isn't available in Python 2, so
some other way of creating that path will be needed, and I'm too
lazy to look up how. ;)
--
Neil Cerutti
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web