Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57986 > unrolled thread

Using "with open(filename, 'ab'):" and calling code only if the file is new?

Started byVictor Hooi <victorhooi@gmail.com>
First post2013-10-29 18:02 -0700
Last post2013-10-30 13:23 +0000
Articles 8 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Using "with open(filename, 'ab'):" and calling code only if the file is new? Victor Hooi <victorhooi@gmail.com> - 2013-10-29 18:02 -0700
    RE: Using "with open(filename, 'ab'):" and calling code only if the file is new? "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-10-30 01:42 +0000
    RE: Using "with open(filename, 'ab'):" and calling code only if the file is new? Dave Angel <davea@davea.name> - 2013-10-30 02:13 +0000
    RE: Using "with open(filename, 'ab'):" and calling code only if the file is new? "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-10-30 02:55 +0000
      Re: Using "with open(filename, 'ab'):" and calling code only if the file is new? Victor Hooi <victorhooi@gmail.com> - 2013-10-29 20:22 -0700
    Fwd: Using "with open(filename, 'ab'):" and calling code only if the file is new? Zachary Ware <zachary.ware+pylist@gmail.com> - 2013-10-29 22:28 -0500
    Re: Using "with open(filename, 'ab'):" and calling code only if the file is new? Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-10-30 08:53 +0100
    Re: Using "with open(filename, 'ab'):" and calling code only if the file is new? Neil Cerutti <neilc@norwich.edu> - 2013-10-30 13:23 +0000

#57986 — Using "with open(filename, 'ab'):" and calling code only if the file is new?

FromVictor Hooi <victorhooi@gmail.com>
Date2013-10-29 18:02 -0700
SubjectUsing "with open(filename, 'ab'):" and calling code only if the file is new?
Message-ID<68bd6cb6-44b2-446c-b0e2-043e3ac1c35b@googlegroups.com>
Hi,

I have a CSV file that I will repeatedly appending to.

I'm using the following to open the file:

    with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
        fieldnames = (...)
        csv_writer = DictWriter(output, filednames)
        # Call csv_writer.writeheader() if file is new.
        csv_writer.writerows(my_dict)

I'm wondering what's the best way of calling writeheader() only if the file is new?

My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.

I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.

Is there another way I can execute code only if the file is new?

Cheers,
Victor

[toc] | [next] | [standalone]


#57990

From"Joseph L. Casale" <jcasale@activenetwerx.com>
Date2013-10-30 01:42 +0000
Message-ID<mailman.1783.1383097380.18130.python-list@python.org>
In reply to#57986
>     with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as
> output:
>         fieldnames = (...)
>         csv_writer = DictWriter(output, filednames)
>         # Call csv_writer.writeheader() if file is new.
>         csv_writer.writerows(my_dict)
> 
> I'm wondering what's the best way of calling writeheader() only if the file is
> new?
> 
> My understanding is that I don't want to use os.path.exist(), since that opens
> me up to race conditions.

What stops you from checking before and setting a flag?

[toc] | [prev] | [next] | [standalone]


#57994

FromDave Angel <davea@davea.name>
Date2013-10-30 02:13 +0000
Message-ID<mailman.1786.1383099209.18130.python-list@python.org>
In reply to#57986
On 29/10/2013 21:42, Joseph L. Casale wrote:

You forgot the attribution line:  "Victor says"
>>     with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as
>> output:
>>         fieldnames = (...)
>>         csv_writer = DictWriter(output, filednames)
>>         # Call csv_writer.writeheader() if file is new.
>>         csv_writer.writerows(my_dict)
>> 
>> I'm wondering what's the best way of calling writeheader() only if the file is
>> new?
>> 
>> My understanding is that I don't want to use os.path.exist(), since that opens
>> me up to race conditions.
>
> What stops you from checking before and setting a flag?

Like Victor says, that opens him up to race conditions.

Victor:

You need to more completely specify your environment.  Are there
multiple instances of this or similar program running simultaneously? 
If so, you've got lots more problems than a missing or duplicated header
line.  You could get partial lines intermixing between the two outputs.

Chances are if you really need to support more than one program at the
same time, you'll need to use a lower-level open, perhaps from the os
module.  Some form of locking is called for.  And if the data SHOULD be
interleaved, you'll have to arrange it so it gets done in whole number
increments.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#57997

From"Joseph L. Casale" <jcasale@activenetwerx.com>
Date2013-10-30 02:55 +0000
Message-ID<mailman.1788.1383101757.18130.python-list@python.org>
In reply to#57986
> Like Victor says, that opens him up to race conditions.

Slim chance, it's no more possible than it happening in the time try/except
takes to recover an alternative procedure.

with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
    if os.path.getsize('out_file'):
        print('file not empty')
    else:
        #write header
        print('file was empty')

And if that's still not acceptable (you did say new) than open the out_file 'r+' an seek
and read to check for a header.

But if your file is not new and lacks a header, then what?
jlc

[toc] | [prev] | [next] | [standalone]


#57999

FromVictor Hooi <victorhooi@gmail.com>
Date2013-10-29 20:22 -0700
Message-ID<63fdb9ff-314d-4177-9dbe-58d9213cae7b@googlegroups.com>
In reply to#57997
Hi,

In theory, it *should* just be our script writing to the output CSV file.

However, I wanted it to be robust - e.g. in case somebody spins up two copies of this script running concurrently.

I suppose the timing would have to be pretty unlucky to hit a race condition there, right?

As in, somebody would have have to open the new file and write to it somewhere in between the check line (os.path.getsize) and the following line (writeheaders).

However, you're saying the only way to be completely safe is some kind of file locking?

Another person (Zachary Ware) suggested using .tell() on the file as well - I suppose that's similar enough to using os.path.getsize(), right?

But basically, I can call .tell() or os.path.getsize() on the file to see if it's zero, and then just call writeheaders() on the following line.

In the future - we may be moving to storing results in something like SQLite, or MongoDB and outputting a CSV directly from there.

Cheers,
Victor

On Wednesday, 30 October 2013 13:55:53 UTC+11, Joseph L. Casale  wrote:
> > Like Victor says, that opens him up to race conditions.
> 
> 
> 
> Slim chance, it's no more possible than it happening in the time try/except
> 
> takes to recover an alternative procedure.
> 
> 
> 
> with open('in_file') as in_file, open('out_file', 'ab') as outfile_file:
> 
>     if os.path.getsize('out_file'):
> 
>         print('file not empty')
> 
>     else:
> 
>         #write header
> 
>         print('file was empty')
> 
> 
> 
> And if that's still not acceptable (you did say new) than open the out_file 'r+' an seek
> 
> and read to check for a header.
> 
> 
> 
> But if your file is not new and lacks a header, then what?
> 
> jlc

[toc] | [prev] | [next] | [standalone]


#58000

FromZachary Ware <zachary.ware+pylist@gmail.com>
Date2013-10-29 22:28 -0500
Message-ID<mailman.1789.1383104100.18130.python-list@python.org>
In reply to#57986
On Tue, Oct 29, 2013 at 8:02 PM, Victor Hooi <victorhooi@gmail.com> wrote:
> Hi,
>
> I have a CSV file that I will repeatedly appending to.
>
> I'm using the following to open the file:
>
>     with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
>         fieldnames = (...)
>         csv_writer = DictWriter(output, filednames)
>         # Call csv_writer.writeheader() if file is new.
>         csv_writer.writerows(my_dict)
>
> I'm wondering what's the best way of calling writeheader() only if the file is new?
>
> My understanding is that I don't want to use os.path.exist(), since that opens me up to race conditions.
>
> I'm guessing I can't use try-except with IOError, since the open(..., 'ab') will work whether the file exists or not.
>
> Is there another way I can execute code only if the file is new?
>
> Cheers,
> Victor

I've not tested, but you might try

with ... open(...) as output:
    ...
    if output.tell() == 0:
        csv_writer.writeheader()
...

HTH

--
Zach

(failed to send to the list first time around...)

[toc] | [prev] | [next] | [standalone]


#58010

FromAntoon Pardon <antoon.pardon@rece.vub.ac.be>
Date2013-10-30 08:53 +0100
Message-ID<mailman.1796.1383119629.18130.python-list@python.org>
In reply to#57986
Op 30-10-13 02:02, Victor Hooi schreef:
> Hi,
> 
> I have a CSV file that I will repeatedly appending to.
> 
> I'm using the following to open the file:
> 
>     with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
>         fieldnames = (...)
>         csv_writer = DictWriter(output, filednames)
>         # Call csv_writer.writeheader() if file is new.
>         csv_writer.writerows(my_dict)
> 
> I'm wondering what's the best way of calling writeheader() only if the file is new?

If you are using 3.3 you could use something like this:

with open(self.full_path, 'r') as input:
    try:
        output = open(self.output_csv, 'abx')
        new_file = True
    except FileExistsError:
	output = open(self.output_csv, 'ab')
        new_file = False
    fieldnames = (...)
    csv_writer = DictWriter(output, filednames)
    if new_file:
        csv_writer.writeheader()
    csv_writer.writerows(my_dict)

[toc] | [prev] | [next] | [standalone]


#58037

FromNeil Cerutti <neilc@norwich.edu>
Date2013-10-30 13:23 +0000
Message-ID<bdcfj8Fk2pU1@mid.individual.net>
In reply to#57986
On 2013-10-30, Victor Hooi <victorhooi@gmail.com> wrote:
> Hi,
>
> I have a CSV file that I will repeatedly appending to.
>
> I'm using the following to open the file:
>
>     with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:
>         fieldnames = (...)
>         csv_writer = DictWriter(output, filednames)
>         # Call csv_writer.writeheader() if file is new.
>         csv_writer.writerows(my_dict)
>
> I'm wondering what's the best way of calling writeheader() only
> if the file is new?
>
> My understanding is that I don't want to use os.path.exist(),
> since that opens me up to race conditions.
>
> I'm guessing I can't use try-except with IOError, since the
> open(..., 'ab') will work whether the file exists or not.
>
> Is there another way I can execute code only if the file is new?

A heavy-duty approach involves prepending the old contents to a
temporary file.

fieldnames = (...)

with tempfile.TempDirectory() as temp:
    tempname = os.path.join(temp, 'output.csv')
    with open(tempname, 'wb') as output:
        writer = csv.DictWriter(output, fieldnames=fieldnames)
        writer.writeheader()
        try:
            with open(self.output_csv, 'b') old_data:
                reader = csv.DictReader(old_data)
                for rec in reader:
                    writer.writerow(rec)
        except IOError:
            pass
        with open(self.full_path, 'b') as infile:
            # etc...
    shutil.copy(tempname, self.output_csv)

This avoids clobbering output_csv unless new data is succesfully
written. I believe TempDirectory isn't available in Python 2, so
some other way of creating that path will be needed, and I'm too
lazy to look up how. ;)

-- 
Neil Cerutti

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web