Groups > comp.lang.python > #90950 > unrolled thread

Best approach to create humongous amount of files

Started by	Parul Mogra <scoria.799@gmail.com>
First post	2015-05-20 17:14 +0530
Last post	2015-05-21 18:28 +0200
Articles	4 — 4 participants

Back to article view | Back to comp.lang.python

  Best approach to create humongous amount of files Parul Mogra <scoria.799@gmail.com> - 2015-05-20 17:14 +0530
    Re: Best approach to create humongous amount of files Denis McMahon <denismfmcmahon@gmail.com> - 2015-05-20 18:08 +0000
      Re: Best approach to create humongous amount of files "Mario R. Osorio" <nimbiotics@gmail.com> - 2015-05-21 08:54 -0700
        Re: Best approach to create humongous amount of files Peter Otten <__peter__@web.de> - 2015-05-21 18:28 +0200

#90950 — Best approach to create humongous amount of files

From	Parul Mogra <scoria.799@gmail.com>
Date	2015-05-20 17:14 +0530
Subject	Best approach to create humongous amount of files
Message-ID	<mailman.166.1432123945.17265.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

Hello everyone,
My objective is to create large amount of data files (say a million *.json
files), using a pre-existing template file (*.json). Each file would have a
unique name, possibly by incorporating time stamp information. The files
have to be generated in a folder specified.

What is the best strategy to achieve this task, so that the files will be
generated in the shortest possible time? Say within an hour.

Thanks

[toc] | [next] | [standalone]

#90972

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2015-05-20 18:08 +0000
Message-ID	<mjiij7$h2l$2@dont-email.me>
In reply to	#90950

On Wed, 20 May 2015 17:14:15 +0530, Parul Mogra wrote:

> Hello everyone,
> My objective is to create large amount of data files (say a million
> *.json files), using a pre-existing template file (*.json). Each file
> would have a unique name, possibly by incorporating time stamp
> information. The files have to be generated in a folder specified.

> What is the best strategy to achieve this task, so that the files will
> be generated in the shortest possible time? Say within an hour.
	
timestamps are normally unixtime in seconds. There are 3600 seconds in an 
hour. You'll have a hard job creating a million files with timestamp 
based naming inside of an hour.

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]

#91012

From	"Mario R. Osorio" <nimbiotics@gmail.com>
Date	2015-05-21 08:54 -0700
Message-ID	<4a0e2c39-ff50-400f-b488-86a6cb37dbd6@googlegroups.com>
In reply to	#90972

On Wednesday, May 20, 2015 at 2:09:59 PM UTC-4, Denis McMahon wrote:
> On Wed, 20 May 2015 17:14:15 +0530, Parul Mogra wrote:
> 
> > Hello everyone,
> > My objective is to create large amount of data files (say a million
> > *.json files), using a pre-existing template file (*.json). Each file
> > would have a unique name, possibly by incorporating time stamp
> > information. The files have to be generated in a folder specified.
> 
> > What is the best strategy to achieve this task, so that the files will
> > be generated in the shortest possible time? Say within an hour.
> 	
> timestamps are normally unixtime in seconds. There are 3600 seconds in an 
> hour. You'll have a hard job creating a million files with timestamp 
> based naming inside of an hour.
> 
> -- 
> Denis McMahon, denismfmcmahon@gmail.com

I would use a combination of both, timestamp and a serial number, such as:
201505201425440000
201505201425440001
201505201425440002
201505201425440003
201505201425450000
201505201425450001
201505201425460000
.. and so on ..

[toc] | [prev] | [next] | [standalone]

#91013

From	Peter Otten <__peter__@web.de>
Date	2015-05-21 18:28 +0200
Message-ID	<mailman.202.1432225768.17265.python-list@python.org>
In reply to	#91012

Mario R. Osorio wrote:

> On Wednesday, May 20, 2015 at 2:09:59 PM UTC-4, Denis McMahon wrote:
>> On Wed, 20 May 2015 17:14:15 +0530, Parul Mogra wrote:
>> 
>> > Hello everyone,
>> > My objective is to create large amount of data files (say a million
>> > *.json files), using a pre-existing template file (*.json). Each file
>> > would have a unique name, possibly by incorporating time stamp
>> > information. The files have to be generated in a folder specified.
>> 
>> > What is the best strategy to achieve this task, so that the files will
>> > be generated in the shortest possible time? Say within an hour.
>> 
>> timestamps are normally unixtime in seconds. There are 3600 seconds in an
>> hour. You'll have a hard job creating a million files with timestamp
>> based naming inside of an hour.
>> 
>> --
>> Denis McMahon, denismfmcmahon@gmail.com
> 
> I would use a combination of both, timestamp and a serial number, such as:
> 201505201425440000
> 201505201425440001
> 201505201425440002
> 201505201425440003
> 201505201425450000
> 201505201425450001
> 201505201425460000
> .. and so on ..

Like this?

import time
import itertools
from operator import itemgetter

try:
    from itertools import imap as map
except ImportError:
    pass

INDEX_TEMPLATE = "{}-{:02}-{:02}-{:02}-{:02}-{:02}-{i:03}"


def unique_names(template):
    return (
        template.format(INDEX_TEMPLATE.format(*t, i=i))
        for g in map(itemgetter(1), itertools.groupby(iter(time.gmtime, 
())))
        for i, t in enumerate(g, 1))


if __name__ == "__main__":
    import random

    for name in unique_names("foo-{}.txt"):
        print(name)
        time.sleep(random.random())

I mean, readability counts...

[toc] | [prev] | [standalone]

csiph-web

Best approach to create humongous amount of files

Contents

#90950 — Best approach to create humongous amount of files

#90972

#91012

#91013