Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90950 > unrolled thread
| Started by | Parul Mogra <scoria.799@gmail.com> |
|---|---|
| First post | 2015-05-20 17:14 +0530 |
| Last post | 2015-05-21 18:28 +0200 |
| Articles | 4 — 4 participants |
Back to article view | Back to comp.lang.python
Best approach to create humongous amount of files Parul Mogra <scoria.799@gmail.com> - 2015-05-20 17:14 +0530
Re: Best approach to create humongous amount of files Denis McMahon <denismfmcmahon@gmail.com> - 2015-05-20 18:08 +0000
Re: Best approach to create humongous amount of files "Mario R. Osorio" <nimbiotics@gmail.com> - 2015-05-21 08:54 -0700
Re: Best approach to create humongous amount of files Peter Otten <__peter__@web.de> - 2015-05-21 18:28 +0200
| From | Parul Mogra <scoria.799@gmail.com> |
|---|---|
| Date | 2015-05-20 17:14 +0530 |
| Subject | Best approach to create humongous amount of files |
| Message-ID | <mailman.166.1432123945.17265.python-list@python.org> |
[Multipart message — attachments visible in raw view] — view raw
Hello everyone, My objective is to create large amount of data files (say a million *.json files), using a pre-existing template file (*.json). Each file would have a unique name, possibly by incorporating time stamp information. The files have to be generated in a folder specified. What is the best strategy to achieve this task, so that the files will be generated in the shortest possible time? Say within an hour. Thanks
[toc] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-05-20 18:08 +0000 |
| Message-ID | <mjiij7$h2l$2@dont-email.me> |
| In reply to | #90950 |
On Wed, 20 May 2015 17:14:15 +0530, Parul Mogra wrote: > Hello everyone, > My objective is to create large amount of data files (say a million > *.json files), using a pre-existing template file (*.json). Each file > would have a unique name, possibly by incorporating time stamp > information. The files have to be generated in a folder specified. > What is the best strategy to achieve this task, so that the files will > be generated in the shortest possible time? Say within an hour. timestamps are normally unixtime in seconds. There are 3600 seconds in an hour. You'll have a hard job creating a million files with timestamp based naming inside of an hour. -- Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | "Mario R. Osorio" <nimbiotics@gmail.com> |
|---|---|
| Date | 2015-05-21 08:54 -0700 |
| Message-ID | <4a0e2c39-ff50-400f-b488-86a6cb37dbd6@googlegroups.com> |
| In reply to | #90972 |
On Wednesday, May 20, 2015 at 2:09:59 PM UTC-4, Denis McMahon wrote: > On Wed, 20 May 2015 17:14:15 +0530, Parul Mogra wrote: > > > Hello everyone, > > My objective is to create large amount of data files (say a million > > *.json files), using a pre-existing template file (*.json). Each file > > would have a unique name, possibly by incorporating time stamp > > information. The files have to be generated in a folder specified. > > > What is the best strategy to achieve this task, so that the files will > > be generated in the shortest possible time? Say within an hour. > > timestamps are normally unixtime in seconds. There are 3600 seconds in an > hour. You'll have a hard job creating a million files with timestamp > based naming inside of an hour. > > -- > Denis McMahon, denismfmcmahon@gmail.com I would use a combination of both, timestamp and a serial number, such as: 201505201425440000 201505201425440001 201505201425440002 201505201425440003 201505201425450000 201505201425450001 201505201425460000 .. and so on ..
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-05-21 18:28 +0200 |
| Message-ID | <mailman.202.1432225768.17265.python-list@python.org> |
| In reply to | #91012 |
Mario R. Osorio wrote:
> On Wednesday, May 20, 2015 at 2:09:59 PM UTC-4, Denis McMahon wrote:
>> On Wed, 20 May 2015 17:14:15 +0530, Parul Mogra wrote:
>>
>> > Hello everyone,
>> > My objective is to create large amount of data files (say a million
>> > *.json files), using a pre-existing template file (*.json). Each file
>> > would have a unique name, possibly by incorporating time stamp
>> > information. The files have to be generated in a folder specified.
>>
>> > What is the best strategy to achieve this task, so that the files will
>> > be generated in the shortest possible time? Say within an hour.
>>
>> timestamps are normally unixtime in seconds. There are 3600 seconds in an
>> hour. You'll have a hard job creating a million files with timestamp
>> based naming inside of an hour.
>>
>> --
>> Denis McMahon, denismfmcmahon@gmail.com
>
> I would use a combination of both, timestamp and a serial number, such as:
> 201505201425440000
> 201505201425440001
> 201505201425440002
> 201505201425440003
> 201505201425450000
> 201505201425450001
> 201505201425460000
> .. and so on ..
Like this?
import time
import itertools
from operator import itemgetter
try:
from itertools import imap as map
except ImportError:
pass
INDEX_TEMPLATE = "{}-{:02}-{:02}-{:02}-{:02}-{:02}-{i:03}"
def unique_names(template):
return (
template.format(INDEX_TEMPLATE.format(*t, i=i))
for g in map(itemgetter(1), itertools.groupby(iter(time.gmtime,
())))
for i, t in enumerate(g, 1))
if __name__ == "__main__":
import random
for name in unique_names("foo-{}.txt"):
print(name)
time.sleep(random.random())
I mean, readability counts...
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web