Groups > comp.lang.python > #90965 > unrolled thread

Re: Best approach to create humongous amount of files

Started by	Tim Chase <python.list@tim.thechases.com>
First post	2015-05-20 11:12 -0500
Last post	2015-05-20 11:12 -0500
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Best approach to create humongous amount of files Tim Chase <python.list@tim.thechases.com> - 2015-05-20 11:12 -0500

#90965 — Re: Best approach to create humongous amount of files

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-05-20 11:12 -0500
Subject	Re: Best approach to create humongous amount of files
Message-ID	<mailman.175.1432138870.17265.python-list@python.org>

On 2015-05-20 17:59, Peter Otten wrote:
> Tim Chase wrote:
> >   wordlist[:] = [ # just lowercase all-alpha words
> >     word
> >     for word in wordlist
> >     if word.isalpha() and word.islower()
> >     ]
> 
> Just a quick reminder: if the data is user-provided you have to
> sanitize it:

Thus my sanitizing to isalpha()+islower() words in my sample.

> I expect that performance will be dominated by I/O; if that's
> correct the extra work of serializing the JSON should not do much
> harm.

I seem to recall that there was a change-over, that an older JSON
library was particularly slow, but that a later replacement sped that
up immensely.  So performance may depend heavily on which version
you're running.

[to the OP] But yes, if you're trusting unsanitized data, Peter's
suggestion would be the way to go.

-tkc

[toc] | [standalone]

csiph-web

Re: Best approach to create humongous amount of files

Contents

#90965 — Re: Best approach to create humongous amount of files