Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #90965

Re: Best approach to create humongous amount of files

Date 2015-05-20 11:12 -0500
From Tim Chase <python.list@tim.thechases.com>
Subject Re: Best approach to create humongous amount of files
References <CAPkZ3MS5SiGH9OCe9RSTmakF681O+qM572y49FuDBmBix=aiFg@mail.gmail.com> <CAPTjJmppiMpVjTBt5CH_6DGSdCWw5aoDU+jY-3wMs5Ai7tPdKw@mail.gmail.com> <20150520100723.3a34a775@bigbox.christie.dr> <mjib17$g32$1@ger.gmane.org>
Newsgroups comp.lang.python
Message-ID <mailman.175.1432138870.17265.python-list@python.org> (permalink)

Show all headers | View raw


On 2015-05-20 17:59, Peter Otten wrote:
> Tim Chase wrote:
> >   wordlist[:] = [ # just lowercase all-alpha words
> >     word
> >     for word in wordlist
> >     if word.isalpha() and word.islower()
> >     ]
> 
> Just a quick reminder: if the data is user-provided you have to
> sanitize it:

Thus my sanitizing to isalpha()+islower() words in my sample.

> I expect that performance will be dominated by I/O; if that's
> correct the extra work of serializing the JSON should not do much
> harm.

I seem to recall that there was a change-over, that an older JSON
library was particularly slow, but that a later replacement sped that
up immensely.  So performance may depend heavily on which version
you're running.

[to the OP] But yes, if you're trusting unsanitized data, Peter's
suggestion would be the way to go.

-tkc

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Best approach to create humongous amount of files Tim Chase <python.list@tim.thechases.com> - 2015-05-20 11:12 -0500

csiph-web