Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Tim Chase Newsgroups: comp.lang.python Subject: Re: Storing a big amount of path names Date: Thu, 11 Feb 2016 19:13:32 -0600 Lines: 27 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de eGJGmS7ba+PKY/o7PcZdwQ+dRfGK7OkEOWAX5owRu3OQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'dict': 0.09; 'dirname': 0.09; 'filenames,': 0.09; 'instance.': 0.09; 'sqlite': 0.09; '(shortest': 0.16; '-tkc': 0.16; 'dirnames': 0.16; 'disk.': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'instances,': 0.16; 'pathnames': 0.16; 'paulo': 0.16; 'received:10.122': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'wrote:': 0.16; 'memory': 0.17; 'class,': 0.22; 'header:In-Reply-To:1': 0.24; 'module': 0.25; 'disk': 0.27; 'finally,': 0.27; 'idea': 0.28; 'actual': 0.28; 'boundaries': 0.29; 'starts': 0.29; 'push': 0.30; "i'd": 0.31; 'common': 0.33; 'skip:d 20': 0.34; 'could': 0.35; 'path': 0.35; 'but': 0.36; 'there': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'really': 0.37; 'charset:us-ascii': 0.37; 'names': 0.38; 'skip:p 20': 0.38; 'shared': 0.38; 'does': 0.39; 'to:addr:python.org': 0.40; 'share': 0.61; 'default': 0.61; 'different': 0.63; 'where:': 0.66; 'talking': 0.67; 'exceed': 0.72; 'received:23': 0.84; 'specs.': 0.84 X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-MC-Relay: Neutral X-MailChannels-SenderId: wwwh|x-authuser|tim@thechases.com X-MailChannels-Auth-Id: wwwh X-MC-Loop-Signature: 1455239773386:604367785 X-MC-Ingress-Time: 1455239773385 In-Reply-To: X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) X-AuthUser: tim@thechases.com X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21rc2 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:102837 On 2016-02-12 00:31, Paulo da Silva wrote: > What is the best (shortest memory usage) way to store lots of > pathnames in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. Well, you can create a dict that has dirname->list(filenames) which will reduce the dirname to a single instance. You could store that dict in the class, shared by all of the instances, though that starts to pick up a code-smell. But unless you're talking about an obscenely large number of dirnames & filenames, or a severely resource-limited machine, just use the default built-ins. If you start to push the boundaries of system resources, then I'd try the "anydbm" module or use the "shelve" module to marshal them out to disk. Finally, you *could* create an actual sqlite database on disk if size really does exceed reasonable system specs. -tkc