Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: srinivas devaki Newsgroups: comp.lang.python Subject: Re: Storing a big amount of path names Date: Fri, 12 Feb 2016 11:46:41 +0530 Lines: 55 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de WloIQ58mTY9yY2lmfL9onQr/hWP51H2D35iCOVrIrD0w== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.010 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'received:209.85.223': 0.03; 'cc:addr:python-list': 0.09; 'stored': 0.10; 'def': 0.13; '+91': 0.15; '(shortest': 0.16; '2016': 0.16; 'dirnames': 0.16; 'folder.': 0.16; 'helps.': 0.16; 'i.e': 0.16; 'pathnames': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'wrote:': 0.16; 'memory': 0.17; 'string': 0.17; 'char': 0.18; 'pfxlen:0': 0.18; '>': 0.18; 'student': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'junior': 0.22; 'am,': 0.23; 'feb': 0.23; 'header:In-Reply-To:1': 0.24; 'skip:m 30': 0.27; 'message- id:@mail.gmail.com': 0.27; 'idea': 0.28; 'node': 0.29; 'structure,': 0.29; 'objects': 0.29; 'skip:_ 10': 0.32; 'class': 0.33; 'skip:_ 30': 0.33; 'common': 0.33; 'extract': 0.33; 'structure': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'path': 0.35; 'but': 0.36; 'instead': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'indian': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'being': 0.37; '12,': 0.37; "won't": 0.38; 'received:209': 0.38; 'names': 0.38; 'stuff': 0.38; 'skip:p 20': 0.38; 'files': 0.38; 'represent': 0.38; 'data': 0.39; 'your': 0.60; 'share': 0.61; 'hope': 0.61; 'school': 0.62; 'more': 0.63; 'different': 0.63; 'where:': 0.66; '(3rd': 0.84; 'parents.': 0.84; 'ph:': 0.84; 'url:self': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=teUAsSZ/CY65x7U4QcuAOwEpLcC/fbMj/cDgyzlsUgI=; b=fRFF22MySqMGyFhEhIesgtKsxXOFfyxL9fb8h/fa/J9aUPLrgLQmrU1RrVdx2mEF8i 1SDKoLwJcxl5FKWYUd4AL3eXTMOONlAgh2eSoGV2pRXw0knS+WTyTf3pmT3bCD74ANYV Pr+x0/nKogKKfx3WEhtNm5v2HXcGA8OnJryQotTYk+G93Sq/mIG9D5IWJN8HDHdkEHS6 kLhv4h1APd7LqWRDRt5gzPNuAeBYjBK3IuG51OLHakDnGjDebdl2hPhzYCAZYEjJBeF4 rwNy1FCLlZ/0awaD0uoE1+IrxUifxb5i3YWSllep+BlpPEyUxdo5oLUGd4dT0qpUx14p iGcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=teUAsSZ/CY65x7U4QcuAOwEpLcC/fbMj/cDgyzlsUgI=; b=AaG5N+6leWfxAcFbGYyUQuM2Cgjyo5dKNu8xU6KoOMDbYaes85enrwxxjBvCQgt8cv cWcOZ6c4zwNGuPhIwk/s/C2j5n6Bq5szjv2vQmACATRAM1LeSI/NgAluzZ9D7hyHUpK8 EQTJxWls9GbTNTr2o3g7iDxU+WWb0DFQlOYnGvaWsriJzLwKTMcHNcjEKzB5y5DCSoAc iD4bYxAR6RTPah3jVfN2KjrJgYUBYeH32QfDMWh/QzKz9N9VmCX7eBtyi8/exyFPDES2 LHOgQcHdmoY45+f0oGi8pB2UrosIuzVDMoyCrXaH/noZTypReNB6CZAdwaJYJEEFx0Tf SdHg== X-Gm-Message-State: AG10YOSPMY6B9SHkqW36AcAzvWBVyh9GbkcLx5+cXQaGFgff7b9fcBhFslbHjpAyDrBHvxkNon5vhMgyAqmLhA== X-Received: by 10.107.3.33 with SMTP id 33mr1104490iod.80.1455257802081; Thu, 11 Feb 2016 22:16:42 -0800 (PST) In-Reply-To: X-Content-Filtered-By: Mailman/MimeDel 2.1.21rc2 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21rc2 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:102849 On Feb 12, 2016 6:05 AM, "Paulo da Silva" wrote: > > Hi! > > What is the best (shortest memory usage) way to store lots of pathnames > in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. > > More realistically not only the pathnames are stored but objects each > object being a MyFile containing > self.name - > getPathname(self) - > other stuff > > class MyFile: > > __allfiles=[] > > def __init__(self,dirname,filename): > self.dirname=dirname # But I want to share this with other files > self.name=filename > MyFile.__allfiles.append(self) > ... > > def getPathname(self): > return os.path.join(self.dirname,self.name) > what you want is Trie data structure, which won't use extra memory if the basepath of your strings are common. instead of having constructing a char Trie, try to make it as string Trie i.e each directory name is a node and all the files and folders are it's children, each node can be of two types a file and folder. if you come to think about it this is most intuitive way to represent the file structure in your program. you can extract the directory name from the file object by traversing it's parents. I hope this helps. Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight