Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!eternal-september.org!feeder.eternal-september.org!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'paths': 0.05; 'ascii': 0.07; 'column': 0.07; 'indices': 0.07; 'integer,': 0.09; 'pointers': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'record.': 0.09; 'subject:method': 0.09; 'index': 0.13; '255': 0.16; 'ids.': 0.16; 'paths.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'threads': 0.16; 'bytes': 0.17; 'integer': 0.17; 'subject:page': 0.17; 'jan': 0.18; 'discussion': 0.20; 'absolute': 0.23; 'creating': 0.26; 'raw': 0.27; 'header:X -Complaints-To:1': 0.28; 'no,': 0.29; 'probably': 0.29; 'maybe': 0.29; 'file': 0.32; 'running': 0.32; 'could': 0.32; 'subject: .': 0.33; 'url:home': 0.33; 'to:addr:python-list': 0.33; 'text': 0.34; 'table': 0.35; 'add': 0.36; 'received:org': 0.36; 'characters': 0.36; 'should': 0.36; 'charset:us-ascii': 0.36; 'two': 0.37; 'why': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'store': 0.38; 'to:addr:python.org': 0.39; 'short': 0.39; 'header:Received:5': 0.40; 'your': 0.60; "you've": 0.61; 'maximum': 0.63; '10,000': 0.65; 'risk': 0.66; '(id': 0.84; '20,000': 0.84; '2013': 0.84; 'alone!': 0.84; 'collision': 0.84; 'column.': 0.84; 'subject:Using': 0.84; 'dennis': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dennis Lee Bieber Subject: Re: Using filepath method to identify an .html page Date: Tue, 22 Jan 2013 17:01:26 -0500 Organization: > Bestiaria Support Staff < References: <50fe8e69$0$30003$c3e8da3$5496439d@news.astraweb.com> <0459659d-4ec2-4c7d-bee3-b4e363c916dd@googlegroups.com> <4847a0e3-aefa-4330-9252-db08f2e993df@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: adsl-76-253-99-200.dsl.klmzmi.sbcglobal.net X-Newsreader: Forte Agent 3.3/32.846 X-No-Archive: YES X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 35 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1358892099 news.xs4all.nl 6932 [2001:888:2000:d::a6]:58178 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:37355 On Tue, 22 Jan 2013 10:07:21 -0800 (PST), Ferrous Cranus declaimed the following in gmane.comp.python.general: > > No, because i DO NOT WANT to store LOTS OF BIGS absolute paths in the database. > Why not? What is "BIG"... 10,000 paths of 255 characters is (presume ASCII 1-byte per character) means you have 2,550,000 characters -- That's LESS THAN THREE MB for all the file paths. Add in a 2-byte short integer ID and you've got 20,000 bytes of IDs. Creating unique indices (ID should already be a unique auto-increment column) double the data usage plus maybe 160,000 bytes for the pointers from the index to the data record. 2,550,000 + 20,000 => 2,570,000 raw data 2,570,000 + 160,000 => 2,730,000 indices 2,570,000 + 2,730,000 => 5,300,000 5MB maximum I could store all that on my ancient PDA! We've probably generated that much text in the two discussion threads alone! The safest way to generate your four digit integer, without running the risk of collision from hashing, is a simple database table with unique ID column and unique filepath column. -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/