Path: csiph.com!usenet.pasdenom.info!news.albasani.net!news2.arglkargh.de!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(at': 0.03; 'retrieval': 0.05; 'say,': 0.05; 'bytes)': 0.09; 'sub': 0.09; 'suggest': 0.11; 'disk.': 0.16; 'dwarfed': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'storing': 0.16; 'structure.': 0.16; 'subject:Generate': 0.16; 'subject:URL': 0.16; 'url.': 0.16; 'wed,': 0.16; 'wrote:': 0.17; 'received:209.85.214.174': 0.21; 'example': 0.23; 'header:In-Reply-To:1': 0.25; 'disk': 0.27; 'message-id:@mail.gmail.com': 0.27; 'app.': 0.29; 'arithmetic': 0.29; 'hash': 0.29; 'probably': 0.29; "i'm": 0.29; 'folder': 0.30; 'figure': 0.30; 'could': 0.32; 'limitations': 0.33; 'to:addr :python-list': 0.33; "can't": 0.34; 'received:google.com': 0.34; 'nov': 0.35; 'richard': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'store': 0.38; 'files': 0.38; 'some': 0.38; 'to:addr:python.org': 0.39; 'received:209.85.214': 0.39; 'header:Received:5': 0.40; '100': 0.78; 'viewed': 0.78; 'locally': 0.84; 'average': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OX5P+rUstKVCEgCM1uZwfrnHKhXHmdjLzyYe1jkpG/8=; b=XmfSt3nE8FvgBJMRUaWSUY3N9u4m0JZvAve9Exz/qUxP9JvctQ4Qct29emoZJN9XPI tEQ6oDTurtgWxqhd87o/DcA9nMV0JwcX0UZI8r0cg6jiALbInYZm0F4ni6Nwk+oyxdxb u4K8tVztVfA7unCoeioh2hcwI5wOJLK/+1uiv5K5fE5imzTFIz+dWPaRaeiR9ZkJxmGz YtKgZ+lctWFOOn0oml2Me2YtQ5YxlleHpQUVwehpUx+yhaw0pyVGocdyyG+lVmZ1EhTn ZQCY9aMV+xc9ETST33Wq+C4VVH7f3+0tGmK1plYCJD6haEcxhFtCV/9r18fCTrNDLyKq m4jw== MIME-Version: 1.0 In-Reply-To: <1ce88f36-bfc7-4a55-89f8-70d1645d27ad@googlegroups.com> References: <0692e6a2-343c-4eb0-be57-fe5c815efb99@googlegroups.com> <1ce88f36-bfc7-4a55-89f8-70d1645d27ad@googlegroups.com> Date: Wed, 14 Nov 2012 15:06:10 +1100 Subject: Re: Generate unique ID for URL From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 14 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1352865978 news.xs4all.nl 6869 [2001:888:2000:d::a6]:60731 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:33295 On Wed, Nov 14, 2012 at 2:25 PM, Richard wrote: > So the use case - I'm storing webpages on disk and want a quick retrieval system based on URL. > I can't store the files in a single directory because of OS limitations so have been using a sub folder structure. > For example to store data at URL "abc": a/b/c/index.html > This data is also viewed locally through a web app. > > If you can suggest a better approach I would welcome it. The cost of a crypto hash on the URL will be completely dwarfed by the cost of storing/retrieving on disk. You could probably do some arithmetic and figure out exactly how many URLs (at an average length of, say, 100 bytes) you can hash in the time of one disk seek. ChrisA