Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #65001

Re: fseek In Compressed Files

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.022
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'explicitly': 0.05; 'boundaries.': 0.09; 'postgres': 0.09; 'postgresql,': 0.09; 'cc:addr:python-list': 0.11; 'jan': 0.12; 'believes': 0.16; 'compression': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'prevent': 0.16; 'do,': 0.16; 'wrote:': 0.18; "hasn't": 0.19; 'written': 0.21; 'otherwise,': 0.22; 'cc:addr:python.org': 0.22; '31,': 0.24; 'instance,': 0.24; 'specify': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'message-id:@mail.gmail.com': 0.30; 'posting': 0.31; 'that.': 0.31; 'file': 0.32; 'probably': 0.32; 'text': 0.33; 'fri,': 0.33; 'table': 0.34; 'could': 0.34; 'but': 0.35; 'received:google.com': 0.35; 'possible': 0.36; 'similar': 0.36; 'list': 0.37; 'configured': 0.38; 'that,': 0.38; 'though,': 0.39; 'read': 0.60; 'worry': 0.60; 'tell': 0.60; "you're": 0.61; 'first': 0.61; "you'll": 0.62; 'field': 0.63; 'more': 0.64; 'details': 0.65; 'line,': 0.68; 'divide': 0.84; 'to:none': 0.92
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=LSuqBm64Qja4BQLPDWenDYFf6+x1EMkkv6lCtP6x+wc=; b=WYOpozVKjp8aV2BYprv1nDr++9IwMsVDpFTgaOYHqIxr1HNNYoHodG01LQuakI838C LPOclf7gPCth/+gM0tk5GPKbrpRVLcce1fAOJlhgbxkU/UrI/PZUY3NtBfjEJUK9oapU btfG7MZAD6wQzXhlmLrwDFwWMsvhnF4JshY5GnniONUEMExbJD4TyKxbvwZQstS+GVL7 GqGU0gjCUv2x0Jm92qKmVV7hPxP+aG/3ArXENpHDfsZnP3yltA+OkIJJFs3aZB98eAr1 vuYvEOra0vEeemWDEMZV4v17RjG3jEceDkBd5l75gqk17+VopQDlyY9BIBX958pYSg6g 2EuA==
MIME-Version 1.0
X-Received by 10.66.164.229 with SMTP id yt5mr14582133pab.67.1391089743479; Thu, 30 Jan 2014 05:49:03 -0800 (PST)
In-Reply-To <78213f6b-3311-4487-a611-ecd3de33a168@googlegroups.com>
References <68316d1a-e52e-48b5-87df-7119f46ebabc@googlegroups.com> <78213f6b-3311-4487-a611-ecd3de33a168@googlegroups.com>
Date Fri, 31 Jan 2014 00:49:03 +1100
Subject Re: fseek In Compressed Files
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.6137.1391089752.18130.python-list@python.org> (permalink)
Lines 23
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1391089752 news.xs4all.nl 2894 [2001:888:2000:d::a6]:52605
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:65001

Show key headers only | View raw


On Fri, Jan 31, 2014 at 12:34 AM, Ayushi Dalmia
<ayushidalmia2604@gmail.com> wrote:
> where temp.txt is the posting list file which is first written in a compressed format and then read  later.

Unless you specify otherwise, a compressed file is likely to have
sub-byte boundaries. It might not be possible to seek to a specific
line.

What you could do, though, is explicitly compress each line, then
write out separately-compressed blocks. You can then seek to any one
that you want, read it, and decompress it. But at this point, you're
probably going to do better with a database; PostgreSQL, for instance,
will automatically compress any content that it believes it's
worthwhile to compress (as long as it's in a VARCHAR field or similar
and the table hasn't been configured to prevent that, yada yada). All
you have to do is tell Postgres to store this, retrieve that, and
it'll worry about the details of compression and decompression. As an
added benefit, you can divide the text up and let it do the hard work
of indexing, filtering, sorting, etc. I suspect you'll find that
deploying a database is a much more efficient use of your development
time than recreating all of that.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

fseek In Compressed Files Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-01-30 02:50 -0800
  Re: fseek In Compressed Files Peter Otten <__peter__@web.de> - 2014-01-30 12:28 +0100
  Re:fseek In Compressed Files Dave Angel <davea@davea.name> - 2014-01-30 06:55 -0500
  Re: fseek In Compressed Files Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-01-30 05:34 -0800
    Re: fseek In Compressed Files Chris Angelico <rosuav@gmail.com> - 2014-01-31 00:49 +1100
    Re: fseek In Compressed Files Dave Angel <davea@davea.name> - 2014-02-03 15:57 -0500
      Re: fseek In Compressed Files Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 04:39 -0800
  Re: fseek In Compressed Files Serhiy Storchaka <storchaka@gmail.com> - 2014-01-30 17:02 +0200
  Re: fseek In Compressed Files Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-01-30 07:37 -0800
    Re: fseek In Compressed Files Dave Angel <davea@davea.name> - 2014-01-30 13:46 -0500
      Re: fseek In Compressed Files Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-01-31 21:52 -0800
        Re: fseek In Compressed Files Dave Angel <davea@davea.name> - 2014-02-01 04:38 -0500
  Re: fseek In Compressed Files Peter Otten <__peter__@web.de> - 2014-01-30 17:21 +0100
    Re: fseek In Compressed Files Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-01-31 21:50 -0800
  Re: fseek In Compressed Files Serhiy Storchaka <storchaka@gmail.com> - 2014-02-03 20:32 +0200

csiph-web