Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #21547

RE: Fast file data retrieval?

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ramit.prasad@jpmorgan.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.010
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'completeness': 0.05; 'python': 0.08; 'dict': 0.09; 'eof': 0.09; 'subject:file': 0.13; 'to:name:python-list@python.org': 0.15; '712': 0.16; 'currencies': 0.16; 'disclaimers': 0.16; 'disclaimers,': 0.16; 'from:addr:jpmorgan.com': 0.16; 'grep': 0.16; 'python;': 0.16; 'received:155.180': 0.16; 'received:155.180.234': 0.16; 'received:159.53': 0.16; 'received:bankone.net': 0.16; 'received:exchad.jpmchase.net': 0.16; 'received:jpmchase.com': 0.16; 'received:jpmchase.net': 0.16; 'received:svr.bankone.net': 0.16; 'securities,': 0.16; 'url:disclosures': 0.16; 'url:jpmorgan': 0.16; 'file,': 0.21; 'header:In-Reply-To:1': 0.22; 'subject:data': 0.25; 'received:169': 0.28; 'received:169.254': 0.28; 'position.': 0.28; 'lines': 0.30; 'received:155': 0.30; 'received:159': 0.30; 'subject:?': 0.31; 'file.': 0.31; 'accuracy': 0.32; 'headers': 0.32; 'that,': 0.32; 'there': 0.33; 'file': 0.34; 'to:addr:python-list': 0.35; 'phone:': 0.35; '...': 0.35; 'sets': 0.35; 'offset': 0.37; 'charset:us-ascii': 0.37; 'could': 0.38; 'several': 0.38; 'data': 0.38; 'header': 0.39; 'to:addr:python.org': 0.40; 'subject': 0.61; 'offers': 0.62; 'alphanumeric': 0.67; 'information,': 0.69; 'legal': 0.72; 'url:email': 0.72; 'thousand': 0.74; 'bank': 0.75; 'sale': 0.75; 'investment': 0.77; 'purchase': 0.78; 'received:169.254.8': 0.84; 'subject:Fast': 0.84; 'steps.': 0.93
X-DKIM OpenDKIM Filter v2.1.3 sf3.jpmchase.com q2CL9DgN003057
DKIM-Signature v=1; a=rsa-sha256; c=simple/simple; d=jpmorgan.com; s=smtpout; t=1331586553; bh=LLFQ+G1uXggJb9LOqLHNCoUKVgNqxHZ30q5F2RVrhdI=; h=From:To:Subject:Date:Message-ID:References:In-Reply-To: Content-Transfer-Encoding:MIME-Version:Content-Type; b=q5gqWXd5kzYZGUGMrxtjoWcn09T2uftboP/EDFeGYac441Gs0UNn/lzogFOw2sNEz rAFVQXAKL+qF7hVoJwJJFnPhszb0JZ6Et7VPwrrRADGru9PpRcryjRbfbPrx7DU/X3 Rs8a1llNvOA8/O6we9BKmPOsXwfQ0gNfsI6LvdkA=
From "Prasad, Ramit" <ramit.prasad@jpmorgan.com>
To "python-list@python.org" <python-list@python.org>
Subject RE: Fast file data retrieval?
Thread-Topic Fast file data retrieval?
Thread-Index AQHNAItSrV6iIdRrwUexxlc5bJnpz5ZnYDOA///E8mA=
Date Mon, 12 Mar 2012 21:09:05 +0000
References <4F5E50F6.9070309@it.uu.se> <4F5E5D27.4010403@mrabarnett.plus.com>
In-Reply-To <4F5E5D27.4010403@mrabarnett.plus.com>
Accept-Language en-US
Content-Language en-US
X-MS-Has-Attach
X-MS-TNEF-Correlator
x-originating-ip [10.67.79.38]
Content-Transfer-Encoding quoted-printable
MIME-Version 1.0
X-DLP-FWD Yes
Content-Type text/plain; charset="us-ascii"
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.596.1331586564.3037.python-list@python.org> (permalink)
Lines 21
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1331586564 news.xs4all.nl 6913 [2001:888:2000:d::a6]:56532
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:21547

Show key headers only | View raw


> > header line
> > 9 nonblank lines with alphanumeric data
> > header line
> > 9 nonblank lines with alphanumeric data
> > ...
> > ...
> > ...
> > header line
> > 9 nonblank lines with alphanumeric data
> > EOF
> >
> > where, a data set contains 10 lines (header + 9 nonblank) and there can
> > be several thousand
> > data sets in a single file. In addition,*each header has a* *unique ID
> > code*.

> Alternatively, you could scan the file, recording the ID and the file
> offset in a dict so that, given an ID, you can seek directly to that
> file position.

If you can grep for the header lines you can retrieve the headers
and the line number for seeking. grep is (probably) faster than python so
I would have it be 2 steps. 
1. grep > temp.txt
2. python; check if ID is in temp.txt and then processes

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

RE: Fast file data retrieval? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-12 21:09 +0000

csiph-web