Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Dennis Lee Bieber <wlfraed@ix.netcom.com>
Subject: Re: Discussion on some Code Issues
Date: Sun, 08 Jul 2012 15:07:06 -0400
Organization: > Bestiaria Support Staff <
References: <a4f0e2a9-cc3b-4081-beb9-82f229e95ba1@googlegroups.com> <3c4e2ef9-bf7e-4fbc-bf12-6780fdc3e5d4@googlegroups.com> <mailman.1901.1341694286.4697.python-list@python.org> <09adb3cf-f3f2-4acc-b561-a36dcf15ecc7@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1924.1341774432.4697.python-list@python.org>
Lines: 52
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:25050

On Sat, 7 Jul 2012 22:42:13 -0700 (PDT), subhabangalore@gmail.com
declaimed the following in gmane.comp.python.general:

> 
> Thanks for pointing out the mistakes. Your points are right. So I am trying to revise it,
> 
> file_open=open("/python32/doc1.txt","r")
> for line in file_open:
>          line_word=line.split()
>          print (line_word)
> 
> To store them the best way is to assign a blank list and append but is there any alternate
> method for huge data it becomes tough as the list becomes huge if any way variables may be assigned.
> 
	Well, first to copy from an earlier post (just so I can trim the
unneeded)...

> > > I like to store in some variable,so that I may print line of my choice and manipulate them at my choice.
> > > Is there any way out to this problem?

	It is still not clear exactly what the task itself is supposed to
be.

	After all, you are splitting the line into a LIST of words, and then
here state the goal is to "print line of" choice... The line and not the
list? There is no hint of what "manipulate them" involves.

	If the files are of any size, I would not even attempt to store them
internally... I'd be more likely to run a preprocess phase which opens
the file in binary mode, (maybe reads it in chunks), and builds a list
of /offsets/ to the start of each line. To process any specific line
later would use seek() operations to the start of the line, followed by
a read operation of just the length to the next line.

	Doing an mmap() of the file may event speed up the later processing,
as you wouldn't be using I/O seeks, but just asking for slices from the
mmap'd file. The OS would be responsible for making sure the file
contents were in memory.

	This won't work if the manipulation requires making a line longer or
shorter. In that case, preprocessing would be writing the lines to a
simple BSD-DB style "database", in which the "line number" is the key;
an manipulation would work on records fetched by line number, and
written back.

	If you also store a "process date" in the BSD-DB database, you could
match it to the last modified time of the source file and skip
reprocessing if the source has not changed.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/