Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'binary': 0.05; 'memory.': 0.05; 'modified': 0.05; 'append': 0.07; 'assign': 0.07; 'list?': 0.07; 'problem?': 0.07; 'mode,': 0.09; 'processing,': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'slices': 0.09; 'splitting': 0.09; "wouldn't": 0.11; 'size,': 0.13; 'sat,': 0.15; '(just': 0.16; 'hint': 0.16; 'of"': 0.16; 'preprocess': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'subject:Discussion': 0.16; 'later': 0.16; 'alternate': 0.17; 'skip': 0.17; 'variables': 0.17; 'followed': 0.20; 'file.': 0.20; 'written': 0.20; 'all,': 0.21; 'trying': 0.21; 'earlier': 0.21; 'supposed': 0.21; 'back.': 0.22; 'subject:Code': 0.22; "i'd": 0.22; 'task': 0.23; 'right.': 0.27; 'header:X-Complaints-To:1': 0.28; 'lines': 0.28; 'run': 0.28; 'post': 0.28; '(maybe': 0.29; 'changed.': 0.29; 'i/o': 0.29; 'subject:some': 0.29; 'case,': 0.29; 'points': 0.29; 'source': 0.29; 'becomes': 0.30; 'asking': 0.32; 'file': 0.32; 'could': 0.32; 'print': 0.32; 'builds': 0.33; 'url:home': 0.33; 'to:addr:python-list': 0.33; 'likely': 0.33; 'operations': 0.33; 'thanks': 0.34; 'list': 0.35; 'clear': 0.35; 'skip:f 40': 0.35; 'doing': 0.35; "won't": 0.35; 'there': 0.35; 'next': 0.35; 'received:org': 0.36; 'but': 0.36; 'be.': 0.36; 'method': 0.36; 'charset:us-ascii': 0.36; 'itself': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'store': 0.38; 'files': 0.38; 'skip:l 20': 0.38; 'some': 0.38; 'sure': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'your': 0.60; 'first': 0.61; 'email addr:gmail.com': 0.63; 'more': 0.63; 'making': 0.64; 'here': 0.65; 'jul': 0.65; 'records': 0.68; 'goal': 0.74; 'mistakes.': 0.84; 'dennis': 0.91; 'choice.': 0.93; 'tough': 0.97 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dennis Lee Bieber Subject: Re: Discussion on some Code Issues Date: Sun, 08 Jul 2012 15:07:06 -0400 Organization: > Bestiaria Support Staff < References: <3c4e2ef9-bf7e-4fbc-bf12-6780fdc3e5d4@googlegroups.com> <09adb3cf-f3f2-4acc-b561-a36dcf15ecc7@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: adsl-76-253-102-122.dsl.klmzmi.sbcglobal.net X-Newsreader: Forte Agent 3.3/32.846 X-No-Archive: YES X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 52 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1341774432 news.xs4all.nl 6870 [2001:888:2000:d::a6]:53267 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:25050 On Sat, 7 Jul 2012 22:42:13 -0700 (PDT), subhabangalore@gmail.com declaimed the following in gmane.comp.python.general: > > Thanks for pointing out the mistakes. Your points are right. So I am trying to revise it, > > file_open=open("/python32/doc1.txt","r") > for line in file_open: > line_word=line.split() > print (line_word) > > To store them the best way is to assign a blank list and append but is there any alternate > method for huge data it becomes tough as the list becomes huge if any way variables may be assigned. > Well, first to copy from an earlier post (just so I can trim the unneeded)... > > > I like to store in some variable,so that I may print line of my choice and manipulate them at my choice. > > > Is there any way out to this problem? It is still not clear exactly what the task itself is supposed to be. After all, you are splitting the line into a LIST of words, and then here state the goal is to "print line of" choice... The line and not the list? There is no hint of what "manipulate them" involves. If the files are of any size, I would not even attempt to store them internally... I'd be more likely to run a preprocess phase which opens the file in binary mode, (maybe reads it in chunks), and builds a list of /offsets/ to the start of each line. To process any specific line later would use seek() operations to the start of the line, followed by a read operation of just the length to the next line. Doing an mmap() of the file may event speed up the later processing, as you wouldn't be using I/O seeks, but just asking for slices from the mmap'd file. The OS would be responsible for making sure the file contents were in memory. This won't work if the manipulation requires making a line longer or shorter. In that case, preprocessing would be writing the lines to a simple BSD-DB style "database", in which the "line number" is the key; an manipulation would work on records fetched by line number, and written back. If you also store a "process date" in the BSD-DB database, you could match it to the last modified time of the source file and skip reprocessing if the source has not changed. -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/