Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #24921

Re: Discussion on some Code Issues

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <subhabangalore@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.009
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; '(of': 0.07; 'data):': 0.07; 'next,': 0.07; 'omit': 0.07; 'parsing': 0.07; 'problem?': 0.07; 'eof': 0.09; 'loop.': 0.09; 'portions': 0.09; 'rules.': 0.09; 'slices': 0.09; 'splitting': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'suggest': 0.11; 'index': 0.13; '"import': 0.16; 'boundaries,': 0.16; 'btw': 0.16; 'combinations': 0.16; 'confusion': 0.16; 'enough.': 0.16; 'intrinsic': 0.16; 'length,': 0.16; 'looping': 0.16; 'mall': 0.16; 'perfect.': 0.16; 'pilot': 0.16; 'set,': 0.16; 'simplest': 0.16; 'slice.': 0.16; 'statement.': 0.16; 'subject:Discussion': 0.16; 'wrote:': 0.17; 'fix': 0.17; 'detect': 0.17; 'issue.': 0.20; 'sort': 0.21; 'trying': 0.21; 'together.': 0.21; 'discovery': 0.22; 'parse': 0.22; 'subject:Code': 0.22; 'cc:2**0': 0.23; 'statement': 0.23; 'task': 0.23; 'seems': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header :In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'coding': 0.27; 'separate': 0.27; 'question': 0.27; 'document.': 0.27; 'lines': 0.28; 'appending': 0.29; 'issues.': 0.29; 'lightning': 0.29; 'separated': 0.29; 'subject:some': 0.29; 'convert': 0.29; 'words': 0.29; 'thursday,': 0.30; 'helpful': 0.30; 'code': 0.31; 'file': 0.32; 'room': 0.32; 'print': 0.32; 'achieving': 0.33; 'curious': 0.33; 'problem': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'text': 0.34; 'list': 0.35; 'ahead': 0.35; 'city.': 0.35; 'text.': 0.35; 'so,': 0.35; 'board': 0.35; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'but': 0.36; 'data.': 0.36; 'indian': 0.36; 'should': 0.36; 'skip:p 20': 0.36; 'india': 0.36; 'received:209': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'some': 0.38; 'things': 0.38; 'end': 0.40; 'your': 0.60; 'group,': 0.60; 'july': 0.60; 'easy': 0.60; 'from:no real name:2**0': 0.60; 'skip:u 10': 0.60; 'you.': 0.61; 'high': 0.61; 'improved': 0.62; 'email addr:gmail.com': 0.63; 'more': 0.63; 'here': 0.65; 'charset:windows-1252': 0.65; 'taking': 0.65; 'learned': 0.65; 'dear': 0.66; 'agencies': 0.66; 'applying': 0.69; 'increase': 0.72; 'day': 0.73; 'power': 0.74; 'bag': 0.75; 'connection.': 0.75; 'fearing': 0.84; 'forced': 0.84; 'isn\x92t': 0.84; 'nicely.': 0.84; 'nigerian': 0.84; 'otten': 0.84; 'peter,': 0.84; 'spreads': 0.84; 'sri': 0.84; 'together,': 0.84; 'universe': 0.84; 'shopping': 0.87; 'aircraft': 0.91; 'evening': 0.91; 'received:209.85.220.184': 0.91; 'try.': 0.91; 'authorities': 0.95
Newsgroups comp.lang.python
Date Thu, 5 Jul 2012 07:33:27 -0700 (PDT)
In-Reply-To <mailman.1814.1341473418.4697.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=122.161.240.29; posting-account=6SonuQoAAACzSakS5dCECcJQe6ylLrzY
References <a4f0e2a9-cc3b-4081-beb9-82f229e95ba1@googlegroups.com> <34484d3d-d4c2-463b-8f83-dba57ce0511d@googlegroups.com> <mailman.1814.1341473418.4697.python-list@python.org>
User-Agent G2/1.0
X-Google-Web-Client true
X-Google-IP 122.161.240.29
MIME-Version 1.0
Subject Re: Discussion on some Code Issues
From subhabangalore@gmail.com
To comp.lang.python@googlegroups.com
Content-Type text/plain; charset=windows-1252
Content-Transfer-Encoding quoted-printable
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Message-ID <mailman.1828.1341498810.4697.python-list@python.org> (permalink)
Lines 131
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1341498810 news.xs4all.nl 6922 [2001:888:2000:d::a6]:35844
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:24921

Show key headers only | View raw


Dear Peter,
That is a nice one. I am thinking if I can write "for lines in f" sort of code that is easy but then how to find out the slices then, btw do you know in any case may I convert the index position of file to the list position provided I am writing the list for the same file we are reading. 

Best Regards,
Subhabrata. 

On Thursday, July 5, 2012 1:00:12 PM UTC+5:30, Peter Otten wrote:
> subhabangalore@gmail.com wrote:
> 
> > On Thursday, July 5, 2012 4:51:46 AM UTC+5:30, (unknown) wrote:
> >> Dear Group,
> >> 
> >> I am Sri Subhabrata Banerjee trying to write from Gurgaon, India to
> >> discuss some coding issues. If any one of this learned room can shower
> >> some light I would be helpful enough.
> >> 
> >> I got to code a bunch of documents  which are combined together.
> >> Like,
> >> 
> >> 1)A Mumbai-bound aircraft with 99 passengers on board was struck by
> >> lightning on Tuesday evening that led to complete communication failure
> >> in mid-air and forced the pilot to make an emergency landing. 2) The
> >> discovery of a new sub-atomic particle that is key to understanding how
> >> the universe is built has an intrinsic Indian connection. 3) A bomb
> >> explosion outside a shopping mall here on Tuesday left no one injured,
> >> but Nigerian authorities put security agencies on high alert fearing more
> >> such attacks in the city.
> >> 
> >> The task is to separate the documents on the fly and to parse each of the
> >> documents with a definite set of rules.
> >> 
> >> Now, the way I am processing is:
> >> I am clubbing all the documents together, as,
> >> 
> >> A Mumbai-bound aircraft with 99 passengers on board was struck by
> >> lightning on Tuesday evening that led to complete communication failure
> >> in mid-air and forced the pilot to make an emergency landing.The
> >> discovery of a new sub-atomic particle that is key to understanding how
> >> the universe is built has an intrinsic Indian connection. A bomb
> >> explosion outside a shopping mall here on Tuesday left no one injured,
> >> but Nigerian authorities put security agencies on high alert fearing more
> >> such attacks in the city.
> >> 
> >> But they are separated by a tag set, like,
> >> A Mumbai-bound aircraft with 99 passengers on board was struck by
> >> lightning on Tuesday evening that led to complete communication failure
> >> in mid-air and forced the pilot to make an emergency landing.$ The
> >> discovery of a new sub-atomic particle that is key to understanding how
> >> the universe is built has an intrinsic Indian connection.$ A bomb
> >> explosion outside a shopping mall here on Tuesday left no one injured,
> >> but Nigerian authorities put security agencies on high alert fearing more
> >> such attacks in the city.
> >> 
> >> To detect the document boundaries, I am splitting them into a bag of
> >> words and using a simple for loop as, for i in range(len(bag_words)):
> >>         if bag_words[i]=="$":
> >>             print (bag_words[i],i)
> >> 
> >> There is no issue. I am segmenting it nicely. I am using annotated corpus
> >> so applying parse rules.
> >> 
> >> The confusion comes next,
> >> 
> >> As per my problem statement the size of the file (of documents combined
> >> together) won’t increase on the fly. So, just to support all kinds of
> >> combinations I am appending in a list the “I” values, taking its length,
> >> and using slice. Works perfect. Question is, is there a smarter way to
> >> achieve this, and a curious question if the documents are on the fly with
> >> no preprocessed tag set like “$” how may I do it? From a bunch without
> >> EOF isn’t it a classification problem?
> >> 
> >> There is no question on parsing it seems I am achieving it independent of
> >> length of the document.
> >> 
> >> If any one in the group can suggest how I am dealing with the problem and
> >> which portions should be improved and how?
> >> 
> >> Thanking You in Advance,
> >> 
> >> Best Regards,
> >> Subhabrata Banerjee.
> > 
> > 
> > Hi Steven, It is nice to see your post. They are nice and I learnt so many
> > things from you. "I" is for index of the loop. Now my clarification I
> > thought to do "import os" and process files in a loop but that is not my
> > problem statement. I have to make a big lump of text and detect one chunk.
> > Looping over the line number of file I am not using because I may not be
> > able to take the slices-this I need. I thought to give re.findall a try
> > but that is not giving me the slices. Slice spreads here. The power issue
> > of string! I would definitely give it a try. Happy Day Ahead Regards,
> > Subhabrata Banerjee.
> 
> Then use re.finditer():
> 
> start = 0
> for match in re.finditer(r"\$", data):
>     end = match.start()
>     print(start, end)
>     print(data[start:end])
>     start = match.end()
> 
> This will omit the last text. The simplest fix is to put another "$" 
> separator at the end of your data.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-04 16:21 -0700
  Re: Discussion on some Code Issues Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-05 00:02 +0000
  Re: Discussion on some Code Issues Rick Johnson <rantingrickjohnson@gmail.com> - 2012-07-04 17:08 -0700
  Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-04 20:25 -0700
    Re: Discussion on some Code Issues Peter Otten <__peter__@web.de> - 2012-07-05 09:30 +0200
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-05 07:33 -0700
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-05 07:33 -0700
        Re: Discussion on some Code Issues Peter Otten <__peter__@web.de> - 2012-07-06 09:35 +0200
  Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-07 12:54 -0700
    Re: Discussion on some Code Issues Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-07-07 16:51 -0400
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-07 22:42 -0700
        Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-08 18:03 +1000
          Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-08 10:05 -0700
            Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-09 03:17 +1000
              Re: Discussion on some Code Issues Roy Smith <roy@panix.com> - 2012-07-08 14:17 -0400
                Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-09 07:54 +1000
                Re: Discussion on some Code Issues Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-09 00:57 +0000
                Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-09 18:41 +1000
                Re: Discussion on some Code Issues Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-09 12:24 +0000
                Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-10 00:47 +1000
                Re: Discussion on some Code Issues Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-07-09 12:49 -0400
              Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-16 07:17 -0700
              Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-16 07:17 -0700
            Re: Discussion on some Code Issues MRAB <python@mrabarnett.plus.com> - 2012-07-08 19:27 +0100
          Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-08 10:05 -0700
        Re: Discussion on some Code Issues Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-07-08 15:07 -0400
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-07 22:42 -0700

csiph-web