Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #24893

Re: Discussion on some Code Issues

From subhabangalore@gmail.com
Newsgroups comp.lang.python
Subject Re: Discussion on some Code Issues
Date 2012-07-04 20:25 -0700
Organization http://groups.google.com
Message-ID <34484d3d-d4c2-463b-8f83-dba57ce0511d@googlegroups.com> (permalink)
References <a4f0e2a9-cc3b-4081-beb9-82f229e95ba1@googlegroups.com>

Show all headers | View raw


On Thursday, July 5, 2012 4:51:46 AM UTC+5:30, (unknown) wrote:
> Dear Group,
> 
> I am Sri Subhabrata Banerjee trying to write from Gurgaon, India to discuss some coding issues. If any one of this learned room can shower some light I would be helpful enough. 
> 
> I got to code a bunch of documents  which are combined together. 
> Like, 
> 
> 1)A Mumbai-bound aircraft with 99 passengers on board was struck by lightning on Tuesday evening that led to complete communication failure in mid-air and forced the pilot to make an emergency landing.
> 2) The discovery of a new sub-atomic particle that is key to understanding how the universe is built has an intrinsic Indian connection.
> 3) A bomb explosion outside a shopping mall here on Tuesday left no one injured, but Nigerian authorities put security agencies on high alert fearing more such attacks in the city.
> 
> The task is to separate the documents on the fly and to parse each of the documents with a definite set of rules. 
> 
> Now, the way I am processing is: 
> I am clubbing all the documents together, as,
> 
> A Mumbai-bound aircraft with 99 passengers on board was struck by lightning on Tuesday evening that led to complete communication failure in mid-air and forced the pilot to make an emergency landing.The discovery of a new sub-atomic particle that is key to understanding how the universe is built has an intrinsic Indian connection. A bomb explosion outside a shopping mall here on Tuesday left no one injured, but Nigerian authorities put security agencies on high alert fearing more such attacks in the city.
> 
> But they are separated by a tag set, like, 
> A Mumbai-bound aircraft with 99 passengers on board was struck by lightning on Tuesday evening that led to complete communication failure in mid-air and forced the pilot to make an emergency landing.$
> The discovery of a new sub-atomic particle that is key to understanding how the universe is built has an intrinsic Indian connection.$
> A bomb explosion outside a shopping mall here on Tuesday left no one injured, but Nigerian authorities put security agencies on high alert fearing more such attacks in the city.
> 
> To detect the document boundaries, I am splitting them into a bag of words and using a simple for loop as, 
> for i in range(len(bag_words)):
>         if bag_words[i]=="$":
>             print (bag_words[i],i)
> 
> There is no issue. I am segmenting it nicely. I am using annotated corpus so applying parse rules. 
> 
> The confusion comes next, 
> 
> As per my problem statement the size of the file (of documents combined together) won’t increase on the fly. So, just to support all kinds of combinations I am appending in a list the “I” values, taking its length, and using slice. Works perfect. Question is, is there a smarter way to achieve this, and a curious question if the documents are on the fly with no preprocessed tag set like “$” how may I do it? From a bunch without EOF isn’t it a classification problem? 
> 
> There is no question on parsing it seems I am achieving it independent of length of the document. 
> 
> If any one in the group can suggest how I am dealing with the problem and which portions should be improved and how?
> 
> Thanking You in Advance,
> 
> Best Regards,
> Subhabrata Banerjee.


Hi Steven, It is nice to see your post. They are nice and I learnt so many things from you. "I" is for index of the loop.
Now my clarification I thought to do "import os" and process files in a loop but that is not my problem statement. I have to make a big lump of text and detect one chunk. Looping over the line number of file I am not using because I may not be able to take the slices-this I need. I thought to give re.findall a try but that is not giving me the slices. Slice spreads here. The power issue of string! I would definitely give it a try. Happy Day Ahead Regards, Subhabrata Banerjee.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-04 16:21 -0700
  Re: Discussion on some Code Issues Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-05 00:02 +0000
  Re: Discussion on some Code Issues Rick Johnson <rantingrickjohnson@gmail.com> - 2012-07-04 17:08 -0700
  Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-04 20:25 -0700
    Re: Discussion on some Code Issues Peter Otten <__peter__@web.de> - 2012-07-05 09:30 +0200
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-05 07:33 -0700
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-05 07:33 -0700
        Re: Discussion on some Code Issues Peter Otten <__peter__@web.de> - 2012-07-06 09:35 +0200
  Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-07 12:54 -0700
    Re: Discussion on some Code Issues Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-07-07 16:51 -0400
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-07 22:42 -0700
        Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-08 18:03 +1000
          Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-08 10:05 -0700
            Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-09 03:17 +1000
              Re: Discussion on some Code Issues Roy Smith <roy@panix.com> - 2012-07-08 14:17 -0400
                Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-09 07:54 +1000
                Re: Discussion on some Code Issues Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-09 00:57 +0000
                Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-09 18:41 +1000
                Re: Discussion on some Code Issues Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-09 12:24 +0000
                Re: Discussion on some Code Issues Chris Angelico <rosuav@gmail.com> - 2012-07-10 00:47 +1000
                Re: Discussion on some Code Issues Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-07-09 12:49 -0400
              Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-16 07:17 -0700
              Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-16 07:17 -0700
            Re: Discussion on some Code Issues MRAB <python@mrabarnett.plus.com> - 2012-07-08 19:27 +0100
          Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-08 10:05 -0700
        Re: Discussion on some Code Issues Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-07-08 15:07 -0400
      Re: Discussion on some Code Issues subhabangalore@gmail.com - 2012-07-07 22:42 -0700

csiph-web