Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #28052

Re: What do I do to read html files on my pc?

Newsgroups comp.lang.python
Date 2012-08-29 03:22 -0700
References <1c7cd833-b6ad-4a17-8ffe-a0ce20c8f400@googlegroups.com>
Message-ID <c92e1fdc-ff14-4484-9e03-e322de9ba82b@googlegroups.com> (permalink)
Subject Re: What do I do to read html files on my pc?
From mikcec82 <michele.cecere@gmail.com>

Show all headers | View raw


Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> Hallo,
> 
> 
> 
> I have an html file on my pc and I want to read it to extract some text.
> 
> Can you help on which libs I have to use and how can I do it?
> 
> 
> 
> thank you so much.
> 
> 
> 
> Michele

Hi Peter and thanks for your precious help.
Fortunately, there aren't runs of "X" with repeats other than 2 or 4.
Starting from your code, I wrote this code (I post it, so it could be helpful for other people):
f = open(fileorig, 'r') 
nomefile = f.read()

start = nomefile.find("XX")
start2 = nomefile.find("NOT PASSED")
c0 = 0
c1 = 0
c2 = 0

while (start != -1) | (start2 != -1):
    
    if nomefile[start:start+4] == "XXXX": 
        print "XXXX       found at location", start
        start += 4
        c0 +=1
    elif nomefile[start:start+2] == "XX":
        print "XX         found at location", start
        start += 2
        c1 +=1
        
    if nomefile[start2:start2+10] == "NOT PASSED": 
        print "NOT PASSED found at location", start2
        start2 += 10
        c2 +=1

    start = nomefile.find("XX", start)
    start2 = nomefile.find("NOT PASSED", start2)

print "XXXX       %s founded" % c0, "\nXX         %s founded" % c1, "\nNOT PASSED %s founded" % c2

Now, I'm able to find all occurences of strings: "XXXX", "XX" and "NOT PASSED" 


Thank you so much.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-27 03:59 -0700
  Re: What do I do to read html files on my pc? Chris Angelico <rosuav@gmail.com> - 2012-08-27 21:58 +1000
  Re: What do I do to read html files on my pc? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-27 13:05 +0100
  Re: What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-27 06:51 -0700
    Re: What do I do to read html files on my pc? Joel Goldstick <joel.goldstick@gmail.com> - 2012-08-27 10:21 -0400
    Re: What do I do to read html files on my pc? Chris Angelico <rosuav@gmail.com> - 2012-08-28 00:41 +1000
    Re: What do I do to read html files on my pc? Jean-Michel Pichavant <jeanmichel@sequans.com> - 2012-08-27 18:57 +0200
  Re: What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-28 03:09 -0700
    Re: What do I do to read html files on my pc? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-08-28 13:31 +0100
  Re: What do I do to read html files on my pc? Peter Otten <__peter__@web.de> - 2012-08-28 17:38 +0200
  Re: What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-29 03:22 -0700
  Re: What do I do to read html files on my pc? Umesh Sharma <usharma01@gmail.com> - 2012-08-29 05:00 -0700

csiph-web