Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #28052
| Received | by 10.224.223.77 with SMTP id ij13mr833175qab.1.1346235746192; Wed, 29 Aug 2012 03:22:26 -0700 (PDT) |
|---|---|
| Received | by 10.52.180.202 with SMTP id dq10mr64631vdc.17.1346235746165; Wed, 29 Aug 2012 03:22:26 -0700 (PDT) |
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!r1no2729879qas.0!news-out.google.com!da15ni48885608qab.0!nntp.google.com!r1no2729877qas.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail |
| Newsgroups | comp.lang.python |
| Date | Wed, 29 Aug 2012 03:22:26 -0700 (PDT) |
| In-Reply-To | <1c7cd833-b6ad-4a17-8ffe-a0ce20c8f400@googlegroups.com> |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=2.229.102.162; posting-account=5n9o1QoAAAClfHc4fQdLBCN15pyoz8MM |
| NNTP-Posting-Host | 2.229.102.162 |
| References | <1c7cd833-b6ad-4a17-8ffe-a0ce20c8f400@googlegroups.com> |
| User-Agent | G2/1.0 |
| MIME-Version | 1.0 |
| Message-ID | <c92e1fdc-ff14-4484-9e03-e322de9ba82b@googlegroups.com> (permalink) |
| Subject | Re: What do I do to read html files on my pc? |
| From | mikcec82 <michele.cecere@gmail.com> |
| Injection-Date | Wed, 29 Aug 2012 10:22:26 +0000 |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| Xref | csiph.com comp.lang.python:28052 |
Show key headers only | View raw
Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> Hallo,
>
>
>
> I have an html file on my pc and I want to read it to extract some text.
>
> Can you help on which libs I have to use and how can I do it?
>
>
>
> thank you so much.
>
>
>
> Michele
Hi Peter and thanks for your precious help.
Fortunately, there aren't runs of "X" with repeats other than 2 or 4.
Starting from your code, I wrote this code (I post it, so it could be helpful for other people):
f = open(fileorig, 'r')
nomefile = f.read()
start = nomefile.find("XX")
start2 = nomefile.find("NOT PASSED")
c0 = 0
c1 = 0
c2 = 0
while (start != -1) | (start2 != -1):
if nomefile[start:start+4] == "XXXX":
print "XXXX found at location", start
start += 4
c0 +=1
elif nomefile[start:start+2] == "XX":
print "XX found at location", start
start += 2
c1 +=1
if nomefile[start2:start2+10] == "NOT PASSED":
print "NOT PASSED found at location", start2
start2 += 10
c2 +=1
start = nomefile.find("XX", start)
start2 = nomefile.find("NOT PASSED", start2)
print "XXXX %s founded" % c0, "\nXX %s founded" % c1, "\nNOT PASSED %s founded" % c2
Now, I'm able to find all occurences of strings: "XXXX", "XX" and "NOT PASSED"
Thank you so much.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-27 03:59 -0700
Re: What do I do to read html files on my pc? Chris Angelico <rosuav@gmail.com> - 2012-08-27 21:58 +1000
Re: What do I do to read html files on my pc? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-27 13:05 +0100
Re: What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-27 06:51 -0700
Re: What do I do to read html files on my pc? Joel Goldstick <joel.goldstick@gmail.com> - 2012-08-27 10:21 -0400
Re: What do I do to read html files on my pc? Chris Angelico <rosuav@gmail.com> - 2012-08-28 00:41 +1000
Re: What do I do to read html files on my pc? Jean-Michel Pichavant <jeanmichel@sequans.com> - 2012-08-27 18:57 +0200
Re: What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-28 03:09 -0700
Re: What do I do to read html files on my pc? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-08-28 13:31 +0100
Re: What do I do to read html files on my pc? Peter Otten <__peter__@web.de> - 2012-08-28 17:38 +0200
Re: What do I do to read html files on my pc? mikcec82 <michele.cecere@gmail.com> - 2012-08-29 03:22 -0700
Re: What do I do to read html files on my pc? Umesh Sharma <usharma01@gmail.com> - 2012-08-29 05:00 -0700
csiph-web