Received: by 10.224.223.77 with SMTP id ij13mr833175qab.1.1346235746192; Wed, 29 Aug 2012 03:22:26 -0700 (PDT) Received: by 10.52.180.202 with SMTP id dq10mr64631vdc.17.1346235746165; Wed, 29 Aug 2012 03:22:26 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!r1no2729879qas.0!news-out.google.com!da15ni48885608qab.0!nntp.google.com!r1no2729877qas.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Wed, 29 Aug 2012 03:22:26 -0700 (PDT) In-Reply-To: <1c7cd833-b6ad-4a17-8ffe-a0ce20c8f400@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2.229.102.162; posting-account=5n9o1QoAAAClfHc4fQdLBCN15pyoz8MM NNTP-Posting-Host: 2.229.102.162 References: <1c7cd833-b6ad-4a17-8ffe-a0ce20c8f400@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: What do I do to read html files on my pc? From: mikcec82 Injection-Date: Wed, 29 Aug 2012 10:22:26 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:28052 Il giorno luned=EC 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto: > Hallo, >=20 >=20 >=20 > I have an html file on my pc and I want to read it to extract some text. >=20 > Can you help on which libs I have to use and how can I do it? >=20 >=20 >=20 > thank you so much. >=20 >=20 >=20 > Michele Hi Peter and thanks for your precious help. Fortunately, there aren't runs of "X" with repeats other than 2 or 4. Starting from your code, I wrote this code (I post it, so it could be helpf= ul for other people): f =3D open(fileorig, 'r')=20 nomefile =3D f.read() start =3D nomefile.find("XX") start2 =3D nomefile.find("NOT PASSED") c0 =3D 0 c1 =3D 0 c2 =3D 0 while (start !=3D -1) | (start2 !=3D -1): =20 if nomefile[start:start+4] =3D=3D "XXXX":=20 print "XXXX found at location", start start +=3D 4 c0 +=3D1 elif nomefile[start:start+2] =3D=3D "XX": print "XX found at location", start start +=3D 2 c1 +=3D1 =20 if nomefile[start2:start2+10] =3D=3D "NOT PASSED":=20 print "NOT PASSED found at location", start2 start2 +=3D 10 c2 +=3D1 start =3D nomefile.find("XX", start) start2 =3D nomefile.find("NOT PASSED", start2) print "XXXX %s founded" % c0, "\nXX %s founded" % c1, "\nNOT = PASSED %s founded" % c2 Now, I'm able to find all occurences of strings: "XXXX", "XX" and "NOT PASS= ED"=20 Thank you so much.