Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #67346

How to extract contents of inner text of html tag?

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <shiblydu60@yahoo.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.003
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'subject:text': 0.05; 'none:': 0.07; 'sys': 0.07; '#print': 0.09; 'line:': 0.09; 'skip:# 30': 0.09; 'skip:/ 10': 0.09; 'subject:How': 0.10; 'received:113.11': 0.16; 'skip:# 20': 0.16; 'import': 0.22; 'to:name:python-list@python.org': 0.22; 'print': 0.22; 'skip:c 70': 0.24; 'skip:e 30': 0.24; 'skip:s 30': 0.35; 'charset:us- ascii': 0.36; 'subject:?': 0.36; 'hi,': 0.36; 'skip:- 20': 0.37; 'expected': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'header:Reply-To:1': 0.67; 'received:113': 0.68; 'skip:/ 30': 0.84
X-Yahoo-Newman-Property ymail-3
X-Yahoo-Newman-Id 282255.99327.bm@omp1052.mail.ne1.yahoo.com
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1393697458; bh=BRupQenYjqyd4eA9MyuwISc1U7lSQv8AHz9Bj818wDo=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=ndVHLTugPTO8aMcqlgK7yMltx5BK8tmsSfDeiSsH0kUpF/PtP+MUD1h5aiY9Ka5un3+oZb1KdbSlc1Zclb31HSOi7l26QkuF7+0V2FYVf2JdZYuu01E14Lgs7TDtrzTfsjZr/oE+2wLRhMye26oN3MwoGfsLNPegP7kXPJSa1p8=
DomainKey-Signature a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=OU2pU6f9G/Iz8s3P220yOTrIIC6JLJXLdHpi9oVjttDlvoE/kpLoDp+xZV4jTxH6lrP+LGwsDPgon1uNMX8oy5iI/PvGXgp30pqIii8C+/glUVuEEkTKMjzDo/S3qtaKYlA9x+FAMKyTV9Bd0Bno6mkruOFqfp+9tzaP0PAB2cU=;
X-YMail-OSG quGQhpcVM1nHQmHOTUnj4pZd8DUbMGFzZZOaRJOTURtyie3 h31v08FF.5oXnitFRjwsa8ov1oh8alQgIsDfmRTxIx5jS_hQmMaLX46jptnw P2SN4cKIEiqH9M07ubKZWBnYMUmrZ8vmCTs_MDGkB0NLF16so_eDhqzOiCVy EIMtCXIyV5saGbzWwO7M8VV7Pq7QAl6N.JFwSBPyg2633819dVnAGoGKx2Am 7vNFLmCzeXqY93UBY3q2rG0pZqscftoXrmGIwach6esxyH2GMVZOHe_IOojf 18Y1izQoVA_xr2FZ1yeF7j7FD1DvJJ_pho0a5GhiNbjKTwdUbp0.TX_fFgWg yAeWbJGI3gyg2wXEGpKnDPZ68Djpguhx18_TaJLFGm.U.7_8VITVa8dpMwwk aWQV7dYnTJgyZiz51CwN2tUfV3Z_d6JH6_0ninvYIkaO2RbX5aCsF8W129NZ 9YGNZlhLhMwcFQnPhtWBy_jiZ6aG4b5XqCMdzNnkfYTjvgNvvPRHTJQExCDO NFKEJVrXY8sMvdUC3co65EZWaitJr56h3tvVSjY32z3c9xuYxTxII5HWSi8S A1mE217klzZMFEA--
X-Rocket-MIMEInfo 002.001, SGksCgojIyNpbi50eHQKPGtiZCBjbGFzcz0iY29tbWFuZCI.CiAgICBjcCAtdiAtLXJlbW92ZS1kZXN0aW5hdGlvbiAvdXNyL3NoYXJlL3pvbmVpbmZvLwogICAgPGVtIGNsYXNzPSJyZXBsYWNlYWJsZSI.PGNvZGU.PHh4eD48L2NvZGU.PC9lbT4KICAgICAgIFwKICAgIC9ldGMvbG9jYWx0aW1lCjwva2JkPgoKaW1wb3J0IHN5cwppbXBvcnQgdW5pY29kZWRhdGEKZnJvbSBiczQgaW1wb3J0IEJlYXV0aWZ1bFNvdXAKCmZpbGVfbmFtZT0iaW4udHh0IgpodG1sX2RvYz1vcGVuKGZpbGVfbmFtZSwncicpCnNvdXABMAEBAQE-
X-Mailer YahooMailWebService/0.8.177.636
Date Sat, 1 Mar 2014 10:10:58 -0800 (PST)
From "Golam Md. Shibly" <shiblydu60@yahoo.com>
Subject How to extract contents of inner text of html tag?
To "python-list@python.org" <python-list@python.org>
MIME-Version 1.0
Content-Type multipart/alternative; boundary="1688457910-1520518465-1393697458=:5147"
X-Mailman-Approved-At Sat, 01 Mar 2014 21:06:05 +0100
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To "Golam Md. Shibly" <shiblydu60@yahoo.com>
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.7534.1393704366.18130.python-list@python.org> (permalink)
Lines 101
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1393704366 news.xs4all.nl 2860 [2001:888:2000:d::a6]:33231
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:67346

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

Hi,

###in.txt
<kbd class="command">
    cp -v --remove-destination /usr/share/zoneinfo/
    <em class="replaceable"><code><xxx></code></em>
       \
    /etc/localtime
</kbd>

import sys
import unicodedata
from bs4 import BeautifulSoup

file_name="in.txt"
html_doc=open(file_name,'r')
soup=BeautifulSoup(html_doc)
#print soup.prettify().encode('utf-8')
#file_to_write.writelines( soup.prettify().encode() )

all_kbd=soup.find_all('kbd')

for line in all_kbd:
	if line.string == None:		
		extract_code=line.code.extract().string
		#store_code=line.code.decompose()
		for inside_line in line:
			if "<<" not in inside_line and "EOF" not in inside_line:
				if len(inside_line)>0: 
					print inside_line
					print extract_code

expected output:
    cp -v --remove-destination /usr/share/zoneinfo/<xxx>\      
    /etc/localtime


Got output:
    cp -v --remove-destination /usr/share/zoneinfo/
    
None

       \
    /etc/localtime

None 

shibly

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

How to extract contents of inner text of html tag? "Golam Md. Shibly" <shiblydu60@yahoo.com> - 2014-03-01 10:10 -0800
  Re: How to extract contents of inner text of html tag? Jesse Adam <jaahush@gmail.com> - 2014-06-27 10:36 -0700

csiph-web