Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=OU2pU6f9G/Iz8s3P220yOTrIIC6JLJXLdHpi9oVjttDlvoE/kpLoDp+xZV4jTxH6lrP+LGwsDPgon1uNMX8oy5iI/PvGXgp30pqIii8C+/glUVuEEkTKMjzDo/S3qtaKYlA9x+FAMKyTV9Bd0Bno6mkruOFqfp+9tzaP0PAB2cU=;
Date: Sat, 1 Mar 2014 10:10:58 -0800 (PST)
From: "Golam Md. Shibly" <shiblydu60@yahoo.com>
Subject: How to extract contents of inner text of html tag?
To: "python-list@python.org" <python-list@python.org>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="1688457910-1520518465-1393697458=:5147"
Precedence: list
Reply-To: "Golam Md. Shibly" <shiblydu60@yahoo.com>
Newsgroups: comp.lang.python
Message-ID: <mailman.7534.1393704366.18130.python-list@python.org>
Lines: 101
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:67346

--1688457910-1520518465-1393697458=:5147
Content-Type: text/plain; charset=us-ascii

Hi,

###in.txt
<kbd class="command">
    cp -v --remove-destination /usr/share/zoneinfo/
    <em class="replaceable"><code><xxx></code></em>
       \
    /etc/localtime
</kbd>

import sys
import unicodedata
from bs4 import BeautifulSoup

file_name="in.txt"
html_doc=open(file_name,'r')
soup=BeautifulSoup(html_doc)
#print soup.prettify().encode('utf-8')
#file_to_write.writelines( soup.prettify().encode() )

all_kbd=soup.find_all('kbd')

for line in all_kbd:
	if line.string == None:		
		extract_code=line.code.extract().string
		#store_code=line.code.decompose()
		for inside_line in line:
			if "<<" not in inside_line and "EOF" not in inside_line:
				if len(inside_line)>0: 
					print inside_line
					print extract_code

expected output:
    cp -v --remove-destination /usr/share/zoneinfo/<xxx>\      
    /etc/localtime


Got output:
    cp -v --remove-destination /usr/share/zoneinfo/
    
None

       \
    /etc/localtime

None 

shibly

--1688457910-1520518465-1393697458=:5147
Content-Type: text/html; charset=us-ascii

<html><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:12pt"><pre>Hi,<br><br>###in.txt
&lt;kbd class="command"&gt;
    cp -v --remove-destination /usr/share/zoneinfo/
    &lt;em class="replaceable"&gt;&lt;code&gt;&lt;xxx&gt;&lt;/code&gt;&lt;/em&gt;
       \
    /etc/localtime
&lt;/kbd&gt;

import sys
import unicodedata
from bs4 import BeautifulSoup

file_name="in.txt"
html_doc=open(file_name,'r')
soup=BeautifulSoup(html_doc)
#print soup.prettify().encode('utf-8')
#file_to_write.writelines( soup.prettify().encode() )

all_kbd=soup.find_all('kbd')

for line in all_kbd:
	if line.string == None:		
		extract_code=line.code.extract().string
		#store_code=line.code.decompose()
		for inside_line in line:
			if "&lt;&lt;" not in inside_line and "EOF" not in inside_line:
				if len(inside_line)&gt;0: 
					print inside_line
					print extract_code

expected output:
    cp -v --remove-destination /usr/share/zoneinfo/&lt;xxx&gt;\      
    /etc/localtime


Got output:
    cp -v --remove-destination /usr/share/zoneinfo/
    
None

       \
    /etc/localtime

None
				<br><br>shibly<br></pre></div></body></html>
--1688457910-1520518465-1393697458=:5147--