Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:text': 0.05; 'none:': 0.07; 'sys': 0.07; '#print': 0.09; 'line:': 0.09; 'skip:# 30': 0.09; 'skip:/ 10': 0.09; 'subject:How': 0.10; 'received:113.11': 0.16; 'skip:# 20': 0.16; 'import': 0.22; 'to:name:python-list@python.org': 0.22; 'print': 0.22; 'skip:c 70': 0.24; 'skip:e 30': 0.24; 'skip:s 30': 0.35; 'charset:us- ascii': 0.36; 'subject:?': 0.36; 'hi,': 0.36; 'skip:- 20': 0.37; 'expected': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'header:Reply-To:1': 0.67; 'received:113': 0.68; 'skip:/ 30': 0.84 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 282255.99327.bm@omp1052.mail.ne1.yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1393697458; bh=BRupQenYjqyd4eA9MyuwISc1U7lSQv8AHz9Bj818wDo=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=ndVHLTugPTO8aMcqlgK7yMltx5BK8tmsSfDeiSsH0kUpF/PtP+MUD1h5aiY9Ka5un3+oZb1KdbSlc1Zclb31HSOi7l26QkuF7+0V2FYVf2JdZYuu01E14Lgs7TDtrzTfsjZr/oE+2wLRhMye26oN3MwoGfsLNPegP7kXPJSa1p8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=OU2pU6f9G/Iz8s3P220yOTrIIC6JLJXLdHpi9oVjttDlvoE/kpLoDp+xZV4jTxH6lrP+LGwsDPgon1uNMX8oy5iI/PvGXgp30pqIii8C+/glUVuEEkTKMjzDo/S3qtaKYlA9x+FAMKyTV9Bd0Bno6mkruOFqfp+9tzaP0PAB2cU=; X-YMail-OSG: quGQhpcVM1nHQmHOTUnj4pZd8DUbMGFzZZOaRJOTURtyie3 h31v08FF.5oXnitFRjwsa8ov1oh8alQgIsDfmRTxIx5jS_hQmMaLX46jptnw P2SN4cKIEiqH9M07ubKZWBnYMUmrZ8vmCTs_MDGkB0NLF16so_eDhqzOiCVy EIMtCXIyV5saGbzWwO7M8VV7Pq7QAl6N.JFwSBPyg2633819dVnAGoGKx2Am 7vNFLmCzeXqY93UBY3q2rG0pZqscftoXrmGIwach6esxyH2GMVZOHe_IOojf 18Y1izQoVA_xr2FZ1yeF7j7FD1DvJJ_pho0a5GhiNbjKTwdUbp0.TX_fFgWg yAeWbJGI3gyg2wXEGpKnDPZ68Djpguhx18_TaJLFGm.U.7_8VITVa8dpMwwk aWQV7dYnTJgyZiz51CwN2tUfV3Z_d6JH6_0ninvYIkaO2RbX5aCsF8W129NZ 9YGNZlhLhMwcFQnPhtWBy_jiZ6aG4b5XqCMdzNnkfYTjvgNvvPRHTJQExCDO NFKEJVrXY8sMvdUC3co65EZWaitJr56h3tvVSjY32z3c9xuYxTxII5HWSi8S A1mE217klzZMFEA-- X-Rocket-MIMEInfo: 002.001, SGksCgojIyNpbi50eHQKPGtiZCBjbGFzcz0iY29tbWFuZCI.CiAgICBjcCAtdiAtLXJlbW92ZS1kZXN0aW5hdGlvbiAvdXNyL3NoYXJlL3pvbmVpbmZvLwogICAgPGVtIGNsYXNzPSJyZXBsYWNlYWJsZSI.PGNvZGU.PHh4eD48L2NvZGU.PC9lbT4KICAgICAgIFwKICAgIC9ldGMvbG9jYWx0aW1lCjwva2JkPgoKaW1wb3J0IHN5cwppbXBvcnQgdW5pY29kZWRhdGEKZnJvbSBiczQgaW1wb3J0IEJlYXV0aWZ1bFNvdXAKCmZpbGVfbmFtZT0iaW4udHh0IgpodG1sX2RvYz1vcGVuKGZpbGVfbmFtZSwncicpCnNvdXABMAEBAQE- X-Mailer: YahooMailWebService/0.8.177.636 Date: Sat, 1 Mar 2014 10:10:58 -0800 (PST) From: "Golam Md. Shibly" Subject: How to extract contents of inner text of html tag? To: "python-list@python.org" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1688457910-1520518465-1393697458=:5147" X-Mailman-Approved-At: Sat, 01 Mar 2014 21:06:05 +0100 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: "Golam Md. Shibly" List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 101 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1393704366 news.xs4all.nl 2860 [2001:888:2000:d::a6]:33231 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:67346 --1688457910-1520518465-1393697458=:5147 Content-Type: text/plain; charset=us-ascii Hi, ###in.txt cp -v --remove-destination /usr/share/zoneinfo/ \ /etc/localtime import sys import unicodedata from bs4 import BeautifulSoup file_name="in.txt" html_doc=open(file_name,'r') soup=BeautifulSoup(html_doc) #print soup.prettify().encode('utf-8') #file_to_write.writelines( soup.prettify().encode() ) all_kbd=soup.find_all('kbd') for line in all_kbd: if line.string == None: extract_code=line.code.extract().string #store_code=line.code.decompose() for inside_line in line: if "<<" not in inside_line and "EOF" not in inside_line: if len(inside_line)>0: print inside_line print extract_code expected output: cp -v --remove-destination /usr/share/zoneinfo/\ /etc/localtime Got output: cp -v --remove-destination /usr/share/zoneinfo/ None \ /etc/localtime None shibly --1688457910-1520518465-1393697458=:5147 Content-Type: text/html; charset=us-ascii
Hi,

###in.txt <kbd class="command"> cp -v --remove-destination /usr/share/zoneinfo/ <em class="replaceable"><code><xxx></code></em> \ /etc/localtime </kbd> import sys import unicodedata from bs4 import BeautifulSoup file_name="in.txt" html_doc=open(file_name,'r') soup=BeautifulSoup(html_doc) #print soup.prettify().encode('utf-8') #file_to_write.writelines( soup.prettify().encode() ) all_kbd=soup.find_all('kbd') for line in all_kbd: if line.string == None: extract_code=line.code.extract().string #store_code=line.code.decompose() for inside_line in line: if "<<" not in inside_line and "EOF" not in inside_line: if len(inside_line)>0: print inside_line print extract_code expected output: cp -v --remove-destination /usr/share/zoneinfo/<xxx>\ /etc/localtime Got output: cp -v --remove-destination /usr/share/zoneinfo/ None \ /etc/localtime None

shibly
--1688457910-1520518465-1393697458=:5147--