Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #41253

Re: editing a HTML file

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <davea@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'beginner': 0.05; 'modified': 0.05; 'rewrite': 0.07; 'subject:file': 0.07; 'suppose': 0.07; 'messing': 0.09; 'modifies': 0.09; 'file,': 0.15; '(created': 0.16; 'crashes': 0.16; 'eliminating': 0.16; 'losing': 0.16; 'renames': 0.16; 'simplest': 0.16; 'soup': 0.16; 'string': 0.17; 'wrote:': 0.17; 'written,': 0.17; 'appropriate': 0.20; 'changes': 0.20; 'file.': 0.20; 'written': 0.20; 'all,': 0.21; 'parse': 0.22; "i'd": 0.22; 'changes,': 0.23; 'somebody': 0.23; 'this:': 0.23; "haven't": 0.23; 'machine': 0.24; 'script': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'question': 0.27; 'appreciated': 0.27; 'functions.': 0.27; 'library.': 0.27; 'replace': 0.27; 'regular': 0.27; 'points': 0.29; "i'm": 0.29; 'file': 0.32; 'could': 0.32; 'to:addr:python- list': 0.33; 'text': 0.34; 'thanks': 0.34; 'whatever': 0.35; 'especially': 0.35; 'open': 0.35; 'pm,': 0.35; 'something': 0.35; 'there': 0.35; 'really': 0.36; 'but': 0.36; 'closing': 0.36; 'data.': 0.36; 'subject:: ': 0.38; 'some': 0.38; 'things': 0.38; 'delete': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'notice': 0.39; 'skip:" 10': 0.40; 'received:192.168': 0.40; 'think': 0.40; 'back': 0.62; 'close': 0.63; 'received:74.208': 0.71; 'mechanics': 0.84; 'original.': 0.84
Date Thu, 14 Mar 2013 21:31:19 -0400
From Dave Angel <davea@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130221 Thunderbird/17.0.3
MIME-Version 1.0
To python-list@python.org
Subject Re: editing a HTML file
References <51421253$0$26783$4fafbaef@reader2.news.tin.it>
In-Reply-To <51421253$0$26783$4fafbaef@reader2.news.tin.it>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:Xt98oUKXvVLVBXoEhOsdyFYdg5ILZead/aJcb2HeNzI tqNwayTJabvqAGRjx5VRECN5YD+6NzINzgNOI/O3CEuSkv4w0j NxXAocE+1OM+QirQ/DVyaPO/10kMMdReRbIWaexd5zm9juVMed zDOLryJSQdQ1U6XRLmGmXedSebM5qRgi+OiEYliEPk072VjgbV 0juPCARnzKjfwJFPdmlaQk5nHFqZBgClDFqkthztmU7RHNulJv F/oRvrja+e3Jh8EU19zy77yIQRzB3i0VIlTZVOEfPyJm18NA/1 yh1Ll8OcgFtjnS3WN4lgoOutGqWMgmf2HZkg+kFFDAz74KQjA= =
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3327.1363311098.2939.python-list@python.org> (permalink)
Lines 53
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1363311098 news.xs4all.nl 6919 [2001:888:2000:d::a6]:39587
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:41253

Show key headers only | View raw


On 03/14/2013 02:09 PM, Tracubik wrote:
> Hi all,
>
> I'would like to make a script that automatically change some text in a
> html file.
>
> I need to make some changes in the text of <p> tags
>
> My question is: there is a way to just "update/substitute" the text in
> the html <p> tags or do i have to make a new modified copy of the html
> file?
>
> To be clear, i'ld like to make something like this:
>
> open html file
> for every <p> tags:
>    if "foo" in text:
>      change "foo" in "bar"
> close html file
>
> any sample would be really appreciated
> I'm really a beginner as you can see
>
> Thanks


As JM points out, you can use Beautiful Soup to parse html.  Then you 
can make structural changes, and write it back out.  Beautiful Soup is 
NOT part of the standard library.

But if you haven't already written something that modifies regular text 
files, I'd do that long before I even started messing with html.  You 
cannot in general update things in place, so you have to think about the 
mechanics of updating, and of minimizing or eliminating the likelihood 
of losing data.

For example, suppose you have a text file (created with any text editor) 
that has just one occurrence of the string "Sammy".  You want to replace 
that with the word "Gazelda".  Notice the replacement string is longer 
than the original.  Think about how you'd go about it, and write the 
simplest program that would accomplish it.  Then think about what could 
go wrong.  What about if somebody shuts the machine off just as you're 
starting to rewrite the file, or the program crashes just then, or 
whatever ?  So plan to write the replacement file to a new name, and 
after written, do the appropriate renames and delete of the old one.

Don't forget about closing each file, especially if you're going to 
manipulate it with other functions.



-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

editing a HTML file Tracubik <affdfsdfdsfsd@b.com> - 2013-03-14 19:09 +0100
  Re: editing a HTML file Dave Angel <davea@davea.name> - 2013-03-14 21:31 -0400

csiph-web