Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.os.linux.development.system > #164

Re: extract data between two regex

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!selfless.tophat.at!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date Thu, 09 Jun 2011 16:50:32 +0200
Message-ID <WTMSE7F8165D@wtms.nl> (permalink)
From Wil Taphoorn <wil@nogo.wtms.nl>
User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23
MIME-Version 1.0
Newsgroups comp.os.linux.development.system
Subject Re: extract data between two regex
References <1307627361.3593.4.camel@roddur>
In-Reply-To <1307627361.3593.4.camel@roddur>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 7bit
Lines 22
NNTP-Posting-Host 83.163.46.61
X-Trace 1307631049 news.xs4all.nl 49174 [::ffff:83.163.46.61]:19246
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.os.linux.development.system:164

Show key headers only | View raw


On 9-6-2011 15:49, Rudra Banerjee wrote:
> Dear friends,
> How can I extract data sandwiched between two regex ? Say, for a
> file(snipped from gcstar export html) like this pasted below.
> What I want to do is to extract the titke (sandwidched between <srtong>
> and  </strong>) and export it to latex(or other format).
> hoping for your help.
> 
> <table border="0" cellspacing="10" cellpadding="0" width="100%">
> 	<tr><td colspan="3"><strong>Band Theory and Electronic Properties of
> Solids (Oxford Master Series in Condensed Matter Physics)
> (9780198506447)</strong></td></tr>
> 	<tr><td rowspan="5" width="80"><img
> src="booklist_images/Band_Theory_and_Electronic_Properties_of_Solids__Oxford_Master_Series_in_Condensed_Matter_Physics___9780198506447__0.jpg" height="160" alt="Band Theory and Electronic Properties of Solids (Oxford Master Series in Condensed Matter Physics) (9780198506447)" title="Band Theory and Electronic Properties of Solids (Oxford Master Series in Condensed Matter Physics) (9780198506447)" border="0"/></td></tr>	
> </table>

Try this:

$ lua -e 'io.read("*all"):gsub("<strong>([^<]+)</strong>",function(x)print(x)end)' <my_file

-- 
Wil

Back to comp.os.linux.development.system | Previous | NextPrevious in thread | Find similar


Thread

extract data between two regex Rudra Banerjee <bnrj.rudra@gmail.com> - 2011-06-09 15:49 +0200
  Re: extract data between two regex Jorgen Grahn <grahn+nntp@snipabacken.se> - 2011-06-09 14:26 +0000
    Re: extract data between two regex Richard Kettlewell <rjk@greenend.org.uk> - 2011-06-09 16:02 +0100
  Re: extract data between two regex Wil Taphoorn <wil@nogo.wtms.nl> - 2011-06-09 16:50 +0200

csiph-web