Groups | Search | Server Info | Login | Register
Groups > comp.os.linux.development.system > #164
| Date | 2011-06-09 16:50 +0200 |
|---|---|
| Message-ID | <WTMSE7F8165D@wtms.nl> (permalink) |
| From | Wil Taphoorn <wil@nogo.wtms.nl> |
| Newsgroups | comp.os.linux.development.system |
| Subject | Re: extract data between two regex |
| References | <1307627361.3593.4.camel@roddur> |
On 9-6-2011 15:49, Rudra Banerjee wrote:
> Dear friends,
> How can I extract data sandwiched between two regex ? Say, for a
> file(snipped from gcstar export html) like this pasted below.
> What I want to do is to extract the titke (sandwidched between <srtong>
> and </strong>) and export it to latex(or other format).
> hoping for your help.
>
> <table border="0" cellspacing="10" cellpadding="0" width="100%">
> <tr><td colspan="3"><strong>Band Theory and Electronic Properties of
> Solids (Oxford Master Series in Condensed Matter Physics)
> (9780198506447)</strong></td></tr>
> <tr><td rowspan="5" width="80"><img
> src="booklist_images/Band_Theory_and_Electronic_Properties_of_Solids__Oxford_Master_Series_in_Condensed_Matter_Physics___9780198506447__0.jpg" height="160" alt="Band Theory and Electronic Properties of Solids (Oxford Master Series in Condensed Matter Physics) (9780198506447)" title="Band Theory and Electronic Properties of Solids (Oxford Master Series in Condensed Matter Physics) (9780198506447)" border="0"/></td></tr>
> </table>
Try this:
$ lua -e 'io.read("*all"):gsub("<strong>([^<]+)</strong>",function(x)print(x)end)' <my_file
--
Wil
Back to comp.os.linux.development.system | Previous | Next — Previous in thread | Find similar
extract data between two regex Rudra Banerjee <bnrj.rudra@gmail.com> - 2011-06-09 15:49 +0200
Re: extract data between two regex Jorgen Grahn <grahn+nntp@snipabacken.se> - 2011-06-09 14:26 +0000
Re: extract data between two regex Richard Kettlewell <rjk@greenend.org.uk> - 2011-06-09 16:02 +0100
Re: extract data between two regex Wil Taphoorn <wil@nogo.wtms.nl> - 2011-06-09 16:50 +0200
csiph-web