Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #21514
| From | Roy Smith <roy@panix.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: What's the best way to parse this HTML tag? |
| Date | 2012-03-11 20:28 -0400 |
| Organization | PANIX Public Access Internet and UNIX, NYC |
| Message-ID | <roy-0C82FF.20283311032012@news.panix.com> (permalink) |
| References | <239c4ad7-ac93-45c5-98d6-71a434e1c5aa@r21g2000yqa.googlegroups.com> |
In article <239c4ad7-ac93-45c5-98d6-71a434e1c5aa@r21g2000yqa.googlegroups.com>, John Salerno <johnjsal@gmail.com> wrote: > Getting the time that the song is played is easy, because the time is > wrapped in a <div> tag all by itself with a class attribute that has a > specific value I can search for. But the actual song title and artist > information is harder, because the HTML isn't quite as precise. Here's > a sample: > > <div class="cmPlaylistContent"> > <strong> > <a href="/lsp/t2995/"> > Love Without End, Amen > </a> > </strong> > <br/> > <a href="/lsp/a436/"> > George Strait > </a> > [...] > Therefore, I appeal to your greater wisdom in these matters. Given > this HTML, is there a "best practice" for how to refer to the song > title and artist? Obviously, any attempt at screen scraping is fraught with peril. Beautiful Soup is a great tool but it doesn't negate the fact that you've made a pact with the devil. That being said, if I had to guess, here's your puppy: > <a href="/lsp/t2995/"> > Love Without End, Amen > </a> the thing to look for is an "a" element with an href that starts with "/lsp/t", where "t" is for "track". Likewise: > <a href="/lsp/a436/"> > George Strait > </a> an href starting with "/lsp/a" is probably an artist link. You owe the Oracle three helpings of tag soup.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
What's the best way to parse this HTML tag? John Salerno <johnjsal@gmail.com> - 2012-03-11 15:53 -0700
Re: What's the best way to parse this HTML tag? Roy Smith <roy@panix.com> - 2012-03-11 20:28 -0400
Re: What's the best way to parse this HTML tag? John Salerno <johnjsal@gmail.com> - 2012-03-11 19:35 -0700
Re: What's the best way to parse this HTML tag? Roy Smith <roy@panix.com> - 2012-03-12 09:27 -0400
csiph-web