Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99719

Re: How can I count word frequency in a web site?

X-Received by 10.66.192.234 with SMTP id hj10mr56778285pac.28.1448852051472; Sun, 29 Nov 2015 18:54:11 -0800 (PST)
X-Received by 10.50.109.229 with SMTP id hv5mr193405igb.0.1448852051442; Sun, 29 Nov 2015 18:54:11 -0800 (PST)
Path csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!mv3no7552436igc.0!news-out.google.com!l1ni45igd.0!nntp.google.com!mv3no5915465igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups comp.lang.python
Date Sun, 29 Nov 2015 18:54:11 -0800 (PST)
In-Reply-To <mailman.14.1448850720.14615.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=69.206.246.110; posting-account=QHCkKAoAAAAtwxtoSlGaj-ksHegzSKUu
NNTP-Posting-Host 69.206.246.110
References <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> <mailman.14.1448850720.14615.python-list@python.org>
User-Agent G2/1.0
MIME-Version 1.0
Message-ID <88ec2ba2-6b06-421b-89d5-ece408bb4c8e@googlegroups.com> (permalink)
Subject Re: How can I count word frequency in a web site?
From ryguy7272 <ryanshuell@gmail.com>
Injection-Date Mon, 30 Nov 2015 02:54:11 +0000
Content-Type text/plain; charset=ISO-8859-1
Xref csiph.com comp.lang.python:99719

Show key headers only | View raw


On Sunday, November 29, 2015 at 9:32:22 PM UTC-5, Cem Karan wrote:
> You might want to look into Beautiful Soup (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping tool.  I've never used it, but I've heard good things about it.
> 
> Good luck,
> Cem Karan
> 
> On Nov 29, 2015, at 7:49 PM, ryguy7272 wrote:
> 
> > I'm trying to figure out how to count words in a web site.  Here is a sample of the link I want to scrape data from and count specific words.
> > http://finance.yahoo.com/q/h?s=STRP+Headlines
> > 
> > I only want to count certain words, like 'fraud', 'lawsuit', etc.  I want to have a way to control for specific words.  I have a couple Python scripts that do this for a text file, but not for a web site.  I can post that, if that's helpful.
> > 
> > -- 
> > https://mail.python.org/mailman/listinfo/python-list

Ok, this small script will grab everything from the link.

import requests
from bs4 import BeautifulSoup
r = requests.get("http://finance.yahoo.com/q/h?s=STRP+Headlines")
soup = BeautifulSoup(r.content)
htmltext = soup.prettify()
print htmltext


Now, how can I count specific words like 'fraud' and 'lawsuit'?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-29 16:49 -0800
  Re: How can I count word frequency in a web site? Cem Karan <cfkaran2@gmail.com> - 2015-11-29 21:31 -0500
    Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-29 18:54 -0800
      Re: How can I count word frequency in a web site? Michiel Overtoom <motoom@xs4all.nl> - 2015-11-30 08:56 +0100
  Re: How can I count word frequency in a web site? Laura Creighton <lac@openend.se> - 2015-11-30 03:51 +0100
    Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-30 07:04 -0800
  Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-30 07:04 -0800

csiph-web