X-Received: by 10.66.192.234 with SMTP id hj10mr56778285pac.28.1448852051472; Sun, 29 Nov 2015 18:54:11 -0800 (PST) X-Received: by 10.50.109.229 with SMTP id hv5mr193405igb.0.1448852051442; Sun, 29 Nov 2015 18:54:11 -0800 (PST) Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!mv3no7552436igc.0!news-out.google.com!l1ni45igd.0!nntp.google.com!mv3no5915465igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Sun, 29 Nov 2015 18:54:11 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=69.206.246.110; posting-account=QHCkKAoAAAAtwxtoSlGaj-ksHegzSKUu NNTP-Posting-Host: 69.206.246.110 References: <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <88ec2ba2-6b06-421b-89d5-ece408bb4c8e@googlegroups.com> Subject: Re: How can I count word frequency in a web site? From: ryguy7272 Injection-Date: Mon, 30 Nov 2015 02:54:11 +0000 Content-Type: text/plain; charset=ISO-8859-1 Xref: csiph.com comp.lang.python:99719 On Sunday, November 29, 2015 at 9:32:22 PM UTC-5, Cem Karan wrote: > You might want to look into Beautiful Soup (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping tool. I've never used it, but I've heard good things about it. > > Good luck, > Cem Karan > > On Nov 29, 2015, at 7:49 PM, ryguy7272 wrote: > > > I'm trying to figure out how to count words in a web site. Here is a sample of the link I want to scrape data from and count specific words. > > http://finance.yahoo.com/q/h?s=STRP+Headlines > > > > I only want to count certain words, like 'fraud', 'lawsuit', etc. I want to have a way to control for specific words. I have a couple Python scripts that do this for a text file, but not for a web site. I can post that, if that's helpful. > > > > -- > > https://mail.python.org/mailman/listinfo/python-list Ok, this small script will grab everything from the link. import requests from bs4 import BeautifulSoup r = requests.get("http://finance.yahoo.com/q/h?s=STRP+Headlines") soup = BeautifulSoup(r.content) htmltext = soup.prettify() print htmltext Now, how can I count specific words like 'fraud' and 'lawsuit'?