Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99714
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Cem Karan <cfkaran2@gmail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: How can I count word frequency in a web site? |
| Date | Sun, 29 Nov 2015 21:31:49 -0500 |
| Lines | 22 |
| Message-ID | <mailman.14.1448850720.14615.python-list@python.org> (permalink) |
| References | <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> |
| Mime-Version | 1.0 (Mac OS X Mail 6.6 \(1510\)) |
| Content-Type | text/plain; charset=us-ascii |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | news.uni-berlin.de GYVLFcJeSeomllwze9ELkAgNbsZ5sp9MCGUjOnqRmY+A== |
| Return-Path | <cfkaran2@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.004 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; 'url:pypi': 0.03; 'cc:addr :python-list': 0.09; 'scripts': 0.09; 'subject:How': 0.09; 'python': 0.10; 'file,': 0.15; 'luck,': 0.16; 'received:192.168.1.4': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'scrape': 0.16; 'soup': 0.16; 'tool.': 0.16; 'wrote:': 0.16; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'trying': 0.22; 'cc:no real name:2**0': 0.22; 'words': 0.24; 'header:In-Reply-To:1': 0.24; "i've": 0.25; 'figure': 0.27; 'helpful.': 0.29; "i'm": 0.30; 'url:mailman': 0.30; 'certain': 0.31; 'post': 0.31; 'url:python': 0.33; 'message-id:@gmail.com': 0.34; 'url:listinfo': 0.34; 'that,': 0.34; 'received:google.com': 0.35; 'text': 0.35; 'nov': 0.35; 'but': 0.36; 'url:org': 0.36; 'received:209.85': 0.36; 'heard': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'charset:us-ascii': 0.37; 'things': 0.38; 'received:209': 0.38; 'data': 0.39; 'received:192': 0.39; 'url:mail': 0.40; 'header:Message-Id:1': 0.61; 'sample': 0.63; 'beautiful': 0.66; 'here': 0.66 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=P2h0eZJTLe4h0ZJm6OgFRCxDC6XQmxF+8oVNtZetu9A=; b=OOKXFt8nQeAEmxpT2iymYWJYm4ZrxPCKgzUvl7R0J8wZabU9mbpVsq9Xc19OghQWW+ J5z7chRhvpurlKZ50AfNCa0/SMzjcl+y1qvLgJrM44mSxLcWFmqStFaPJ7mgE4mVGKWe 6A+sEu0ZrjjDobwOWCS3SS6Y7D8oIXCf2R41G0RJWLdY31td7cMI4yLUHN5rrSaHgz4/ JwmDQbJrQFEv266hA9nInF8rkGTltkCkKO3E3cHM8DZTjrI20wNdt5WGDTskxMbBG8VY RnMx2h4Nplo0ODtU7JGom40h0RtdqwxSWMRlOYO86yn/1oxalPh2ju+TPQ0Im6bVCmh7 auww== |
| X-Received | by 10.140.196.69 with SMTP id r66mr73955594qha.40.1448850712427; Sun, 29 Nov 2015 18:31:52 -0800 (PST) |
| In-Reply-To | <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> |
| X-Mailer | Apple Mail (2.1510) |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:99714 |
Show key headers only | View raw
You might want to look into Beautiful Soup (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping tool. I've never used it, but I've heard good things about it. Good luck, Cem Karan On Nov 29, 2015, at 7:49 PM, ryguy7272 <ryanshuell@gmail.com> wrote: > I'm trying to figure out how to count words in a web site. Here is a sample of the link I want to scrape data from and count specific words. > http://finance.yahoo.com/q/h?s=STRP+Headlines > > I only want to count certain words, like 'fraud', 'lawsuit', etc. I want to have a way to control for specific words. I have a couple Python scripts that do this for a text file, but not for a web site. I can post that, if that's helpful. > > -- > https://mail.python.org/mailman/listinfo/python-list
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-29 16:49 -0800
Re: How can I count word frequency in a web site? Cem Karan <cfkaran2@gmail.com> - 2015-11-29 21:31 -0500
Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-29 18:54 -0800
Re: How can I count word frequency in a web site? Michiel Overtoom <motoom@xs4all.nl> - 2015-11-30 08:56 +0100
Re: How can I count word frequency in a web site? Laura Creighton <lac@openend.se> - 2015-11-30 03:51 +0100
Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-30 07:04 -0800
Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-30 07:04 -0800
csiph-web