Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99722
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Michiel Overtoom <motoom@xs4all.nl> |
| Newsgroups | comp.lang.python |
| Subject | Re: How can I count word frequency in a web site? |
| Date | Mon, 30 Nov 2015 08:56:32 +0100 |
| Lines | 43 |
| Message-ID | <mailman.20.1448870261.14615.python-list@python.org> (permalink) |
| References | <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> <mailman.14.1448850720.14615.python-list@python.org> <88ec2ba2-6b06-421b-89d5-ece408bb4c8e@googlegroups.com> |
| Mime-Version | 1.0 (Mac OS X Mail 8.2 \(2104\)) |
| Content-Type | text/plain; charset=us-ascii |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | news.uni-berlin.de YIH6vWRR+sZMrvgGqlxILA3+6RHrFa72sFfDT0MIe7/Q== |
| Return-Path | <motoom@xs4all.nl> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.049 |
| X-Spam-Evidence | '*H*': 0.90; '*S*': 0.00; 'subject:How': 0.09; 'cleaned': 0.09; 'collections': 0.09; 'rounding': 0.09; 'weird': 0.15; "'%s'": 0.16; 'from:addr:xs4all.nl': 0.16; 'received:194.109': 0.16; 'received:194.109.24': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'received:xs4all.nl': 0.16; 'rings': 0.16; 'rotation': 0.16; 'wrote:': 0.16; 'occurs': 0.22; 'split': 0.23; 'this:': 0.23; 'import': 0.24; 'plain': 0.24; 'words': 0.24; 'header:In-Reply- To:1': 0.24; 'convert': 0.29; 'print': 0.30; "we're": 0.30; 'to:name:python-list': 0.30; 'probably': 0.31; '"the': 0.32; 'skip:c 30': 0.35; 'text': 0.35; 'nov': 0.35; 'something': 0.35; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'charset:us-ascii': 0.37; 'to:addr:python.org': 0.40; 'header:Message-Id:1': 0.61; 'received:194': 0.61; 'relatively': 0.63; 'times': 0.63; 'our': 0.64; 'received:nl': 0.72; 'increase': 0.73; 'tour': 0.81; "'and'": 0.84; 'blue.': 0.84; 'triton': 0.84; 'water,': 0.84; 'giants': 0.91; 'ice': 0.91 |
| In-Reply-To | <88ec2ba2-6b06-421b-89d5-ece408bb4c8e@googlegroups.com> |
| X-Mailer | Apple Mail (2.2104) |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:99722 |
Show key headers only | View raw
> On 30 Nov 2015, at 03:54, ryguy7272 <ryanshuell@gmail.com> wrote:
>
> Now, how can I count specific words like 'fraud' and 'lawsuit'?
- convert the page to plain text
- remove any interpunction
- split into words
- see what words occur
- enumerate all the words and increase a counter for each word
Something like this:
s = """Today we're rounding out our planetary tour with ice giants Uranus
and Neptune. Both have small rocky cores, thick mantles of ammonia, water,
and methane, and atmospheres that make them look greenish and blue. Uranus
has a truly weird rotation and relatively dull weather, while Neptune has
clouds and storms whipped by tremendous winds. Both have rings and moons,
with Neptune's Triton probably being a captured iceball that has active
geology."""
import collections
cleaned = s.lower().replace("\n", " ").replace(".", "").replace(",", "").replace("'", " ")
count = collections.Counter(cleaned.split(" "))
for interesting in ("neptune", "and"):
print "The word '%s' occurs %d times" % (interesting, count[interesting])
# Outputs:
The word 'neptune' occurs 3 times
The word 'and' occurs 7 times
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-29 16:49 -0800
Re: How can I count word frequency in a web site? Cem Karan <cfkaran2@gmail.com> - 2015-11-29 21:31 -0500
Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-29 18:54 -0800
Re: How can I count word frequency in a web site? Michiel Overtoom <motoom@xs4all.nl> - 2015-11-30 08:56 +0100
Re: How can I count word frequency in a web site? Laura Creighton <lac@openend.se> - 2015-11-30 03:51 +0100
Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-30 07:04 -0800
Re: How can I count word frequency in a web site? ryguy7272 <ryanshuell@gmail.com> - 2015-11-30 07:04 -0800
csiph-web