Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Cem Karan Newsgroups: comp.lang.python Subject: Re: How can I count word frequency in a web site? Date: Sun, 29 Nov 2015 21:31:49 -0500 Lines: 22 Message-ID: References: <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de GYVLFcJeSeomllwze9ELkAgNbsZ5sp9MCGUjOnqRmY+A== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'url:pypi': 0.03; 'cc:addr :python-list': 0.09; 'scripts': 0.09; 'subject:How': 0.09; 'python': 0.10; 'file,': 0.15; 'luck,': 0.16; 'received:192.168.1.4': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'scrape': 0.16; 'soup': 0.16; 'tool.': 0.16; 'wrote:': 0.16; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'trying': 0.22; 'cc:no real name:2**0': 0.22; 'words': 0.24; 'header:In-Reply-To:1': 0.24; "i've": 0.25; 'figure': 0.27; 'helpful.': 0.29; "i'm": 0.30; 'url:mailman': 0.30; 'certain': 0.31; 'post': 0.31; 'url:python': 0.33; 'message-id:@gmail.com': 0.34; 'url:listinfo': 0.34; 'that,': 0.34; 'received:google.com': 0.35; 'text': 0.35; 'nov': 0.35; 'but': 0.36; 'url:org': 0.36; 'received:209.85': 0.36; 'heard': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'charset:us-ascii': 0.37; 'things': 0.38; 'received:209': 0.38; 'data': 0.39; 'received:192': 0.39; 'url:mail': 0.40; 'header:Message-Id:1': 0.61; 'sample': 0.63; 'beautiful': 0.66; 'here': 0.66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=P2h0eZJTLe4h0ZJm6OgFRCxDC6XQmxF+8oVNtZetu9A=; b=OOKXFt8nQeAEmxpT2iymYWJYm4ZrxPCKgzUvl7R0J8wZabU9mbpVsq9Xc19OghQWW+ J5z7chRhvpurlKZ50AfNCa0/SMzjcl+y1qvLgJrM44mSxLcWFmqStFaPJ7mgE4mVGKWe 6A+sEu0ZrjjDobwOWCS3SS6Y7D8oIXCf2R41G0RJWLdY31td7cMI4yLUHN5rrSaHgz4/ JwmDQbJrQFEv266hA9nInF8rkGTltkCkKO3E3cHM8DZTjrI20wNdt5WGDTskxMbBG8VY RnMx2h4Nplo0ODtU7JGom40h0RtdqwxSWMRlOYO86yn/1oxalPh2ju+TPQ0Im6bVCmh7 auww== X-Received: by 10.140.196.69 with SMTP id r66mr73955594qha.40.1448850712427; Sun, 29 Nov 2015 18:31:52 -0800 (PST) In-Reply-To: <6851e3b8-0d46-4808-9f7f-372b71bf327c@googlegroups.com> X-Mailer: Apple Mail (2.1510) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:99714 You might want to look into Beautiful Soup = (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML = screen-scraping tool. I've never used it, but I've heard good things = about it. Good luck, Cem Karan On Nov 29, 2015, at 7:49 PM, ryguy7272 wrote: > I'm trying to figure out how to count words in a web site. Here is a = sample of the link I want to scrape data from and count specific words. > http://finance.yahoo.com/q/h?s=3DSTRP+Headlines >=20 > I only want to count certain words, like 'fraud', 'lawsuit', etc. I = want to have a way to control for specific words. I have a couple = Python scripts that do this for a text file, but not for a web site. I = can post that, if that's helpful. >=20 > --=20 > https://mail.python.org/mailman/listinfo/python-list