Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Whittle it on down Date: Fri, 06 May 2016 17:44:29 +0200 Organization: None Lines: 14 Message-ID: References: <1462426755.15465.598690257.42990546@webmail.messagingengine.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de cPHWiYD0mH3niB8B1EvVnA8lTE0U8OZV5+Rlwk2W0IlQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'categories,': 0.16; 'dfs': 0.16; 'parse,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'wrote:': 0.16; 'load': 0.20; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'point.': 0.27; 'code': 0.30; 'next': 0.35; 'level': 0.35; 'should': 0.36; 'there': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'beyond': 0.37; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'levels': 0.70; 'subject:down': 0.84; 'url:cat': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd920e.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <1462426755.15465.598690257.42990546@webmail.messagingengine.com> Xref: csiph.com comp.lang.python:108232 DFS wrote: > There are up to 4 levels of categorization: > http://www.usdirectory.com/cat/g0 shows 21 Level 1 categories, and 390 > Level 2. To get the Level 3 and 4 you have to drill-down using the > hyperlinks. > > How to do it in python code is beyond my skills at this point. Get the > hrefs and load them and parse, then get the next level and load them and > parse, etc.? Yes, that should work ;)