Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Marko Rauhamaa Newsgroups: comp.lang.python Subject: Re: Screen scraper to get all 'a title' elements Date: Thu, 26 Nov 2015 01:53:26 +0200 Organization: A noiseless patient Spider Lines: 19 Message-ID: <87y4dl3abt.fsf@elektro.pacujo.net> References: <23ed6f4b-0ef2-4c9e-ade6-e597e7e03ca2@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: mx02.eternal-september.org; posting-host="b7cb1518d23ec19d482dcc9c31d30fdd"; logging-data="30355"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+F+10u7QG4BON1ieSS0tqK" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Cancel-Lock: sha1:o0RRFw4udpO8ZdURWYyBEaVHro4= sha1:TK9KGt0Oo2xtSAZDK4alnpI9NBo= Xref: csiph.com comp.lang.python:99509 Grobu : > Sorry, I wasn't aware of regex being on the dark side :-) No, regular expressions are great for many purposes. Parsing context-free syntax isn't one of them. See: Most modern programming languages including HTML are context-free. Their structure is too rich for regular expressions to capture. Regular expressions can handle any regular language just fine. They are commonly used to define the lexical tokens of a language. Marko