Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: inhahe Newsgroups: comp.lang.python Subject: Question about how to do something in BeautifulSoup? Date: Fri, 22 Jan 2016 09:01:37 -0500 Lines: 25 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de LwsPJLo588tkevECTDkRrQU5F+KuGpNBDfRWnMoYBNmg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.054 X-Spam-Evidence: '*H*': 0.89; '*S*': 0.00; 'yet.': 0.03; 'subject:Question': 0.05; 'appropriate': 0.14; "hasn't": 0.15; '"is': 0.16; '<div': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'skip:d 60': 0.16; 'nested': 0.18; 'defined': 0.23; '(or': 0.23; 'third-party': 0.23; "haven't": 0.24; 'plain': 0.24; "i've": 0.25; 'message-id:@mail.gmail.com': 0.27; 'that.': 0.30; 'noticed': 0.32; 'posting': 0.32; 'extract': 0.33; 'list': 0.34; 'received:google.com': 0.35; 'text': 0.35; 'done': 0.35; 'sometimes': 0.35; 'but': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'modules': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'two': 0.37; 'skip:& 10': 0.37; 'say': 0.37; 'thanks': 0.37; 'received:209': 0.38; 'anything': 0.38; 'mailing': 0.38; 'google': 0.39; 'rather': 0.39; 'to:addr:python.org': 0.40; 'hope': 0.61; 'within': 0.64; 'places': 0.64; 'today': 0.65; 'python-list': 0.66; 'subject:skip:B 10': 0.66; 'here': 0.66; 'color': 0.67; 'day': 0.67; 'levels': 0.70 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=qaOmKtZPrQHvBoTvExPzfq7SZcAnEp9mBZ7SyGa0d+Y=; b=o68ENm/ht5DYw2sUJySA152IC+2WQOsb4oEKKya/407cwsfl3MUuFCYMd9R7X9/XM6 C4byqJ/Qf7BSbv9fZLJaXDqf3gu3qjaFpU+4tXf6cBVpvbOejTqwDjJ4uDtXYwutB1km ueADem0m2X+LLGpQh/2pMXmPNbn2/d4+g3v8Nr8PLz/qtoIIHfFwHQT3xLNgUf00FEVN gyr7aag9YAJtmRyfs+JMnO8ce56RvhsNhpLQqxNh2Xbq0cLSir7I2F/Osi3b83BhTm4k 59KHeV5JE2I4wlBcVmN15B3SrWZYzYEjiuCXT6bYWpiOy0BNB2aZ+ehmoCjQmWEYqGh9 QkmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=qaOmKtZPrQHvBoTvExPzfq7SZcAnEp9mBZ7SyGa0d+Y=; b=OUYcKza9lpeH3Rlb9wSpllPWQ2CN7W90Fr/IDREmoz0q/OHXeqIIEfqjxBTIwne7zU 0SZrrsaXdmz5eQJopiQ4nrOe/ca0SLsjIjTO2eK+DBSb8/0Eokigyd8k19TT5UEVRwlq FyHGRvORrXSL0ZraKhNpCVJGkBVJGrHVPl/2MOwX1oCqHa28NzwGeGu87IZjIa3zKpnT BdNcyfCjqKqyAjYBAc8KXBG8khwnlOnIf/+XX2ZMXt4ddLa9CrchiOYEouEJz8wwQLbI o7p7ch8RytJ/32h4ghJtv/dWNTD+Xkh4HuFt2MMeltUsDyQlDzvzjIxAeZ67dhzx/4Oi QEtQ== X-Gm-Message-State: AG10YOSAVk5ThVJ6DaJ0EBI7QOpbMbyHInx6XlepoiIoBP2VKQbq4eORUZb0eXU6/8TZ4jR0liX9zoG1MwTt1g== X-Received: by 10.129.83.213 with SMTP id h204mr1522514ywb.276.1453471297082; Fri, 22 Jan 2016 06:01:37 -0800 (PST) X-Content-Filtered-By: Mailman/MimeDel 2.1.20+ X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:102015 I hope this is an appropriate mailing list for BeautifulSoup questions, it's been a long time since I've used python-list and I don't remember if third-party modules are on topic. I did try posting to the BeautifulSoup mailing list on Google groups, but I've waited a day or two and my message hasn't been approved yet. Say I have the following HTML (I hope this shows up as plain text here rather than formatting):
"Is today the day?"
And I want to extract the "Is today the day?" part. There are other places in the document with and , but this is the only place that uses color #000000, so I want to extract anything that's within a color #000000 style, even if it's nested multiple levels deep within that. - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's defined as #000000 - Sometimes the is within the and sometimes the is within the . - There may be other discrepancies I haven't noticed yet How can I do this in BeautifulSoup (or is this better done in lxml.html)? Thanks