Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #108024
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Stephen Hansen <me+python@ixokai.io> |
| Newsgroups | comp.lang.python |
| Subject | Re: Python3 html scraper that supports javascript |
| Date | Mon, 02 May 2016 11:00:24 -0700 |
| Lines | 37 |
| Message-ID | <mailman.327.1462212027.32212.python-list@python.org> (permalink) |
| References | <2a0c92ed-352d-455c-832d-c9a9438f318b@googlegroups.com> <CAP1rxO5_ypuejA3eXowPKcndsyAy1k9CzCdO8L5KWuqsK1-X1g@mail.gmail.com> <mailman.285.1462122077.32212.python-list@python.org> <d8db7fec-0083-44ef-8f5b-73d097789b9b@googlegroups.com> <1462212024.1321698.595856209.333A021A@webmail.messagingengine.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain |
| Content-Transfer-Encoding | 7bit |
| X-Trace | news.uni-berlin.de YG5GiHvV3eWYBLm6UeBH5QOtbcQTQjFBOEfU+HB0eRWA== |
| Return-Path | <me+python@ixokai.io> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.014 |
| X-Spam-Evidence | '*H*': 0.97; '*S*': 0.00; 'variable,': 0.07; 'received:internal': 0.09; 'subject:Python3': 0.09; 'message- id:@webmail.messagingengine.com': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:io': 0.16; 'received:messagingengine.com': 0.16; 'received:psf.io': 0.16; 'selenium': 0.16; 'soup': 0.16; 'url:foo': 0.16; 'wrote:': 0.16; 'stephen': 0.22; 'am,': 0.23; 'code,': 0.23; 'this:': 0.23; 'tried': 0.24; 'import': 0.24; 'skip:b 30': 0.24; 'header:In- Reply-To:1': 0.24; 'mon,': 0.24; 'point.': 0.27; 'error': 0.27; 'subject:that': 0.29; 'code:': 0.29; 'print': 0.30; "can't": 0.32; 'url:tv': 0.32; 'something': 0.35; 'but': 0.36; 'skip:i 20': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'expect': 0.37; 'received:66': 0.38; 'to:addr:python.org': 0.40; 'header:Message-Id:1': 0.61; 'email addr:gmail.com': 0.62; 'show': 0.62; 'skip:w 30': 0.64; "skip:' 80": 0.84; 'url:hr': 0.84; 'url:show': 0.84; 'url:video': 0.91; 'why?': 0.91 |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; d=ixokai.io; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=3IMCQ9wdzajq7VnbGWyy5/eZbQc=; b=iXy3Bs i4J9wGzaMPoLi78Y2FYC+IHNMPjMcZ+RxR3ZmSvk76pKyxjDhIOPyz677WA2/P7U xU5I6Vh7SQrlPbWZcDYWvKfgtAeBnwFTuQx6EzfrLIDrGp4WkUHsnmo7dwFNONmB rg25bUhfZwayzfVr99p+z+T4iY2l2hUPVzYMk= |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=3IMCQ9wdzajq7Vn bGWyy5/eZbQc=; b=WRrX0mTM7JDk9uS4W8tSCtTUyGXLfjwOTKhvk0VD1DwXl4k if+ujEoXY26UsLy33/5VNA0RHYUC8M064agfKUnzvFmhD0z5xWE7YbmwyGLf1+fE yvlqYkpQug/BPQXXdrLRJhFXdLEeuuBdrZkCruL9Yc/ORBGOsP3MZKWnuNJo= |
| X-Sasl-Enc | zx5k4bFnd/zzO9/iip1QEJHj1hT1rXUWwZsnHNwT7XpJ 1462212024 |
| X-Mailer | MessagingEngine.com Webmail Interface - ajax-491eb5a4 |
| In-Reply-To | <d8db7fec-0083-44ef-8f5b-73d097789b9b@googlegroups.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.22 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <1462212024.1321698.595856209.333A021A@webmail.messagingengine.com> |
| X-Mailman-Original-References | <2a0c92ed-352d-455c-832d-c9a9438f318b@googlegroups.com> <CAP1rxO5_ypuejA3eXowPKcndsyAy1k9CzCdO8L5KWuqsK1-X1g@mail.gmail.com> <mailman.285.1462122077.32212.python-list@python.org> <d8db7fec-0083-44ef-8f5b-73d097789b9b@googlegroups.com> |
| Xref | csiph.com comp.lang.python:108024 |
Show key headers only | View raw
On Mon, May 2, 2016, at 08:33 AM, zljubisic@gmail.com wrote: > I tried to use the following code: > > from bs4 import BeautifulSoup > from selenium import webdriver > > PHANTOMJS_PATH = > 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' > > url = > 'https://hrti.hrt.hr/#/video/show/2203605/trebizat-prica-o-jednoj-vodi-i-jednom-narodu-dokumentarni-film' > > browser = webdriver.PhantomJS(PHANTOMJS_PATH) > browser.get(url) > > soup = BeautifulSoup(browser.page_source, "html.parser") > > x = soup.prettify() > > print(x) > > When I print x variable, I would expect to see something like this: > <video > src="mediasource:https://hrti.hrt.hr/2e9e9c45-aa23-4d08-9055-cd2d7f2c4d58" > id="vjs_video_3_html5_api" class="vjs-tech" preload="none"><source > type="application/x-mpegURL" > src="https://prd-hrt.spectar.tv/player/get_smil/id/2203605/video_id/2203605/token/Cny6ga5VEQSJ2uZaD2G8pg/token_expiration/1462043309/asset_type/Movie/playlist_template/nginx/channel_name/trebiat__pria_o_jednoj_vodi_i_jednom_narodu_dokumentarni_film/playlist.m3u8?foo=bar"> > </video> > > but I can't come to that point. Why? As important as it is to show code, you need to show what actually happens and what error message is produced. -- Stephen Hansen m e @ i x o k a i . i o
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Python3 html scraper that supports javascript zljubisic@gmail.com - 2016-05-01 07:19 -0700
Re: Python3 html scraper that supports javascript Bob Gailer <bgailer@gmail.com> - 2016-05-01 13:01 -0400
Re: Python3 html scraper that supports javascript zljubisic@gmail.com - 2016-05-02 08:33 -0700
Re: Python3 html scraper that supports javascript DFS <nospam@dfs.com> - 2016-05-02 12:39 -0400
Re: Python3 html scraper that supports javascript Stephen Hansen <me+python@ixokai.io> - 2016-05-02 11:00 -0700
Re: Python3 html scraper that supports javascript zljubisic@gmail.com - 2016-05-02 13:11 -0700
csiph-web