Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64753

Pls help me...I want to save scraped data automatically to my database(cleaner version)

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <edzeame@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.008
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'subsequent': 0.05; 'sys': 0.07; 'advance': 0.07; 'subject:help': 0.08; 'append': 0.09; 'mess': 0.09; 'req': 0.09; 'run,': 0.09; 'subject:skip:a 10': 0.09; 'subject:version': 0.09; 'django': 0.11; 'def': 0.12; 'changes': 0.15; '#this': 0.16; 'cleaner': 0.16; 'django.db': 0.16; 'empty.': 0.16; 'iterable': 0.16; 'skip:j 30': 0.16; 'soup': 0.16; 'subject: \n ': 0.16; 'urllib2,': 0.16; 'variable': 0.18; '8bit%:5': 0.22; 'import': 0.22; 'this?': 0.23; 'earlier': 0.24; 'initial': 0.24; 'looks': 0.24; 'question': 0.24; 'this:': 0.26; 'function': 0.29; '8bit%:3': 0.30; 'message-id:@mail.gmail.com': 0.30; 'asked': 0.31; 'code': 0.31; "skip:' 10": 0.31; 'extract': 0.31; 'anyone': 0.31; 'class': 0.32; 'updated': 0.34; 'sense': 0.34; 'skip:_ 10': 0.34; 'skip:d 20': 0.34; 'could': 0.34; 'skip:u 20': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'skip:j 20': 0.36; 'subject:data': 0.36; 'url:jobs': 0.36; 'thanks': 0.36; 'should': 0.36; 'list': 0.37; 'list.': 0.37; 'skip:& 10': 0.38; '8bit%:4': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'called': 0.40; '8bit%:6': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'tag': 0.61; 'new': 0.61; 'save': 0.62; 'name': 0.63; '8bit%:10': 0.64; 'more': 0.64; 'here': 0.66; 'reply': 0.66; 'saving': 0.69; 'skip:r 30': 0.69; 'overcome': 0.74; 'url:portal': 0.74; 'hoping': 0.75; 'day': 0.76; 'potentially': 0.81; '\xa0at': 0.84; 'subject:want': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=wrNM0aczKALBq8jDL+Z6W1j4yD/HBTLrIhlAdlzmlcc=; b=mFGRIEY6Ael4MJgAJAVD6fJmEc9G8MELCQHbRwreTyOR8ZEijlYS57VSa/V78i0pih JlKJYbZn/JDkv+2Hg4DFyk2nUtOp3RnNMsmtycwn0tSjHmujXSM0Ox0Wx/wLpFcpkaI2 hsC5LP9PO+QWkN/j/yhcqjTgdyQqG7grfOQlGGDMfLsw3DRVsXCzrG4RaZiQ99iqIMhi GW2FKPSyFpRHBQaX81zAnlEYjQsj8sp05TXyFkPcgy1KPCm61tlvWEFHiLHvCQHi4eJf xUxY0RBlmwjWSX+bmEk4n4Bya9iValyWshVoT5UDFjg4HHKbDEAgpGcDy3BFyQkJ5jVI m5nA==
MIME-Version 1.0
X-Received by 10.66.129.169 with SMTP id nx9mr21489173pab.130.1390684934113; Sat, 25 Jan 2014 13:22:14 -0800 (PST)
Date Sat, 25 Jan 2014 13:22:14 -0800
Subject Pls help me...I want to save scraped data automatically to my database(cleaner version)
From Max Cuban <edzeame@gmail.com>
To python-list@python.org
Content-Type multipart/alternative; boundary=001a113653707e0eeb04f0d2119c
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.5982.1390684944.18130.python-list@python.org> (permalink)
Lines 126
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1390684944 news.xs4all.nl 2939 [2001:888:2000:d::a6]:37856
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:64753

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

I have asked this question earlier but this should make more sense than the
earlier version and I don't want anyone who could potentially helped to be
put off by the initial mess even if I updated it with my cleaner version as
a reply

I want to save the links scraped to be save in my database so that on
subsequent run, it only scrapes and append only new links to the list.

This is my code below but  at the end of the day my database is empty. What
changes can I make to overcome this? Thanks in advance


    from django.template.loader import get_template
    from django.shortcuts import render_to_response
    from bs4 import BeautifulSoup
    import urllib2, sys
    import urlparse
    import re
    from listing.models import jobLinks
 #this function extract the links
    def businessghana():
        site = "http://www.businessghana.com/portal/jobs"
        hdr = {'User-Agent' : 'Mozilla/5.0'}
        req = urllib2.Request(site, headers=hdr)
        jobpass = urllib2.urlopen(req)
        soup = BeautifulSoup(jobpass)
        for tag in soup.find_all('a', href = True):
            tag['href'] = urlparse.urljoin('
http://www.businessghana.com/portal/', tag['href'])
        return map(str, soup.find_all('a', href =
re.compile('.getJobInfo')))
 # result from businssghana() saved to a variable to make them iterable as
a list
    all_links  = businessghana()

#this function should be saving the links to the database unless the link
already exist
    def save_new_links(all_links):
        current_links = jobLinks.objects.all()
        for i in all_links:
            if i not in current_links:
                jobLinks.objects.create(url=i)

# I called the above function here hoping that it will save to database
    save_new_links(all_links)

# return my httpResponse with this function
    def display_links(request):
        name = all_links()
        return render_to_response('jobs.html', {'name' : name})
 My django models.py looks like this:
 from django.db import models class jobLinks(models.Model): links =
models.URLField() pub_date = models.DateTimeField('date retrieved') def
__unicode__(self): return self.links

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Pls help me...I want to save scraped data automatically to my database(cleaner version) Max Cuban <edzeame@gmail.com> - 2014-01-25 13:22 -0800

csiph-web