Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #60211 > unrolled thread

Web Page Parsing/Downloading

Started byTheRandomPast <wishingforsam@gmail.com>
First post2013-11-22 02:10 -0800
Last post2013-11-22 23:02 +1100
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  Web Page Parsing/Downloading TheRandomPast <wishingforsam@gmail.com> - 2013-11-22 02:10 -0800
    Re: Web Page Parsing/Downloading Chris Angelico <rosuav@gmail.com> - 2013-11-22 23:02 +1100

#60211 — Web Page Parsing/Downloading

FromTheRandomPast <wishingforsam@gmail.com>
Date2013-11-22 02:10 -0800
SubjectWeb Page Parsing/Downloading
Message-ID<fd8d82d5-f5cc-4a5b-99d1-a93a20895f63@googlegroups.com>
Hi. I'm self taught at Python and I used http://www.codecademy.com/ to learn which was great help i must say but now, I'm attempting it all on my own and need a little help? 

I have three scripts and this is what I'm trying to do with them;


Download from webpage
Parse Links from Page
Output summary of total links
Format a list of matched links
Parse and Print Email addresses
Crach Hash Passwords
Exception Handling
Parsing and Print links to image files/.doc 
Save file into specified folder and alert when files don't save

Can anyone help because I've become a little stuck? None of the scripts are running for me and I can't see where I'm having issues


WebPage script;
import sys, urllib
def getWebpage(url):
    print '[*] getWebpage()'
    url_file = urllib.urlopen(url)
    page = url_file.read()
    return page
def main():
    sys.argv.append('http://www.funeralformyfat.tumblr.com')
    if len(sys.argv) != 2:
        print '[-] Usage: webpage_get URL'
        return
    
print getWebpage(sys.argv[1])

if __name__ == '__main__':
    main()

getLinks

def print_links(page):
    print '[*] print_links()'
    links = re.findall(r'\<a.*href\=.*http\:.+', page)
    links.sort()
    print '[+]', str(len(links)), 'HyperLinks Found:'
    
for link in links:
    print link
    
def main():
    sys.argv.append('http://www.funeralformyfat.tumblr.com')
    if len(sys.argv) != 2:
        print '[-] Usage: webpage_getlinks URL'
        return
        page = webpage_get.wget(sys.argv[1])
        print_links(page)

from os.path import join

    directory = join('/home/', y, '/newdir/')
    file_name = url.split('/')[-1]
    file_name = join(directory, file_name)



        
if __name__ == '__main__':
    main()

getParser 

 import md5

 oldpasswd_byuser=str("tom")
 oldpasswd_db="sha1$c60da$1835a9c3ccb1cc436ccaa577679b5d0321234c6f"
 opw=     md5.new(oldpasswd_byuser)
 #opw=     md5.new(oldpasswd_byuser).hexdigest()
 if(opw ==      oldpasswd_db):
    print "same password"
 else:
     print "Invalid password"

from email.parser import Parser


#headers = Parser().parse(open(messagefile, 'r'))


headers = Parser().parsestr('From: <user@example.com>\n'
        'To: <someone_else@example.com>\n'
        'Subject: Test message\n'
        '\n'
        'Body would go here\n')
print 'To: %s' % headers['to']
print 'From: %s' % headers['from']
print 'Subject: %s' % headers['subject']



Thanks for any help! 

[toc] | [next] | [standalone]


#60215

FromChris Angelico <rosuav@gmail.com>
Date2013-11-22 23:02 +1100
Message-ID<mailman.3034.1385121767.18130.python-list@python.org>
In reply to#60211
On Fri, Nov 22, 2013 at 9:10 PM, TheRandomPast <wishingforsam@gmail.com> wrote:
> Can anyone help because I've become a little stuck? None of the scripts are running for me and I can't see where I'm having issues

I'm rather lost in what you're trying to accomplish here. The first
thing to do would be to separate out your three scripts and just look
at one at a time; then cut each one down to just what it really needs
to be doing. Once you've done that, you'll have a simple example - see
http://sscce.org/ for tips on that - and you can figure out what it's
doing wrong. If you can't figure it out on your own, the short example
will be far more suitable for posting here, along with its error
backtrace (if it's throwing one), than a more verbose program listing.

Two general points of advice. Firstly, if you're just starting out, I
strongly recommend you use Python 3 instead of Python 2. All sorts of
things have been improved, and it's far easier to learn on the new
version than to learn on the old and then have to change your habits
later.

And secondly, please read this and take note:
https://wiki.python.org/moin/GoogleGroupsPython - otherwise, you'll
find that a lot of people don't want to see your post. Best would be
to avoid Google Groups altogether, as it's very approximately the
worst newsgroup client I've ever seen posts from.

ChrisA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web