Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #60211 > unrolled thread
| Started by | TheRandomPast <wishingforsam@gmail.com> |
|---|---|
| First post | 2013-11-22 02:10 -0800 |
| Last post | 2013-11-22 23:02 +1100 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
Web Page Parsing/Downloading TheRandomPast <wishingforsam@gmail.com> - 2013-11-22 02:10 -0800
Re: Web Page Parsing/Downloading Chris Angelico <rosuav@gmail.com> - 2013-11-22 23:02 +1100
| From | TheRandomPast <wishingforsam@gmail.com> |
|---|---|
| Date | 2013-11-22 02:10 -0800 |
| Subject | Web Page Parsing/Downloading |
| Message-ID | <fd8d82d5-f5cc-4a5b-99d1-a93a20895f63@googlegroups.com> |
Hi. I'm self taught at Python and I used http://www.codecademy.com/ to learn which was great help i must say but now, I'm attempting it all on my own and need a little help?
I have three scripts and this is what I'm trying to do with them;
Download from webpage
Parse Links from Page
Output summary of total links
Format a list of matched links
Parse and Print Email addresses
Crach Hash Passwords
Exception Handling
Parsing and Print links to image files/.doc
Save file into specified folder and alert when files don't save
Can anyone help because I've become a little stuck? None of the scripts are running for me and I can't see where I'm having issues
WebPage script;
import sys, urllib
def getWebpage(url):
print '[*] getWebpage()'
url_file = urllib.urlopen(url)
page = url_file.read()
return page
def main():
sys.argv.append('http://www.funeralformyfat.tumblr.com')
if len(sys.argv) != 2:
print '[-] Usage: webpage_get URL'
return
print getWebpage(sys.argv[1])
if __name__ == '__main__':
main()
getLinks
def print_links(page):
print '[*] print_links()'
links = re.findall(r'\<a.*href\=.*http\:.+', page)
links.sort()
print '[+]', str(len(links)), 'HyperLinks Found:'
for link in links:
print link
def main():
sys.argv.append('http://www.funeralformyfat.tumblr.com')
if len(sys.argv) != 2:
print '[-] Usage: webpage_getlinks URL'
return
page = webpage_get.wget(sys.argv[1])
print_links(page)
from os.path import join
directory = join('/home/', y, '/newdir/')
file_name = url.split('/')[-1]
file_name = join(directory, file_name)
if __name__ == '__main__':
main()
getParser
import md5
oldpasswd_byuser=str("tom")
oldpasswd_db="sha1$c60da$1835a9c3ccb1cc436ccaa577679b5d0321234c6f"
opw= md5.new(oldpasswd_byuser)
#opw= md5.new(oldpasswd_byuser).hexdigest()
if(opw == oldpasswd_db):
print "same password"
else:
print "Invalid password"
from email.parser import Parser
#headers = Parser().parse(open(messagefile, 'r'))
headers = Parser().parsestr('From: <user@example.com>\n'
'To: <someone_else@example.com>\n'
'Subject: Test message\n'
'\n'
'Body would go here\n')
print 'To: %s' % headers['to']
print 'From: %s' % headers['from']
print 'Subject: %s' % headers['subject']
Thanks for any help!
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-22 23:02 +1100 |
| Message-ID | <mailman.3034.1385121767.18130.python-list@python.org> |
| In reply to | #60211 |
On Fri, Nov 22, 2013 at 9:10 PM, TheRandomPast <wishingforsam@gmail.com> wrote: > Can anyone help because I've become a little stuck? None of the scripts are running for me and I can't see where I'm having issues I'm rather lost in what you're trying to accomplish here. The first thing to do would be to separate out your three scripts and just look at one at a time; then cut each one down to just what it really needs to be doing. Once you've done that, you'll have a simple example - see http://sscce.org/ for tips on that - and you can figure out what it's doing wrong. If you can't figure it out on your own, the short example will be far more suitable for posting here, along with its error backtrace (if it's throwing one), than a more verbose program listing. Two general points of advice. Firstly, if you're just starting out, I strongly recommend you use Python 3 instead of Python 2. All sorts of things have been improved, and it's far easier to learn on the new version than to learn on the old and then have to change your habits later. And secondly, please read this and take note: https://wiki.python.org/moin/GoogleGroupsPython - otherwise, you'll find that a lot of people don't want to see your post. Best would be to avoid Google Groups altogether, as it's very approximately the worst newsgroup client I've ever seen posts from. ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web