Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #98978 > unrolled thread
| Started by | ryguy7272 <ryanshuell@gmail.com> |
|---|---|
| First post | 2015-11-18 08:37 -0800 |
| Last post | 2015-11-18 16:38 -0500 |
| Articles | 11 — 7 participants |
Back to article view | Back to comp.lang.python
How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 08:37 -0800
Re: How can I export data from a website and write the contents to a text file? Chris Angelico <rosuav@gmail.com> - 2015-11-19 03:57 +1100
Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:03 -0800
Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:15 -0800
Re: How can I export data from a website and write the contents to a text file? Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-18 17:19 +0000
Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:40 -0800
Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:43 -0800
Re: How can I export data from a website and write the contents to a text file? Patrick Hess <patrickhess@gmx.net> - 2015-11-19 20:17 +0100
Re: How can I export data from a website and write the contents to a text file? Michael Torrie <torriem@gmail.com> - 2015-11-20 10:44 -0700
Re: How can I export data from a website and write the contents to a text file? Rob Gaddi <rgaddi@technologyhighland.invalid> - 2015-11-18 18:05 +0000
Re: How can I export data from a website and write the contents to a text file? Random832 <random832@fastmail.com> - 2015-11-18 16:38 -0500
| From | ryguy7272 <ryanshuell@gmail.com> |
|---|---|
| Date | 2015-11-18 08:37 -0800 |
| Subject | How can I export data from a website and write the contents to a text file? |
| Message-ID | <9365cf2f-e9c7-4338-83b4-ce3d1d7ce1d6@googlegroups.com> |
I'm trying the script below, and it simple writes the last line to a text file. I want to add a '\n' after each line is written, so I don't overwrite all the lines.
from bs4 import BeautifulSoup
import urllib2
var_file = urllib2.urlopen("http://www.imdb.com/chart/top")
var_html = var_file.read()
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
print(link)
text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
z = str(link)
text_file.write(z + "\n")
text_file.write("\n")
text_file.close()
Can someone please help me get this working?
Thanks!!
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-11-19 03:57 +1100 |
| Message-ID | <mailman.418.1447865881.16136.python-list@python.org> |
| In reply to | #98978 |
On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <ryanshuell@gmail.com> wrote:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> z = str(link)
> text_file.write(z + "\n")
> text_file.write("\n")
> text_file.close()
You're opening the file every time you go through the loop,
overwriting each time. Instead, open the file once, then start the
loop, and then close it at the end. You can use a 'with' statement to
do the closing for you, or you can do it the way you are here.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | ryguy7272 <ryanshuell@gmail.com> |
|---|---|
| Date | 2015-11-18 09:03 -0800 |
| Message-ID | <099133ed-c6df-4f5c-b47b-f1cf464511f6@googlegroups.com> |
| In reply to | #98982 |
On Wednesday, November 18, 2015 at 11:58:17 AM UTC-5, Chris Angelico wrote:
> On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <ryanshuell@gmail.com> wrote:
> > text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> > z = str(link)
> > text_file.write(z + "\n")
> > text_file.write("\n")
> > text_file.close()
>
> You're opening the file every time you go through the loop,
> overwriting each time. Instead, open the file once, then start the
> loop, and then close it at the end. You can use a 'with' statement to
> do the closing for you, or you can do it the way you are here.
>
> ChrisA
Thanks. What would the code look like? I tried the code below, and got the same results.
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
#print(link)
z = str(link)
text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
text_file.write(z + "\n")
text_file.close()
[toc] | [prev] | [next] | [standalone]
| From | ryguy7272 <ryanshuell@gmail.com> |
|---|---|
| Date | 2015-11-18 09:15 -0800 |
| Message-ID | <9ddeb643-292f-4d5a-a891-83bee1d35c2f@googlegroups.com> |
| In reply to | #98983 |
On Wednesday, November 18, 2015 at 12:04:16 PM UTC-5, ryguy7272 wrote:
> On Wednesday, November 18, 2015 at 11:58:17 AM UTC-5, Chris Angelico wrote:
> > On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <> wrote:
> > > text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> > > z = str(link)
> > > text_file.write(z + "\n")
> > > text_file.write("\n")
> > > text_file.close()
> >
> > You're opening the file every time you go through the loop,
> > overwriting each time. Instead, open the file once, then start the
> > loop, and then close it at the end. You can use a 'with' statement to
> > do the closing for you, or you can do it the way you are here.
> >
> > ChrisA
>
>
>
> Thanks. What would the code look like? I tried the code below, and got the same results.
>
>
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> #print(link)
> z = str(link)
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> text_file.write(z + "\n")
> text_file.close()
Oh, I see, it's like this:
text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
#print(link)
z = str(link)
text_file.write(z + "\n")
text_file.close()
However, it's not organized very well, and it's hard to read. I thought the '\n' would create a new line after one line was written. Now, it seems like everything is jumbled together. Kind of weird. Am I missing something?
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-11-18 17:19 +0000 |
| Message-ID | <n2ibuh$bm9$2@dont-email.me> |
| In reply to | #98978 |
On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
> I'm trying the script below...
The problem isn't that you're over-writing the lines (although it may
seem that way to you), the problem is that you're overwriting the whole
file every time you write a link to it. This is because you open and
close the file for every link you write, and you do so in file mode "wb"
which restarts writing at the first byte of the file every time.
You only need to open and close the text file once, instead of for every
link you output. Try moving the lines to open and close the file outside
the outer for loop to change the loop from:
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
# open file
# write link to file
# close file
to:
# open file
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
# write link to file
# close file
Alternatively, use the with form:
with open("blah","wb") as text_file:
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
# write link to file
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | ryguy7272 <ryanshuell@gmail.com> |
|---|---|
| Date | 2015-11-18 09:40 -0800 |
| Message-ID | <6e0f470b-f896-43ae-8f83-b20f22a9db8d@googlegroups.com> |
| In reply to | #98985 |
On Wednesday, November 18, 2015 at 12:21:47 PM UTC-5, Denis McMahon wrote:
> On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
>
> > I'm trying the script below...
>
> The problem isn't that you're over-writing the lines (although it may
> seem that way to you), the problem is that you're overwriting the whole
> file every time you write a link to it. This is because you open and
> close the file for every link you write, and you do so in file mode "wb"
> which restarts writing at the first byte of the file every time.
>
> You only need to open and close the text file once, instead of for every
> link you output. Try moving the lines to open and close the file outside
> the outer for loop to change the loop from:
>
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> # open file
> # write link to file
> # close file
>
> to:
>
> # open file
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> # write link to file
> # close file
>
> Alternatively, use the with form:
>
> with open("blah","wb") as text_file:
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> # write link to file
>
> --
> Denis McMahon,
Yes, I just figured it out. Thanks.
It doesn't seem like the '\n' is doing anything useful. All the text is jumbled together. When I open the file in Excel, or Notepad++, it is easy to read. However, when I open it in as a regular text file, everything is jumbled together. Is there an easy way to fix this?
[toc] | [prev] | [next] | [standalone]
| From | ryguy7272 <ryanshuell@gmail.com> |
|---|---|
| Date | 2015-11-18 09:43 -0800 |
| Message-ID | <e0edf996-9ce8-404e-b4e0-1e9a7b9af706@googlegroups.com> |
| In reply to | #98987 |
On Wednesday, November 18, 2015 at 12:41:19 PM UTC-5, ryguy7272 wrote:
> On Wednesday, November 18, 2015 at 12:21:47 PM UTC-5, Denis McMahon wrote:
> > On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
> >
> > > I'm trying the script below...
> >
> > The problem isn't that you're over-writing the lines (although it may
> > seem that way to you), the problem is that you're overwriting the whole
> > file every time you write a link to it. This is because you open and
> > close the file for every link you write, and you do so in file mode "wb"
> > which restarts writing at the first byte of the file every time.
> >
> > You only need to open and close the text file once, instead of for every
> > link you output. Try moving the lines to open and close the file outside
> > the outer for loop to change the loop from:
> >
> > for item in soup.find_all(class_='lister-list'):
> > for link in item.find_all('a'):
> > # open file
> > # write link to file
> > # close file
> >
> > to:
> >
> > # open file
> > for item in soup.find_all(class_='lister-list'):
> > for link in item.find_all('a'):
> > # write link to file
> > # close file
> >
> > Alternatively, use the with form:
> >
> > with open("blah","wb") as text_file:
> > for item in soup.find_all(class_='lister-list'):
> > for link in item.find_all('a'):
> > # write link to file
> >
> > --
> > Denis McMahon,
>
>
> Yes, I just figured it out. Thanks.
>
> It doesn't seem like the '\n' is doing anything useful. All the text is jumbled together. When I open the file in Excel, or Notepad++, it is easy to read. However, when I open it in as a regular text file, everything is jumbled together. Is there an easy way to fix this?
I finally got it working. It's like this:
"\r\n"
Thanks everyone!!
[toc] | [prev] | [next] | [standalone]
| From | Patrick Hess <patrickhess@gmx.net> |
|---|---|
| Date | 2015-11-19 20:17 +0100 |
| Message-ID | <mailman.486.1447964619.16136.python-list@python.org> |
| In reply to | #98988 |
ryguy7272 wrote:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> [...]
> It doesn't seem like the '\n' is doing anything useful. All the text is jumbled together.
> [...]
> I finally got it working. It's like this:
> "\r\n"
The better solution would be to open text files in actual text mode:
open("filename", "wb") # binary mode
open("filename", "w") # text mode
In text mode, the correct line-ending characters, which will vary
depending on the operating system, are chosen automatically.
with open("test.txt", "w") as textfile:
textfile.write("line 1\n")
textfile.write("line 2")
This produces "line 1\nline 2" on Unix systems and "line 1\r\nline 2"
on Windows.
Also involves less typing this way. ;-)
Patrick
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2015-11-20 10:44 -0700 |
| Message-ID | <mailman.12.1448041450.2291.python-list@python.org> |
| In reply to | #98988 |
On 11/19/2015 12:17 PM, Patrick Hess wrote:
> ryguy7272 wrote:
>> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
>> [...]
>> It doesn't seem like the '\n' is doing anything useful. All the text is jumbled together.
>> [...]
>> I finally got it working. It's like this:
>> "\r\n"
>
> The better solution would be to open text files in actual text mode:
>
> open("filename", "wb") # binary mode
> open("filename", "w") # text mode
>
> In text mode, the correct line-ending characters, which will vary
> depending on the operating system, are chosen automatically.
It's not just a matter of line endings. It's a matter of text encoding
also. This is critical in Python3 where everything is unicode and
encoding is essential. You have to to use the text mode when writing
files here, and it's also a good idea to specify what encoding you wish
to write with (UTF-8 is a good default).
[toc] | [prev] | [next] | [standalone]
| From | Rob Gaddi <rgaddi@technologyhighland.invalid> |
|---|---|
| Date | 2015-11-18 18:05 +0000 |
| Message-ID | <n2ield$bbq$2@dont-email.me> |
| In reply to | #98987 |
On Wed, 18 Nov 2015 09:40:58 -0800, ryguy7272 wrote: > > It doesn't seem like the '\n' is doing anything useful. All the text is > jumbled together. When I open the file in Excel, or Notepad++, it is > easy to read. However, when I open it in as a regular text file, > everything is jumbled together. Is there an easy way to fix this? You're suffering cause-effect inversion. Windows default Notepad is a fundamentally crippled text editor that only knows how to handle Windows/ DOS style text files, where the line endings is '\n\r'. Notepad++, along with many other excellent editors available for Windows, is smart enough to figure out from the file whether it's Windows style or UNIX style, where line endings are just a bare '\n'. So the problem wasn't with what you were writing, it's with how you define "open it as a regular text file". On my Windows machine I long ago switched the default editor to Notepad++ for everything and was far happier for it. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
[toc] | [prev] | [next] | [standalone]
| From | Random832 <random832@fastmail.com> |
|---|---|
| Date | 2015-11-18 16:38 -0500 |
| Message-ID | <mailman.427.1447882725.16136.python-list@python.org> |
| In reply to | #98978 |
ryguy7272 <ryanshuell@gmail.com> writes:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
Remove the "b" from this line. This is causing it to omit the
platform-specific translation of "\n", which means some Windows
applications will not recognize the line endings.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web