Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98978 > unrolled thread

How can I export data from a website and write the contents to a text file?

Started byryguy7272 <ryanshuell@gmail.com>
First post2015-11-18 08:37 -0800
Last post2015-11-18 16:38 -0500
Articles 11 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 08:37 -0800
    Re: How can I export data from a website and write the contents to a text file? Chris Angelico <rosuav@gmail.com> - 2015-11-19 03:57 +1100
      Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:03 -0800
        Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:15 -0800
    Re: How can I export data from a website and write the contents to a text file? Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-18 17:19 +0000
      Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:40 -0800
        Re: How can I export data from a website and write the contents to a text file? ryguy7272 <ryanshuell@gmail.com> - 2015-11-18 09:43 -0800
          Re: How can I export data from a website and write the contents to a text file? Patrick Hess <patrickhess@gmx.net> - 2015-11-19 20:17 +0100
          Re: How can I export data from a website and write the contents to a text file? Michael Torrie <torriem@gmail.com> - 2015-11-20 10:44 -0700
        Re: How can I export data from a website and write the contents to a text file? Rob Gaddi <rgaddi@technologyhighland.invalid> - 2015-11-18 18:05 +0000
    Re: How can I export data from a website and write the contents to a text file? Random832 <random832@fastmail.com> - 2015-11-18 16:38 -0500

#98978 — How can I export data from a website and write the contents to a text file?

Fromryguy7272 <ryanshuell@gmail.com>
Date2015-11-18 08:37 -0800
SubjectHow can I export data from a website and write the contents to a text file?
Message-ID<9365cf2f-e9c7-4338-83b4-ce3d1d7ce1d6@googlegroups.com>
I'm trying the script below, and it simple writes the last line to a text file.  I want to add a '\n' after each line is written, so I don't overwrite all the lines.


from bs4 import BeautifulSoup
import urllib2

var_file = urllib2.urlopen("http://www.imdb.com/chart/top")

var_html  = var_file.read()

var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        print(link)
        text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
        z = str(link)
        text_file.write(z + "\n")
        text_file.write("\n")
        text_file.close()


Can someone please help me get this working?
Thanks!!

[toc] | [next] | [standalone]


#98982

FromChris Angelico <rosuav@gmail.com>
Date2015-11-19 03:57 +1100
Message-ID<mailman.418.1447865881.16136.python-list@python.org>
In reply to#98978
On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <ryanshuell@gmail.com> wrote:
>       text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
>         z = str(link)
>         text_file.write(z + "\n")
>         text_file.write("\n")
>         text_file.close()

You're opening the file every time you go through the loop,
overwriting each time. Instead, open the file once, then start the
loop, and then close it at the end. You can use a 'with' statement to
do the closing for you, or you can do it the way you are here.

ChrisA

[toc] | [prev] | [next] | [standalone]


#98983

Fromryguy7272 <ryanshuell@gmail.com>
Date2015-11-18 09:03 -0800
Message-ID<099133ed-c6df-4f5c-b47b-f1cf464511f6@googlegroups.com>
In reply to#98982
On Wednesday, November 18, 2015 at 11:58:17 AM UTC-5, Chris Angelico wrote:
> On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <ryanshuell@gmail.com> wrote:
> >       text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> >         z = str(link)
> >         text_file.write(z + "\n")
> >         text_file.write("\n")
> >         text_file.close()
> 
> You're opening the file every time you go through the loop,
> overwriting each time. Instead, open the file once, then start the
> loop, and then close it at the end. You can use a 'with' statement to
> do the closing for you, or you can do it the way you are here.
> 
> ChrisA



Thanks.  What would the code look like?  I tried the code below, and got the same results.


for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        #print(link)
        z = str(link)
        text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
        text_file.write(z + "\n")
        text_file.close()


[toc] | [prev] | [next] | [standalone]


#98984

Fromryguy7272 <ryanshuell@gmail.com>
Date2015-11-18 09:15 -0800
Message-ID<9ddeb643-292f-4d5a-a891-83bee1d35c2f@googlegroups.com>
In reply to#98983
On Wednesday, November 18, 2015 at 12:04:16 PM UTC-5, ryguy7272 wrote:
> On Wednesday, November 18, 2015 at 11:58:17 AM UTC-5, Chris Angelico wrote:
> > On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <> wrote:
> > >       text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> > >         z = str(link)
> > >         text_file.write(z + "\n")
> > >         text_file.write("\n")
> > >         text_file.close()
> > 
> > You're opening the file every time you go through the loop,
> > overwriting each time. Instead, open the file once, then start the
> > loop, and then close it at the end. You can use a 'with' statement to
> > do the closing for you, or you can do it the way you are here.
> > 
> > ChrisA
> 
> 
> 
> Thanks.  What would the code look like?  I tried the code below, and got the same results.
> 
> 
> for item in soup.find_all(class_='lister-list'):
>     for link in item.find_all('a'):
>         #print(link)
>         z = str(link)
>         text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
>         text_file.write(z + "\n")
>         text_file.close()


Oh, I see, it's like this:

text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        #print(link)
        z = str(link)
        text_file.write(z + "\n")
text_file.close()


However, it's not organized very well, and it's hard to read.  I thought the '\n' would create a new line after one line was written.  Now, it seems like everything is jumbled together.  Kind of weird.  Am I missing something?

[toc] | [prev] | [next] | [standalone]


#98985

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-11-18 17:19 +0000
Message-ID<n2ibuh$bm9$2@dont-email.me>
In reply to#98978
On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:

> I'm trying the script below...

The problem isn't that you're over-writing the lines (although it may 
seem that way to you), the problem is that you're overwriting the whole 
file every time you write a link to it. This is because you open and 
close the file for every link you write, and you do so in file mode "wb" 
which restarts writing at the first byte of the file every time.

You only need to open and close the text file once, instead of for every 
link you output. Try moving the lines to open and close the file outside 
the outer for loop to change the loop from:

for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        # open file
        # write link to file
        # close file

to:

# open file
for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        # write link to file
# close file

Alternatively, use the with form:

with open("blah","wb") as text_file:
    for item in soup.find_all(class_='lister-list'):
        for link in item.find_all('a'):
            # write link to file

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#98987

Fromryguy7272 <ryanshuell@gmail.com>
Date2015-11-18 09:40 -0800
Message-ID<6e0f470b-f896-43ae-8f83-b20f22a9db8d@googlegroups.com>
In reply to#98985
On Wednesday, November 18, 2015 at 12:21:47 PM UTC-5, Denis McMahon wrote:
> On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
> 
> > I'm trying the script below...
> 
> The problem isn't that you're over-writing the lines (although it may 
> seem that way to you), the problem is that you're overwriting the whole 
> file every time you write a link to it. This is because you open and 
> close the file for every link you write, and you do so in file mode "wb" 
> which restarts writing at the first byte of the file every time.
> 
> You only need to open and close the text file once, instead of for every 
> link you output. Try moving the lines to open and close the file outside 
> the outer for loop to change the loop from:
> 
> for item in soup.find_all(class_='lister-list'):
>     for link in item.find_all('a'):
>         # open file
>         # write link to file
>         # close file
> 
> to:
> 
> # open file
> for item in soup.find_all(class_='lister-list'):
>     for link in item.find_all('a'):
>         # write link to file
> # close file
> 
> Alternatively, use the with form:
> 
> with open("blah","wb") as text_file:
>     for item in soup.find_all(class_='lister-list'):
>         for link in item.find_all('a'):
>             # write link to file
> 
> -- 
> Denis McMahon, 


Yes, I just figured it out.  Thanks.  

It doesn't seem like the '\n' is doing anything useful.  All the text is jumbled together.  When I open the file in Excel, or Notepad++, it is easy to read.  However, when I open it in as a regular text file, everything is jumbled together.  Is there an easy way to fix this?

[toc] | [prev] | [next] | [standalone]


#98988

Fromryguy7272 <ryanshuell@gmail.com>
Date2015-11-18 09:43 -0800
Message-ID<e0edf996-9ce8-404e-b4e0-1e9a7b9af706@googlegroups.com>
In reply to#98987
On Wednesday, November 18, 2015 at 12:41:19 PM UTC-5, ryguy7272 wrote:
> On Wednesday, November 18, 2015 at 12:21:47 PM UTC-5, Denis McMahon wrote:
> > On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
> > 
> > > I'm trying the script below...
> > 
> > The problem isn't that you're over-writing the lines (although it may 
> > seem that way to you), the problem is that you're overwriting the whole 
> > file every time you write a link to it. This is because you open and 
> > close the file for every link you write, and you do so in file mode "wb" 
> > which restarts writing at the first byte of the file every time.
> > 
> > You only need to open and close the text file once, instead of for every 
> > link you output. Try moving the lines to open and close the file outside 
> > the outer for loop to change the loop from:
> > 
> > for item in soup.find_all(class_='lister-list'):
> >     for link in item.find_all('a'):
> >         # open file
> >         # write link to file
> >         # close file
> > 
> > to:
> > 
> > # open file
> > for item in soup.find_all(class_='lister-list'):
> >     for link in item.find_all('a'):
> >         # write link to file
> > # close file
> > 
> > Alternatively, use the with form:
> > 
> > with open("blah","wb") as text_file:
> >     for item in soup.find_all(class_='lister-list'):
> >         for link in item.find_all('a'):
> >             # write link to file
> > 
> > -- 
> > Denis McMahon, 
> 
> 
> Yes, I just figured it out.  Thanks.  
> 
> It doesn't seem like the '\n' is doing anything useful.  All the text is jumbled together.  When I open the file in Excel, or Notepad++, it is easy to read.  However, when I open it in as a regular text file, everything is jumbled together.  Is there an easy way to fix this?

I finally got it working.  It's like this:
"\r\n"

Thanks everyone!!

[toc] | [prev] | [next] | [standalone]


#99093

FromPatrick Hess <patrickhess@gmx.net>
Date2015-11-19 20:17 +0100
Message-ID<mailman.486.1447964619.16136.python-list@python.org>
In reply to#98988
ryguy7272 wrote:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> [...]
> It doesn't seem like the '\n' is doing anything useful.  All the text is jumbled together.
> [...]
> I finally got it working.  It's like this:
> "\r\n"

The better solution would be to open text files in actual text mode:

    open("filename", "wb")   # binary mode
    open("filename", "w")    # text mode

In text mode, the correct line-ending characters, which will vary
depending on the operating system, are chosen automatically.

    with open("test.txt", "w") as textfile:
        textfile.write("line 1\n")
        textfile.write("line 2")

This produces "line 1\nline 2" on Unix systems and "line 1\r\nline 2"
on Windows.

Also involves less typing this way. ;-)

Patrick

[toc] | [prev] | [next] | [standalone]


#99174

FromMichael Torrie <torriem@gmail.com>
Date2015-11-20 10:44 -0700
Message-ID<mailman.12.1448041450.2291.python-list@python.org>
In reply to#98988
On 11/19/2015 12:17 PM, Patrick Hess wrote:
> ryguy7272 wrote:
>> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
>> [...]
>> It doesn't seem like the '\n' is doing anything useful.  All the text is jumbled together.
>> [...]
>> I finally got it working.  It's like this:
>> "\r\n"
> 
> The better solution would be to open text files in actual text mode:
> 
>     open("filename", "wb")   # binary mode
>     open("filename", "w")    # text mode
> 
> In text mode, the correct line-ending characters, which will vary
> depending on the operating system, are chosen automatically.

It's not just a matter of line endings. It's a matter of text encoding
also.  This is critical in Python3 where everything is unicode and
encoding is essential.  You have to to use the text mode when writing
files here, and it's also a good idea to specify what encoding you wish
to write with (UTF-8 is a good default).

[toc] | [prev] | [next] | [standalone]


#98991

FromRob Gaddi <rgaddi@technologyhighland.invalid>
Date2015-11-18 18:05 +0000
Message-ID<n2ield$bbq$2@dont-email.me>
In reply to#98987
On Wed, 18 Nov 2015 09:40:58 -0800, ryguy7272 wrote:
> 
> It doesn't seem like the '\n' is doing anything useful.  All the text is
> jumbled together.  When I open the file in Excel, or Notepad++, it is
> easy to read.  However, when I open it in as a regular text file,
> everything is jumbled together.  Is there an easy way to fix this?

You're suffering cause-effect inversion.  Windows default Notepad is a 
fundamentally crippled text editor that only knows how to handle Windows/
DOS style text files, where the line endings is '\n\r'.  Notepad++, along 
with many other excellent editors available for Windows, is smart enough 
to figure out from the file whether it's Windows style or UNIX style, 
where line endings are just a bare '\n'.

So the problem wasn't with what you were writing, it's with how you 
define "open it as a regular text file".  On my Windows machine I long 
ago switched the default editor to Notepad++ for everything and was far 
happier for it.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

[toc] | [prev] | [next] | [standalone]


#99000

FromRandom832 <random832@fastmail.com>
Date2015-11-18 16:38 -0500
Message-ID<mailman.427.1447882725.16136.python-list@python.org>
In reply to#98978
ryguy7272 <ryanshuell@gmail.com> writes:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")

Remove the "b" from this line. This is causing it to omit the
platform-specific translation of "\n", which means some Windows
applications will not recognize the line endings.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web