Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35091 > unrolled thread

counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file

Started bydgcosgrave@gmail.com
First post2012-12-19 02:45 -0800
Last post2012-12-19 13:29 -0500
Articles 9 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file dgcosgrave@gmail.com - 2012-12-19 02:45 -0800
    Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2012-12-19 12:55 +0200
      Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file dgcosgrave@gmail.com - 2012-12-19 03:28 -0800
    Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-19 11:03 +0000
      Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file dgcosgrave@gmail.com - 2012-12-19 03:34 -0800
    Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file Thomas Bach <thbach@students.uni-mainz.de> - 2012-12-19 12:21 +0100
      Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file dgcosgrave@gmail.com - 2012-12-19 03:37 -0800
      Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file dgcosgrave@gmail.com - 2012-12-19 03:37 -0800
    Re: counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-12-19 13:29 -0500

#35091 — counting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file

Fromdgcosgrave@gmail.com
Date2012-12-19 02:45 -0800
Subjectcounting how often the same word appears in a txt file...But my code only prints the last line entry in the txt file
Message-ID<f91585d2-ca8d-4b01-96a0-db817c419858@googlegroups.com>
Hi Iam just starting out with python...My code below changes the txt file into a list and add them to an empty dictionary and print how often the word occurs, but it only seems to recognise and print the last entry of the txt file. Any help would be great.

tm =open('ask.txt', 'r')
dict = {}
for line in tm:
	line = line.strip()
	line = line.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
	line = line.lower()
	list = line.split(' ')
for word in list:	
		if word in dict:
			count = dict[word]
			count += 1
			dict[word] = count
else:
	dict[word] = 1
for word, count in dict.iteritems():
	print word + ":" + str(count)

[toc] | [next] | [standalone]


#35093

FromJussi Piitulainen <jpiitula@ling.helsinki.fi>
Date2012-12-19 12:55 +0200
Message-ID<qot4njinwf3.fsf@ruuvi.it.helsinki.fi>
In reply to#35091
dgcosgrave@gmail.com writes:

> Hi Iam just starting out with python...My code below changes the txt
> file into a list and add them to an empty dictionary and print how
> often the word occurs, but it only seems to recognise and print the
> last entry of the txt file. Any help would be great.
> 
> tm =open('ask.txt', 'r')
> dict = {}
> for line in tm:
> 	line = line.strip()
> 	line = line.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
> 	line = line.lower()
> 	list = line.split(' ')
> for word in list:	
> 		if word in dict:
> 			count = dict[word]
> 			count += 1
> 			dict[word] = count
> else:
> 	dict[word] = 1
> for word, count in dict.iteritems():
> 	print word + ":" + str(count)

The "else" clause is mis-indented (rather, mis-unindented).

Python's "for" statement does have an optional "else" clause. That's
why you don't get a syntax error. The "else" clause is used after the
loop finishes normally. That's why it catches the last word.

[toc] | [prev] | [next] | [standalone]


#35099

Fromdgcosgrave@gmail.com
Date2012-12-19 03:28 -0800
Message-ID<17964ae5-dc49-4f46-a004-d5c665d250dc@googlegroups.com>
In reply to#35093
On Wednesday, December 19, 2012 11:55:28 PM UTC+13, Jussi Piitulainen wrote:
> 
> 
> 
> 
> > Hi Iam just starting out with python...My code below changes the txt
> 
> > file into a list and add them to an empty dictionary and print how
> 
> > often the word occurs, but it only seems to recognise and print the
> 
> > last entry of the txt file. Any help would be great.
> 
> > 
> 
> > tm =open('ask.txt', 'r')
> 
> > dict = {}
> 
> > for line in tm:
> 
> > 	line = line.strip()
> 
> > 	line = line.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
> 
> > 	line = line.lower()
> 
> > 	list = line.split(' ')
> 
> > for word in list:	
> 
> > 		if word in dict:
> 
> > 			count = dict[word]
> 
> > 			count += 1
> 
> > 			dict[word] = count
> 
> > else:
> 
> > 	dict[word] = 1
> 
> > for word, count in dict.iteritems():
> 
> > 	print word + ":" + str(count)
> 
> 
> 
> The "else" clause is mis-indented (rather, mis-unindented).
> 
> 
> 
> Python's "for" statement does have an optional "else" clause. That's
> 
> why you don't get a syntax error. The "else" clause is used after the
> 
> loop finishes normally. That's why it catches the last word.

Thanks for quick reply Jussi...indentation fixed the problem :-)

[toc] | [prev] | [next] | [standalone]


#35095

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-12-19 11:03 +0000
Message-ID<50d19ef9$0$29991$c3e8da3$5496439d@news.astraweb.com>
In reply to#35091
On Wed, 19 Dec 2012 02:45:13 -0800, dgcosgrave wrote:

> Hi Iam just starting out with python...My code below changes the txt
> file into a list and add them to an empty dictionary and print how often
> the word occurs, but it only seems to recognise and print the last entry
> of the txt file. Any help would be great.
> 
> tm =open('ask.txt', 'r')
> dict = {}
> for line in tm:
> 	line = line.strip()
> 	line = line.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
>       line = line.lower()
> 	list = line.split(' ')

Note: you should use descriptive names. Since this is a list of WORDS, a 
much better name would be "words" rather than list. Also, list is a built-
in function, and you may run into trouble when you accidentally re-use 
that as a name. Same with using "dict" as you do.

Apart from that, so far so good. For each line, you generate a list of 
words. But that's when it goes wrong, because you don't do anything with 
the list of words! The next block of code is *outside* the for-loop, so 
it only runs once the for-loop is done. So it only sees the last list of 
words.

> for word in list:

The problem here is that you lost the indentation. You need to indent the 
"for word in list" (better: "for word in words") so that it starts level 
with the line above it.

> 		if word in dict:
> 			count = dict[word]
> 			count += 1
> 			dict[word] = count

This bit is fine.

> else:
> 	dict[word] = 1

But this fails for the same reason! You have lost the indentation.

A little-known fact: Python for-loops take an "else" block too! It's a 
badly named statement, but sometimes useful. You can write:


for value in values:
    do_something_with(value)
    if condition:
        break  # skip to the end of the for...else
else:
    print "We never reached the break statement"

So by pure accident, you lined up the "else" statement with the for loop, 
instead of what you needed:

for line in tm:
    ... blah blah blah
    for word in words:
        if word in word_counts:  # better name than "dict"
            ... blah blah blah
        else:
            ...


> for word, count in dict.iteritems():
> 	print word + ":" + str(count)

And this bit is okay too.


Good luck!


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#35100

Fromdgcosgrave@gmail.com
Date2012-12-19 03:34 -0800
Message-ID<88298877-1213-4305-96bf-2e3f99a88856@googlegroups.com>
In reply to#35095
On Thursday, December 20, 2012 12:03:21 AM UTC+13, Steven D'Aprano wrote:
> On Wed, 19 Dec 2012 02:45:13 -0800, dgcosgrave wrote:
> 
> 
> 
> > Hi Iam just starting out with python...My code below changes the txt
> 
> > file into a list and add them to an empty dictionary and print how often
> 
> > the word occurs, but it only seems to recognise and print the last entry
> 
> > of the txt file. Any help would be great.
> 
> > 
> 
> > tm =open('ask.txt', 'r')
> 
> > dict = {}
> 
> > for line in tm:
> 
> > 	line = line.strip()
> 
> > 	line = line.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
> 
> >       line = line.lower()
> 
> > 	list = line.split(' ')
> 
> 
> 
> Note: you should use descriptive names. Since this is a list of WORDS, a 
> 
> much better name would be "words" rather than list. Also, list is a built-
> 
> in function, and you may run into trouble when you accidentally re-use 
> 
> that as a name. Same with using "dict" as you do.
> 
> 
> 
> Apart from that, so far so good. For each line, you generate a list of 
> 
> words. But that's when it goes wrong, because you don't do anything with 
> 
> the list of words! The next block of code is *outside* the for-loop, so 
> 
> it only runs once the for-loop is done. So it only sees the last list of 
> 
> words.
> 
> 
> 
> > for word in list:
> 
> 
> 
> The problem here is that you lost the indentation. You need to indent the 
> 
> "for word in list" (better: "for word in words") so that it starts level 
> 
> with the line above it.
> 
> 
> 
> > 		if word in dict:
> 
> > 			count = dict[word]
> 
> > 			count += 1
> 
> > 			dict[word] = count
> 
> 
> 
> This bit is fine.
> 
> 
> 
> > else:
> 
> > 	dict[word] = 1
> 
> 
> 
> But this fails for the same reason! You have lost the indentation.
> 
> 
> 
> A little-known fact: Python for-loops take an "else" block too! It's a 
> 
> badly named statement, but sometimes useful. You can write:
> 
> 
> 
> 
> 
> for value in values:
> 
>     do_something_with(value)
> 
>     if condition:
> 
>         break  # skip to the end of the for...else
> 
> else:
> 
>     print "We never reached the break statement"
> 
> 
> 
> So by pure accident, you lined up the "else" statement with the for loop, 
> 
> instead of what you needed:
> 
> 
> 
> for line in tm:
> 
>     ... blah blah blah
> 
>     for word in words:
> 
>         if word in word_counts:  # better name than "dict"
> 
>             ... blah blah blah
> 
>         else:
> 
>             ...
> 
> 
> 
> 
> 
> > for word, count in dict.iteritems():
> 
> > 	print word + ":" + str(count)
> 
> 
> 
> And this bit is okay too.
> 
> 
> 
> 
> 
> Good luck!
> 
> 
> 
> 
> 
> -- 
> 
> Steven

Thanks Steven appreciate great info for future coding. i have change names to be more decriptive and corrected the indentation... all works! cheers

[toc] | [prev] | [next] | [standalone]


#35097

FromThomas Bach <thbach@students.uni-mainz.de>
Date2012-12-19 12:21 +0100
Message-ID<mailman.1040.1355916192.29569.python-list@python.org>
In reply to#35091
Hi,

just as a side-note

On Wed, Dec 19, 2012 at 02:45:13AM -0800, dgcosgrave@gmail.com wrote:
> for word in list:	
> 		if word in dict:
> 			count = dict[word]
> 			count += 1
> 			dict[word] = count
> else:
> 	dict[word] = 1

When you got the indentation and names right, you can restate this as

import collections
counter = collections.Counter(words)

in Python 2.7 or as

import collections
counter = collections.defaultdict(int)
for word in words:
    counter[word] += 1

in Python 2.6

Regards,
	Thomas.

[toc] | [prev] | [next] | [standalone]


#35101

Fromdgcosgrave@gmail.com
Date2012-12-19 03:37 -0800
Message-ID<0e63ae65-1607-494e-9e76-2ed7cd4e3e19@googlegroups.com>
In reply to#35097
On Thursday, December 20, 2012 12:21:57 AM UTC+13, Thomas Bach wrote:
> Hi,
> 
> 
> 
> just as a side-note
> 
> 
> 
> On Wed, Dec 19, 2012 at 02:45:13AM -0800, :
> 
> > for word in list:	
> 
> > 		if word in dict:
> 
> > 			count = dict[word]
> 
> > 			count += 1
> 
> > 			dict[word] = count
> 
> > else:
> 
> > 	dict[word] = 1
> 
> 
> 
> When you got the indentation and names right, you can restate this as
> 
> 
> 
> import collections
> 
> counter = collections.Counter(words)
> 
> 
> 
> in Python 2.7 or as
> 
> 
> 
> import collections
> 
> counter = collections.defaultdict(int)
> 
> for word in words:
> 
>     counter[word] += 1
> 
> 
> 
> in Python 2.6
> 
> 
> 
> Regards,
> 
> 	Thomas.

Thanks Thomas for your time... using 2.7 great!

[toc] | [prev] | [next] | [standalone]


#35105

Fromdgcosgrave@gmail.com
Date2012-12-19 03:37 -0800
Message-ID<mailman.1043.1355919690.29569.python-list@python.org>
In reply to#35097
On Thursday, December 20, 2012 12:21:57 AM UTC+13, Thomas Bach wrote:
> Hi,
> 
> 
> 
> just as a side-note
> 
> 
> 
> On Wed, Dec 19, 2012 at 02:45:13AM -0800, :
> 
> > for word in list:	
> 
> > 		if word in dict:
> 
> > 			count = dict[word]
> 
> > 			count += 1
> 
> > 			dict[word] = count
> 
> > else:
> 
> > 	dict[word] = 1
> 
> 
> 
> When you got the indentation and names right, you can restate this as
> 
> 
> 
> import collections
> 
> counter = collections.Counter(words)
> 
> 
> 
> in Python 2.7 or as
> 
> 
> 
> import collections
> 
> counter = collections.defaultdict(int)
> 
> for word in words:
> 
>     counter[word] += 1
> 
> 
> 
> in Python 2.6
> 
> 
> 
> Regards,
> 
> 	Thomas.

Thanks Thomas for your time... using 2.7 great!

[toc] | [prev] | [next] | [standalone]


#35148

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-12-19 13:29 -0500
Message-ID<mailman.1069.1355941724.29569.python-list@python.org>
In reply to#35091
On Wed, 19 Dec 2012 02:45:13 -0800 (PST), dgcosgrave@gmail.com declaimed
the following in gmane.comp.python.general:


> tm =open('ask.txt', 'r')
> dict = {}
> for line in tm:
> 	line = line.strip()
> 	line = line.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
> 	line = line.lower()
> 	list = line.split(' ')

	You could (though it gets a bit long) combine some of the above...

	list = line.strip().translate(
				None,
				"the punctuation set"
				).lower().split()

	# taking advantage that open ()/[]/{} automatically continue on next
lines

> for word in list:	

	INDENTATION!  As coded, you first do the strip/translate/lower/split
on EACH line of the file... THEN you are processing the words in the
LAST line processed in the previous loop.

> 		if word in dict:
> 			count = dict[word]
> 			count += 1
> 			dict[word] = count
> else:
> 	dict[word] = 1

	More indentation -- I suspect your want the else: and following line
to be indented the same as the if line...

	Though the whole block can be simplified to

		dict[word] = dict.get(word, 0) + 1

> for word, count in dict.iteritems():
> 	print word + ":" + str(count)
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web