Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #104489 > unrolled thread

Read and count

Started byVal Krem <valkrem@yahoo.com>
First post2016-03-09 21:30 +0000
Last post2016-03-10 11:09 -0600
Articles 4 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Read and count Val Krem <valkrem@yahoo.com> - 2016-03-09 21:30 +0000
    Re: Read and count Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-10 11:11 +0200
      Re: Read and count Peter Otten <__peter__@web.de> - 2016-03-10 10:33 +0100
      Re: Read and count Val Krem <valkrem@yahoo.com> - 2016-03-10 11:09 -0600

#104489 — Read and count

FromVal Krem <valkrem@yahoo.com>
Date2016-03-09 21:30 +0000
SubjectRead and count
Message-ID<mailman.116.1457599150.15725.python-list@python.org>
Hi all,

I am a new learner about python (moving from R to python) and trying  read and count the number of observation  by year for each city.


The data set look like
city year  x 

XC1 2001  10
XC1   2001  20
XC1   2002   20
XC1   2002   10
XC1 2002   10

Yv2 2001   10
Yv2 2002   20
Yv2 2002   20
Yv2 2002   10
Yv2 2002   10

out put will be

city
xc1  2001  2
xc1   2002  3
yv1  2001  1
yv2  2002  3


Below is my starting code
count=0
fo=open("dat", "r+")
str = fo.read();
print "Read String is : ", str

fo.close()


Many thanks

[toc] | [next] | [standalone]


#104491

FromJussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date2016-03-10 11:11 +0200
Message-ID<lf58u1qy9y1.fsf@ling.helsinki.fi>
In reply to#104489
Val Krem writes:

> Hi all,
>
> I am a new learner about python (moving from R to python) and trying
> read and count the number of observation by year for each city.
>
>
> The data set look like
> city year  x 
>
> XC1 2001  10
> XC1   2001  20
> XC1   2002   20
> XC1   2002   10
> XC1 2002   10
>
> Yv2 2001   10
> Yv2 2002   20
> Yv2 2002   20
> Yv2 2002   10
> Yv2 2002   10
>
> out put will be
>
> city
> xc1  2001  2
> xc1   2002  3
> yv1  2001  1
> yv2  2002  3
>
>
> Below is my starting code
> count=0
> fo=open("dat", "r+")
> str = fo.read();
> print "Read String is : ", str
>
> fo.close()

Below's some of the basics that you want to study. Also look up the csv
module in Python's standard library. You will want to learn these things
even if you end up using some sort of third-party data-frame library (I
don't know those but they exist).

from collections import Counter

# collections.Counter is a special dictionary type for just this
counts = Counter()

# with statement ensures closing the file
with open("dat") as fo:
    # file object provides lines
    next(fo) # skip header line
    for line in fo:
        # test requires non-empty string, but lines
        # contain at least newline character so ok
        if line.isspace(): continue
        # .split() at whitespace, omits empty fields
        city, year, x = line.split()
        # collections.Counter has default 0,
        # key is a tuple (city, year), parentheses omitted here
        counts[city, year] += 1

print("city")
for city, year in sorted(counts): # iterate over keys
    print(city.lower(), year, counts[city, year], sep = "\t")

# Alternatively:
# for cy, n in sorted(counts.items()):
#   city, year = cy
#   print(city.lower(), year, n, sep = "\t")

[toc] | [prev] | [next] | [standalone]


#104494

FromPeter Otten <__peter__@web.de>
Date2016-03-10 10:33 +0100
Message-ID<mailman.119.1457602409.15725.python-list@python.org>
In reply to#104491
Jussi Piitulainen wrote:

> Val Krem writes:
> 
>> Hi all,
>>
>> I am a new learner about python (moving from R to python) and trying
>> read and count the number of observation by year for each city.
>>
>>
>> The data set look like
>> city year  x
>>
>> XC1 2001  10
>> XC1   2001  20
>> XC1   2002   20
>> XC1   2002   10
>> XC1 2002   10
>>
>> Yv2 2001   10
>> Yv2 2002   20
>> Yv2 2002   20
>> Yv2 2002   10
>> Yv2 2002   10
>>
>> out put will be
>>
>> city
>> xc1  2001  2
>> xc1   2002  3
>> yv1  2001  1
>> yv2  2002  3
>>
>>
>> Below is my starting code
>> count=0
>> fo=open("dat", "r+")
>> str = fo.read();
>> print "Read String is : ", str
>>
>> fo.close()
> 
> Below's some of the basics that you want to study. Also look up the csv
> module in Python's standard library. You will want to learn these things
> even if you end up using some sort of third-party data-frame library (I
> don't know those but they exist).

With pandas:
 
$ cat sample.txt
city year  x 
XC1 2001  10
XC1   2001  20
XC1   2002   20
XC1   2002   10
XC1 2002   10
Yv2 2001   10
Yv2 2002   20
Yv2 2002   20
Yv2 2002   10
Yv2 2002   10
$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> table = pandas.read_csv("sample.txt", delimiter=r"\s+")
>>> table
  city  year   x
0  XC1  2001  10
1  XC1  2001  20
2  XC1  2002  20
3  XC1  2002  10
4  XC1  2002  10
5  Yv2  2001  10
6  Yv2  2002  20
7  Yv2  2002  20
8  Yv2  2002  10
9  Yv2  2002  10

[10 rows x 3 columns]
>>> table.groupby(["city", "year"])["x"].count()
city  year
XC1   2001    2
      2002    3
Yv2   2001    1
      2002    4
dtype: int64


> from collections import Counter
> 
> # collections.Counter is a special dictionary type for just this
> counts = Counter()
> 
> # with statement ensures closing the file
> with open("dat") as fo:
>     # file object provides lines
>     next(fo) # skip header line
>     for line in fo:
>         # test requires non-empty string, but lines
>         # contain at least newline character so ok
>         if line.isspace(): continue
>         # .split() at whitespace, omits empty fields
>         city, year, x = line.split()
>         # collections.Counter has default 0,
>         # key is a tuple (city, year), parentheses omitted here
>         counts[city, year] += 1
> 
> print("city")
> for city, year in sorted(counts): # iterate over keys
>     print(city.lower(), year, counts[city, year], sep = "\t")
> 
> # Alternatively:
> # for cy, n in sorted(counts.items()):
> #   city, year = cy
> #   print(city.lower(), year, n, sep = "\t")

[toc] | [prev] | [next] | [standalone]


#104535

FromVal Krem <valkrem@yahoo.com>
Date2016-03-10 11:09 -0600
Message-ID<mailman.141.1457629772.15725.python-list@python.org>
In reply to#104491
Thank you very much for the help.

First I want count by city and year. 
City year count
Xc1.    2001.  1
Xc1.    2002.  3
Yv1.     2001.  1
Yv2.    2002.  4
This worked fine !

Now I want to count by city only
City. Count
Xc1.   4
Yv2.  5

Then combine these two objects with the original data and send it to a file called  "detout" with these columns:

"City", " year ", "x ", "cycount ", "citycount"

Many thanks again






This worked fine. I tried to count only by city  and combine the three objects together 

City
Xc1  4
Yv2  5



Sent from my iPad 

> On Mar 10, 2016, at 3:11 AM, Jussi Piitulainen <jussi.piitulainen@helsinki.fi> wrote:
> 
> Val Krem writes:
> 
>> Hi all,
>> 
>> I am a new learner about python (moving from R to python) and trying
>> read and count the number of observation by year for each city.
>> 
>> 
>> The data set look like
>> city year  x 
>> 
>> XC1 2001  10
>> XC1   2001  20
>> XC1   2002   20
>> XC1   2002   10
>> XC1 2002   10
>> 
>> Yv2 2001   10
>> Yv2 2002   20
>> Yv2 2002   20
>> Yv2 2002   10
>> Yv2 2002   10
>> 
>> out put will be
>> 
>> city
>> xc1  2001  2
>> xc1   2002  3
>> yv1  2001  1
>> yv2  2002  3
>> 
>> 
>> Below is my starting code
>> count=0
>> fo=open("dat", "r+")
>> str = fo.read();
>> print "Read String is : ", str
>> 
>> fo.close()
> 
> Below's some of the basics that you want to study. Also look up the csv
> module in Python's standard library. You will want to learn these things
> even if you end up using some sort of third-party data-frame library (I
> don't know those but they exist).
> 
> from collections import Counter
> 
> # collections.Counter is a special dictionary type for just this
> counts = Counter()
> 
> # with statement ensures closing the file
> with open("dat") as fo:
>    # file object provides lines
>    next(fo) # skip header line
>    for line in fo:
>        # test requires non-empty string, but lines
>        # contain at least newline character so ok
>        if line.isspace(): continue
>        # .split() at whitespace, omits empty fields
>        city, year, x = line.split()
>        # collections.Counter has default 0,
>        # key is a tuple (city, year), parentheses omitted here
>        counts[city, year] += 1
> 
> print("city")
> for city, year in sorted(counts): # iterate over keys
>    print(city.lower(), year, counts[city, year], sep = "\t")
> 
> # Alternatively:
> # for cy, n in sorted(counts.items()):
> #   city, year = cy
> #   print(city.lower(), year, n, sep = "\t")
> -- 
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web