Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #104494

Re: Read and count

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: Read and count
Date Thu, 10 Mar 2016 10:33:09 +0100
Organization None
Lines 119
Message-ID <mailman.119.1457602409.15725.python-list@python.org> (permalink)
References <2095750566.7009618.1457559033672.JavaMail.yahoo.ref@mail.yahoo.com> <mailman.116.1457599150.15725.python-list@python.org> <lf58u1qy9y1.fsf@ling.helsinki.fi>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Trace news.uni-berlin.de g0ebvx2O+YfCIrXFIGWpnAvz87naxon49PlEoS2YsfXA==
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python)': 0.05; 'python3': 0.05; 'newline': 0.07; 'val': 0.07; 'collections': 0.09; 'csv': 0.09; 'iterate': 0.09; 'learner': 0.09; 'observation': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'rows': 0.09; 'tuple': 0.09; 'python': 0.10; '(moving': 0.16; 'counter()': 0.16; 'int64': 0.16; 'line.split()': 0.16; 'non-empty': 0.16; 'parentheses': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'wrote:': 0.16; 'string': 0.17; '2001': 0.18; 'skip': 0.18; 'string,': 0.18; '>>>': 0.20; 'all,': 0.20; 'library': 0.20; 'year,': 0.22; 'keys': 0.22; 'sep': 0.22; 'trying': 0.22; "python's": 0.23; 'third-party': 0.23; 'import': 0.24; 'header': 0.24; 'sort': 0.25; 'module': 0.25; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'linux': 0.26; 'skip:" 20': 0.26; 'least': 0.27; 'cat': 0.29; 'dictionary': 0.29; 'omitted': 0.29; 'str': 0.29; 'character': 0.29; 'print': 0.30; 'code': 0.30; 'table': 0.32; 'statement': 0.32; 'file': 0.34; 'city.': 0.35; 'library.': 0.35; 'but': 0.36; 'lines': 0.36; 'closing': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'starting': 0.37; 'things': 0.38; 'skip:p 20': 0.38; 'end': 0.39; 'test': 0.39; 'data': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'some': 0.40; 'default': 0.61; 'more': 0.63; 'city': 0.65; 'here': 0.66; 'special': 0.73; '2002': 0.79; 'counts': 0.81
X-Injected-Via-Gmane http://gmane.org/
X-Gmane-NNTP-Posting-Host p57bd8240.dip0.t-ipconnect.de
User-Agent KNode/4.13.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:104494

Show key headers only | View raw


Jussi Piitulainen wrote:

> Val Krem writes:
> 
>> Hi all,
>>
>> I am a new learner about python (moving from R to python) and trying
>> read and count the number of observation by year for each city.
>>
>>
>> The data set look like
>> city year  x
>>
>> XC1 2001  10
>> XC1   2001  20
>> XC1   2002   20
>> XC1   2002   10
>> XC1 2002   10
>>
>> Yv2 2001   10
>> Yv2 2002   20
>> Yv2 2002   20
>> Yv2 2002   10
>> Yv2 2002   10
>>
>> out put will be
>>
>> city
>> xc1  2001  2
>> xc1   2002  3
>> yv1  2001  1
>> yv2  2002  3
>>
>>
>> Below is my starting code
>> count=0
>> fo=open("dat", "r+")
>> str = fo.read();
>> print "Read String is : ", str
>>
>> fo.close()
> 
> Below's some of the basics that you want to study. Also look up the csv
> module in Python's standard library. You will want to learn these things
> even if you end up using some sort of third-party data-frame library (I
> don't know those but they exist).

With pandas:
 
$ cat sample.txt
city year  x 
XC1 2001  10
XC1   2001  20
XC1   2002   20
XC1   2002   10
XC1 2002   10
Yv2 2001   10
Yv2 2002   20
Yv2 2002   20
Yv2 2002   10
Yv2 2002   10
$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> table = pandas.read_csv("sample.txt", delimiter=r"\s+")
>>> table
  city  year   x
0  XC1  2001  10
1  XC1  2001  20
2  XC1  2002  20
3  XC1  2002  10
4  XC1  2002  10
5  Yv2  2001  10
6  Yv2  2002  20
7  Yv2  2002  20
8  Yv2  2002  10
9  Yv2  2002  10

[10 rows x 3 columns]
>>> table.groupby(["city", "year"])["x"].count()
city  year
XC1   2001    2
      2002    3
Yv2   2001    1
      2002    4
dtype: int64


> from collections import Counter
> 
> # collections.Counter is a special dictionary type for just this
> counts = Counter()
> 
> # with statement ensures closing the file
> with open("dat") as fo:
>     # file object provides lines
>     next(fo) # skip header line
>     for line in fo:
>         # test requires non-empty string, but lines
>         # contain at least newline character so ok
>         if line.isspace(): continue
>         # .split() at whitespace, omits empty fields
>         city, year, x = line.split()
>         # collections.Counter has default 0,
>         # key is a tuple (city, year), parentheses omitted here
>         counts[city, year] += 1
> 
> print("city")
> for city, year in sorted(counts): # iterate over keys
>     print(city.lower(), year, counts[city, year], sep = "\t")
> 
> # Alternatively:
> # for cy, n in sorted(counts.items()):
> #   city, year = cy
> #   print(city.lower(), year, n, sep = "\t")

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Read and count Val Krem <valkrem@yahoo.com> - 2016-03-09 21:30 +0000
  Re: Read and count Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-10 11:11 +0200
    Re: Read and count Peter Otten <__peter__@web.de> - 2016-03-10 10:33 +0100
    Re: Read and count Val Krem <valkrem@yahoo.com> - 2016-03-10 11:09 -0600

csiph-web