Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #104863
| Path | csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Terry Reedy <tjreedy@udel.edu> |
| Newsgroups | comp.lang.python |
| Subject | Re: Different sources of file |
| Date | Mon, 14 Mar 2016 17:15:55 -0400 |
| Lines | 88 |
| Message-ID | <mailman.131.1457990191.12893.python-list@python.org> (permalink) |
| References | <391627201.251709.1457795607818.JavaMail.yahoo.ref@mail.yahoo.com> <391627201.251709.1457795607818.JavaMail.yahoo@mail.yahoo.com> <1903250354.1594913.1457989004279.JavaMail.yahoo@mail.yahoo.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=utf-8; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-Trace | news.uni-berlin.de hulSXLWGObIv2o8vdONnow8Dpuz276/Q8xen4bVbrvFw== |
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'sys': 0.05; 'column': 0.07; 'subject:file': 0.07; 'val': 0.07; 'collections': 0.09; 'csv': 0.09; 'delimited': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python.': 0.11; 'jan': 0.11; 'output': 0.13; 'appropriate': 0.14; '"m"': 0.16; '(x1,': 0.16; 'columns': 0.16; 'comma': 0.16; 'concatenate': 0.16; 'count)': 0.16; 'f1:': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'reedy': 0.16; 'subject:Different': 0.16; 'wrote:': 0.16; 'variable': 0.18; 'all,': 0.20; 'assuming': 0.22; 'file.': 0.22; 'needed.': 0.23; 'import': 0.24; 'examples': 0.24; 'header:In- Reply-To:1': 0.24; 'module': 0.25; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'skip:# 10': 0.27; 'finally,': 0.27; 'values': 0.28; 'skip:( 20': 0.28; 'record': 0.29; 'recorded': 0.29; 'separated': 0.29; 'character': 0.29; 'skip:. 10': 0.32; 'file': 0.34; 'attempt': 0.35; 'but': 0.36; 'there': 0.36; 'lines': 0.36; 'skip:{ 10': 0.36; 'to:addr:python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'say': 0.37; 'received:org': 0.37; 'sources': 0.37; 'doing': 0.38; 'progress': 0.38; 'delete': 0.38; 'files': 0.38; 'skip:o 20': 0.38; 'data': 0.39; 'rather': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'your': 0.60; 'total': 0.62; 'received:96': 0.63; 'different': 0.63; 'great': 0.63; 'python-list': 0.66; 'records': 0.70; 'received:fios.verizon.net': 0.91; 'male': 0.93; 'sex': 0.95 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| X-Gmane-NNTP-Posting-Host | pool-96-227-207-81.phlapa.fios.verizon.net |
| User-Agent | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
| In-Reply-To | <1903250354.1594913.1457989004279.JavaMail.yahoo@mail.yahoo.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:104863 |
Show key headers only | View raw
On 3/14/2016 4:56 PM, Val Krem via Python-list wrote:
>
>
> Hi all,
>
>
>
> I am made a little progress on using python.
> I have five files to read from different sources and concatenate them to one file. From each file I want only to pick few column (x1, x2 and x3). However, these columns say x3 was a date in one file it was recorded as a character (2015/12/26) and in the other file it was records (20151226) and in the other file it was recorded as (26122015). How do I standardized these into one form (yyyymmdd-20151126). If there is no date then delete that record
>
> 2. The other variable x2. In one of the one files it was recorded as "M" and "F". In the other file x3 is 1 for male and 2 for female. So I want to change all to 1 or 2. if this variable is out of range M / F or 1 or 2 then delete that record
>
> 3. After doing all these I want combine all files into one and send it to output.
>
> Finally, do some statistics such as number of records read from each file. Distribution of sex and total number of records sent out to a file.
>
> Below is my attempt but not great
> #!/usr/bin/python
> import sys
> import csv
> from collections import Counter
>
> N=10
> count=0
> with open("file1") as f1:
> for line in f1:
> count+=1
> print("Total Number of records read", count)
> # I want to see the first few lines of the data
>
>
> file1Name x2 x3
> Alex1 F 2015/02/11
> Alex2 M 2012/01/27
> Alex3 F 2011/10/20
> Alex4 M .
> Alex5 N 2003/11/14
>
> file2
> Name x2 x3
> Bob1 1 2010-02-10
> Bob2 2 2001-01-07
> Bob3 1 2002-10-21
> Bob4 2 2004-11-17
> bob5 0 2009-11-19
>
> file2
> Name x2 x3
> Alexa1 0 12102013
> Alexa2 2 20012007
> Alexa3 1 11052002
> Alexa4 2 26112004
> Alexa5 2 15072009
Your examples are not comma separated values, rather column delimited
values, so csv module is not appropriate nor note needed. Assuming that
your examples do not mislead, slice out values by columns.
for line in file1:
name = line[0:5]
sex = line[7:8]
date = line[11:12]
<transform to your standard format>
<update stats>
outfile.write("{namespec} {sexspec} {datespec}\n"
.format(name, sex, date))
etc.
> Output to a file
> Name x2 x3
> Alex1 2 20150211
> Alex2 1 20120127
> Alex3 2 20111020
> Bob1 1 20100210
> Bob2 2 20010107
> Bob3 1 20021021
> Bob4 2 20041117
> Alexa2 2 20070120
> Alexa3 1 20020511
> Alexa4 2 20041126
> Alexa5 2 20090715
>
--
Terry Jan Reedy
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Different sources of file Terry Reedy <tjreedy@udel.edu> - 2016-03-14 17:15 -0400
csiph-web