Groups > comp.lang.python > #42721 > unrolled thread

question about csv.DictReader

Started by	Norman Clerman <norm.clerman@gmail.com>
First post	2013-04-03 18:26 -0700
Last post	2013-04-04 12:09 -0700
Articles	4 — 3 participants

Back to article view | Back to comp.lang.python

  question about csv.DictReader Norman Clerman <norm.clerman@gmail.com> - 2013-04-03 18:26 -0700
    Re: question about csv.DictReader MRAB <python@mrabarnett.plus.com> - 2013-04-04 03:46 +0100
    Re: question about csv.DictReader Tim Chase <python.list@tim.thechases.com> - 2013-04-03 21:52 -0500
    Re: question about csv.DictReader Norman Clerman <norm.clerman@gmail.com> - 2013-04-04 12:09 -0700

#42721 — question about csv.DictReader

From	Norman Clerman <norm.clerman@gmail.com>
Date	2013-04-03 18:26 -0700
Subject	question about csv.DictReader
Message-ID	<846339ea-366a-4bb2-b234-0e03bf87489e@googlegroups.com>

Hello,

I have the following python script (some of lines are wrapped):

#! /usr/bin/env python

import csv

def dict_test_1():
    """ csv test program  """

    # Open the file Holdings_EXA.csv
    HOLDING_FILE = 'Holdings_EXA.csv'
    try:
        csv_file = open(HOLDING_FILE, 'rt')
    except IOError:
        print('Problem opening {0}\nExiting').format(HOLDING_FILE)
        exit()

    # create a dictionary reader
    try:
        csv_reader = csv.DictReader(csv_file)
    except NameError:
        print('Cannot find file {0} to create a dictionary reader \nExiting').format(HOLDING_FILE)
        exit()

    # Print the keys in each row
    i_row = 1
    for row in csv_reader:
        print ('There are {0} keys in row {1}').format(len(row.keys()), i_row)
        print ('The keys in  row {0} are \n{1}').format(i_row, row.keys())
        i_row += 1
dict_test_1()

Here are the lines in file Holdings_EXA.csv:
Please note that the first field in the first row is "Holdings"

"Holdings","Weighting","Type","Ticker","Style","First Bought","Shares Owned","Shares Change","Sector","Price","Day Change","Day high/low","Volume","52-Wk high/low","Country","3-Month Return","1-Year Return","3-Year Return","5-Year Return","Market Cap Mil","Currency","Morningstar Rating","YTD Return","P/E","Maturity Date","Coupon %","Yield to Maturity"
"Nestle SA","1.91","EQUITY","NESN","Large Core","1999-12-31","3732276","197810","Consumer Defensive","67.65","-","67.75-67.35","1211531","67.75-53.8","Switzerland","10.42","21.25","10.5","8.84","213475.59","CHF","2","12.92","21.69","-","-","-"
"HSBC Holdings PLC","1.75","EQUITY","HSBA","Large Value","1999-12-31","21120203","1711934","Financial Services","733.3","-1.4|-0","738.8-731","7839724","739.9-501.2","United Kingdom","14.51","37.17","3.88","2.77","132694.66","GBP","3","13.93","15.55","-","-","-"
"Novartis AG","1.33","EQUITY","NOVN","Large Core","2003-06-30","2669523","206851","Healthcare","65.95","0.5|0.01","66-65.4","1121549","66-48.29","Switzerland","15.1","36.5","6.16","8.53","158671.66","CHF","4","16.7","17.76","-","-","-"
"Roche Holding AG","1.31","EQUITY","ROG","Large Growth","2003-05-31","817830","59352","Healthcare","214.8","1.4|0.01","215.2-213.1","684173","220.4-148.4","Switzerland","17.45","37.95","7.78","4.09","34000","CHF","3","18.09","19.05","-","-","-"

Finally, here are the results of running the script:


norm@lima:~/python/overlap$ python dict_test_1.py 
There are 27 keys in row 1
The keys in  row 1 are 
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 2
The keys in  row 2 are 
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 3
The keys in  row 3 are 
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
There are 27 keys in row 4
The keys in  row 4 are 
['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
norm@lima:~/python/overlap$ 


Can anyone explain the presence of the characters "\xref\xbb\xbf" before the first field contents "Holdings" ?

Thanks,
Norm

[toc] | [next] | [standalone]

#42730

From	MRAB <python@mrabarnett.plus.com>
Date	2013-04-04 03:46 +0100
Message-ID	<mailman.87.1365043571.3114.python-list@python.org>
In reply to	#42721

On 04/04/2013 02:26, Norman Clerman wrote:
> Hello,
>
> I have the following python script (some of lines are wrapped):
>
> #! /usr/bin/env python
>
> import csv
>
> def dict_test_1():
>      """ csv test program  """
>
>      # Open the file Holdings_EXA.csv
>      HOLDING_FILE = 'Holdings_EXA.csv'
>      try:
>          csv_file = open(HOLDING_FILE, 'rt')
>      except IOError:
>          print('Problem opening {0}\nExiting').format(HOLDING_FILE)
>          exit()
>
>      # create a dictionary reader
>      try:
>          csv_reader = csv.DictReader(csv_file)
>      except NameError:
>          print('Cannot find file {0} to create a dictionary reader \nExiting').format(HOLDING_FILE)
>          exit()
>
>      # Print the keys in each row
>      i_row = 1
>      for row in csv_reader:
>          print ('There are {0} keys in row {1}').format(len(row.keys()), i_row)
>          print ('The keys in  row {0} are \n{1}').format(i_row, row.keys())
>          i_row += 1
> dict_test_1()
>
> Here are the lines in file Holdings_EXA.csv:
> Please note that the first field in the first row is "Holdings"
>
> "Holdings","Weighting","Type","Ticker","Style","First Bought","Shares Owned","Shares Change","Sector","Price","Day Change","Day high/low","Volume","52-Wk high/low","Country","3-Month Return","1-Year Return","3-Year Return","5-Year Return","Market Cap Mil","Currency","Morningstar Rating","YTD Return","P/E","Maturity Date","Coupon %","Yield to Maturity"
> "Nestle SA","1.91","EQUITY","NESN","Large Core","1999-12-31","3732276","197810","Consumer Defensive","67.65","-","67.75-67.35","1211531","67.75-53.8","Switzerland","10.42","21.25","10.5","8.84","213475.59","CHF","2","12.92","21.69","-","-","-"
> "HSBC Holdings PLC","1.75","EQUITY","HSBA","Large Value","1999-12-31","21120203","1711934","Financial Services","733.3","-1.4|-0","738.8-731","7839724","739.9-501.2","United Kingdom","14.51","37.17","3.88","2.77","132694.66","GBP","3","13.93","15.55","-","-","-"
> "Novartis AG","1.33","EQUITY","NOVN","Large Core","2003-06-30","2669523","206851","Healthcare","65.95","0.5|0.01","66-65.4","1121549","66-48.29","Switzerland","15.1","36.5","6.16","8.53","158671.66","CHF","4","16.7","17.76","-","-","-"
> "Roche Holding AG","1.31","EQUITY","ROG","Large Growth","2003-05-31","817830","59352","Healthcare","214.8","1.4|0.01","215.2-213.1","684173","220.4-148.4","Switzerland","17.45","37.95","7.78","4.09","34000","CHF","3","18.09","19.05","-","-","-"
>
> Finally, here are the results of running the script:
>
>
> norm@lima:~/python/overlap$ python dict_test_1.py
> There are 27 keys in row 1
> The keys in  row 1 are
> ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
> There are 27 keys in row 2
> The keys in  row 2 are
> ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
> There are 27 keys in row 3
> The keys in  row 3 are
> ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
> There are 27 keys in row 4
> The keys in  row 4 are
> ['Style', 'Day Change', 'Coupon %', 'Yield to Maturity', 'P/E', 'Type', 'Weighting', 'Price', '3-Month Return', 'Volume', '\xef\xbb\xbf"Holdings"', 'Ticker', 'Shares Change', 'Shares Owned', 'YTD Return', '5-Year Return', 'Market Cap Mil', 'Country', '3-Year Return', 'Day high/low', 'Maturity Date', '1-Year Return', 'Sector', 'Morningstar Rating', 'Currency', '52-Wk high/low', 'First Bought']
> norm@lima:~/python/overlap$
>
>
> Can anyone explain the presence of the characters "\xref\xbb\xbf" before the first field contents "Holdings" ?
>
Microsoft Windows indicates that a text file contains text encoded as
UTF-8 by including a signature at its start. (Does the file also have
"\r\n" line endings? Presumably it was created on a Windows system.)

Try opening the file with the "utf-8-sig" encoding instead; this will 
drop the signature if present.

[toc] | [prev] | [next] | [standalone]

#42731

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-04-03 21:52 -0500
Message-ID	<mailman.88.1365043854.3114.python-list@python.org>
In reply to	#42721

On 2013-04-03 18:26, Norman Clerman wrote:
> Can anyone explain the presence of the characters "\xref\xbb\xbf"
> before the first field contents "Holdings" ?

(you mean "\xef", not "\xref")

This is a byte-order-mark (BOM), which you can read about at [1].  In
this case, it denotes the file as UTF-8 encoded.  Certain programs
insert these, though it's more important with UTF-16 or UTF-32
encodings where the byte-order and endian'ness actually matters.  I
believe Notepad and Visual Studio on Win32 were both offenders when
it came to inserting unbidden BOMs.

-tkc

[1]
http://en.wikipedia.org/wiki/Byte_order_mark

[toc] | [prev] | [next] | [standalone]

#42774

From	Norman Clerman <norm.clerman@gmail.com>
Date	2013-04-04 12:09 -0700
Message-ID	<e39c1e23-2cb2-4253-8f2e-2edee4784dd5@googlegroups.com>
In reply to	#42721

Thanks for your replies. Greatly appreciated.

Norm

[toc] | [prev] | [standalone]

csiph-web

question about csv.DictReader

Contents

#42721 — question about csv.DictReader

#42730

#42731

#42774