Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #104631 > unrolled thread

issue with CVS module

Started byFillmore <fillmore_remove@hotmail.com>
First post2016-03-11 14:41 -0500
Last post2016-03-11 16:29 -0500
Articles 13 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 14:41 -0500
    Re: issue with CVS module Joel Goldstick <joel.goldstick@gmail.com> - 2016-03-11 15:05 -0500
      Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 15:32 -0500
        Re: issue with CVS module mm0fmf <none@invalid.com> - 2016-03-11 21:04 +0000
        Re: issue with CVS module Ben Finney <ben+python@benfinney.id.au> - 2016-03-12 08:13 +1100
    Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 15:49 -0500
      Re: issue with CVS module MRAB <python@mrabarnett.plus.com> - 2016-03-11 21:14 +0000
        Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 16:23 -0500
      Re: issue with csv module (subject module name spelling correction, too) "Martin A. Brown" <martin@linux-ip.net> - 2016-03-11 13:20 -0800
    Re: issue with CVS module Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-11 21:15 +0000
      Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 16:26 -0500
        Re: issue with CVS module alister <alister.ware@ntlworld.com> - 2016-03-12 09:49 +0000
    Re: issue with CVS module Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 16:29 -0500

#104631 — issue with CVS module

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-11 14:41 -0500
Subjectissue with CVS module
Message-ID<nbv71v$1fj8$1@gioia.aioe.org>
I have a TSV file containing a few strings like this (double quotes are 
part of the string):

'"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"'

After Python and the CVS module has read the file and re-printed the 
value, the string has become:

'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52'

which is NOT good for me. I went back to Perl and noticed that Perl was 
correctly leaving the original string intact.

This is what I am using to read the file:


with open(file, newline='') as csvfile:

     myReader = csv.reader(csvfile, delimiter='\t')
     for row in myReader:

and this is what I use to write the cell value

     sys.stdout.write(row[0])

Is there some directive I can give CVS reader to tell it to stop 
screwing with my text?

Thanks

[toc] | [next] | [standalone]


#104634

FromJoel Goldstick <joel.goldstick@gmail.com>
Date2016-03-11 15:05 -0500
Message-ID<mailman.3.1457726734.12893.python-list@python.org>
In reply to#104631
On Fri, Mar 11, 2016 at 2:41 PM, Fillmore <fillmore_remove@hotmail.com>
wrote:

>
> I have a TSV file containing a few strings like this (double quotes are
> part of the string):
>
> '"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"'
>
> After Python and the CVS module has read the file and re-printed the
> value, the string has become:
>
> 'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52'
>
> which is NOT good for me. I went back to Perl and noticed that Perl was
> correctly leaving the original string intact.
>
> This is what I am using to read the file:
>
>
> with open(file, newline='') as csvfile:
>
>     myReader = csv.reader(csvfile, delimiter='\t')
>     for row in myReader:
>
> and this is what I use to write the cell value
>
>     sys.stdout.write(row[0])
>
> Is there some directive I can give CVS reader to tell it to stop screwing
> with my text?
>
> Thanks
> --
> https://mail.python.org/mailman/listinfo/python-list
>

Enter the python shell.  Import csv

then type help(csv)

It is highly configurable

-- 
Joel Goldstick
http://joelgoldstick.com/ <http://joelgoldstick.com/stats/birthdays>
http://cc-baseballstats.info/

[toc] | [prev] | [next] | [standalone]


#104635

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-11 15:32 -0500
Message-ID<nbva0l$1m7a$1@gioia.aioe.org>
In reply to#104634
On 3/11/2016 3:05 PM, Joel Goldstick wrote:
>
> Enter the python shell.  Import csv
>
> then type help(csv)
>
> It is highly configurable
>

Possibly, but I am having a hard time letting it know that it should 
leave each and every char alone, ignore quoting and just handle strings 
as strings. I tried playing with the quoting related parameters, to no 
avail:

Traceback (most recent call last):
   File "./myscript.py", line 47, in <module>
     myReader = csv.reader(csvfile, delimiter='\t',quotechar='')
TypeError: quotechar must be set if quoting enabled

I tried adding CVS.QUOTE_NONE, but things get messy :(

Traceback (most recent call last):
   File "./myscript.py", line 64, in <module>
     sys.stdout.write("\t"+row[h])
IndexError: list index out of range

Sorry for being a pain, but I am porting from Perl and  split 
/\t/,$line; was doing the job for me. Maybe I should go back to split on 
'\t' for python too...

[toc] | [prev] | [next] | [standalone]


#104637

Frommm0fmf <none@invalid.com>
Date2016-03-11 21:04 +0000
Message-ID<nbvbo4$drk$1@dont-email.me>
In reply to#104635
On 11/03/2016 20:32, Fillmore wrote:
> myReader = csv.reader(csvfile, delimiter='\t',quotechar='')

 From reading that the quotechar is null. You have a single quote and 
single quote with nothing in the middle.

Try this:

myReader = csv.reader(csvfile, delimiter='\t',quotechar="'")

i.e doublequote singlequote doublequote

or the other way

myReader = csv.reader(csvfile, delimiter='\t',quotechar='"')

I haven't tried this, so it may be nonsense.

[toc] | [prev] | [next] | [standalone]


#104638

FromBen Finney <ben+python@benfinney.id.au>
Date2016-03-12 08:13 +1100
Message-ID<mailman.4.1457730844.12893.python-list@python.org>
In reply to#104635
Fillmore <fillmore_remove@hotmail.com> writes:

> Possibly, but I am having a hard time letting it know that it should
> leave each and every char alone

You're using the wrong module, then. To use the ‘csv’ module is to have
the sequence of characters parsed to extract component values, which
cannot also “leave each and every character alone”.

If you want to “leave each and every character alone”, don't parse the
data as CSV. Instead, read the file as a simple text file.

-- 
 \       “I have always wished for my computer to be as easy to use as |
  `\       my telephone; my wish has come true because I can no longer |
_o__)          figure out how to use my telephone.” —Bjarne Stroustrup |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#104636

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-11 15:49 -0500
Message-ID<nbvb0s$1nps$1@gioia.aioe.org>
In reply to#104631
On 3/11/2016 2:41 PM, Fillmore wrote:
> Is there some directive I can give CVS reader to tell it to stop
> screwing with my text?

OK, I think I reproduced my problem at the REPL:

 >>> import csv
 >>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
 >>> reader = csv.reader([s], delimiter='\t')
 >>> for row in reader:
...     print(row[0])
...
Please preserve my doublequotes
 >>>

:(

How do I instruct the reader to preserve my doublequotes?

As an aside. split() performs the job correctly...

 >>> allVals = s.split("\t")
 >>> print(allVals[0])
"Please preserve my doublequotes"
 >>>

[toc] | [prev] | [next] | [standalone]


#104639

FromMRAB <python@mrabarnett.plus.com>
Date2016-03-11 21:14 +0000
Message-ID<mailman.5.1457730898.12893.python-list@python.org>
In reply to#104636
On 2016-03-11 20:49, Fillmore wrote:
> On 3/11/2016 2:41 PM, Fillmore wrote:
>> Is there some directive I can give CVS reader to tell it to stop
>> screwing with my text?
>
> OK, I think I reproduced my problem at the REPL:
>
>   >>> import csv
>   >>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>   >>> reader = csv.reader([s], delimiter='\t')
>   >>> for row in reader:
> ...     print(row[0])
> ...
> Please preserve my doublequotes
>   >>>
>
> :(
>
> How do I instruct the reader to preserve my doublequotes?
>
> As an aside. split() performs the job correctly...
>
>   >>> allVals = s.split("\t")
>   >>> print(allVals[0])
> "Please preserve my doublequotes"
>   >>>
>
 >>> import csv
 >>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
 >>> reader = csv.reader([s], delimiter='\t', quotechar=None)
 >>> for row in reader:
...     print(row[0])
...
"Please preserve my doublequotes"
 >>>

[toc] | [prev] | [next] | [standalone]


#104642

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-11 16:23 -0500
Message-ID<nbvd1b$1rak$1@gioia.aioe.org>
In reply to#104639
On 3/11/2016 4:14 PM, MRAB wrote:
>>
>  >>> import csv
>  >>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>  >>> reader = csv.reader([s], delimiter='\t', quotechar=None)
>  >>> for row in reader:
> ...     print(row[0])
> ...
> "Please preserve my doublequotes"
>  >>>


This worked! thank you MRAB

[toc] | [prev] | [next] | [standalone]


#104641 — Re: issue with csv module (subject module name spelling correction, too)

From"Martin A. Brown" <martin@linux-ip.net>
Date2016-03-11 13:20 -0800
SubjectRe: issue with csv module (subject module name spelling correction, too)
Message-ID<mailman.7.1457731250.12893.python-list@python.org>
In reply to#104636
Good afternoon Fillmore,

>>>> import csv
>>>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>>>> reader = csv.reader([s], delimiter='\t')

> How do I instruct the reader to preserve my doublequotes?

Change the quoting used by the dialect on the csv reader instance:

  reader = csv.reader([s], delimiter='\t', quoting=csv.QUOTE_NONE)

You can use the same technique for the writer.

If you cannot create your particular (required) variant of csv by 
tuning the available parameters in the csv module's dialect control, 
I'd be a touch surprised, but, it is possible that your other csv
readers and writers are more finicky.

Did you see the parameters that are available to you for tuning how 
the csv module turns your csv data into records?

  https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters

Judging from your example, you definitely want to use 
quoting=csv.QUOTE_NONE, because you don't want the module to do much 
more than split('\t').

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/

[toc] | [prev] | [next] | [standalone]


#104640

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2016-03-11 21:15 +0000
Message-ID<mailman.6.1457730962.12893.python-list@python.org>
In reply to#104631
On 11/03/2016 19:41, Fillmore wrote:
>
> I have a TSV file containing a few strings like this (double quotes are
> part of the string):
>
> '"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"'
>
> After Python and the CVS module has read the file and re-printed the
> value, the string has become:
>
> 'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52'
>
> which is NOT good for me. I went back to Perl and noticed that Perl was
> correctly leaving the original string intact.
>
> This is what I am using to read the file:
>
>
> with open(file, newline='') as csvfile:
>
>      myReader = csv.reader(csvfile, delimiter='\t')
>      for row in myReader:
>
> and this is what I use to write the cell value
>
>      sys.stdout.write(row[0])
>
> Is there some directive I can give CVS reader to tell it to stop
> screwing with my text?
>
> Thanks

https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#104643

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-11 16:26 -0500
Message-ID<nbvd5b$1rak$2@gioia.aioe.org>
In reply to#104640
On 3/11/2016 4:15 PM, Mark Lawrence wrote:
>
> https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote
>

thanks, but my TSV is not using any particular dialect as far as I 
understand...

Thank you, anyway

[toc] | [prev] | [next] | [standalone]


#104681

Fromalister <alister.ware@ntlworld.com>
Date2016-03-12 09:49 +0000
Message-ID<oCREy.1473073$wX5.813049@fx40.am4>
In reply to#104643
On Fri, 11 Mar 2016 16:26:02 -0500, Fillmore wrote:

> On 3/11/2016 4:15 PM, Mark Lawrence wrote:
>>
>> https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote
>>
>>
> thanks, but my TSV is not using any particular dialect as far as I
> understand...
> 
> Thank you, anyway

Every variation of a language/format is a dialect even if (at present) 
there is only one.

CSV/TSV has many - (TSV is simply a dialect of CSV or vice versa)
you also have the variable of quoting (as you have found) and line 
separation (which may or may not be an issue)

Whenever you are processing an external file it is essential to know the 
exact format it is using to avoid errors

-- 
So, you better watch out!
You better not cry!
You better not pout!
I'm telling you why,
Santa Claus is coming, to town.

He knows when you've been sleeping,
He know when you're awake.
He knows if you've been bad or good,
He has ties with the CIA.
So...

[toc] | [prev] | [next] | [standalone]


#104644

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-11 16:29 -0500
Message-ID<nbvdc9$1rr0$1@gioia.aioe.org>
In reply to#104631
On 3/11/2016 2:41 PM, Fillmore wrote:
>
> I have a TSV file containing a few strings like this (double quotes are
> part of the string):


A big thank you to everyone who helped with this and with other 
questions. My porting of one of my Perl scripts to Python is over now 
that the two scripts produce virtually the same result:

$ wc -l test2.txt test3.txt
    70823 test2.txt
    70822 test3.txt
   141645 total
$ diff test2.txt test3.txt
69351d69350
<

there's only an extra empty line at the bottom that I'll leave as a tip 
to Perl ;)

It was instructive.


[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web