Groups > comp.lang.python > #98647 > unrolled thread

new to python, help please !!

Started by	Anas Belemlih <anas.belemlih@gmail.com>
First post	2015-11-11 08:34 -0800
Last post	2015-11-12 21:24 +0000
Articles	15 — 10 participants

Back to article view | Back to comp.lang.python

  new to python, help please !! Anas Belemlih <anas.belemlih@gmail.com> - 2015-11-11 08:34 -0800
    Re: new to python, help please !! John Gordon <gordon@panix.com> - 2015-11-11 16:58 +0000
    Re: new to python, help please !! Tim Chase <python.list@tim.thechases.com> - 2015-11-11 11:06 -0600
    Re: new to python, help please !! Ben Finney <ben+python@benfinney.id.au> - 2015-11-12 04:16 +1100
    Re: new to python, help please !! Quivis <quivis@domain.invalid> - 2015-11-11 17:48 +0000
      Re: new to python, help please !! Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-12 13:58 +1100
        Re: new to python, help please !! Marko Rauhamaa <marko@pacujo.net> - 2015-11-12 08:21 +0200
          Re: new to python, help please !! Tim Chase <python.list@tim.thechases.com> - 2015-11-12 05:48 -0600
          Re: new to python, help please !! <paul.hermeneutic@gmail.com> - 2015-11-12 07:27 -0700
        Re: new to python, help please !! Quivis <quivis@domain.invalid> - 2015-11-12 17:55 +0000
          Re: new to python, help please !! Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-12 19:49 +0000
    Re: new to python, help please !! Peter Otten <__peter__@web.de> - 2015-11-12 15:56 +0100
    Re: new to python, help please !! Tim Chase <python.list@tim.thechases.com> - 2015-11-12 09:00 -0600
    Re: new to python, help please !! Peter Otten <__peter__@web.de> - 2015-11-12 16:41 +0100
    Re: new to python, help please !! Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-12 21:24 +0000

#98647 — new to python, help please !!

From	Anas Belemlih <anas.belemlih@gmail.com>
Date	2015-11-11 08:34 -0800
Subject	new to python, help please !!
Message-ID	<93aef8e5-3d6f-41f4-a625-cd3c2007686e@googlegroups.com>

i am  a beginning programmer,  i am trying to write a simple code to compare two character sets in 2 seperate files. ( 2 hash value files basically)
idea is:
 open both files, measure the length of the  loop on.

if the length doesn't match, ==  files do not  match

if length matchs, loop  while comparing each character from each file if they match. 
 please tell me what i am doing wrong ?  i am using python 2.7

**********************************
hash1= open ("file1.md5", "r")
line1 =hash1.read()
hash2 = open("file2.md5","r")
line2= hash2.read()

number1 = len(line1)
number2 = len(line2)

#**************************
i=0
s1=line1[i]
s2=line2[i]
count = 0

if number1 != number2:
	print " hash table not the same size"
else:
    while count < number1:
	if s1 == s2:
		print " character", line1[i]," matchs"
		i=i+1
	count=count+1
	else
		print "Hash values corrupt"

[toc] | [next] | [standalone]

#98649

From	John Gordon <gordon@panix.com>
Date	2015-11-11 16:58 +0000
Message-ID	<n1vs3e$6ih$1@reader1.panix.com>
In reply to	#98647

In <93aef8e5-3d6f-41f4-a625-cd3c2007686e@googlegroups.com> Anas Belemlih <anas.belemlih@gmail.com> writes:

> i=0
> s1=line1[i]
> s2=line2[i]
> count = 0

> if number1 != number2:
> 	print " hash table not the same size"
> else:
>     while count < number1:
> 	if s1 == s2:
> 		print " character", line1[i]," matchs"
> 		i=i+1
> 	count=count+1
> 	else
> 		print "Hash values corrupt"

It looks like you're expecting s1 and s2 to automatically update their
values when i gets incremented, but it doesn't work like that.  When you
increment i, you also have to reassign s1 and s2.

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]

#98651

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-11-11 11:06 -0600
Message-ID	<mailman.246.1447261727.16136.python-list@python.org>
In reply to	#98647

On 2015-11-11 08:34, Anas Belemlih wrote:
> i am  a beginning programmer,  i am trying to write a simple code
> to compare two character sets in 2 seperate files. ( 2 hash value
> files basically) idea is: open both files, measure the length of
> the  loop on.
> 
> if the length doesn't match, ==  files do not  match
> 
> if length matchs, loop  while comparing each character from each
> file if they match. please tell me what i am doing wrong ?  i am
> using python 2.7
> 
> **********************************
> hash1= open ("file1.md5", "r")
> line1 =hash1.read()
> hash2 = open("file2.md5","r")
> line2= hash2.read()
> 
> number1 = len(line1)
> number2 = len(line2)
> 
> #**************************
> i=0
> s1=line1[i]
> s2=line2[i]
> count = 0
> 
> if number1 != number2:
> 	print " hash table not the same size"
> else:
>     while count < number1:
> 	if s1 == s2:
> 		print " character", line1[i]," matchs"
> 		i=i+1
> 	count=count+1
> 	else
> 		print "Hash values corrupt"

Well, the immediate answer is that you don't update s1 or s2 inside
your loop.  Also, the indent on "count=count+1" is wrong.  Finally,
if the hashes don't match, you don't break out of your while loop.
That said, the pythonesque way of writing this would likely look
something much more like

  with open("file1.md5") as a, open("file2.md5") as b:
    for s1, s2 in zip(a, b):
      if s1 != s2:
        print("Files differ")

You can compare the strings to get the actual offset if you want, or
check the lengths if you really want a more verbatim translation of
your code:

  with open("file1.md5") as a, open("file2.md5") as b:
    for s1, s2 in zip(a, b):
      if len(s1) != len(s2):
        print("not the same size")
      else:
        for i, (c1, c2) in enumerate(zip(s1, s2)):
          if c1 == c2:
            print(" character %s matches" %  c1)
          else:
            print(" %r and %r differ at position %i" % (s1, s2, i))

-tkc

[toc] | [prev] | [next] | [standalone]

#98652

From	Ben Finney <ben+python@benfinney.id.au>
Date	2015-11-12 04:16 +1100
Message-ID	<mailman.247.1447262197.16136.python-list@python.org>
In reply to	#98647

Anas Belemlih <anas.belemlih@gmail.com> writes:

> i am  a beginning programmer,  i am trying to write a simple code to
> compare two character sets in 2 seperate files. ( 2 hash value files
> basically)

Welcome, and congratulations on arriving at Python for your programming!

As a beginning programmer, you will benefit from joining the ‘tutor’
forum <URL:https://mail.python.org/mailman/listinfo/tutor>, which is
much better suited to collaborative teaching of newcomers.

-- 
 \     “As scarce as truth is, the supply has always been in excess of |
  `\                                       the demand.” —Josh Billings |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#98655

From	Quivis <quivis@domain.invalid>
Date	2015-11-11 17:48 +0000
Message-ID	<obL0y.222880$6i2.63495@fx35.am4>
In reply to	#98647

On Wed, 11 Nov 2015 08:34:30 -0800, Anas Belemlih wrote:

> md5

If those are md5 values stored inside files, wouldn't it be easier to 
just hash them?

import hashlib

m1 = hashlib.sha224(open('f1').read()).hexdigest()
m2 = hashlib.sha224(open('f2').read()).hexdigest()

if m1 == m2:
    print 'Equal!'
else:
    print 'Different!'
-- 
  _____  __ __ __ __ __ __   __
 ((   )) || || || \\ // ||  ((
  \\_/X| \\_// ||  \V/  || \_))
   Omnia paratus  *~*~*~*~*~*~*

[toc] | [prev] | [next] | [standalone]

#98666

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-11-12 13:58 +1100
Message-ID	<5644005e$0$2932$c3e8da3$76491128@news.astraweb.com>
In reply to	#98655

On Thursday 12 November 2015 04:48, Quivis wrote:

> On Wed, 11 Nov 2015 08:34:30 -0800, Anas Belemlih wrote:
> 
>> md5
> 
> If those are md5 values stored inside files, wouldn't it be easier to
> just hash them?
> 
> import hashlib
> 
> m1 = hashlib.sha224(open('f1').read()).hexdigest()
> m2 = hashlib.sha224(open('f2').read()).hexdigest()

I presume that the purpose of the exercise is to learn basic Python skills 
like looping.

Also, using sha224 when all you want is a simple "different"/"equal" is 
horribly inefficient. Sha224 needs to read the entire file, every single 
byte, *and* perform a bunch of expensive cryptographic operations. Consider 
reading two five GB files, the first starting with byte \x30 and the second 
starting with byte \x60. The two bytes are different, so we know the files 
differ, but sha224 still needs to do a massive amount of work.

-- 
Steve

[toc] | [prev] | [next] | [standalone]

#98674

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-11-12 08:21 +0200
Message-ID	<8737wbu49x.fsf@elektro.pacujo.net>
In reply to	#98666

Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> On Thursday 12 November 2015 04:48, Quivis wrote:
>
>> On Wed, 11 Nov 2015 08:34:30 -0800, Anas Belemlih wrote:
>> 
>>> md5
>> 
>> If those are md5 values stored inside files, wouldn't it be easier to
>> just hash them?
>> 
>> import hashlib
>> 
>> m1 = hashlib.sha224(open('f1').read()).hexdigest()
>> m2 = hashlib.sha224(open('f2').read()).hexdigest()
>
> I presume that the purpose of the exercise is to learn basic Python
> skills like looping.

And if you really wanted to compare two files that are known to contain
MD5 checksums, the simplest way is:

   with open('f1.md5') as f1, open('f2.md5') as f2:
       if f1.read() == f2.read():
           ...
       else:
           ...


Marko

[toc] | [prev] | [next] | [standalone]

#98696

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-11-12 05:48 -0600
Message-ID	<mailman.269.1447337476.16136.python-list@python.org>
In reply to	#98674

On 2015-11-12 08:21, Marko Rauhamaa wrote:
> And if you really wanted to compare two files that are known to
> contain MD5 checksums, the simplest way is:
> 
>    with open('f1.md5') as f1, open('f2.md5') as f2:
>        if f1.read() == f2.read():
>            ...
>        else:
>            ...

Though that suffers if the files are large.  Might try

  CHUNK_SIZE = 4 * 1024 # read 4k chunks
  # chunk_offset = 0
  with open('f1.md5') as f1, open('f2.md5') as f2:
    while True:
      c1 = f1.read(CHUNK_SIZE)
      c2 = f2.read(CHUNK_SIZE)
      if c1 or c2:
        # chunk_offset += 1
        if c1 != c2:
          not_the_same(c1, c2)
          # not_the_same(chunk_offset * CHUNK_SIZE, c1, c2)
          break
      else: # EOF
        the_same()
        break

which should perform better if the files are huge

-tkc

[toc] | [prev] | [next] | [standalone]

#98697

From	<paul.hermeneutic@gmail.com>
Date	2015-11-12 07:27 -0700
Message-ID	<mailman.270.1447338456.16136.python-list@python.org>
In reply to	#98674

Would some form of subprocess.Popen() on cmp or fc /b be easier?
On Nov 12, 2015 7:13 AM, "Tim Chase" <python.list@tim.thechases.com> wrote:

> On 2015-11-12 08:21, Marko Rauhamaa wrote:
> > And if you really wanted to compare two files that are known to
> > contain MD5 checksums, the simplest way is:
> >
> >    with open('f1.md5') as f1, open('f2.md5') as f2:
> >        if f1.read() == f2.read():
> >            ...
> >        else:
> >            ...
>
> Though that suffers if the files are large.  Might try
>
>   CHUNK_SIZE = 4 * 1024 # read 4k chunks
>   # chunk_offset = 0
>   with open('f1.md5') as f1, open('f2.md5') as f2:
>     while True:
>       c1 = f1.read(CHUNK_SIZE)
>       c2 = f2.read(CHUNK_SIZE)
>       if c1 or c2:
>         # chunk_offset += 1
>         if c1 != c2:
>           not_the_same(c1, c2)
>           # not_the_same(chunk_offset * CHUNK_SIZE, c1, c2)
>           break
>       else: # EOF
>         the_same()
>         break
>
> which should perform better if the files are huge
>
> -tkc
>
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

[toc] | [prev] | [next] | [standalone]

#98709

From	Quivis <quivis@domain.invalid>
Date	2015-11-12 17:55 +0000
Message-ID	<po41y.186836$wR.71600@fx43.am4>
In reply to	#98666

On Thu, 12 Nov 2015 13:58:35 +1100, Steven D'Aprano wrote:

> horribly inefficient

Assuming it was md5 values, who cares? Those are small.
-- 
  _____  __ __ __ __ __ __   __
 ((   )) || || || \\ // ||  ((
  \\_/X| \\_// ||  \V/  || \_))
   Omnia paratus  *~*~*~*~*~*~*

[toc] | [prev] | [next] | [standalone]

#98712

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2015-11-12 19:49 +0000
Message-ID	<n22qh2$m26$1@dont-email.me>
In reply to	#98709

On Thu, 12 Nov 2015 17:55:33 +0000, Quivis wrote:

> On Thu, 12 Nov 2015 13:58:35 +1100, Steven D'Aprano wrote:
> 
>> horribly inefficient
> 
> Assuming it was md5 values, who cares? Those are small.

A file of 160 million md5 hashes as 32 character hex strings is a huge 
file. Your method calculates the hash over both files to test whether the 
contents are different. If the input files are both lists of 160 million 
md5 hashes, you're calculating the hash of two 5 gigabyte files.

In your method the size of the lines of data is irrelevant to the 
execution time, the execution time varies with the size of the datafiles.
-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]

#98698

From	Peter Otten <__peter__@web.de>
Date	2015-11-12 15:56 +0100
Message-ID	<mailman.271.1447340210.16136.python-list@python.org>
In reply to	#98647

Tim Chase wrote:

>   with open("file1.md5") as a, open("file2.md5") as b:
>     for s1, s2 in zip(a, b):
>       if s1 != s2:
>         print("Files differ")

Note that this will not detect extra lines in one of the files.
I recommend that you use itertools.zip_longest (izip_longest in Python 2) 
instead of the built-in zip().

[toc] | [prev] | [next] | [standalone]

#98699

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-11-12 09:00 -0600
Message-ID	<mailman.272.1447341014.16136.python-list@python.org>
In reply to	#98647

On 2015-11-12 15:56, Peter Otten wrote:
> Tim Chase wrote:
> 
> >   with open("file1.md5") as a, open("file2.md5") as b:
> >     for s1, s2 in zip(a, b):
> >       if s1 != s2:
> >         print("Files differ")
> 
> Note that this will not detect extra lines in one of the files.
> I recommend that you use itertools.zip_longest (izip_longest in
> Python 2) instead of the built-in zip().

Yeah, I noticed that after pushing <send> but then posted a later
version that just read chunks of the file which should catch that
file-size difference.  Or, as in that other message, prefix it with
an fstat() check to compare file-sizes so that you don't even have to
open the files if the sizes differ.

-tkc

[toc] | [prev] | [next] | [standalone]

#98701

From	Peter Otten <__peter__@web.de>
Date	2015-11-12 16:41 +0100
Message-ID	<mailman.274.1447342924.16136.python-list@python.org>
In reply to	#98647

Tim Chase wrote:

> On 2015-11-12 15:56, Peter Otten wrote:
>> Tim Chase wrote:
>> 
>> >   with open("file1.md5") as a, open("file2.md5") as b:
>> >     for s1, s2 in zip(a, b):
>> >       if s1 != s2:
>> >         print("Files differ")
>> 
>> Note that this will not detect extra lines in one of the files.
>> I recommend that you use itertools.zip_longest (izip_longest in
>> Python 2) instead of the built-in zip().
> 
> Yeah, I noticed that after pushing <send> but then posted a later
> version that just read chunks of the file which should catch that
> file-size difference.  Or, as in that other message, prefix it with
> an fstat() check to compare file-sizes so that you don't even have to
> open the files if the sizes differ.
> 
> -tkc

>>> os.path.getsize("file1.md5")
10
>>> os.path.getsize("file2.md5")
10
>>> with open("file1.md5") as a, open("file2.md5") as b:
...     for s, t in zip(a, b):
...         if s != t: print("different")
... 
>>> from itertools import zip_longest
>>> with open("file1.md5") as a, open("file2.md5") as b:
...     for s, t in zip_longest(a, b):
...         if s != t: print("different")
... 
different

I admit I cheated and used Python 3 ;)

[toc] | [prev] | [next] | [standalone]

#98715

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2015-11-12 21:24 +0000
Message-ID	<n23020$m26$2@dont-email.me>
In reply to	#98647

On Wed, 11 Nov 2015 08:34:30 -0800, Anas Belemlih wrote:

> i am  a beginning programmer,  i am trying to write a simple code to
> compare two character sets in 2 seperate files. ( 2 hash value files
> basically)

Why? If you simply wish to compare two files, most operating systems 
provide executable tools at the OS level which are more efficient than 
anything you will write in a scripting language.

Lesson 1 of computing. Use the right tool for the job. Writing a new 
program is not always the right tool.
-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [standalone]

csiph-web

new to python, help please !!

Contents

#98647 — new to python, help please !!

#98649

#98651

#98652

#98655

#98666

#98674

#98696

#98697

#98709

#98712

#98698

#98699

#98701

#98715