Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #78079 > unrolled thread

Re: program to generate data helpful in finding duplicate large files

Started byChris Angelico <rosuav@gmail.com>
First post2014-09-19 23:59 +1000
Last post2014-09-20 16:29 +1000
Articles 5 — 4 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-19 23:59 +1000
    Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-20 02:22 +1000
      Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-20 03:07 +1000
      Re: program to generate data helpful in finding duplicate large files Cameron Simpson <cs@zip.com.au> - 2014-09-20 10:30 +1000
      Re: program to generate data helpful in finding duplicate large files Ben Finney <ben+python@benfinney.id.au> - 2014-09-20 16:29 +1000

#78079 — Re: program to generate data helpful in finding duplicate large files

FromChris Angelico <rosuav@gmail.com>
Date2014-09-19 23:59 +1000
SubjectRe: program to generate data helpful in finding duplicate large files
Message-ID<mailman.14147.1411135178.18130.python-list@python.org>
On Fri, Sep 19, 2014 at 11:32 PM, David Alban <extasia@extasia.org> wrote:
> thanks for the responses.   i'm having quite a good time learning python.

Awesome! But while you're at it, you may want to consider learning
English on the side; capitalization does make your prose more
readable. Also, it makes you look careless - you appear not to care
about your English, so it's logical to expect that you may not care
about your Python either. That may be completely false, but it's still
the impression you're creating.

> On Thu, Sep 18, 2014 at 11:45 AM, Chris Kaynor <ckaynor@zindagigames.com>
> wrote:
>>
>> Additionally, you may want to specify binary mode by using open(file_path,
>> 'rb') to ensure platform-independence ('r' uses Universal newlines, which
>> means on Windows, Python will convert "\r\n" to "\n" while reading the
>> file). Additionally, some platforms will treat binary files differently.
>
> would it be good to use 'rb' all the time?

Only if you're reading binary files. In the program you're doing here,
yes; you want binary mode.

> if you omit the exit statement it in this example, and
> $report_mode is not set, your shell program will give a non-zero return code
> and appear to have terminated with an error.  in shell the last expression
> evaluated determines the return code to the os.

IMO that's a terrible misfeature. If you actually want the return
value to be propagated, you should have to say so - something like:

#!/bin/sh
run_program
exit $?

Fortunately, Python isn't like that.

> style question:  if there is only one, possibly short statement in a block,
> do folks usually move it up to the line starting the block?
>
>   if not S_ISREG( mode ) or S_ISLNK( mode ):
>     return
>
> vs.
>
>   if not S_ISREG( mode ) or S_ISLNK( mode ): return
>
> or even:
>
>   with open( file_path, 'rb' ) as f: md5sum = md5_for_file( file_path )

Only if it's really short AND it makes very good sense that way. Some
people would say "never". In the first case, I might do it, but not
the second. (Though that's not necessary at all, there; md5_for_file
opens and closes the file, so you don't need to open it redundantly
before calling.)

ChrisA

[toc] | [next] | [standalone]


#78081

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-09-20 02:22 +1000
Message-ID<541c5855$0$30003$c3e8da3$5496439d@news.astraweb.com>
In reply to#78079
Chris Angelico wrote:

> On Fri, Sep 19, 2014 at 11:32 PM, David Alban <extasia@extasia.org> wrote:
>> thanks for the responses.   i'm having quite a good time learning python.
> 
> Awesome! But while you're at it, you may want to consider learning
> English on the side; capitalization does make your prose more
> readable. Also, it makes you look careless - you appear not to care
> about your English, so it's logical to expect that you may not care
> about your Python either. That may be completely false, but it's still
> the impression you're creating.

Speaking of learning English... http://bash.org/?949621

I used to work with programmers whose spelling is awful. I know for a fact
that at least some of them had Vim's on-the-fly spell checking turned on:

http://www.linux.com/learn/tutorials/357267:using-spell-checking-in-vim

nevertheless their commit messages and documentation was full of things
like "make teh function reqire a posative index".

(No wonder we ended up stuck with 'referer'.)

I heard one of them mention that even though he sees the words are
misspelled, he deliberately doesn't bother fixing them because its not
important. I guess he just liked the look of his text having highlighted
words scattered throughout the editor.

Some other things programmers have taught me are unimportant:

- accurate, up-to-date documentation;
- error checking;
- correctness;
- code that meets functional requirements;
- telling the project manager that you're going to miss a hard deadline;
- turning up to work the day after deploying a new system, so you will 
  be on hand if something goes wrong.

But I'm not bitter.

And apropos of nothing:

http://bash.org/?835030


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#78082

FromChris Angelico <rosuav@gmail.com>
Date2014-09-20 03:07 +1000
Message-ID<mailman.14149.1411146479.18130.python-list@python.org>
In reply to#78081
On Sat, Sep 20, 2014 at 2:22 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> I heard one of them mention that even though he sees the words are
> misspelled, he deliberately doesn't bother fixing them because its not
> important. I guess he just liked the look of his text having highlighted
> words scattered throughout the editor.
>
> Some other things programmers have taught me are unimportant:
>
> - accurate, up-to-date documentation;
> - error checking;
> - correctness;
> - code that meets functional requirements;
> - telling the project manager that you're going to miss a hard deadline;
> - turning up to work the day after deploying a new system, so you will
>   be on hand if something goes wrong.

We all have our different perspectives on what's important. I see code
and coding as more important than sleep, so I've been known keep
coding long past everyone else is in bed. (Though extreme states of
sleep deprivation do tend to result in code that needs to be corrected
later, so there are limits.) Maybe your bovine orkers just have
different priorities. Like KLOC production, or their TPS reports.

Also: Agree with your bash.org links.

ChrisA

[toc] | [prev] | [next] | [standalone]


#78093

FromCameron Simpson <cs@zip.com.au>
Date2014-09-20 10:30 +1000
Message-ID<mailman.14157.1411174972.18130.python-list@python.org>
In reply to#78081
On 20Sep2014 02:22, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>[...] I used to work with programmers whose spelling is awful. [...]
>nevertheless their commit messages and documentation was full of things
>like "make teh function reqire a posative index". [...]
>I heard one of them mention that even though he sees the words are
>misspelled, he deliberately doesn't bother fixing them because its not
>important. I guess he just liked the look of his text having highlighted
>words scattered throughout the editor.

I guess he just liked the idea of having terrible search results:-(

Cheers,
Cameron Simpson <cs@zip.com.au>

There is one evil which...should never be passed over in silence but be
continually publicly attacked, and that is corruption of the language...
         - W.H. Auden

[toc] | [prev] | [next] | [standalone]


#78098

FromBen Finney <ben+python@benfinney.id.au>
Date2014-09-20 16:29 +1000
Message-ID<mailman.14162.1411194605.18130.python-list@python.org>
In reply to#78081
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> I heard one [programmer] mention that even though he sees the words
> are misspelled, he deliberately doesn't bother fixing them because its
> not important. I guess he just liked the look of his text having
> highlighted words scattered throughout the editor.

If it's who I'm thinking of (or, heck, any one of a hundred similar
cow-orkers), the text editor would not show English spelling errors
since they're not interested in the English text in their program code
:-)

-- 
 \        “Spam will be a thing of the past in two years' time.” —Bill |
  `\                                                 Gates, 2004-01-24 |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web