Re: What is the most efficient way to find similarities and differences between the contents of two lists?

From	nn <pruebauno@latinmail.com>
Newsgroups	comp.lang.python
Subject	Re: What is the most efficient way to find similarities and differences between the contents of two lists?
Date	2011-06-13 08:26 -0700
Organization	http://groups.google.com
Message-ID	<f16c71e6-31b4-4e04-98c6-69dbcf90a582@e17g2000prj.googlegroups.com> (permalink)
References	<fe8105b6-cbb0-4df1-beab-4bc66b40b335@a10g2000vbz.googlegroups.com>

Show all headers | View raw

On Jun 13, 11:06 am, Zachary Dziura <zcdzi...@gmail.com> wrote:
> Hi all.
>
> I'm writing a Python script that will be used to compare two database
> tables. Currently, those two tables are dumped into .csv files,
> whereby my code goes through both files and makes comparisons. Thus
> far, I only have functionality coded to make comparisons on the
> headers to check for similarities and differences. Here is the code
> for that functionality:
>
> similar_headers = 0
> different_headers = 0
> source_headers = sorted(source_mapping.headers)
> target_headers = sorted(target_mapping.headers)
>
> # Check if the headers between the two mappings are the same
> if set(source_headers) == set(target_headers):
>     similar_headers = len(source_headers)
> else:
>     # We're going to do two run-throughs of the tables, to find the
>     # different and similar header names. Start with the source
>     # headers...
>     for source_header in source_headers:
>         if source_header in target_headers:
>             similar_headers += 1
>         else:
>             different_headers += 1
>     # Now check target headers for any differences
>     for target_header in target_headers:
>         if target_header in source_headers:
>             pass
>         else:
>             different_headers += 1
>
> As you can probably tell, I make two iterations: one for the
> 'source_headers' list, and another for the 'target_headers' list.
> During the first iteration, if a specific header (mapped to a variable
> 'source_header') exists in both lists, then the 'similar_headers'
> variable is incremented by one. Similarly, if it doesn't exist in both
> lists, 'different_headers' is incremented by one. For the second
> iteration, it only checks for different headers.
>
> My code works as expected and there are no bugs, however I get the
> feeling that I'm not doing this comparison in the most efficient way
> possible. Is there another way that I can make this same comparison
> while making my code more Pythonic and efficient? I would prefer not
> to have to install an external module from elsewhere, though if I have
> to then I will.
>
> Thanks in advance for any and all answers!

how about:

# Check if the headers between the two mappings are the same
source_headers_set = set(source_headers)
target_headers_set = set(target_headers)

similar_headers = len(source_headers_set & target_headers_set)
different_headers = len(source_headers_set ^ target_headers_set)

Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread

Thread

What is the most efficient way to find similarities and differences between the contents of two lists? Zachary Dziura <zcdziura@gmail.com> - 2011-06-13 08:06 -0700
  Re: What is the most efficient way to find similarities and differences between the contents of two lists? nn <pruebauno@latinmail.com> - 2011-06-13 08:26 -0700

csiph-web