Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #7521
| From | nn <pruebauno@latinmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: What is the most efficient way to find similarities and differences between the contents of two lists? |
| Date | 2011-06-13 08:26 -0700 |
| Organization | http://groups.google.com |
| Message-ID | <f16c71e6-31b4-4e04-98c6-69dbcf90a582@e17g2000prj.googlegroups.com> (permalink) |
| References | <fe8105b6-cbb0-4df1-beab-4bc66b40b335@a10g2000vbz.googlegroups.com> |
On Jun 13, 11:06 am, Zachary Dziura <zcdzi...@gmail.com> wrote: > Hi all. > > I'm writing a Python script that will be used to compare two database > tables. Currently, those two tables are dumped into .csv files, > whereby my code goes through both files and makes comparisons. Thus > far, I only have functionality coded to make comparisons on the > headers to check for similarities and differences. Here is the code > for that functionality: > > similar_headers = 0 > different_headers = 0 > source_headers = sorted(source_mapping.headers) > target_headers = sorted(target_mapping.headers) > > # Check if the headers between the two mappings are the same > if set(source_headers) == set(target_headers): > similar_headers = len(source_headers) > else: > # We're going to do two run-throughs of the tables, to find the > # different and similar header names. Start with the source > # headers... > for source_header in source_headers: > if source_header in target_headers: > similar_headers += 1 > else: > different_headers += 1 > # Now check target headers for any differences > for target_header in target_headers: > if target_header in source_headers: > pass > else: > different_headers += 1 > > As you can probably tell, I make two iterations: one for the > 'source_headers' list, and another for the 'target_headers' list. > During the first iteration, if a specific header (mapped to a variable > 'source_header') exists in both lists, then the 'similar_headers' > variable is incremented by one. Similarly, if it doesn't exist in both > lists, 'different_headers' is incremented by one. For the second > iteration, it only checks for different headers. > > My code works as expected and there are no bugs, however I get the > feeling that I'm not doing this comparison in the most efficient way > possible. Is there another way that I can make this same comparison > while making my code more Pythonic and efficient? I would prefer not > to have to install an external module from elsewhere, though if I have > to then I will. > > Thanks in advance for any and all answers! how about: # Check if the headers between the two mappings are the same source_headers_set = set(source_headers) target_headers_set = set(target_headers) similar_headers = len(source_headers_set & target_headers_set) different_headers = len(source_headers_set ^ target_headers_set)
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
What is the most efficient way to find similarities and differences between the contents of two lists? Zachary Dziura <zcdziura@gmail.com> - 2011-06-13 08:06 -0700 Re: What is the most efficient way to find similarities and differences between the contents of two lists? nn <pruebauno@latinmail.com> - 2011-06-13 08:26 -0700
csiph-web