Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68898
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.002 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'output': 0.05; 'subsequent': 0.05; 'append': 0.09; 'subject:files': 0.09; 'cc:addr:python-list': 0.11; "'b',": 0.16; "'c',": 0.16; "'d',": 0.16; "'e']": 0.16; '(it': 0.16; "['a',": 0.16; 'copying.': 0.16; 'csv': 0.16; 'file1': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'headers.': 0.16; 'readers,': 0.16; 'subject:CSV': 0.16; 'subject:Merge': 0.16; 'subject:headers': 0.16; 'do,': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'basically': 0.19; 'file,': 0.19; 'seems': 0.21; 'cc:addr:python.org': 0.22; 'headers': 0.24; 'large,': 0.24; 'cc:2**0': 0.24; 'order.': 0.26; 'second': 0.26; 'subject:/': 0.26; 'code:': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'message-id:@mail.gmail.com': 0.30; '25,': 0.31; 'once,': 0.31; 'file': 0.32; 'probably': 0.32; 'summary': 0.32; 'open': 0.33; 'used,': 0.33; 'actual': 0.34; 'sense': 0.34; "i'd": 0.34; 'could': 0.34; 'subject:with': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'found.': 0.36; 'set.': 0.36; 'done': 0.36; 'two': 0.37; 'easily': 0.37; 'files': 0.38; 'list,': 0.38; 'rather': 0.38; 'enough': 0.39; 'matter': 0.61; 'simple': 0.61; "you're": 0.61; 'first': 0.61; 'different': 0.65; 'here': 0.66; 'mar': 0.68; "it'd": 0.84; 'technique.': 0.84; 'to:none': 0.92 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=n+nOR8yZCxRr7MEXJNOA/YB9ENyuLerj0kHLKQ8u90w=; b=fZ0cXIFXMkSZrozxkvOfrDvc5TC06sgKfVNR7jH5gk0jZgPojVrNNNccj3fEOqA7EO uT1+FQDCwbvUxTt6iefqloeW4YUfNkOiRt/pb9jW9v8Fi4Rvgq8Y9nhSEs2ka4oNP1dE d5GO9C6B9T8GiBivlJpIN4ZOv4y2mLHVq4YQWLDkjfASJ9BkuTbYXrcPqOl9MGkBtazP o2X5Ylru4HNbg9qBJ+6vl0N0lx7hYhLI2VYIUivmjPnVx6XlgyLWcpowFkCHmad8jYGA qDmVTdV8UxRwSEaxtYyyUOpjmoNZvrFW9yiisUFdr/703ynJZcH5S7Vy8za7OJzzllGF KVZw== |
| MIME-Version | 1.0 |
| X-Received | by 10.68.248.7 with SMTP id yi7mr74661959pbc.31.1395690279398; Mon, 24 Mar 2014 12:44:39 -0700 (PDT) |
| In-Reply-To | <CALyJZZWmAWP=STfZGpvN_NexxS=7DA9GrxOAC+2ibUNgkJtM1Q@mail.gmail.com> |
| References | <CALyJZZWmAWP=STfZGpvN_NexxS=7DA9GrxOAC+2ibUNgkJtM1Q@mail.gmail.com> |
| Date | Tue, 25 Mar 2014 06:44:39 +1100 |
| Subject | Re: Merge/append CSV files with different headers |
| From | Chris Angelico <rosuav@gmail.com> |
| Cc | "python-list@python.org" <python-list@python.org> |
| Content-Type | text/plain; charset=UTF-8 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.8462.1395690282.18130.python-list@python.org> (permalink) |
| Lines | 32 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1395690282 news.xs4all.nl 2836 [2001:888:2000:d::a6]:35762 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:68898 |
Show key headers only | View raw
On Tue, Mar 25, 2014 at 4:50 AM, Vincent Davis <vincent@vincentdavis.net> wrote: > I have several csv file I need to append (vertically). They have different > but overlapping headers. For example; > file1 headers ['a', 'b', 'c'] > file2 headers ['d', 'e'] > file3 headers ['c', 'd'] > > Is there a better way than this Summary of your code: 1) Build up a set of all headers used, by opening each file and reading the headers. 2) Go through each file a second time and write them out. That seems like the best approach, broadly. You might be able to improve it a bit (it might be tidier to open each file once, but since you're using two different CSV readers, it'd probably not be), but by and large, I'd say you have the right technique. Your processing time here is going to be dominated by the actual work of copying. The only thing you might want to consider is order. The headers all have a set order to them, and it'd make sense to have the output come out as ['a', 'b', 'c', 'd', 'e'] - the first three from the first file, then adding in everything from subsequent files in the order they were found. Could be done easily enough by using 'in' and .append() on a list, rather than using a set. But if that doesn't matter to you, or if something simple like "sort the headers alphabetically" will do, then I think you basically have what you want. ChrisA
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Merge/append CSV files with different headers Chris Angelico <rosuav@gmail.com> - 2014-03-25 06:44 +1100
csiph-web