Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.008 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'python.': 0.02; 'subject:text': 0.05; 'key.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:into': 0.09; 'worse': 0.09; 'translate': 0.10; 'jan': 0.12; 'sections': 0.14; 'contractions': 0.16; 'oyster': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'ignore': 0.16; 'wrote:': 0.18; 'rules': 0.22; 'header:User-Agent:1': 0.23; 'text,': 0.24; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply- To:1': 0.27; 'rest': 0.29; 'am,': 0.29; 'words': 0.29; 'list:': 0.30; 'lines': 0.31; 'keys': 0.31; 'moderate': 0.31; 'file': 0.32; 'another': 0.32; 'text': 0.33; 'could': 0.34; 'but': 0.35; 'possible': 0.36; 'should': 0.36; 'list': 0.37; 'to:addr:python- list': 0.38; 'short': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'remove': 0.60; 'blank': 0.60; 'length': 0.61; 'received:173': 0.61; 'different': 0.65; 'between': 0.67; 'line,': 0.68; 'computers': 0.72; 'hand': 0.80; 'subject:this': 0.83; 'follow,': 0.84; 'presumably': 0.84; 'received:fios.verizon.net': 0.84; 'splitted': 0.84; 'items,': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: how to split this kind of text into sections Date: Fri, 25 Apr 2014 13:52:56 -0400 References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: pool-173-75-254-207.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 25 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1398448424 news.xs4all.nl 2862 [2001:888:2000:d::a6]:42344 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:70606 On 4/25/2014 9:07 AM, oyster wrote: > I have a long text, which should be splitted into some sections, where > all sections have a pattern like following with different KEY. Computers are worse at reading your mind than humans. If you can write rules that another person could follow, THEN we could help you translate the rules to Python. If you have 1 moderate length file or a few short files, I would edit them by hand to remove ignore lines and put a blank line between sections. A program to do the rest would then be easy. > the above text should be splitted as a LIST with 3 items, and I also > need to know the KEY for LIST is ['I am section', 'let's continue', 'I > am using']: This suggests that the rule for keys is 'first 3 words of a line, with contractions counted as 2 words'. Correct? Another possible rule is 'a member of the following list: ...', as you gave above but presumably expanded. -- Terry Jan Reedy