Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #70606

Re: how to split this kind of text into sections

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.008
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'python.': 0.02; 'subject:text': 0.05; 'key.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:into': 0.09; 'worse': 0.09; 'translate': 0.10; 'jan': 0.12; 'sections': 0.14; 'contractions': 0.16; 'oyster': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'ignore': 0.16; 'wrote:': 0.18; 'rules': 0.22; 'header:User-Agent:1': 0.23; 'text,': 0.24; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply- To:1': 0.27; 'rest': 0.29; 'am,': 0.29; 'words': 0.29; 'list:': 0.30; 'lines': 0.31; 'keys': 0.31; 'moderate': 0.31; 'file': 0.32; 'another': 0.32; 'text': 0.33; 'could': 0.34; 'but': 0.35; 'possible': 0.36; 'should': 0.36; 'list': 0.37; 'to:addr:python- list': 0.38; 'short': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'remove': 0.60; 'blank': 0.60; 'length': 0.61; 'received:173': 0.61; 'different': 0.65; 'between': 0.67; 'line,': 0.68; 'computers': 0.72; 'hand': 0.80; 'subject:this': 0.83; 'follow,': 0.84; 'presumably': 0.84; 'received:fios.verizon.net': 0.84; 'splitted': 0.84; 'items,': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Terry Reedy <tjreedy@udel.edu>
Subject Re: how to split this kind of text into sections
Date Fri, 25 Apr 2014 13:52:56 -0400
References <CACW-qXU3nQZY1VV+aGucqKeFjxz+OdVvmrSmSZRitj-+OvKFkA@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 7bit
X-Gmane-NNTP-Posting-Host pool-173-75-254-207.phlapa.fios.verizon.net
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
In-Reply-To <CACW-qXU3nQZY1VV+aGucqKeFjxz+OdVvmrSmSZRitj-+OvKFkA@mail.gmail.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.9502.1398448424.18130.python-list@python.org> (permalink)
Lines 25
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1398448424 news.xs4all.nl 2862 [2001:888:2000:d::a6]:42344
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:70606

Show key headers only | View raw


On 4/25/2014 9:07 AM, oyster wrote:
> I have a long text, which should be splitted into some sections, where
> all sections have a pattern like following with different KEY.

Computers are worse at reading your mind than humans. If you can write 
rules that another person could follow, THEN we could help you translate 
the rules to Python.

If you have 1 moderate length file or a few short files, I would edit 
them by hand to remove ignore lines and put a blank line between 
sections. A program to do the rest would then be easy.

> the above text should be splitted as a LIST with 3 items, and I also
> need to know the KEY for LIST is ['I am section', 'let's continue', 'I
> am using']:

This suggests that the rule for keys is 'first 3 words of a line, with 
contractions counted as 2 words'. Correct?

Another possible rule is 'a member of the following list: ...', as you 
gave above but presumably expanded.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: how to split this kind of text into sections Terry Reedy <tjreedy@udel.edu> - 2014-04-25 13:52 -0400

csiph-web