Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #106780
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Peter Otten <__peter__@web.de> |
| Newsgroups | comp.lang.python |
| Subject | Re: function to remove and punctuation |
| Date | Sun, 10 Apr 2016 14:35:55 +0200 |
| Organization | None |
| Lines | 59 |
| Message-ID | <mailman.2.1460291770.6211.python-list@python.org> (permalink) |
| References | <3af95726-6f5c-4a2d-bf42-061efedd13b1@googlegroups.com> <nedhbf$all$1@ger.gmane.org> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="UTF-8" |
| Content-Transfer-Encoding | 8Bit |
| X-Trace | news.uni-berlin.de sudD6xKX3A8ZVVNfvo+ycg18WLDURQGQDWjQtZCAIQNg== |
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'discard': 0.05; 'method.': 0.05; 'none,': 0.05; 'collections': 0.09; 'iterate': 0.09; 'method:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'def': 0.13; 'translate': 0.15; '"_"': 0.16; '(assuming': 0.16; 'defaultdict': 0.16; 'parameter,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; "skip:' 60": 0.16; 'subject:remove': 0.16; 'wrote:': 0.16; 'string': 0.17; '>>>': 0.20; 'first,': 0.20; '"",': 0.22; 'keyerror:': 0.22; 'pass': 0.22; 'decide': 0.23; 'bit': 0.23; 'import': 0.24; '(most': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints- To:1': 0.26; 'function': 0.28; 'looks': 0.29; 'preserve': 0.29; 'themselves': 0.29; "i'd": 0.31; 'changed': 0.33; 'traceback': 0.33; 'file': 0.34; 'text': 0.35; 'mapping': 0.35; 'maps': 0.35; 'too': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'delete': 0.38; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'your': 0.60; 'email addr:gmail.com': 0.62; 'more': 0.63; 'keep.': 0.84 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| X-Gmane-NNTP-Posting-Host | p57bd9a4b.dip0.t-ipconnect.de |
| User-Agent | KNode/4.13.3 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <nedhbf$all$1@ger.gmane.org> |
| X-Mailman-Original-References | <3af95726-6f5c-4a2d-bf42-061efedd13b1@googlegroups.com> |
| Xref | csiph.com comp.lang.python:106780 |
Show key headers only | View raw
geshdus@gmail.com wrote:
> how to write a function taking a string parameter, which returns it after
> you delete the spaces, punctuation marks, accented characters in python ?
Looks like you want to remove more characters than you want to keep. In this
case I'd decide what characters too keep first, e. g. (assuming Python 3)
>>> import string
>>> keep = string.ascii_letters + string.digits
>>> keep
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
Now you can iterate over the characters and check if you want to preserve it
for each of them:
>>> def clean(s, keep):
... return "".join(c for c in s if c in keep)
...
>>> clean("<alpha> äöü ::42", keep)
'alpha42'
>>> clean("<alpha> äöü ::42", string.ascii_letters)
'alpha'
If you are dealing with a lot of text you can make this a bit more efficient
with the str.translate() method. Create a mapping that maps all characters
that you want to keep to themselves
>>> m = str.maketrans(keep, keep)
>>> m[ord("a")]
97
>>> m[ord(">")]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 62
and all characters that you want to discard to None
>>> from collections import defaultdict
>>> trans = defaultdict(lambda: None, m)
>>> trans[ord("s")]
115
>>> trans[ord("ß")] # returns None, so nothing is printed
>>>
Now pass it to the translate() method:
>>> "<alpha> äöü ::42".translate(trans)
'alpha42'
You changed your mind and want to translate " " to "_"? Here's how:
>>> trans[ord(" ")] = "_"
>>> "<alpha> äöü ::42".translate(trans)
'alpha__42'
>>> trans[ord(" ")] = "_"
>>> "<alpha> äöü ::42".translate(trans)
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
function to remove and punctuation geshdus@gmail.com - 2016-04-10 04:37 -0700
Re: function to remove and punctuation Steven D'Aprano <steve@pearwood.info> - 2016-04-10 22:08 +1000
Re: function to remove and punctuation Peter Otten <__peter__@web.de> - 2016-04-10 14:35 +0200
Re: function to remove and punctuation Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-04-10 16:23 +0200
Re: function to remove and punctuation Peter Otten <__peter__@web.de> - 2016-04-10 17:52 +0200
csiph-web