Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #106780

Re: function to remove and punctuation

From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: function to remove and punctuation
Date 2016-04-10 14:35 +0200
Organization None
Message-ID <mailman.2.1460291770.6211.python-list@python.org> (permalink)
References <3af95726-6f5c-4a2d-bf42-061efedd13b1@googlegroups.com> <nedhbf$all$1@ger.gmane.org>

Show all headers | View raw


geshdus@gmail.com wrote:

> how to write a function taking a string parameter, which returns it after
> you delete the spaces, punctuation marks, accented characters in python ?

Looks like you want to remove more characters than you want to keep. In this 
case I'd decide what characters too keep first, e. g. (assuming Python 3)

>>> import string
>>> keep = string.ascii_letters + string.digits
>>> keep
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'

Now you can iterate over the characters and check if you want to preserve it 
for each of them:

>>> def clean(s, keep):
...     return "".join(c for c in s if c in keep)
... 
>>> clean("<alpha> äöü ::42", keep)
'alpha42'
>>> clean("<alpha> äöü ::42", string.ascii_letters)
'alpha'

If you are dealing with a lot of text you can make this a bit more efficient 
with the str.translate() method. Create a mapping that maps all characters 
that you want to keep to themselves

>>> m = str.maketrans(keep, keep)
>>> m[ord("a")]
97
>>> m[ord(">")]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 62

and all characters that you want to discard to None

>>> from collections import defaultdict
>>> trans = defaultdict(lambda: None, m)
>>> trans[ord("s")]
115
>>> trans[ord("ß")] # returns None, so nothing is printed
>>> 

Now pass it to the translate() method:

>>> "<alpha> äöü ::42".translate(trans)
'alpha42'

You changed your mind and want to translate " " to "_"? Here's how:
>>> trans[ord(" ")] = "_"
>>> "<alpha> äöü ::42".translate(trans)
'alpha__42'

>>> trans[ord(" ")] = "_"
>>> "<alpha> äöü ::42".translate(trans)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

function to remove and punctuation geshdus@gmail.com - 2016-04-10 04:37 -0700
  Re: function to remove and punctuation Steven D'Aprano <steve@pearwood.info> - 2016-04-10 22:08 +1000
  Re: function to remove and punctuation Peter Otten <__peter__@web.de> - 2016-04-10 14:35 +0200
    Re: function to remove and punctuation Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-04-10 16:23 +0200
      Re: function to remove and punctuation Peter Otten <__peter__@web.de> - 2016-04-10 17:52 +0200

csiph-web