Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: function to remove and punctuation Date: Sun, 10 Apr 2016 14:35:55 +0200 Organization: None Lines: 59 Message-ID: References: <3af95726-6f5c-4a2d-bf42-061efedd13b1@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8Bit X-Trace: news.uni-berlin.de sudD6xKX3A8ZVVNfvo+ycg18WLDURQGQDWjQtZCAIQNg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'discard': 0.05; 'method.': 0.05; 'none,': 0.05; 'collections': 0.09; 'iterate': 0.09; 'method:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'def': 0.13; 'translate': 0.15; '"_"': 0.16; '(assuming': 0.16; 'defaultdict': 0.16; 'parameter,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; "skip:' 60": 0.16; 'subject:remove': 0.16; 'wrote:': 0.16; 'string': 0.17; '>>>': 0.20; 'first,': 0.20; '"",': 0.22; 'keyerror:': 0.22; 'pass': 0.22; 'decide': 0.23; 'bit': 0.23; 'import': 0.24; '(most': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints- To:1': 0.26; 'function': 0.28; 'looks': 0.29; 'preserve': 0.29; 'themselves': 0.29; "i'd": 0.31; 'changed': 0.33; 'traceback': 0.33; 'file': 0.34; 'text': 0.35; 'mapping': 0.35; 'maps': 0.35; 'too': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'delete': 0.38; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'your': 0.60; 'email addr:gmail.com': 0.62; 'more': 0.63; 'keep.': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd9a4b.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <3af95726-6f5c-4a2d-bf42-061efedd13b1@googlegroups.com> Xref: csiph.com comp.lang.python:106780 geshdus@gmail.com wrote: > how to write a function taking a string parameter, which returns it after > you delete the spaces, punctuation marks, accented characters in python ? Looks like you want to remove more characters than you want to keep. In this case I'd decide what characters too keep first, e. g. (assuming Python 3) >>> import string >>> keep = string.ascii_letters + string.digits >>> keep 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' Now you can iterate over the characters and check if you want to preserve it for each of them: >>> def clean(s, keep): ... return "".join(c for c in s if c in keep) ... >>> clean(" äöü ::42", keep) 'alpha42' >>> clean(" äöü ::42", string.ascii_letters) 'alpha' If you are dealing with a lot of text you can make this a bit more efficient with the str.translate() method. Create a mapping that maps all characters that you want to keep to themselves >>> m = str.maketrans(keep, keep) >>> m[ord("a")] 97 >>> m[ord(">")] Traceback (most recent call last): File "", line 1, in KeyError: 62 and all characters that you want to discard to None >>> from collections import defaultdict >>> trans = defaultdict(lambda: None, m) >>> trans[ord("s")] 115 >>> trans[ord("ß")] # returns None, so nothing is printed >>> Now pass it to the translate() method: >>> " äöü ::42".translate(trans) 'alpha42' You changed your mind and want to translate " " to "_"? Here's how: >>> trans[ord(" ")] = "_" >>> " äöü ::42".translate(trans) 'alpha__42' >>> trans[ord(" ")] = "_" >>> " äöü ::42".translate(trans)