Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #6900
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Newsgroups | comp.lang.ruby |
| Subject | Re: regex and decomposed character |
| Date | 2013-12-16 20:58 +0100 |
| Message-ID | <bh94c2Fma4mU1@mid.individual.net> (permalink) |
| References | <l8n8jl$oev$1@shakotay.alphanet.ch> |
On 16.12.2013 17:09, Une Bévue wrote:
> Running on Mac OS X the UTF-8 chars are decomposed, for example de é is
> represented as :
> 65 301
> instead of E9 (precomposed char)
On Linux with locale en_US.UTF-8:
$ echo 'é' | od -t x1c
0000000 c3 a9 0a
303 251 \n
0000003
> Then, in a script we could have a mixture of both representation.
>
> Normalement iconv, only on mac not on linux and other OSes, has the
> ability to transform from UTF-8MAC (decomposed) to UTF-8, but this
> doesn't work with a regex, i don't know why.
>
> for example, in french de screen shots are named "Capture d'écran..."
> and i'm unable, until now, to do a working regex over that, both "é" and
> "'" used by Apple aren't recognised.
>
> I found a workaround changing the default string for screen shots to
> "Capture ecran..." (no "é" no "'") however i wonder on a more efficient
> solution.
>
> With Ruby 2 is there a way to switch between decomposed and precomposed
> chars ?
Can you put a zip up somewhere (e.g. github) with original text and a
Ruby file you wrote just for matching? Then we could use that as
starting point for own experiments.
Also, did you try to give the source file an explicit encoding like so?
#!/usr/bin/ruby
# encoding: utf-8
Kind regards
robert
Back to comp.lang.ruby | Previous | Next — Previous in thread | Next in thread | Find similar
regex and decomposed character Une Bévue <unbewusst.sein@fai.invalid> - 2013-12-16 17:09 +0100
Re: regex and decomposed character Robert Klemme <shortcutter@googlemail.com> - 2013-12-16 20:58 +0100
Re: regex and decomposed character Une Bévue <unbewusst.sein@fai.invalid> - 2013-12-16 21:18 +0100
Re: regex and decomposed character Une Bévue <unbewusst.sein@fai.invalid> - 2013-12-16 21:22 +0100
Re: regex and decomposed character Thibault Jouan <tj+usenet@a13.fr> - 2013-12-22 21:04 +0000
Re: regex and decomposed character theone1 <link285@yahoo.com> - 2014-02-10 08:17 -0600
csiph-web