Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #40419

Re: Question about Tashaphyne package in python

Date 2013-03-03 19:01 +0000
From MRAB <python@mrabarnett.plus.com>
Subject Re: Question about Tashaphyne package in python
References <0d0a40de-052c-49df-b43e-dff935071b49@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.2825.1362337289.2939.python-list@python.org> (permalink)

Show all headers | View raw


On 2013-03-03 03:06, yomnasalah91@gmail.com wrote:
> I have a Python code that take an Arabic word and get the root and also remove diacritics, but i I have a problem with the output. For example  : when the input is "العربيه" the output is:"عرب" which is right answer but when the input is "كاتب" the output is:"ب", and when the input is "يخاف" the output is " خف".
>
> This is my code:
>
> # -*- coding=utf-8 -*-
>
> import re
> from arabic_const import *
> import Tashaphyne
> from Tashaphyne import *
> import enum
> from enum import Enum
> search_type=Enum('unvoc_word','voc_word','root_word')
>
> HARAKAT_pat = re.compile(ur"[" + u"".join([FATHATAN, DAMMATAN, KASRATAN, FATHA, DAMMA, KASRA, SUKUN, SHADDA]) + u"]")
> HAMZAT_pat = re.compile(ur"[" + u"".join([WAW_HAMZA, YEH_HAMZA]) + u"]");
> ALEFAT_pat = re.compile(ur"[" + u"".join([ALEF_MADDA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, HAMZA_ABOVE, HAMZA_BELOW]) + u"]");
> LAMALEFAT_pat = re.compile(ur"[" + u"".join([LAM_ALEF, LAM_ALEF_HAMZA_ABOVE, LAM_ALEF_HAMZA_BELOW, LAM_ALEF_MADDA_ABOVE]) + u"]");
>
[snip]
When you're using Unicode with re in Python 2, you should include the
re.UNICODE flag. For example:

HARAKAT_pat = re.compile(ur"[" + u"".join([FATHATAN, DAMMATAN, KASRATAN, 
FATHA, DAMMA, KASRA, SUKUN, SHADDA]) + u"]", flags=re.UNICODE)

or:

HARAKAT_pat = re.compile(ur"(?u)[" + u"".join([FATHATAN, DAMMATAN, 
KASRATAN, FATHA, DAMMA, KASRA, SUKUN, SHADDA]) + u"]")

I don't know whether that will make a difference in this case because I
don't know Tashaphyne or Arabic.

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Question about Tashaphyne package in python yomnasalah91@gmail.com - 2013-03-02 19:06 -0800
  Re: Question about Tashaphyne package in python MRAB <python@mrabarnett.plus.com> - 2013-03-03 19:01 +0000

csiph-web