Path: csiph.com!aioe.org!bofh.it!news.nic.it!robomod From: Markus Koschany Newsgroups: linux.debian.bugs.dist,linux.debian.devel,linux.debian.maint.java Subject: Bug#1018100: ITP: liblanguage-detector-java -- Language Detection Library for Java Date: Thu, 25 Aug 2022 19:10:01 +0200 Message-ID: X-Original-To: Debian Bug Tracking System X-Mailbox-Line: From debian-bugs-dist-request@lists.debian.org Thu Aug 25 17:09:11 2022 Old-Return-Path: X-Spam-Flag: NO X-Spam-Score: -4.21 Reply-To: Markus Koschany , 1018100@bugs.debian.org Resent-To: debian-bugs-dist@lists.debian.org Resent-Cc: debian-devel@lists.debian.org, apo@debian.org, debian-java@lists.debian.org, wnpp@debian.org X-Debian-Pr-Message: report 1018100 X-Debian-Pr-Package: wnpp Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Mailer: reportbug 7.10.3+deb11u1 X-Debian-Message: from BTS X-Mailing-List: archive/latest/1731070 List-ID: List-URL: Approved: robomod@news.nic.it Lines: 22 Organization: linux.* mail to news gateway Sender: robomod@news.nic.it X-Original-Date: Thu, 25 Aug 2022 19:06:20 +0200 X-Original-Message-ID: <166144718054.606615.6897519586320983299.reportbug@faye> Xref: csiph.com linux.debian.bugs.dist:1114481 linux.debian.devel:105432 linux.debian.maint.java:12431 Package: wnpp Severity: wishlist Owner: Markus Koschany X-Debbugs-Cc: debian-devel@lists.debian.org, apo@debian.org,debian-java@lists.debian.org * Package name : liblanguage-detector-java Version : 0.6 Upstream Author : Nakatani Shuyo, Francois ROLAND, Fabian Kessler, Nicole Torres, Robert Theis * URL : https://github.com/optimaize/language-detector * License : Apache-2.0 Programming Lang: Java Description : Language Detection Library for Java This software uses language profiles which were created based on common text for each language. N-grams, a contiguous sequence of n items from a given sample of text, were then extracted from that text and stored in the profiles. When trying to figure out in what language a certain text is written, the program goes through the same process: It creates the same kind of n-grams of the input text. Then it compares the relative frequency of them, and finds the language that matches best. Currently 71 languages are supported.