From unknown Thu Jun 19 14:02:35 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#42602 <42602@debbugs.gnu.org> To: bug#42602 <42602@debbugs.gnu.org> Subject: Status: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist Reply-To: bug#42602 <42602@debbugs.gnu.org> Date: Thu, 19 Jun 2025 21:02:35 +0000 retitle 42602 Wrong (not-)casechars value for "polish" in ispell-dictionary= -base-alist reassign 42602 emacs submitter 42602 Sebastian Urban severity 42602 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 29 12:12:10 2020 Received: (at submit) by debbugs.gnu.org; 29 Jul 2020 16:12:10 +0000 Received: from localhost ([127.0.0.1]:33538 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0ogc-0000h6-2I for submit@debbugs.gnu.org; Wed, 29 Jul 2020 12:12:10 -0400 Received: from lists.gnu.org ([209.51.188.17]:47822) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0ogZ-0000gy-GQ for submit@debbugs.gnu.org; Wed, 29 Jul 2020 12:12:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58328) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k0ogZ-0004m6-AM for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 12:12:07 -0400 Received: from mail-ej1-x62e.google.com ([2a00:1450:4864:20::62e]:36990) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k0ogU-0002Fi-Dg for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 12:12:06 -0400 Received: by mail-ej1-x62e.google.com with SMTP id qc22so10146223ejb.4 for ; Wed, 29 Jul 2020 09:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=v0X/1WNo+KnIWuodDVqYYu/vE8+2OafT/zqRJReAYXI=; b=cvjQiXpeXZhF4q/kLaJeQ73C0KhygabkYTfVVV7HAm/lSnw+SQ+/Wq8INUl4Vfv1mm D4/uWWqLSOFSMrf1kNrKT174biY3OqaPEOQ0BOKSqKGHQKY2FWgbh13/QQbEMYSgg2OE JYrbJHTCKmX13URBlZf21aJ9vkh7/VBsXsWPGHUJ7comUlBuqmTc1UMfbJ4/evAwFLsh zuCZG7dWgbbszUP7VLkKjjnBGKSynmORuN637dCVPGCNWtXm/yN+otQaW5aLPK+I15dU IdOLUJbcFTbMXRkrnuEE7FaxRzomz0yXXyuzBo760sSgcdblxbBQTcPCrPLA/K6kqpOj MgHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=v0X/1WNo+KnIWuodDVqYYu/vE8+2OafT/zqRJReAYXI=; b=ZYGnN/kmrDpf3bDDKVI+rRnFc0dcTDxX+ojgk5JKA15GXdcQOD/wZhR8x2ke1lRJNH 9tshzGjIPCoOUAatJ2tYBiUMu6V84tFah21GjzHVL/wS7ScraEGECpthuVmuGl6u8S9y Mm4z8kOM1BVvVQqJN6Hnil/sJa+2DsfpREM5J6QU6aHpbbmxxDBc8ZqqLKI2ATT7DusK fu6fR05HcE5sIglfTTP08vftVr7atOWBLJ8eLQepDYq3osDwxFzZeZSy8g6CSEE077IH ACVeEgD0nNDDtt7x3p7Atdr6ZJb7ZcRMSiQcinveD54c9xs5YfK0CTGwiTNqtCKWHBHv 2pRQ== X-Gm-Message-State: AOAM5328UOc9LEqmNkIbon8iUhn1Petn/qFVEhX2vP0nRlgJ3U36AY/B 0fgl5EOvyFC6NAUahgDM9hvkRfzR X-Google-Smtp-Source: ABdhPJzl/kqe4gHK7b6XHVJ0uyIAsZEvCK+mWXzGRqEs0FRjDPliHp49lQ04dUW5VNtTJl2FqDGoDg== X-Received: by 2002:a17:906:37d2:: with SMTP id o18mr15966828ejc.162.1596039119874; Wed, 29 Jul 2020 09:11:59 -0700 (PDT) Received: from ?IPv6:2a00:f41:184b:2a09:2d6f:1042:9358:c5b? ([2a00:f41:184b:2a09:2d6f:1042:9358:c5b]) by smtp.gmail.com with ESMTPSA id cn16sm2131924edb.86.2020.07.29.09.11.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jul 2020 09:11:59 -0700 (PDT) From: Sebastian Urban Subject: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist To: Bug GNU Emacs Message-ID: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> Date: Wed, 29 Jul 2020 18:12:02 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::62e; envelope-from=mrsebastianurban@gmail.com; helo=mail-ej1-x62e.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hello, for words like: męski miód klątwa ślad łuk żaba źrebak grzać bańka ispell.el sends to Aspell only part of the word, e.g. "lad" instead of "ślad", or "kl"/"twa" (depending on the cursor position) instead of "klątwa". I think this is because wrong value of (NOT-)CASECHARS, which is ASCII A-z letters and a few chars of which only ó/Ó is valid for Polish. Although, for some reason, it doesn't recognize "ó" in word "miód", sending "mi" or "d". It is on the list of CASECHARS under \363, so it should work. Moreover, if I type in regexp-builder "[\363\323]" it won't recognize ó/Ó, but it doesn't have a problem with other Polish chars, like "ł" ("[\502]") or "ż" ("[\574]"). If I put in my init.el: --8<---------------cut here---------------start------------->8--- (setq ispell-program-name "C:/cygwin64/bin/aspell") (add-hook 'ispell-initialize-spellchecker-hook (lambda () (add-to-list 'ispell-local-dictionary-alist '("pl" ;; "[[:alpha:]]" ;; "[^[:alpha:]]" ;; ęóąśłżźćńĘÓĄŚŁŻŹĆŃ "[A-Za-z\431\363\405\533\502\574\572\407\504\430\323\404\532\501\573\571\406\503]" "[^A-Za-z\431\363\405\533\502\574\572\407\504\430\323\404\532\501\573\571\406\503]" "[.]" nil nil nil iso-8859-2)))) (setq ispell-dictionary "pl") --8<---------------cut here---------------start------------->8--- everything seems to work, even ó/Ó are recognised. "[[:alpha:]]" works as well, so I leaved it as an alternative. Changing from iso-8859-2 to utf-8 doesn't break anything. Tested on: - GNU Emacs 26.3 (build 1, x86_64-w64-mingw32) of 2019-08-29, - GNU Emacs 28.0.50 (build 1, x86_64-w64-mingw32) of 2020-07-05, with Aspell from Cygwin installation. S. U. From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 29 14:43:35 2020 Received: (at 42602) by debbugs.gnu.org; 29 Jul 2020 18:43:35 +0000 Received: from localhost ([127.0.0.1]:33716 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0r39-0004MO-J7 for submit@debbugs.gnu.org; Wed, 29 Jul 2020 14:43:35 -0400 Received: from eggs.gnu.org ([209.51.188.92]:58640) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0r37-0004MC-Mt for 42602@debbugs.gnu.org; Wed, 29 Jul 2020 14:43:33 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:51932) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k0r32-00077p-E2; Wed, 29 Jul 2020 14:43:28 -0400 Received: from [176.228.60.248] (port=1083 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k0r31-0001Oq-20; Wed, 29 Jul 2020 14:43:27 -0400 Date: Wed, 29 Jul 2020 21:43:22 +0300 Message-Id: <83h7tqf9h1.fsf@gnu.org> From: Eli Zaretskii To: Sebastian Urban In-Reply-To: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> (message from Sebastian Urban on Wed, 29 Jul 2020 18:12:02 +0200) Subject: Re: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 42602 Cc: 42602@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Sebastian Urban > Date: Wed, 29 Jul 2020 18:12:02 +0200 > > for words like: > męski > miód > klątwa > ślad > łuk > żaba > źrebak > grzać > bańka > ispell.el sends to Aspell only part of the word, e.g. "lad" instead of > "ślad", or "kl"/"twa" (depending on the cursor position) instead of > "klątwa". > > I think this is because wrong value of (NOT-)CASECHARS, which is ASCII > A-z letters and a few chars of which only ó/Ó is valid for Polish. > > Although, for some reason, it doesn't recognize "ó" in word "miód", > sending "mi" or "d". It is on the list of CASECHARS under \363, so it > should work. Moreover, if I type in regexp-builder "[\363\323]" it > won't recognize ó/Ó, but it doesn't have a problem with other Polish > chars, like "ł" ("[\502]") or "ż" ("[\574]"). > > If I put in my init.el: > --8<---------------cut here---------------start------------->8--- > (setq ispell-program-name "C:/cygwin64/bin/aspell") > (add-hook 'ispell-initialize-spellchecker-hook > (lambda () > (add-to-list 'ispell-local-dictionary-alist > '("pl" > ;; "[[:alpha:]]" > ;; "[^[:alpha:]]" > ;; ęóąśłżźćńĘÓĄŚŁŻŹĆŃ > "[A-Za-z\431\363\405\533\502\574\572\407\504\430\323\404\532\501\573\571\406\503]" > "[^A-Za-z\431\363\405\533\502\574\572\407\504\430\323\404\532\501\573\571\406\503]" > "[.]" nil nil nil iso-8859-2)))) > (setq ispell-dictionary "pl") > --8<---------------cut here---------------start------------->8--- > > everything seems to work, even ó/Ó are recognised. I don't understand this change. Values above octal 377 cannot be right in the above regexps, because they are supposed to be in Latin-2 encoding, which is a single-byte encoding, and so can only handle values below octal 400. How did you come up with those values? Anyway, I'm quite sure some other factor is at work here. > Tested on: > - GNU Emacs 26.3 (build 1, x86_64-w64-mingw32) of 2019-08-29, > - GNU Emacs 28.0.50 (build 1, x86_64-w64-mingw32) of 2020-07-05, > with Aspell from Cygwin installation. Your Emacs is a native MinGW build, whereas Aspell seems to be a Cygwin build? If so, you could have incompatibility in character encoding. What is your Windows locale? And what does M-: (getenv "LANG") RET yield inside Emacs? From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 30 07:40:01 2020 Received: (at 42602) by debbugs.gnu.org; 30 Jul 2020 11:40:01 +0000 Received: from localhost ([127.0.0.1]:35193 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k16un-0000H5-7h for submit@debbugs.gnu.org; Thu, 30 Jul 2020 07:40:01 -0400 Received: from mail-lf1-f46.google.com ([209.85.167.46]:37957) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k16ul-0000Gp-3c for 42602@debbugs.gnu.org; Thu, 30 Jul 2020 07:40:00 -0400 Received: by mail-lf1-f46.google.com with SMTP id 140so14766409lfi.5 for <42602@debbugs.gnu.org>; Thu, 30 Jul 2020 04:39:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=iaFWtsTJk6zifwAJQqp4rJ/U/IHT2Pfzz6lyPVK3lBk=; b=aMYvJXDrOwUWTISfCZ0FPXCtC/otpi2Gud4jABlAOppT7Elwf36ji+jRgH1O9yCLMT oRV6ZS7fDurZXTala4c2wZdXwCsXv8nz8q2rYvbQ2NQPI5zaM5C92CVfhee+lHWBFhwt 2pff1fjpAsmnuAjUc9tAZRduFouGb7Oye4+UaSTMbAXyZeAy8ezI4lulkWkI4JMIrBMl ht4OAyT/3j8hWJkOceLf7H6avlp+7UCH3pW2RsJbCEF44/JJfqzj+1FME8YfHxHjWb7T Em0Ib6JTGES0D5fOganXC5hLN4AX3zcyUofqySowUtle2vGcpgUyZ5Ni9pIukTAm1OKr 1huA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=iaFWtsTJk6zifwAJQqp4rJ/U/IHT2Pfzz6lyPVK3lBk=; b=Btm8psBE+K64hSlr2Krk2l6glYP7VIn4hV+FK+BrPjIzEJvGNRjPT98lftcdMcvO7r JEtrgZEG9OkbWvn7bcI95w32CGoZv8Nwuy1FlZ9knoFuIbnc6eYKYS5SbwU9c5+/dMVL Jt7789s6V02upt6qfQFy5p6eDoCkMeUHt93UGUjyCRG/fIFMPbxkk/F0LrNjV8ielMFL T+btJdqvdTkXKOJt1PwoUECAlkwklejEB0VIBICj6EPMw4sEGQXcabjMsLgU+gjHCI7P M8NtjMeic1D45GUGyS5n4zjXKC2b0cnmicwTck84BjryHCgqqRPzxN+MNjH61BJ4V5Qm RWtg== X-Gm-Message-State: AOAM53122xulvgguDQm/UmSJt7C/v4HhlTSZz16mXFdxTB0YjYGXIRSl vmFY0ng7tLAG/EFBDklgMNlPi8CY X-Google-Smtp-Source: ABdhPJw1qFV7juztOa729vHhRvVo0nkDOoCATXosP9KaEPk3+LjQRcf5De3QGGVys/7xgiuQQhpdjA== X-Received: by 2002:a19:c752:: with SMTP id x79mr1329033lff.197.1596109191800; Thu, 30 Jul 2020 04:39:51 -0700 (PDT) Received: from [192.168.1.100] (ip-89-161-2-137.tel.tkb.net.pl. [89.161.2.137]) by smtp.gmail.com with ESMTPSA id k12sm1128313lfe.68.2020.07.30.04.39.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Jul 2020 04:39:51 -0700 (PDT) From: Sebastian Urban Subject: Re: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist To: Eli Zaretskii References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> <83h7tqf9h1.fsf@gnu.org> Message-ID: Date: Thu, 30 Jul 2020 13:39:55 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <83h7tqf9h1.fsf@gnu.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit X-Spam-Score: -0.8 (/) X-Debbugs-Envelope-To: 42602 Cc: 42602@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) > I don't understand this change. Values above octal 377 cannot be > right in the above regexps, because they are supposed to be in > Latin-2 encoding, which is a single-byte encoding, and so can only > handle values below octal 400. How did you come up with those > values? Basically, C-x = on a char, which gave me octal values. I though it was recognising only A-z + ó/Ó and some other chars that I'm not interested in, so I swapped those values for the ones corresponding to the Polish chars. That's the whole story. > Anyway, I'm quite sure some other factor is at work here. Well, I did some tests, e.g. switched back to the original value of "polish" in my "pl" dictionary, and... it works. And if I change from iso-8859-2 to utf-8 in my "pl" (with original value from "polish") it doesn't work. So, as you later wrote - wrong character encoding, I guess. Looking for a cause (in default settings), I think I found it in ispell-dictionary-base-alist and ispell-dictionary-alist. During "transfer" from *-base-* to ispell-dictionary-alist, the value of CHARACTER-SET is changed in all cases from iso-* or cp1255 to utf-8, then ispell uses these (from ispell-dictionary-alist) when it "talks" with Aspell. On the other hand, if I use Emacs 26.3 from Cygwin, everything works out of the box, I don't even have to set "polish" as default dictionary. But there, in Cygwin command line, "env | grep LANG" gives "LANG=pl_PL.UTF-8". > Your Emacs is a native MinGW build, whereas Aspell seems to be > a Cygwin build? Both Emacses are official Win builds, and Aspell is installed through Cygwin. > If so, you could have incompatibility in character encoding. What > is your Windows locale? "Polish" everywhere in "Control Panel" -> "Regional and Language". > And what does M-: (getenv "LANG") RET yield inside Emacs? "PLK" S. U. P.S. > Moreover, if I type in regexp-builder "[\363\323]" it won't > recognize ó/Ó, but it doesn't have a problem with other Polish > chars, like "ł" ("[\502]") or "ż" ("[\574]"). In the "Character List" buffer for unicode-bmp, regexp-builder (numbers are octal values): - 0-177 and 400-777 - highlights chars - 240-377 - doesn't highlight chars (it highlights them if I use hex value, or insert them directly) I didn't check "80h-9Fh" chars. Chars like C-a were checked by inserting them with quoted-insert in another buffer. From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 30 09:26:34 2020 Received: (at 42602) by debbugs.gnu.org; 30 Jul 2020 13:26:35 +0000 Received: from localhost ([127.0.0.1]:35348 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k18Zu-0002pv-HN for submit@debbugs.gnu.org; Thu, 30 Jul 2020 09:26:34 -0400 Received: from eggs.gnu.org ([209.51.188.92]:59228) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k18Zs-0002ph-AI for 42602@debbugs.gnu.org; Thu, 30 Jul 2020 09:26:34 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37839) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k18Zm-0006Nw-HD; Thu, 30 Jul 2020 09:26:26 -0400 Received: from [176.228.60.248] (port=1891 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k18Zl-0002rK-JQ; Thu, 30 Jul 2020 09:26:26 -0400 Date: Thu, 30 Jul 2020 16:26:07 +0300 Message-Id: <831rktf828.fsf@gnu.org> From: Eli Zaretskii To: Sebastian Urban In-Reply-To: (message from Sebastian Urban on Thu, 30 Jul 2020 13:39:55 +0200) Subject: Re: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> <83h7tqf9h1.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 42602 Cc: 42602@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Sebastian Urban > Cc: 42602@debbugs.gnu.org > Date: Thu, 30 Jul 2020 13:39:55 +0200 > > > I don't understand this change. Values above octal 377 cannot be > > right in the above regexps, because they are supposed to be in > > Latin-2 encoding, which is a single-byte encoding, and so can only > > handle values below octal 400. How did you come up with those > > values? > > Basically, C-x = on a char, which gave me octal values. This gives you the Unicode codepoint, not its Latin-2 encoding. They are different. The database in ispell.el uses Latin-2 encodings of Polish characters. > Well, I did some tests, e.g. switched back to the original value of > "polish" in my "pl" dictionary, and... it works. And if I change from > iso-8859-2 to utf-8 in my "pl" (with original value from "polish") it > doesn't work. So, as you later wrote - wrong character encoding, > I guess. > > Looking for a cause (in default settings), I think I found it in > ispell-dictionary-base-alist and ispell-dictionary-alist. During > "transfer" from *-base-* to ispell-dictionary-alist, the value of > CHARACTER-SET is changed in all cases from iso-* or cp1255 to utf-8, > then ispell uses these (from ispell-dictionary-alist) when it "talks" > with Aspell. > > On the other hand, if I use Emacs 26.3 from Cygwin, everything works > out of the box, I don't even have to set "polish" as default > dictionary. But there, in Cygwin command line, "env | grep LANG" gives > "LANG=pl_PL.UTF-8". Native MinGW builds cannot use the UTF-8 encoding. So, do we have a problem to solve, or can this issue be closed? From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 31 06:52:53 2020 Received: (at 42602) by debbugs.gnu.org; 31 Jul 2020 10:52:53 +0000 Received: from localhost ([127.0.0.1]:37378 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k1Sej-0004LF-4C for submit@debbugs.gnu.org; Fri, 31 Jul 2020 06:52:53 -0400 Received: from mail-lf1-f41.google.com ([209.85.167.41]:34167) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k1Seh-0004L3-CL for 42602@debbugs.gnu.org; Fri, 31 Jul 2020 06:52:52 -0400 Received: by mail-lf1-f41.google.com with SMTP id d2so10988791lfj.1 for <42602@debbugs.gnu.org>; Fri, 31 Jul 2020 03:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=9y3le19LG2o6USa+3sGSEay/BKB2pZBjjhPVstT7XUo=; b=HArNB3yrRD/NWv1KzZqqbDEGA/FZD5igj5U2T45oLID+g/MOV7e12DM1LgNzcLnGC8 er5A/f7ZGuyv1VphkpY0CBWFGw7jK2OlG3lU6Ct8RGkZaA6D8Amlt5A2a6qGR1M1PR2K DldvV5yTdkHVN4qER4VcEzADFmdFZ5+wHL7eJZv6/e/WhJ3Nkl7t5FptTy7+lc04sn4y fdGHganoebRbMalRkwlHLlSSrz3DbNOyhy0zVyIY+63WJFmuiBa0cauB4fzTTw1mPQ/f +RxN/tRswJnu4dKBUzWaG7Pb6jYwS6x5AYZYh89ku6LrHZCjvIYdlYf3dYe2fMGQtWcv npNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=9y3le19LG2o6USa+3sGSEay/BKB2pZBjjhPVstT7XUo=; b=GQnAkDOcS9AZiHIYF7o23RS55xBUpjtmQ+FI2yw/Mzr6LdpDCsfesjTFba06M7EUIs Pl15SWd/sKvsvdMwvvb6fsyR4YUhliw6jzUQGE6kYb8PO+VwxQljQdZc1nsKqQxbeaLR 0Z+TLCjz6xai4bUlF6nSOWVOvsFHP74HIzuIu0S5rcwPjLgv5RWr2IPUrjeu8sBcb1NE o4sUSIAhh6Klm7xuVo+XaOwzJmrI/zWR+mgo0IvVcIXmrj/4xGvlA0HBel6n1RvgT4fl HfD9hF8sjorgN6eNrnTi3Y8yW8mDItPEyaifJ5jAQhcO4hPd+xmTFYkM2qkFks/rZM4t X3aw== X-Gm-Message-State: AOAM5331BUYZfpNTGAp4qDa9TPWMuA5Hh8Uz40FfVxCIyi78Gn+81UEG BWbHV2fhwjZc/mABCyhLZGb45sld X-Google-Smtp-Source: ABdhPJxQG6t2UNfX+jq77fTYHXUPPdt7NSD+eAIdam0ShEBtW4GOx1g0nXk+aY1nhiKHNoyxiPZ8Fg== X-Received: by 2002:ac2:44d4:: with SMTP id d20mr1727029lfm.137.1596192764771; Fri, 31 Jul 2020 03:52:44 -0700 (PDT) Received: from [192.168.1.100] (ip-89-161-2-137.tel.tkb.net.pl. [89.161.2.137]) by smtp.gmail.com with ESMTPSA id 186sm1846245lfn.66.2020.07.31.03.52.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Jul 2020 03:52:43 -0700 (PDT) From: Sebastian Urban Subject: Re: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist To: Eli Zaretskii References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> <83h7tqf9h1.fsf@gnu.org> <831rktf828.fsf@gnu.org> Message-ID: Date: Fri, 31 Jul 2020 12:52:47 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <831rktf828.fsf@gnu.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Spam-Score: -0.8 (/) X-Debbugs-Envelope-To: 42602 Cc: 42602@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) >>> I don't understand this change. Values above octal 377 cannot be >>> right in the above regexps, because they are supposed to be in >>> Latin-2 encoding, which is a single-byte encoding, and so can only >>> handle values below octal 400. How did you come up with those >>> values? >> >> Basically, C-x = on a char, which gave me octal values. > > This gives you the Unicode codepoint, not its Latin-2 encoding. > They are different. So, it would work even if I would add "\999999999", because Emacs would not recognize and simply ignore it, which means the only reason it worked was explicitly set encoding (iso-8859-2)? > The database in ispell.el uses Latin-2 encodings of Polish > characters. As base, but before ispell.el sends the string to the Aspell it translates it to uft-8, right? Because that's the only difference between my custom "pl" dictionary and value of "polish" in ispell-dictionary-alist. > Native MinGW builds cannot use the UTF-8 encoding. So, with my setup (not saying that it's the best one, it's just current one, if there is a better one I can change), for Polish lang, I have to define local dictionary with iso-8859-2 coding? > So, do we have a problem to solve, or can this issue be closed? If it's a problem of MinGW, and my setup, then I guess it's not an Emacs problem, so yes, it can be closed. S. U. From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 12 20:07:58 2020 Received: (at 42602-done) by debbugs.gnu.org; 13 Aug 2020 00:07:58 +0000 Received: from localhost ([127.0.0.1]:45963 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k60mk-0001p2-12 for submit@debbugs.gnu.org; Wed, 12 Aug 2020 20:07:58 -0400 Received: from mail-yb1-f182.google.com ([209.85.219.182]:38366) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k60mh-0001oo-TV for 42602-done@debbugs.gnu.org; Wed, 12 Aug 2020 20:07:56 -0400 Received: by mail-yb1-f182.google.com with SMTP id e187so2333515ybc.5 for <42602-done@debbugs.gnu.org>; Wed, 12 Aug 2020 17:07:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:user-agent :mime-version:date:message-id:subject:to:cc; bh=bN6dItIBfK4EvOp3r41lNAmHzVLmPqfpWri8QsYqUVk=; b=DBQ0qvxSBBo9taPydafN4GCUujFq8nErLlYzGt46jjwF/IigKosnJfLyscq2CTrubC y02GAdJ6nNfu4NIt+JLU6FVTVNdOD5WFrnDcYscXYrRyHLeUStO3uJVcuPFCEamaRVB0 03SmwvY9IccK+NfM7zqxRhOV10KaiAKQLYrdgyuGx5TzD2XjnpzxO+4icFsiE8Sv+Pp+ P+PD0Cp371r4/EPC8Iw+fOiqOwCHaCESmxgdk4DE3lGWYMP8KBWT8CXb0wvOSuHrbYZ5 0vWrwBFt+9G1lszluzLSwBKmhSvFucXc+UFbBhzg5sk8u/PXmq/grcXI/Ipxh0uKF275 AzeA== X-Gm-Message-State: AOAM532Eu6b/3UxYZXKZilro+WK11Jcek5jLXsbw9gTvSNqUawGXnKio W9td2EQQS73YuwJKSPqKH9cJFmcL9Jt5YdDnTwU= X-Google-Smtp-Source: ABdhPJwehWO/lf0ZpAnNtmNEbUDajD22uH+ZIjEZ1HVXhucyOWYbfQCIc/twTScZfECgUkFKDMy5GMK3byeXggkKz+s= X-Received: by 2002:a25:4609:: with SMTP id t9mr2791636yba.231.1597277270563; Wed, 12 Aug 2020 17:07:50 -0700 (PDT) Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Wed, 12 Aug 2020 17:07:50 -0700 From: Stefan Kangas In-Reply-To: (Sebastian Urban's message of "Fri, 31 Jul 2020 12:52:47 +0200") References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> <83h7tqf9h1.fsf@gnu.org> <831rktf828.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Date: Wed, 12 Aug 2020 17:07:50 -0700 Message-ID: Subject: Re: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist To: Sebastian Urban Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 42602-done Cc: Eli Zaretskii , 42602-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Sebastian Urban writes: >> So, do we have a problem to solve, or can this issue be closed? > > If it's a problem of MinGW, and my setup, then I guess it's not an > Emacs problem, so yes, it can be closed. I'm therefore closing this bug report. Best regards, Stefan Kangas From unknown Thu Jun 19 14:02:35 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 10 Sep 2020 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator