I found another issue. My files are stored in UTF-8 also on Windows. Without setting Windows in the "Beta: Use UTF-8 everywhere", tools like ripgrep will somehow interpret the files as latin-1. So I cannot search for special characters in my language, and I even remember having crashes happening when searching documents which includes them. On Wed, May 15, 2024 at 12:25 PM Simen Endsjø wrote: > > > I suggest to remove them, and see if the crashes keep happening. > > No crashes yet at least, so let's hope. > > > If removing these hacks make something stop working, describe the > > problems with the details: there are definitely ways to solve them > > without these dangerous customizations. > > Nothing has stopped working per se, but I encounter encoding problems > which is probably why I added this in the first place. > I tested using `emacs -Q`, so the default settings. > > When running in a regular terminal , I get the output: > ┌───────────────────────────────────────────────────────────┬───────────────────────────────┬─────────────────┬───────────┬─────────────────┬───────────┐ > │ Package │ > Installed │ Released │ Latest │ Released > │ Age (y) │ > > Tested with Git Bash, msys2, Powershell 5, Powershell 7 in Windows > Terminal, Powershell 7, Command Prompt. > > But in eshell, I get: > ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿ > ³ Package ³ Installed ³ Released ³ Latest ³ Released > ³ Age (y) ³ > > And in shell: > +------------------------------------------------------------------------------+ > Package Installed Released Latest > Released Age (y) > > > Guess I'll have to dig into encoding in emacs and integration with Windows. > > ┌ and Ú: > > position: 1 of 155 (0%), column: 0 > character: ┌ (displayed as ┌) (codepoint 9484, #o22414, #x250c) > charset: unicode-bmp (Unicode Basic Multilingual Plane > (U+0000..U+FFFF)) > code point in charset: 0x250C > script: symbol > syntax: _ which means: symbol > category: .:Base, P:Haskell symbol constituent > characters, c:Chinese, h:Korean, j:Japanese > to input: type "C-x 8 RET 250c" or "C-x 8 RET BOX > DRAWINGS LIGHT DOWN AND RIGHT" > buffer code: #xE2 #x94 #x8C > file code: #xE2 #x94 #x8C (encoded by coding system utf-8-dos) > display: by this font (glyph code): > harfbuzz:-outline-Iosevka Slab > Regular-regular-normal-normal-mono-24-*-*-*-c-*-iso8859-1 (#x605F) > > Character code properties: customize what to show > name: BOX DRAWINGS LIGHT DOWN AND RIGHT > old-name: FORMS LIGHT DOWN AND RIGHT > general-category: So (Symbol, Other) > decomposition: (9484) ('┌') > > > position: 155 of 1140 (14%), column: 0 > character: Ú (displayed as Ú) (codepoint 218, #o332, #xda) > charset: unicode-bmp (Unicode Basic Multilingual Plane > (U+0000..U+FFFF)) > code point in charset: 0xDA > script: latin > syntax: w which means: word > category: .:Base, L:Strong L2R, j:Japanese, l:Latin, v:Viet > to input: type "C-x 8 RET da" or "C-x 8 RET LATIN > CAPITAL LETTER U WITH ACUTE" > buffer code: #xC3 #x9A > file code: #xC3 #x9A (encoded by coding system utf-8-dos) > display: by this font (glyph code): > harfbuzz:-outline-Iosevka Slab > Regular-regular-normal-normal-mono-24-*-*-*-c-*-iso8859-1 (#x9B) > > Character code properties: customize what to show > name: LATIN CAPITAL LETTER U WITH ACUTE > old-name: LATIN CAPITAL LETTER U ACUTE > general-category: Lu (Letter, Uppercase) > decomposition: (85 769) ('U' '́') > > On Tue, May 14, 2024 at 4:18 PM Eli Zaretskii wrote: > > > > > From: Simen Endsjø > > > Date: Tue, 14 May 2024 15:58:48 +0200 > > > Cc: 70914@debbugs.gnu.org > > > > > > I'm not really sure why I've added these anymore. I've added them over time > > > since 2016 first using Spacemacs, then Doom Emacs. > > > > > > >> ;; Windows doesn't set this, but some packages might depend on the variable > > > >> (setenv "LANG" "en_US") > > > > > > > > The comment is not correct. To see for yourself, ensure LANG is not > > > > set in the system-wide environment, start "emacs -Q", and then type > > > > > > > > M-: (getenv "LANG") RET > > > > > > That's interesting. I usually just { M-x getenv }, and LANG isn't listed there. > > > (getenv "LANG") returns "ENU" though. Looking at the environment variables for > > > the process, I see LANG listed there. How is getenv *not* listing the variable? > > > Has it marked it special somehow and filter it out? > > > > It's a Windows-specific trick: we ad a few environment variables at > > startup such that getenv can access them, but don't want it to appear > > in process-environment explicitly, and so the function that prompts > > for the variable when you invoke getenv interactively doesn't know > > about them. > > > > > > This is a very bad idea, IME. The clipboard on Windows uses UTF-16, > > > > and Emacs knows how to decode it correctly. Customizing > > > > clipboard-coding-system to something else just gets in the way. > > > > > > Probably something I did after changing Windows to use utf-8, which also > > > includes the clipboard. > > > > > > > I don't know where does the comment about latin-1 by default come from > > > > (maybe from Windows 9X days?), but it is not true on Windows for a > > > > very long time. The default value of selection-coding-system on > > > > Windows is utf-16le-dos, you can again verify that in "emacs -Q". > > > > > > Maybe I broke something else when trying to get text to work properly and added > > > that hack as a workaround..? I really have no idea. Don't want to dig through my > > > git commits to find out ;) > > > > > > > Again, I'm not sure this is relevant to the crashes. But it doesn't > > > > do any harm to make your Emacs configuration healthier ;-) > > > > > > Yes, thanks a lot for the help! I'm a bit scared to remove these hacks I've > > > accumulated over time as I probably added them there for a reason though. But > > > hopefully the workarounds was just for some symptoms and not the root cause -- > > > we'll see. > > > > I suggest to remove them, and see if the crashes keep happening. > > > > If removing these hacks make something stop working, describe the > > problems with the details: there are definitely ways to solve them > > without these dangerous customizations.