From unknown Sat Jun 21 10:33:59 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#31149 <31149@debbugs.gnu.org> To: bug#31149 <31149@debbugs.gnu.org> Subject: Status: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text Reply-To: bug#31149 <31149@debbugs.gnu.org> Date: Sat, 21 Jun 2025 17:33:59 +0000 retitle 31149 27.0.50; (gui-get-selection nil 'text/html) returns mis-decod= ed text reassign 31149 emacs submitter 31149 Stefan Monnier severity 31149 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 13 16:55:43 2018 Received: (at submit) by debbugs.gnu.org; 13 Apr 2018 20:55:43 +0000 Received: from localhost ([127.0.0.1]:50986 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75jT-0001yN-5M for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:43 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52185) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75jQ-0001yA-WA for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f75jK-0002TK-6d for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56963) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f75jK-0002TD-2k for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:34 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43128) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f75jI-0000pC-8d for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:55:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f75jE-0002Qq-5V for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:55:32 -0400 Received: from pruche.dit.umontreal.ca ([132.204.246.22]:42346) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f75jD-0002Q6-UD for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:55:28 -0400 Received: from ceviche.home (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w3DKtQDG024538 for ; Fri, 13 Apr 2018 16:55:26 -0400 Received: by ceviche.home (Postfix, from userid 20848) id 37EBF6639A; Fri, 13 Apr 2018 16:55:26 -0400 (EDT) From: Stefan Monnier To: bug-gnu-emacs@gnu.org Subject: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text X-Debbugs-Cc: Lars Ingebrigtsen Date: Fri, 13 Apr 2018 16:55:26 -0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.9 X-NAI-Spam-Rules: 5 Rules triggered BEC_TRC1=0.4, BEC_TRC1_W_GEN_SPAM_FEATRE=0.4, GEN_SPAM_FEATRE=0.1, EDT_SA_DN_PASS=0, RV6264=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6264> : inlines <6560> : streams <1783938> : uri <2625014> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Package: Emacs Version: 27.0.50 (gui-get-selection nil 'text/html) returns utf-16 text when the primary selection is owned by Mozilla, but we decode it as latin-1 instead, so it looks like garbage. I don't know why we're getting utf-16. Is that what standards say it should do? If so, we should adjust our code (which currently knows nothing about the `text/html` target-type). As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be using something else because he's getting something with a `charset` property which I don't get here) because: - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with the property `foreign-selection` set to `STRING` when the actual string type is not known (as opposed to COMPOUND-TEXT and UTF8-STRING, basically). - in gui-get-selection we then have a mapping from `STRING` to `iso-8859-1` (which is apparently the right thing for the official `STRING` target-type in X11). I can't figure out if/where these kinds of things about the X11 selection protocol is described, but at least in `xclip` they have a hack specifically for this case: [...] if (html != None && sel_type == html) { /* if the buffer contains UCS-2 (UTF-16), convert to * UTF-8. Mozilla-based browsers do this for the * text/html target. */ [...] and according to the subsequent code it's not even always the same endianness. I don't know what is the difference between the `target-type` passed to x-get-selection-internal and the `foreign-selection` property we get on the returned string (they seem to be the same in my tests, except when the type is not one of the known ones, and where we then force `foreign-selection` to be `STRING`). Stefan In GNU Emacs 27.0.50 (build 1, i686-pc-linux-gnu, GTK+ Version 2.24.32) of 2018-03-23 built on ceviche Repository revision: ef4cd3805771e2cccd395d0f0b35f56816940508 Windowing system distributor 'The X.Org Foundation', version 11.0.11906000 System Description: Debian GNU/Linux buster/sid Recent messages: Saving file /home/monnier/src/emacs/work/src/xselect.c... Wrote /home/monnier/src/emacs/work/src/xselect.c Mark set user-error: Minibuffer window is not active Mark set Mark saved where search started Mark set Making completion list... [2 times] Quit [2 times] Mark set Configured using: 'configure -C --enable-checking --with-modules --enable-check-lisp-object-type 'CFLAGS=-Wall -g3 -Og -Wno-pointer-sign' PKG_CONFIG_PATH=/home/monnier/lib/pkgconfig' Configured features: XPM JPEG TIFF GIF PNG RSVG SOUND GPM DBUS GSETTINGS NOTIFY GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK2 X11 MODULES THREADS Important settings: value of $LANG: fr_CH.UTF-8 locale-coding-system: utf-8-unix Major mode: InactiveMinibuffer Minor modes in effect: csv-field-index-mode: t shell-dirtrack-mode: t diff-auto-refine-mode: t electric-pair-mode: t global-reveal-mode: t reveal-mode: t auto-insert-mode: t savehist-mode: t minibuffer-electric-default-mode: t global-compact-docstrings-mode: t url-handler-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t global-prettify-symbols-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: /home/monnier/src/emacs/elpa/packages/svg/svg hides /home/monnier/src/emacs/work/lisp/svg /home/monnier/src/emacs/elpa/packages/ada-mode/ada-xref hides /home/monnier/src/emacs/work/lisp/progmodes/ada-xref /home/monnier/src/emacs/elpa/packages/ada-mode/ada-mode hides /home/monnier/src/emacs/work/lisp/progmodes/ada-mode /home/monnier/src/emacs/elpa/packages/ada-mode/ada-stmt hides /home/monnier/src/emacs/work/lisp/progmodes/ada-stmt /home/monnier/src/emacs/elpa/packages/ada-mode/ada-prj hides /home/monnier/src/emacs/work/lisp/progmodes/ada-prj /home/monnier/src/emacs/elpa/packages/hyperbole/set hides /home/monnier/src/emacs/work/lisp/emacs-lisp/set /home/monnier/src/emacs/elpa/packages/landmark/landmark hides /home/monnier/src/emacs/work/lisp/obsolete/landmark /home/monnier/src/emacs/elpa/packages/crisp/crisp hides /home/monnier/src/emacs/work/lisp/obsolete/crisp Features: (mule-diag csv-mode mailcap reporter debian-bug debian-el-loaddefs image-file iimage skeleton html5-schema rng-xsd xsd-regexp rng-cmpct rng-nxml nxml-mode nxml-outln nxml-rap sgml-mode dom reftex-dcr reftex reftex-loaddefs reftex-vars latexenc sort mail-extr emacsbug tildify rst rng-valid refer refer-to-bibtex refbib printing picture nroff-mode enriched ebnf2ps ps-print ps-print-loaddefs ps-def lpr delim-col bib-mode view cal-china lunar solar cal-dst cal-bahai cal-islam cal-hebrew holidays hol-loaddefs cal-french diary-lib diary-loaddefs cal-move battery log-view srecode/document semantic/doc srecode/semantic semantic/senator semantic/decorate semantic/ctxt semantic/format srecode/extract srecode/insert srecode/filters srecode/find srecode/map srecode/ctxt semantic/tag-ls semantic/find srecode/compile semantic/util-modes semantic/util semantic semantic/tag semantic/lex semantic/fw srecode/args ede/speedbar ede/files ede ede/detect ede/base ede/auto ede/source eieio-speedbar eieio-custom cedet srecode/dictionary srecode/table eieio-base srecode mode-local informat texinfo tex-mode vc-dir grep rect gdb-mi bindat gud ffap cl-print ox-odt rng-loc rng-uri rng-parse rng-match rng-dt rng-util rng-pttrn nxml-parse nxml-ns nxml-enc xmltok nxml-util ox-latex ox-icalendar ox-html table ox-ascii ox-publish ox org-protocol org-mouse org-mobile org-agenda org-indent org-feed org-crypt org-capture org-attach org-id org-rmail org-mhe org-irc org-info org-gnus nnir gnus-sum gnus-group gnus-undo gnus-start gnus-cloud nnimap nnmail mail-source tls gnutls utf7 netrc nnoo parse-time gnus-spec gnus-int gnus-range gnus-win gnus nnheader org-docview org-bibtex bibtex org-bbdb org-w3m org-element avl-tree generator org org-macro org-footnote org-pcomplete org-list org-faces org-entities org-version ob-emacs-lisp ob ob-tangle org-src ob-ref ob-lob ob-table ob-keys ob-exp ob-comint ob-core ob-eval org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs autorevert filenotify doc-view jka-compr image-mode vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs dabbrev log-edit message sendmail rmc puny dired dired-loaddefs format-spec rfc822 mml mml-sec gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils mailheader pcvs-util bug-reference add-log sh-script make-mode autoload shell pcomplete pulse etags xref project epa-file epa derived epg sm-c-mode smie whitespace misearch multi-isearch eieio-opt speedbar sb-image ezimage dframe cl-extra help-fns radix-tree executable copyright lisp-mnt xscheme unsafep trace testcover shadow scheme re-builder profiler inf-lisp ielm gmm-utils ert pp find-func ewoc debug elp edebug cl-indent cus-edit cus-start cus-load wid-edit vc vc-dispatcher smerge-mode vc-git diff-mode filecache server time-date flymake-proc flymake compile comint ansi-color ring warnings noutline outline easy-mmode flyspell ispell checkdoc thingatpt help-mode load-dir elec-pair reveal autoinsert proof-site proof-autoloads cl pg-vars savehist minibuf-eldef disp-table compact-docstrings cl-seq inline kotl-autoloads advice info realgud-recursive-autoloads finder-inf url-auth package easymenu epg-config url-handlers url-parse auth-source eieio eieio-core cl-macs eieio-loaddefs password-cache json map url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote dbusbind inotify dynamic-setting system-font-setting font-render-setting move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 8 904625 146270) (symbols 24 56914 156) (miscs 20 15608 1993) (strings 16 269351 14086) (string-bytes 1 8339699) (vectors 12 109056) (vector-slots 4 3333709 279700) (floats 8 1341 1410) (intervals 28 57426 412) (buffers 536 153)) From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 13 17:05:51 2018 Received: (at 31149) by debbugs.gnu.org; 13 Apr 2018 21:05:51 +0000 Received: from localhost ([127.0.0.1]:51002 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75tG-0002Df-Q7 for submit@debbugs.gnu.org; Fri, 13 Apr 2018 17:05:50 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:46133) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75tE-0002DU-5s for 31149@debbugs.gnu.org; Fri, 13 Apr 2018 17:05:49 -0400 Received: from 46.67.12.60.tmi.telenormobil.no ([46.67.12.60] helo=corrigan) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1f75t7-0004Wl-1L; Fri, 13 Apr 2018 23:05:43 +0200 Received: from larsi by corrigan with local (Exim 4.89) (envelope-from ) id 1f75t1-0002xg-2N; Fri, 13 Apr 2018 23:05:35 +0200 From: Lars Ingebrigtsen To: Stefan Monnier Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: Date: Fri, 13 Apr 2018 23:05:34 +0200 In-Reply-To: (Stefan Monnier's message of "Fri, 13 Apr 2018 16:55:26 -0400") Message-ID: <871sfizujl.fsf@mouse.gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Stefan Monnier writes: > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > using something else because he's getting something with a `charset` > property which I don't get here) because: I'm also running under GNU/Linux -- it's the latest Debian (9, which is... stretch?), but not with Gnome. Instead I'm using xfce -- I guess Gnome could get involved with the selection stuff somehow. Another data point: If I select some HTML in Chromium, (gui-get-selection nil 'text/html) returns nil. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 14 02:33:18 2018 Received: (at 31149) by debbugs.gnu.org; 14 Apr 2018 06:33:18 +0000 Received: from localhost ([127.0.0.1]:51287 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f7EkN-00047B-6D for submit@debbugs.gnu.org; Sat, 14 Apr 2018 02:33:18 -0400 Received: from eggs.gnu.org ([208.118.235.92]:56266) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f7EkH-00046u-SC for 31149@debbugs.gnu.org; Sat, 14 Apr 2018 02:33:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f7Ek8-0004Qy-RC for 31149@debbugs.gnu.org; Sat, 14 Apr 2018 02:33:04 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43386) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7Ejr-0004M7-M9; Sat, 14 Apr 2018 02:32:43 -0400 Received: from [176.228.60.248] (port=3659 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1f7Ejq-0003dN-56; Sat, 14 Apr 2018 02:32:42 -0400 Date: Sat, 14 Apr 2018 09:32:41 +0300 Message-Id: <83vacu47sm.fsf@gnu.org> From: Eli Zaretskii To: Stefan Monnier , Kenichi Handa In-reply-to: (message from Stefan Monnier on Fri, 13 Apr 2018 16:55:26 -0400) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31149 Cc: larsi@gnus.org, 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) > From: Stefan Monnier > Date: Fri, 13 Apr 2018 16:55:26 -0400 > Cc: Lars Ingebrigtsen > > (gui-get-selection nil 'text/html) > > returns utf-16 text when the primary selection is owned by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage. > > I don't know why we're getting utf-16. Is that what standards say it > should do? If so, we should adjust our code (which currently knows > nothing about the `text/html` target-type). > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > using something else because he's getting something with a `charset` > property which I don't get here) because: > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with > the property `foreign-selection` set to `STRING` when the actual > string type is not known (as opposed to COMPOUND-TEXT and > UTF8-STRING, basically). > - in gui-get-selection we then have a mapping from `STRING` to > `iso-8859-1` (which is apparently the right thing for the official > `STRING` target-type in X11). > > I can't figure out if/where these kinds of things about the X11 > selection protocol is described, but at least in `xclip` they have > a hack specifically for this case: > > [...] > if (html != None && sel_type == html) { > /* if the buffer contains UCS-2 (UTF-16), convert to > * UTF-8. Mozilla-based browsers do this for the > * text/html target. > */ > [...] > > and according to the subsequent code it's not even always the > same endianness. > > I don't know what is the difference between the `target-type` passed to > x-get-selection-internal and the `foreign-selection` property we get on > the returned string (they seem to be the same in my tests, except when > the type is not one of the known ones, and where we then force > `foreign-selection` to be `STRING`). I Hope Handa-san (CC'ed) could comment on this. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 24 14:11:39 2018 Received: (at 31149) by debbugs.gnu.org; 24 Apr 2018 18:11:39 +0000 Received: from localhost ([127.0.0.1]:38410 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fB2Pg-0006gD-3a for submit@debbugs.gnu.org; Tue, 24 Apr 2018 14:11:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49775) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fB2Pb-0006fv-6d for 31149@debbugs.gnu.org; Tue, 24 Apr 2018 14:11:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fB2PS-0004pU-Ao for 31149@debbugs.gnu.org; Tue, 24 Apr 2018 14:11:26 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41972) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fB2PN-0004nA-HI; Tue, 24 Apr 2018 14:11:17 -0400 Received: from [176.228.60.248] (port=2228 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fB2PM-0008Lk-UD; Tue, 24 Apr 2018 14:11:17 -0400 Date: Tue, 24 Apr 2018 21:11:10 +0300 Message-Id: <83zi1sv5j5.fsf@gnu.org> From: Eli Zaretskii To: Kenichi Handa In-reply-to: <83vacu47sm.fsf@gnu.org> (message from Eli Zaretskii on Sat, 14 Apr 2018 09:32:41 +0300) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <83vacu47sm.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31149 Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Ping! > Date: Sat, 14 Apr 2018 09:32:41 +0300 > From: Eli Zaretskii > Cc: larsi@gnus.org, 31149@debbugs.gnu.org > > > From: Stefan Monnier > > Date: Fri, 13 Apr 2018 16:55:26 -0400 > > Cc: Lars Ingebrigtsen > > > > (gui-get-selection nil 'text/html) > > > > returns utf-16 text when the primary selection is owned by Mozilla, but > > we decode it as latin-1 instead, so it looks like garbage. > > > > I don't know why we're getting utf-16. Is that what standards say it > > should do? If so, we should adjust our code (which currently knows > > nothing about the `text/html` target-type). > > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > > using something else because he's getting something with a `charset` > > property which I don't get here) because: > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with > > the property `foreign-selection` set to `STRING` when the actual > > string type is not known (as opposed to COMPOUND-TEXT and > > UTF8-STRING, basically). > > - in gui-get-selection we then have a mapping from `STRING` to > > `iso-8859-1` (which is apparently the right thing for the official > > `STRING` target-type in X11). > > > > I can't figure out if/where these kinds of things about the X11 > > selection protocol is described, but at least in `xclip` they have > > a hack specifically for this case: > > > > [...] > > if (html != None && sel_type == html) { > > /* if the buffer contains UCS-2 (UTF-16), convert to > > * UTF-8. Mozilla-based browsers do this for the > > * text/html target. > > */ > > [...] > > > > and according to the subsequent code it's not even always the > > same endianness. > > > > I don't know what is the difference between the `target-type` passed to > > x-get-selection-internal and the `foreign-selection` property we get on > > the returned string (they seem to be the same in my tests, except when > > the type is not one of the known ones, and where we then force > > `foreign-selection` to be `STRING`). > > I hope Handa-san (CC'ed) could comment on this. From debbugs-submit-bounces@debbugs.gnu.org Sat May 05 05:37:57 2018 Received: (at 31149) by debbugs.gnu.org; 5 May 2018 09:37:57 +0000 Received: from localhost ([127.0.0.1]:50241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fEtdZ-0000Tt-PY for submit@debbugs.gnu.org; Sat, 05 May 2018 05:37:57 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52898) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fEtdV-0000Td-9J for 31149@debbugs.gnu.org; Sat, 05 May 2018 05:37:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fEtdM-0002hQ-93 for 31149@debbugs.gnu.org; Sat, 05 May 2018 05:37:44 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:60729) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fEtd7-0002cB-3g; Sat, 05 May 2018 05:37:25 -0400 Received: from [176.228.60.248] (port=3077 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fEtd6-0007YA-6i; Sat, 05 May 2018 05:37:24 -0400 Date: Sat, 05 May 2018 12:37:24 +0300 Message-Id: <83h8nmsasr.fsf@gnu.org> From: Eli Zaretskii To: Kenichi Handa In-reply-to: <83zi1sv5j5.fsf@gnu.org> (message from Eli Zaretskii on Tue, 24 Apr 2018 21:11:10 +0300) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <83vacu47sm.fsf@gnu.org> <83zi1sv5j5.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31149 Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Ping! Ping! > Date: Tue, 24 Apr 2018 21:11:10 +0300 > From: Eli Zaretskii > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > Ping! > > > Date: Sat, 14 Apr 2018 09:32:41 +0300 > > From: Eli Zaretskii > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org > > > > > From: Stefan Monnier > > > Date: Fri, 13 Apr 2018 16:55:26 -0400 > > > Cc: Lars Ingebrigtsen > > > > > > (gui-get-selection nil 'text/html) > > > > > > returns utf-16 text when the primary selection is owned by Mozilla, but > > > we decode it as latin-1 instead, so it looks like garbage. > > > > > > I don't know why we're getting utf-16. Is that what standards say it > > > should do? If so, we should adjust our code (which currently knows > > > nothing about the `text/html` target-type). > > > > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > > > using something else because he's getting something with a `charset` > > > property which I don't get here) because: > > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with > > > the property `foreign-selection` set to `STRING` when the actual > > > string type is not known (as opposed to COMPOUND-TEXT and > > > UTF8-STRING, basically). > > > - in gui-get-selection we then have a mapping from `STRING` to > > > `iso-8859-1` (which is apparently the right thing for the official > > > `STRING` target-type in X11). > > > > > > I can't figure out if/where these kinds of things about the X11 > > > selection protocol is described, but at least in `xclip` they have > > > a hack specifically for this case: > > > > > > [...] > > > if (html != None && sel_type == html) { > > > /* if the buffer contains UCS-2 (UTF-16), convert to > > > * UTF-8. Mozilla-based browsers do this for the > > > * text/html target. > > > */ > > > [...] > > > > > > and according to the subsequent code it's not even always the > > > same endianness. > > > > > > I don't know what is the difference between the `target-type` passed to > > > x-get-selection-internal and the `foreign-selection` property we get on > > > the returned string (they seem to be the same in my tests, except when > > > the type is not one of the known ones, and where we then force > > > `foreign-selection` to be `STRING`). > > > > I hope Handa-san (CC'ed) could comment on this. > > > > From debbugs-submit-bounces@debbugs.gnu.org Fri May 11 05:18:47 2018 Received: (at 31149) by debbugs.gnu.org; 11 May 2018 09:18:47 +0000 Received: from localhost ([127.0.0.1]:57130 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fH4CK-0005pt-2j for submit@debbugs.gnu.org; Fri, 11 May 2018 05:18:47 -0400 Received: from eggs.gnu.org ([208.118.235.92]:33453) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fH4CG-0005pe-2O for 31149@debbugs.gnu.org; Fri, 11 May 2018 05:18:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fH4C6-0006Pj-Vg for 31149@debbugs.gnu.org; Fri, 11 May 2018 05:18:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50597) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fH4Bs-0006Iu-GI; Fri, 11 May 2018 05:18:16 -0400 Received: from [176.228.60.248] (port=2058 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fH4Br-0000OE-TL; Fri, 11 May 2018 05:18:16 -0400 Date: Fri, 11 May 2018 12:18:13 +0300 Message-Id: <83vabuo8iy.fsf@gnu.org> From: Eli Zaretskii To: Kenichi Handa In-reply-to: <83h8nmsasr.fsf@gnu.org> (message from Eli Zaretskii on Sat, 05 May 2018 12:37:24 +0300) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <83vacu47sm.fsf@gnu.org> <83zi1sv5j5.fsf@gnu.org> <83h8nmsasr.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31149 Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Ping! Ping! Ping! > Date: Sat, 05 May 2018 12:37:24 +0300 > From: Eli Zaretskii > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > Ping! Ping! > > > Date: Tue, 24 Apr 2018 21:11:10 +0300 > > From: Eli Zaretskii > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > > > Ping! > > > > > Date: Sat, 14 Apr 2018 09:32:41 +0300 > > > From: Eli Zaretskii > > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org > > > > > > > From: Stefan Monnier > > > > Date: Fri, 13 Apr 2018 16:55:26 -0400 > > > > Cc: Lars Ingebrigtsen > > > > > > > > (gui-get-selection nil 'text/html) > > > > > > > > returns utf-16 text when the primary selection is owned by Mozilla, but > > > > we decode it as latin-1 instead, so it looks like garbage. > > > > > > > > I don't know why we're getting utf-16. Is that what standards say it > > > > should do? If so, we should adjust our code (which currently knows > > > > nothing about the `text/html` target-type). > > > > > > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > > > > using something else because he's getting something with a `charset` > > > > property which I don't get here) because: > > > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with > > > > the property `foreign-selection` set to `STRING` when the actual > > > > string type is not known (as opposed to COMPOUND-TEXT and > > > > UTF8-STRING, basically). > > > > - in gui-get-selection we then have a mapping from `STRING` to > > > > `iso-8859-1` (which is apparently the right thing for the official > > > > `STRING` target-type in X11). > > > > > > > > I can't figure out if/where these kinds of things about the X11 > > > > selection protocol is described, but at least in `xclip` they have > > > > a hack specifically for this case: > > > > > > > > [...] > > > > if (html != None && sel_type == html) { > > > > /* if the buffer contains UCS-2 (UTF-16), convert to > > > > * UTF-8. Mozilla-based browsers do this for the > > > > * text/html target. > > > > */ > > > > [...] > > > > > > > > and according to the subsequent code it's not even always the > > > > same endianness. > > > > > > > > I don't know what is the difference between the `target-type` passed to > > > > x-get-selection-internal and the `foreign-selection` property we get on > > > > the returned string (they seem to be the same in my tests, except when > > > > the type is not one of the known ones, and where we then force > > > > `foreign-selection` to be `STRING`). > > > > > > I hope Handa-san (CC'ed) could comment on this. > > From debbugs-submit-bounces@debbugs.gnu.org Sat May 19 04:51:07 2018 Received: (at 31149) by debbugs.gnu.org; 19 May 2018 08:51:07 +0000 Received: from localhost ([127.0.0.1]:40482 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJxZy-0007Vc-0c for submit@debbugs.gnu.org; Sat, 19 May 2018 04:51:07 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50004) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJxZt-0007V6-FN for 31149@debbugs.gnu.org; Sat, 19 May 2018 04:51:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fJxZl-0006Bf-0r for 31149@debbugs.gnu.org; Sat, 19 May 2018 04:50:56 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42936) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJxZX-00068b-Bx; Sat, 19 May 2018 04:50:39 -0400 Received: from [176.228.60.248] (port=1950 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fJxZW-0007Yr-MB; Sat, 19 May 2018 04:50:39 -0400 Date: Sat, 19 May 2018 11:50:37 +0300 Message-Id: <83po1sghb6.fsf@gnu.org> From: Eli Zaretskii To: Kenichi Handa In-reply-to: <83vabuo8iy.fsf@gnu.org> (message from Eli Zaretskii on Fri, 11 May 2018 12:18:13 +0300) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <83vacu47sm.fsf@gnu.org> <83zi1sv5j5.fsf@gnu.org> <83h8nmsasr.fsf@gnu.org> <83vabuo8iy.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31149 Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Ping! Ping! Ping! Ping! > Date: Fri, 11 May 2018 12:18:13 +0300 > From: Eli Zaretskii > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > Ping! Ping! Ping! > > > Date: Sat, 05 May 2018 12:37:24 +0300 > > From: Eli Zaretskii > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > > > Ping! Ping! > > > > > Date: Tue, 24 Apr 2018 21:11:10 +0300 > > > From: Eli Zaretskii > > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > > > > > Ping! > > > > > > > Date: Sat, 14 Apr 2018 09:32:41 +0300 > > > > From: Eli Zaretskii > > > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org > > > > > > > > > From: Stefan Monnier > > > > > Date: Fri, 13 Apr 2018 16:55:26 -0400 > > > > > Cc: Lars Ingebrigtsen > > > > > > > > > > (gui-get-selection nil 'text/html) > > > > > > > > > > returns utf-16 text when the primary selection is owned by Mozilla, but > > > > > we decode it as latin-1 instead, so it looks like garbage. > > > > > > > > > > I don't know why we're getting utf-16. Is that what standards say it > > > > > should do? If so, we should adjust our code (which currently knows > > > > > nothing about the `text/html` target-type). > > > > > > > > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > > > > > using something else because he's getting something with a `charset` > > > > > property which I don't get here) because: > > > > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with > > > > > the property `foreign-selection` set to `STRING` when the actual > > > > > string type is not known (as opposed to COMPOUND-TEXT and > > > > > UTF8-STRING, basically). > > > > > - in gui-get-selection we then have a mapping from `STRING` to > > > > > `iso-8859-1` (which is apparently the right thing for the official > > > > > `STRING` target-type in X11). > > > > > > > > > > I can't figure out if/where these kinds of things about the X11 > > > > > selection protocol is described, but at least in `xclip` they have > > > > > a hack specifically for this case: > > > > > > > > > > [...] > > > > > if (html != None && sel_type == html) { > > > > > /* if the buffer contains UCS-2 (UTF-16), convert to > > > > > * UTF-8. Mozilla-based browsers do this for the > > > > > * text/html target. > > > > > */ > > > > > [...] > > > > > > > > > > and according to the subsequent code it's not even always the > > > > > same endianness. > > > > > > > > > > I don't know what is the difference between the `target-type` passed to > > > > > x-get-selection-internal and the `foreign-selection` property we get on > > > > > the returned string (they seem to be the same in my tests, except when > > > > > the type is not one of the known ones, and where we then force > > > > > `foreign-selection` to be `STRING`). > > > > > > > > I hope Handa-san (CC'ed) could comment on this. > > > > > > > From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 04:44:56 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 08:44:56 +0000 Received: from localhost ([127.0.0.1]:52216 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUp6-0001pt-32 for submit@debbugs.gnu.org; Sun, 29 Sep 2019 04:44:56 -0400 Received: from quimby.gnus.org ([80.91.231.51]:48908) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUp4-0001pj-3z for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 04:44:55 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEUoy-0006ej-Em; Sun, 29 Sep 2019 10:44:50 +0200 From: Lars Ingebrigtsen To: Stefan Monnier Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: Date: Sun, 29 Sep 2019 10:44:48 +0200 In-Reply-To: (Stefan Monnier's message of "Fri, 13 Apr 2018 16:55:26 -0400") Message-ID: <87h84vqynz.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Stefan Monnier writes: > (gui-get-selection nil 'text/html) > > returns utf-16 text when the primary selection is owned by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Stefan Monnier writes: > (gui-get-selection nil 'text/html) > > returns utf-16 text when the primary selection is owned by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage. This is still the case on the trunk: #("=C3=BF=C3=BEM^@e^@r^@g^@e^@d^@" 0 14 (foreign-selection STRING charset i= so-8859-1)) [...] > I can't figure out if/where these kinds of things about the X11 > selection protocol is described, but at least in `xclip` they have > a hack specifically for this case: > > [...] > if (html !=3D None && sel_type =3D=3D html) { > /* if the buffer contains UCS-2 (UTF-16), convert to > * UTF-8. Mozilla-based browsers do this for the > * text/html target. > */ > [...] > > and according to the subsequent code it's not even always the > same endianness. I think it would make sense for us to do the same here. It should be easy enough for us to detect that the string is utf-16, I think? The data has a BOM and everything... --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 05:32:12 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 09:32:12 +0000 Received: from localhost ([127.0.0.1]:52240 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEVYq-00050N-4m for submit@debbugs.gnu.org; Sun, 29 Sep 2019 05:32:12 -0400 Received: from eggs.gnu.org ([209.51.188.92]:53694) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEVYo-000508-Hz for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 05:32:10 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:46111) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1iEVYi-0007KY-VZ; Sun, 29 Sep 2019 05:32:05 -0400 Received: from [176.228.60.248] (port=2601 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1iEVYi-00038j-6w; Sun, 29 Sep 2019 05:32:04 -0400 Date: Sun, 29 Sep 2019 12:31:58 +0300 Message-Id: <83o8z3fnxt.fsf@gnu.org> From: Eli Zaretskii To: Lars Ingebrigtsen In-reply-to: <87h84vqynz.fsf@gnus.org> (message from Lars Ingebrigtsen on Sun, 29 Sep 2019 10:44:48 +0200) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87h84vqynz.fsf@gnus.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Lars Ingebrigtsen > Date: Sun, 29 Sep 2019 10:44:48 +0200 > Cc: 31149@debbugs.gnu.org > > > if (html != None && sel_type == html) { > > /* if the buffer contains UCS-2 (UTF-16), convert to > > * UTF-8. Mozilla-based browsers do this for the > > * text/html target. > > */ > > [...] > > > > and according to the subsequent code it's not even always the > > same endianness. > > I think it would make sense for us to do the same here. It should be > easy enough for us to detect that the string is utf-16, I think? I think you want to use auto-coding-regexp-alist-lookup. > The data has a BOM Does it? It doesn't have to, at least not in principle. From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 05:37:35 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 09:37:35 +0000 Received: from localhost ([127.0.0.1]:52250 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEVe3-00058S-3C for submit@debbugs.gnu.org; Sun, 29 Sep 2019 05:37:35 -0400 Received: from quimby.gnus.org ([80.91.231.51]:49922) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEVe1-00058J-9Z for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 05:37:33 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEVdx-00076h-0R; Sun, 29 Sep 2019 11:37:31 +0200 From: Lars Ingebrigtsen To: Eli Zaretskii Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87h84vqynz.fsf@gnus.org> <83o8z3fnxt.fsf@gnu.org> Date: Sun, 29 Sep 2019 11:37:28 +0200 In-Reply-To: <83o8z3fnxt.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 29 Sep 2019 12:31:58 +0300") Message-ID: <87y2y7phnr.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Eli Zaretskii writes: >> I think it would make sense for us to do the same here. It should be >> easy enough for us to detect that the string is utf-16, I think? > > I think you want to use auto-coding-regexp-alist-lookup. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> I think it would make sense for us to do the same here. It should be >> easy enough for us to detect that the string is utf-16, I think? > > I think you want to use auto-coding-regexp-alist-lookup. Ah, thanks. So should I go ahead and make this change? It looks pretty trivial, but I guess there could be interop problems with code that assumes the current odd behaviour. >> The data has a BOM > > Does it? It doesn't have to, at least not in principle. It doesn't have to, but it always does on the systems I've tested on. But I guess it doesn't matter, `auto-coding-regexp-alist-lookup' will figure it out in any case... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 05:52:33 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 09:52:33 +0000 Received: from localhost ([127.0.0.1]:52256 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEVsX-0005WH-GR for submit@debbugs.gnu.org; Sun, 29 Sep 2019 05:52:33 -0400 Received: from eggs.gnu.org ([209.51.188.92]:55002) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEVsV-0005W2-LU for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 05:52:32 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:46195) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1iEVsQ-0002TB-7i; Sun, 29 Sep 2019 05:52:26 -0400 Received: from [176.228.60.248] (port=3837 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1iEVsN-0006ia-LV; Sun, 29 Sep 2019 05:52:24 -0400 Date: Sun, 29 Sep 2019 12:52:19 +0300 Message-Id: <83lfu7fmzw.fsf@gnu.org> From: Eli Zaretskii To: Lars Ingebrigtsen In-reply-to: <87y2y7phnr.fsf@gnus.org> (message from Lars Ingebrigtsen on Sun, 29 Sep 2019 11:37:28 +0200) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87h84vqynz.fsf@gnus.org> <83o8z3fnxt.fsf@gnu.org> <87y2y7phnr.fsf@gnus.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Lars Ingebrigtsen > Cc: monnier@IRO.UMontreal.CA, 31149@debbugs.gnu.org > Date: Sun, 29 Sep 2019 11:37:28 +0200 > > > I think you want to use auto-coding-regexp-alist-lookup. > > Ah, thanks. So should I go ahead and make this change? It looks pretty > trivial, but I guess there could be interop problems with code that > assumes the current odd behaviour. What odd behavior is that? I understood that we just display binary garbage, something that no one should miss. From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 06:02:51 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 10:02:51 +0000 Received: from localhost ([127.0.0.1]:52262 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEW2V-0005oB-JX for submit@debbugs.gnu.org; Sun, 29 Sep 2019 06:02:51 -0400 Received: from quimby.gnus.org ([80.91.231.51]:50540) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEW2T-0005o2-1X for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 06:02:49 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEW2N-0007LY-08; Sun, 29 Sep 2019 12:02:45 +0200 From: Lars Ingebrigtsen To: Eli Zaretskii Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87h84vqynz.fsf@gnus.org> <83o8z3fnxt.fsf@gnu.org> <87y2y7phnr.fsf@gnus.org> <83lfu7fmzw.fsf@gnu.org> Date: Sun, 29 Sep 2019 12:02:42 +0200 In-Reply-To: <83lfu7fmzw.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 29 Sep 2019 12:52:19 +0300") Message-ID: <87tv8vpghp.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Eli Zaretskii writes: >> Ah, thanks. So should I go ahead and make this change? It looks pretty >> trivial, but I guess there could be interop problems with code that >> assumes the current odd behaviour. > > What odd beha [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> Ah, thanks. So should I go ahead and make this change? It looks pretty >> trivial, but I guess there could be interop problems with code that >> assumes the current odd behaviour. > > What odd behavior is that? I understood that we just display binary > garbage, something that no one should miss. We don't have any commands to yank HTML, so we don't display anything, but I've got code like the following in one of my out-of-tree packages (which will fail after the fix). I'm with that, though, but I have no idea how much other people would be impacted. (defun ewp-yank-html () [...] (let ((data (loop for type in '(PRIMARY CLIPBOARD) for data = (x-get-selection-internal type 'text/html) [...] ;; Somehow the selection is UTF-16 when selecting text in ;; Firefox. (decode-coding-string data 'utf-16-le) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 06:21:27 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 10:21:27 +0000 Received: from localhost ([127.0.0.1]:52280 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEWKV-0006Gf-8H for submit@debbugs.gnu.org; Sun, 29 Sep 2019 06:21:27 -0400 Received: from eggs.gnu.org ([209.51.188.92]:58134) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEWKT-0006GM-6y for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 06:21:25 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:46368) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1iEWKN-00016h-Rc; Sun, 29 Sep 2019 06:21:19 -0400 Received: from [176.228.60.248] (port=1623 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1iEWKN-0000Dz-9m; Sun, 29 Sep 2019 06:21:19 -0400 Date: Sun, 29 Sep 2019 13:21:13 +0300 Message-Id: <83k19rflnq.fsf@gnu.org> From: Eli Zaretskii To: Lars Ingebrigtsen In-reply-to: <87tv8vpghp.fsf@gnus.org> (message from Lars Ingebrigtsen on Sun, 29 Sep 2019 12:02:42 +0200) Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87h84vqynz.fsf@gnus.org> <83o8z3fnxt.fsf@gnu.org> <87y2y7phnr.fsf@gnus.org> <83lfu7fmzw.fsf@gnu.org> <87tv8vpghp.fsf@gnus.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Lars Ingebrigtsen > Cc: monnier@IRO.UMontreal.CA, 31149@debbugs.gnu.org > Date: Sun, 29 Sep 2019 12:02:42 +0200 > > (defun ewp-yank-html () > > [...] > > (let ((data (loop for type in '(PRIMARY CLIPBOARD) > for data = (x-get-selection-internal type 'text/html) > > [...] > > ;; Somehow the selection is UTF-16 when selecting text in > ;; Firefox. > (decode-coding-string data 'utf-16-le) And you do that for _every_ text you get from the clipboard? Or do you have some means of detecting those selected in Firefox? In general, if the text is already decoded, I don't expect this extra decoding to do anything bad, does it? From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 07:48:41 2019 Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 11:48:41 +0000 Received: from localhost ([127.0.0.1]:52454 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEXgt-0002EE-3u for submit@debbugs.gnu.org; Sun, 29 Sep 2019 07:48:41 -0400 Received: from quimby.gnus.org ([80.91.231.51]:52458) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEXgq-0002E5-Qf for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 07:48:37 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEXgl-0008Uc-Vr; Sun, 29 Sep 2019 13:48:34 +0200 From: Lars Ingebrigtsen To: Eli Zaretskii Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87h84vqynz.fsf@gnus.org> <83o8z3fnxt.fsf@gnu.org> <87y2y7phnr.fsf@gnus.org> <83lfu7fmzw.fsf@gnu.org> <87tv8vpghp.fsf@gnus.org> <83k19rflnq.fsf@gnu.org> Date: Sun, 29 Sep 2019 13:48:31 +0200 In-Reply-To: <83k19rflnq.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 29 Sep 2019 13:21:13 +0300") Message-ID: <87blv3nx0w.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Eli Zaretskii writes: >> ; ; Somehow the selection is UTF-16 when selecting text in >> ;; Firefox. >> (decode-coding-string data 'utf-16-le) > > And you do that for _every_ text you get from the clipboard? Or do > you have [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> ;; Somehow the selection is UTF-16 when selecting text in >> ;; Firefox. >> (decode-coding-string data 'utf-16-le) > > And you do that for _every_ text you get from the clipboard? Or do > you have some means of detecting those selected in Firefox? In this package, I'm assuming all the text comes from Firefox because that's what I use. :-) > In general, if the text is already decoded, I don't expect this extra > decoding to do anything bad, does it? That's true, so perhaps this isn't a problem: (decode-coding-string "f=C3=B3o" 'utf-16-le) =3D> "f=C3=B3o" --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 07 20:07:53 2021 Received: (at 31149) by debbugs.gnu.org; 8 Nov 2021 01:07:53 +0000 Received: from localhost ([127.0.0.1]:55359 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mjt8b-0007mE-B9 for submit@debbugs.gnu.org; Sun, 07 Nov 2021 20:07:53 -0500 Received: from quimby.gnus.org ([95.216.78.240]:40800) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mjt8R-0007lo-4X for 31149@debbugs.gnu.org; Sun, 07 Nov 2021 20:07:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=7Xr4h7slCLeSriXubvcY5lU5rm46SdKS7QX+w4Gf+nw=; b=smlFf2FY10wvqjAx4pMhOCRBQ0 GM2MbvZua9bY6BVimqOdhPKzjUqr3ZTxF5RIf30sFz6FuvQq6cdk/t7+NKDICagpCQWIUvxKS7LPx rx32HeaU4blrNcZF5BctUCEVG8/6WSFN54E2Zak+pY15Efcj0KNGFk/tH8NNhLmEBYdA=; Received: from [84.212.220.105] (helo=elva) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjt8H-0005Q7-Dg; Mon, 08 Nov 2021 02:07:35 +0100 From: Lars Ingebrigtsen To: Stefan Monnier Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: X-Now-Playing: Dntel's _Away_: "No Common" Date: Mon, 08 Nov 2021 02:07:32 +0100 In-Reply-To: (Stefan Monnier's message of "Fri, 13 Apr 2018 16:55:26 -0400") Message-ID: <87ee7rcv57.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Stefan Monnier writes: > (gui-get-selection nil 'text/html) > > returns utf-16 text when the primary selection is owned by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Stefan Monnier writes: > (gui-get-selection nil 'text/html) > > returns utf-16 text when the primary selection is owned by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage. This should now be fixed on the trunk, and hopefully I didn't regress anything, but I have not regressed anything. (I've only tested on Debian and Macos.) Let me know whether I broke something. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 07 20:07:54 2021 Received: (at control) by debbugs.gnu.org; 8 Nov 2021 01:07:55 +0000 Received: from localhost ([127.0.0.1]:55361 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mjt8c-0007mR-N0 for submit@debbugs.gnu.org; Sun, 07 Nov 2021 20:07:54 -0500 Received: from quimby.gnus.org ([95.216.78.240]:40814) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mjt8Y-0007lz-Bw for control@debbugs.gnu.org; Sun, 07 Nov 2021 20:07:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Subject:From:To:Message-Id:Date:Sender:Reply-To:Cc: MIME-Version:Content-Type:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=VW8Latkz3epOlhWyUIhWYuzMOOsnITlsr0j1Wvh05wY=; b=k5OxCUHKocc/adZQbfqrcpCPui bEq44SHJzR+E8PU2xA0QwZN9Er/Kequazc2vFjHMClBjHrLwwikP3aR/0f8vU0b/+Zp2u5Qc24M+e DWWp6OZ+oWiqjJ1PhB9haEwRkm13PMrE+NpKfDMeRaHxoYZw5k8KK7DR0I22UZRgnzyo=; Received: from [84.212.220.105] (helo=elva) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjt8Q-0005QE-DE for control@debbugs.gnu.org; Mon, 08 Nov 2021 02:07:44 +0100 Date: Mon, 08 Nov 2021 02:07:41 +0100 Message-Id: <87cznbcv4y.fsf@gnus.org> To: control@debbugs.gnu.org From: Lars Ingebrigtsen Subject: control message for bug #31149 X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: close 31149 29.1 quit Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) close 31149 29.1 quit From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 07 20:12:27 2021 Received: (at 31149) by debbugs.gnu.org; 8 Nov 2021 01:12:27 +0000 Received: from localhost ([127.0.0.1]:55377 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mjtD1-0007u1-Dd for submit@debbugs.gnu.org; Sun, 07 Nov 2021 20:12:27 -0500 Received: from quimby.gnus.org ([95.216.78.240]:40868) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mjtD0-0007to-2y for 31149@debbugs.gnu.org; Sun, 07 Nov 2021 20:12:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=ag6bI7+F6YQK5taOejZeyjcFK/BGZVr75C1N59R0N3I=; b=VPPzzVT6S1cC1cApzPVFrEHZ7V IMvfK+ruH9rQyKhlCuFLo8s2RjcyszEZJiFJqJ9xIcppc7pwBuAf3sUTbFRbz/kIn49QP/STAuKPW BdZqJsH22xNfLkt5Qua+/WVl3SYg9X8rlrRzL+DiqjA+HZ0/7dCf9BgYE2gir0jgLpjk=; Received: from [84.212.220.105] (helo=elva) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjtCq-0005Sb-OH; Mon, 08 Nov 2021 02:12:19 +0100 From: Lars Ingebrigtsen To: Stefan Monnier Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87ee7rcv57.fsf@gnus.org> X-Now-Playing: Dntel's _Away_: "Bridge" Date: Mon, 08 Nov 2021 02:12:16 +0100 In-Reply-To: <87ee7rcv57.fsf@gnus.org> (Lars Ingebrigtsen's message of "Mon, 08 Nov 2021 02:07:32 +0100") Message-ID: <874k8ncuxb.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Lars Ingebrigtsen writes: > This should now be fixed on the trunk, and hopefully I didn't regress > anything, but I have not regressed anything. Urr. I think it's getting a bit late in the day. Substitute with sentences that makes sense instead. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Lars Ingebrigtsen writes: > This should now be fixed on the trunk, and hopefully I didn't regress > anything, but I have not regressed anything. Urr. I think it's getting a bit late in the day. Substitute with sentences that makes sense instead. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From unknown Sat Jun 21 10:33:59 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug No longer marked as fixed in versions 29.1 and reopened. Date: Tue, 09 Nov 2021 03:44:02 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug No longer marked as fixed in versions 29.1 and reopened. thanks # This fakemail brought to you by your local debbugs # administrator From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 08 22:44:35 2021 Received: (at 31149) by debbugs.gnu.org; 9 Nov 2021 03:44:35 +0000 Received: from localhost ([127.0.0.1]:60173 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkI3n-0001q1-F3 for submit@debbugs.gnu.org; Mon, 08 Nov 2021 22:44:35 -0500 Received: from quimby.gnus.org ([95.216.78.240]:54674) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mkI3l-0001pj-9t for 31149@debbugs.gnu.org; Mon, 08 Nov 2021 22:44:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Dgt7987iXn2uft54x5UhvXtvLLeqoN1btlihqqd9afc=; b=fwW2oDFCA+OUcC+8Xw3rm42hhI MkpHBWuh0yYBvz/PqS4lLgYanCbyk5SdmELyv6iL/g00uid+NQNsPViQJwkNNFzxX+KrQT64zsO6r BBuFiaXjrMW8cYp8KhwBy5xcp0+cp7Kcq0ILnL+rp0G0zKooq0bQUNLI3rSr5W4zfqX0=; Received: from [84.212.220.105] (helo=xo) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mkI3X-0008Gl-Ev; Tue, 09 Nov 2021 04:44:26 +0100 From: Lars Ingebrigtsen To: Stefan Monnier Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87ee7rcv57.fsf@gnus.org> X-Now-Playing: IDM Theft Able's _Rogue Pulse: Gravity Collapse (10)_: "Exhalations in a Flashlight Beam" Date: Tue, 09 Nov 2021 04:44:17 +0100 In-Reply-To: <87ee7rcv57.fsf@gnus.org> (Lars Ingebrigtsen's message of "Mon, 08 Nov 2021 02:07:32 +0100") Message-ID: <87sfw6t2lq.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Lars Ingebrigtsen writes: > Stefan Monnier writes: > >> (gui-get-selection nil 'text/html) >> >> returns utf-16 text when the primary selection is owned by Mozilla, but >> we decode it as latin-1 ins [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Lars Ingebrigtsen writes: > Stefan Monnier writes: > >> (gui-get-selection nil 'text/html) >> >> returns utf-16 text when the primary selection is owned by Mozilla, but >> we decode it as latin-1 instead, so it looks like garbage. > > This should now be fixed on the trunk, and hopefully I didn't regress > anything, but I have not regressed anything. (I've only tested on > Debian and Macos.) Let me know whether I broke something. It broke selection on Windows, so I've reverted and reopened this bug report. >From the discussion on emacs-devel: > > > Does reverting 5e66c75e0 fix the issue? > > > > I've reverted it now and will have to reexamine the problem before > > attempting a new fix. > > Whatever you do, don't decode the selection text on MS-Windows. It is > already decoded (see w32-get-clipboard-data), and > selection-coding-system is UTF-16 on MS-Windows, so decoding a decoded > string by that will not do anything useful ;-) > > The existing code carefully side-steps the decoding by looking at the > foreign-selection property on the string, which the Windows code > doesn't set. But your changes removed that test, and thus caused the > clipboard text to be decoded on Windows. So I'll be re-exploring this issue once I get my Windows VMs back up and can do some testing on Windows, too. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 10 23:25:09 2021 Received: (at 31149) by debbugs.gnu.org; 11 Nov 2021 04:25:09 +0000 Received: from localhost ([127.0.0.1]:39762 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ml1e8-00048t-W4 for submit@debbugs.gnu.org; Wed, 10 Nov 2021 23:25:09 -0500 Received: from quimby.gnus.org ([95.216.78.240]:49504) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ml1e6-00048L-Qx for 31149@debbugs.gnu.org; Wed, 10 Nov 2021 23:25:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=NuaApPPiu5eprMyEgW3w6arjErO62VeHXLaaSQsEMYU=; b=ZaqZHdVKjIeoRyj2FJ7rS5vUmT 0PoklynRJGlejm3cSWNAHtwDaqNyThvNdA4I/agqHL0/qYuFyR9+3Y/ChhECRO4HS3a/Yh17GaiVt /gs+nhfJwV/IvAAdeXMJUIRJogE+u+VIEa+ctQOtGfRXq9ie4bsQSpLN95HISYZQaPTM=; Received: from [84.212.220.105] (helo=xo) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ml1dx-0002Cl-2J; Thu, 11 Nov 2021 05:24:59 +0100 From: Lars Ingebrigtsen To: Stefan Monnier Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text References: <87ee7rcv57.fsf@gnus.org> <87sfw6t2lq.fsf@gnus.org> X-Now-Playing: Lolina's _Fast Fashion_: "Looking for a Charger but Only Works on Batteries" Date: Thu, 11 Nov 2021 05:24:52 +0100 In-Reply-To: <87sfw6t2lq.fsf@gnus.org> (Lars Ingebrigtsen's message of "Tue, 09 Nov 2021 04:44:17 +0100") Message-ID: <87tugjb9pn.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Lars Ingebrigtsen writes: > It broke selection on Windows, so I've reverted and reopened this bug > report. I've now reapplied a tweaked version of the patch, and checked that it doesn't break selection on Windows. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31149 Cc: 31149@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Lars Ingebrigtsen writes: > It broke selection on Windows, so I've reverted and reopened this bug > report. I've now reapplied a tweaked version of the patch, and checked that it doesn't break selection on Windows. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 10 23:25:13 2021 Received: (at control) by debbugs.gnu.org; 11 Nov 2021 04:25:13 +0000 Received: from localhost ([127.0.0.1]:39765 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ml1eD-00049D-7E for submit@debbugs.gnu.org; Wed, 10 Nov 2021 23:25:13 -0500 Received: from quimby.gnus.org ([95.216.78.240]:49520) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ml1eB-00048l-RD for control@debbugs.gnu.org; Wed, 10 Nov 2021 23:25:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Subject:From:To:Message-Id:Date:Sender:Reply-To:Cc: MIME-Version:Content-Type:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=VW8Latkz3epOlhWyUIhWYuzMOOsnITlsr0j1Wvh05wY=; b=YplbPbZHlFE5znKFQtgGHm9aQB rrsSGXVpZ/8UrPa3raKpONi7zrs7EgrFTlvKNjDVw3ixHHFUvWJB93y4NvWvZcq7I4CmvW0W8t4xD B+Vdyg5MPN79Oi44LlCEAlkCgORgZicih+II5vZ+LhTqIFaJZ0rk8WFM+AO4wJnHUyAM=; Received: from [84.212.220.105] (helo=xo) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ml1e4-0002Cv-1L for control@debbugs.gnu.org; Thu, 11 Nov 2021 05:25:06 +0100 Date: Thu, 11 Nov 2021 05:25:03 +0100 Message-Id: <87sfw3b9pc.fsf@gnus.org> To: control@debbugs.gnu.org From: Lars Ingebrigtsen Subject: control message for bug #31149 X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: close 31149 29.1 quit Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) close 31149 29.1 quit From unknown Sat Jun 21 10:33:59 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 09 Dec 2021 12:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator