GNU bug report logs - #31149
27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text

Previous Next

Package: emacs;

Reported by: Stefan Monnier <monnier <at> IRO.UMontreal.CA>

Date: Fri, 13 Apr 2018 20:56:02 UTC

Severity: normal

Found in version 27.0.50

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>, Kenichi Handa <handa <at> gnu.org>
Cc: larsi <at> gnus.org, 31149 <at> debbugs.gnu.org
Subject: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text
Date: Sat, 14 Apr 2018 09:32:41 +0300
> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Date: Fri, 13 Apr 2018 16:55:26 -0400
> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>
> 
> (gui-get-selection nil 'text/html)
> 
> returns utf-16 text when the primary selection is owned by Mozilla, but
> we decode it as latin-1 instead, so it looks like garbage.
> 
> I don't know why we're getting utf-16.  Is that what standards say it
> should do?  If so, we should adjust our code (which currently knows
> nothing about the `text/html` target-type).
> 
> As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> using something else because he's getting something with a `charset`
> property which I don't get here) because:
> - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
>   the property `foreign-selection` set to `STRING` when the actual
>   string type is not known (as opposed to COMPOUND-TEXT and
>   UTF8-STRING, basically).
> - in gui-get-selection we then have a mapping from `STRING` to
>   `iso-8859-1` (which is apparently the right thing for the official
>   `STRING` target-type in X11).
> 
> I can't figure out if/where these kinds of things about the X11
> selection protocol is described, but at least in `xclip` they have
> a hack specifically for this case:
> 
>     [...]
>     if (html != None && sel_type == html) {
> 	/* if the buffer contains UCS-2 (UTF-16), convert to
> 	 * UTF-8.  Mozilla-based browsers do this for the
> 	 * text/html target.
> 	 */
>     [...]
> 
> and according to the subsequent code it's not even always the
> same endianness.
> 
> I don't know what is the difference between the `target-type` passed to
> x-get-selection-internal and the `foreign-selection` property we get on
> the returned string (they seem to be the same in my tests, except when
> the type is not one of the known ones, and where we then force
> `foreign-selection` to be `STRING`).

I Hope Handa-san (CC'ed) could comment on this.




This bug report was last modified 3 years and 193 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.