GNU bug report logs - #20499
[PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc.

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Mon, 4 May 2015 01:15:03 UTC

Severity: wishlist

Tags: patch

Merged with 16082

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #108 received at 20499 <at> debbugs.gnu.org (full text, mbox):

From: Ivan Shmakov <ivan <at> siamics.net>
To: 20499 <at> debbugs.gnu.org
Subject: Re: bug#20499: C-x 8 shorthands for curved quotes, Euro, etc. 
Date: Thu, 07 May 2015 10:00:38 +0000
>>>>> Paul Eggert <eggert <at> cs.ucla.edu> writes:

[…]

 >> … Also, did you consider generating this list automatically, based
 >> on the codepoint properties already known to Emacs?  Something along
 >> the lines of the function MIMEd, which readily produces a list of
 >> entries for the following 133 characters.  (Three spaces added for
 >> symmetry purposes.)

 >> À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
 >> à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
 >> ÿ   Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
 >> Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
 >> Ÿ   Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ   ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
 >> Ȟ ȟ Ȳ ȳ

 > Sorry, I don't really follow the code that you attached.

	Which part, specifically?

	It just iterates over the range given (or U+00A8 through U+02AF
	by default) and maps “LATIN + COMBINING” decompositions to
	'iso-transl entries.  For example, it maps the (?g #x327)
	decomposition (U+0327 being COMBINING CEDILLA) for U+0123 into
	an (",g" . ģ) entry.

	Or, rather, it /should/, for my code has an obvious typo:

 		    (`(,c #x30c) (string ?v c))
 		    (`(,c #x326) (string 59 c))
-		    (`(,c #x326) (string ?, c)))))
+		    (`(,c #x327) (string ?, c)))))

	Other possible additions (assuming we’ll agree on C-x 8 u,
	C-x 8 .) are:

                   (`(,c #x304) (string ?= c))
+                  (`(,c #x306) (string ?u c))
+                  (`(,c #x307) (string ?. c))
                   (`(,c #x308) (string 34 c))
+                  (`(,c #x30b) (string ?2 c))
                   (`(,c #x30c) (string ?v c))

 > Although I suppose it comes from a decomposition table, I don't know
 > what the table was designed for, and it's not clear to me how it's
 > relevant.

	I hope someone more knowledgeable could comment on this.  Still,
	this (ab)use of the data seem to work well in practice.

 > Anyway, most of those letters are either in iso-transl.el now,

	The point is to /remove/ them from 'iso-transl, as these entries
	duplicate, in a way, a part of the decomposition table already
	present in Emacs.

[…]

 >> Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǹ ǹ

 > These are for toned Pinyin but this list is incomplete.  If we wanted
 > to cover toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ.  Coming up
 > with two-character abbreviations for all these might be tricky.

	But are we actually limited to two-character abbreviations only?
	Why not allow for, say, C-x 8 " ' u?

[…]

 >> ǰ

 > What language uses this?  I couldn't find one.

	To quote NamesList.txt:

01F0	LATIN SMALL LETTER J WITH CARON
	* IPA and many languages

 >> Ǵ ǵ

 > Good catch.  These are used for transliteration from Serbian and
 > Macedonian.  We should also include Ḱ ḱ as they are also needed.
 > Included in the attached patch.

	The code I’ve suggested could be used to scan the U+1Exx range
	just as well, thus resulting in the following set.

    Ḑ ḑ Ḡ ḡ Ḧ ḧ Ḩ ḩ Ḱ ḱ Ḿ ḿ Ṕ ṕ Ṽ ṽ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẍ ẍ Ẑ ẑ ẗ Ẽ ẽ Ỳ ỳ Ỹ ỹ

[…]

 > Anyway, part of what's going on here is that the proposed list
 > doesn't cover every Latin character in the ISO 10646 repertoire
 > (that'd be a large set), but instead is limited to what appear to be
 > reasonably commonly letters.  Admittedly this is not universal but
 > one must cut things off somewhere, and it would be odd to add only
 > partial coverage for toned Pinyin, Livonian, etc.

	When it comes to the LATIN … LETTER WITH … letters, my proposal
	for such a cut off would be to satisfy /both/ of the following
	criteria:

	• only cover specific Unicode ranges; such as, for instance,
	  U+00A8 through U+02AF, U+1E00 … U+1EFF, perhaps 2C60 … 2C7F;

	• only cover the letters which can be represented with a
	  sufficiently general C-x 8 ⟨diacritic⟩+ ⟨ASCII-latin⟩ pattern.

	Other characters deemed common may be added to the list.

 >>> --------------090904020002020306060104
 >>> Content-Type: text/x-patch;
 >>>  name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"

 >> This MIME part sure wants ‘; charset=UTF-8’.  Otherwise, Gnus does
 >> no decoding, and Emacs shows the contents with the likes of
 >> \304\260.

 > Hmm, it works for me.  I use Thunderbird to read the top level
 > message, and it spins off an Emacs to display the attachment with no
 > problem.

	I can “spin off” cat(1) to read the offending MIME part, too:
	Emacs will feed it raw-text, and interpret the result as UTF-8
	(the default.)

	It still does /not/ comply with the MIME specification.
	Consider section 4.1.2 of RFC 2046:

 RFC> […] The default character set, which must be assumed in the
 RFC> absence of a charset parameter, is US-ASCII.

	RFC 6657 updates this as follows:

 RFC> Each subtype of the "text" media type that uses the "charset"
 RFC> parameter can define its own default value for the "charset"
 RFC> parameter, including the absence of any default.

	However, given that ‘text/x-patch’ is not a /registered/ MIME
	type, I believe the above does not apply.

 > The web-site archive at <http://bugs.gnu.org/20499#60> also works for
 > me with Firefox.

 > It's common for people to send the output of "git send-email" as
 > attachments;

	If Thunderbird /knows/ the encoding (“character set”) of the
	contents of the MIME part, it /should/ specify it in the MIME
	part header.  If the said contents is strictly 7-bit, it /could/
	omit that (given that it’s more than likely to be US-ASCII.)
	Otherwise, I guess Thunderbird should either ask the user for
	the encoding /or/ send the part as application/octet-stream.

[…]

-- 
FSF associate member #7257  np. Satellite one — Purple Motion  B6A0 230E 334A




This bug report was last modified 4 years and 343 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.