From unknown Fri Jun 20 07:19:15 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#47885 <47885@debbugs.gnu.org> To: bug#47885 <47885@debbugs.gnu.org> Subject: Status: [PATCH] org-table-import: Make it more smarter for interactive use Reply-To: bug#47885 <47885@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:19:15 +0000 retitle 47885 [PATCH] org-table-import: Make it more smarter for interactiv= e use reassign 47885 org-mode submitter 47885 Utkarsh Singh severity 47885 normal tag 47885 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 19 00:43:36 2021 Received: (at submit) by debbugs.gnu.org; 19 Apr 2021 04:43:36 +0000 Received: from localhost ([127.0.0.1]:48165 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYLl1-0008Oj-SV for submit@debbugs.gnu.org; Mon, 19 Apr 2021 00:43:36 -0400 Received: from lists.gnu.org ([209.51.188.17]:58124) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYLl1-0008Oc-0y for submit@debbugs.gnu.org; Mon, 19 Apr 2021 00:43:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57392) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYLkz-0000gn-Fq; Mon, 19 Apr 2021 00:43:34 -0400 Received: from mail-pg1-x533.google.com ([2607:f8b0:4864:20::533]:46661) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lYLkx-00056P-Jb; Mon, 19 Apr 2021 00:43:33 -0400 Received: by mail-pg1-x533.google.com with SMTP id 31so7916331pgn.13; Sun, 18 Apr 2021 21:43:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version; bh=HwWhWBjz/uah/jBqo0i2tmAhiMv2OVugd6VkwcEUpVE=; b=s6q6v5+KzuLUKWxTbAj8LT9UM7Yn9iCOr74UKgB1f107Own9mxDLtD96EpMOkcskDl rspgh1mO3w9ebfE07UOO6DLJaBKZ6rnQ5T798gPDGgOIR4HpGSgwueZv9zVsSONwW+4/ RSwYsRe7ZijVAtUq/BMeT1Cbghih51ZTJlw+LKwfvZ8ck2eP2G6AP0kXjs4CqyiCht1f JM/VtWA6bM0NbOuUMK9Vv7GJH+mQ66ESkXhrxLlEwaqT/5GJR91OYIpc59SWJ9LwTARS MI8XYgquVUWbAMu/fUkx0jUM99h/nP8Lb0gC+DzyiazK2qGdgzgkoeF/HeP7C/O2Fhy/ qlJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=HwWhWBjz/uah/jBqo0i2tmAhiMv2OVugd6VkwcEUpVE=; b=fjT/8wzry0m7Et5XXJto4bcduquMMvwa2HUhGD5u6jpv7LkLPh9eYSyJyV3BS2JHxc 6lswTFLQKfCzoHNWA1TxEMIrUR5P8Hh4pErbMd2CZaM6LAoeATBxV1Ku7VUCrlDWgF/w DGfxU94EICDO1uap9QvZ/1f1UjuXYp97Xm84/ZRyVPuj1zJgx4IJhvnKpwAcbMmmbzRX AQMuoBvGPsOK9nYDk+7+W3PvsBKZ23JTPU6DT+kE8pKqv1qPtM7276kahpALI+QoqEdz 59t1UGzmd0bmpiVZw2HDgjjQFktsq1PZwtG+TYyAc/xQSVk5ib3kP+TeIHKGCM7WcvFW wp6Q== X-Gm-Message-State: AOAM530/J9VJ5PsnEPcUprViYcelPisEFoLHrIgtW3EDqdQ0YxsJBPjz Wtaeg0SOM9dxxOFBHli0d8B0C5rZG9k= X-Google-Smtp-Source: ABdhPJwFDP7In9IVrNIHl9U14KeH+UTLS3nA+0g4RDFzRi9Vl7y7HlLVMWqk6RJ1nIkCGL/7LdkMeQ== X-Received: by 2002:aa7:946b:0:b029:24c:57ea:99bf with SMTP id t11-20020aa7946b0000b029024c57ea99bfmr18322610pfq.63.1618807408837; Sun, 18 Apr 2021 21:43:28 -0700 (PDT) Received: from localhost ([45.251.50.123]) by smtp.gmail.com with ESMTPSA id v8sm10820886pfm.128.2021.04.18.21.43.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Apr 2021 21:43:28 -0700 (PDT) From: Utkarsh Singh To: emacs-orgmode@gnu.org, bug-gnu-emacs@gnu.org Subject: [PATCH] org-table-import: Make it more smarter for interactive use Date: Mon, 19 Apr 2021 10:13:31 +0530 Message-ID: <87czuq9958.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::533; envelope-from=utkarsh190601@gmail.com; helo=mail-pg1-x533.google.com X-Spam_score_int: 1 X-Spam_score: 0.1 X-Spam_bar: / X-Spam_report: (0.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, PDS_OTHER_BAD_TLD=1.999, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) Hi, My previous patch proposed to add support for importing file with arbitrary name and building upon that this patch tries to make use of it by making org-table-import smarter by simply adding more separators (delimiters). Currently org-table-import 'smartly' guesses only COMMA, TAB and SPACE as separator whereas this patch tries to add support for ';'(SEMICOLON) and ':' (COLON). Here is an example org-table generated using =M-x org-table-import= /etc/passwd (uses COLON as separator) with private information removed. | bin | x | 1 | 1 | | / | /usr/bin/nologin | | daemon | x | 2 | 2 | | / | /usr/bin/nologin | | mail | x | 8 | 12 | | /var/spool/mail | /usr/bin/nologin | | ftp | x | 14 | 11 | | /srv/ftp | /usr/bin/nologin | | http | x | 33 | 33 | | /srv/http | /usr/bin/nologin | | nobody | x | 65534 | 65534 | Nobody | / | /usr/bin/nologin | | dbus | x | 81 | 81 | System Message Bus | / | /usr/bin/nologin | | systemd-journal-remote | x | 981 | 981 | systemd Journal Remote | / | /usr/bin/nologin | | systemd-network | x | 980 | 980 | systemd Network Management | / | /usr/bin/nologin | | systemd-oom | x | 979 | 979 | systemd Userspace OOM Killer | / | /usr/bin/nologin | | systemd-resolve | x | 978 | 978 | systemd Resolver | / | /usr/bin/nologin | | systemd-timesync | x | 977 | 977 | systemd Time Synchronization | / | /usr/bin/nologin | | systemd-coredump | x | 976 | 976 | systemd Core Dumper | / | /usr/bin/nologin | | avahi | x | 974 | 974 | Avahi mDNS/DNS-SD daemon | / | /usr/bin/nologin | | colord | x | 973 | 973 | Color management daemon | /var/lib/colord | /usr/bin/nologin | | rtkit | x | 133 | 133 | RealtimeKit | /proc | /usr/bin/nologin | | transmission | x | 169 | 169 | Transmission BitTorrent Daemon | /var/lib/transmission | /usr/bin/nologin | | geoclue | x | 972 | 972 | Geoinformation service | /var/lib/geoclue | /usr/bin/nologin | | usbmux | x | 140 | 140 | usbmux user | / | /usr/bin/nologin | diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index ab66859d6a..5ee4af612b 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -846,6 +846,35 @@ org-table-create (goto-char pos)) (org-table-align))) + +(defun org-table-guess-separator (beg0 end0) + "Guess separator for `org-table-convert-region' for region BEG0 to END0. + +List of preferred separator: +comma, TAB, ';', ':' or SPACE + +If region contains a line which doesn't contain the required +separator then discard the separator and search again using next +separator." + (let ((beg (save-excursion + (goto-char (min beg0 end0)) + (beginning-of-line 1) + (point))) + (end (save-excursion + (goto-char (max beg0 end0)) + (end-of-line 1) + (if (bolp) (backward-char 1) (end-of-line 1)) + (point)))) + (save-excursion + (goto-char beg) + (cond + ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) + ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) + ((not (re-search-forward "^[^\n;]+$" end t)) ";") + ((not (re-search-forward "^[^\n:]+$" end t)) ":") + ((not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t)) " ") + (t nil))))) + ;;;###autoload (defun org-table-convert-region (beg0 end0 &optional separator) "Convert region to a table. @@ -862,10 +891,7 @@ org-table-convert-region integer When a number, use that many spaces, or a TAB, as field separator regexp When a regular expression, use it to match the separator nil When nil, the command tries to be smart and figure out the - separator in the following way: - - when each line contains a TAB, assume TAB-separated material - - when each line contains a comma, assume CSV material - - else, assume one or more SPACE characters as separator." + separator using `org-table-guess-seperator'." (interactive "r\nP") (let* ((beg (min beg0 end0)) (end (max beg0 end0)) @@ -881,14 +907,9 @@ org-table-convert-region (goto-char end) (if (bolp) (backward-char 1) (end-of-line 1)) (setq end (point-marker)) - ;; Get the right field separator - (unless separator - (goto-char beg) - (setq separator - (cond - ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) - ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) - (t 1)))) + (if (and (not separator) + (not (setq separator (org-table-guess-separator beg end)))) + (error "Unable to guess suitable separator.")) (goto-char beg) (if (equal separator '(4)) (while (< (point) end) @@ -921,12 +942,8 @@ org-table-convert-region (defun org-table-import (file separator) "Import FILE as a table. -The command tries to be smart and figure out the separator in the -following way: - -- when each line contains a TAB, assume TAB-separated material; -- when each line contains a comma, assume CSV material; -- else, assume one or more SPACE characters as separator. +The command tries to be smart and figure out the separator using +`org-table-guess-seperator'. When non-nil, SEPARATOR specifies the field separator in the lines. It can have the following values: -- Utkarsh Singh http://utkarshsingh.xyz From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 19 04:19:12 2021 Received: (at submit) by debbugs.gnu.org; 19 Apr 2021 08:19:12 +0000 Received: from localhost ([127.0.0.1]:48341 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYP7g-0001M3-8V for submit@debbugs.gnu.org; Mon, 19 Apr 2021 04:19:12 -0400 Received: from lists.gnu.org ([209.51.188.17]:37902) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYP7f-0001Lw-2V for submit@debbugs.gnu.org; Mon, 19 Apr 2021 04:19:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34370) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYP7e-0006hm-Ph; Mon, 19 Apr 2021 04:19:10 -0400 Received: from relay1-d.mail.gandi.net ([217.70.183.193]:31161) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYP7c-0004Dm-Gw; Mon, 19 Apr 2021 04:19:10 -0400 X-Originating-IP: 185.131.40.67 Received: from localhost (40-67.ipv4.commingeshautdebit.fr [185.131.40.67]) (Authenticated sender: admin@nicolasgoaziou.fr) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 7E96624000C; Mon, 19 Apr 2021 08:19:04 +0000 (UTC) From: Nicolas Goaziou To: Utkarsh Singh Subject: Re: [PATCH] org-table-import: Make it more smarter for interactive use References: <87czuq9958.fsf@gmail.com> Mail-Followup-To: Utkarsh Singh , emacs-orgmode@gnu.org, bug-gnu-emacs@gnu.org Date: Mon, 19 Apr 2021 10:19:03 +0200 In-Reply-To: <87czuq9958.fsf@gmail.com> (Utkarsh Singh's message of "Mon, 19 Apr 2021 10:13:31 +0530") Message-ID: <8735vmelfs.fsf@nicolasgoaziou.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=217.70.183.193; envelope-from=mail@nicolasgoaziou.fr; helo=relay1-d.mail.gandi.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.7 (-) X-Debbugs-Envelope-To: submit Cc: bug-gnu-emacs@gnu.org, emacs-orgmode@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.7 (--) Hello, Utkarsh Singh writes: > My previous patch proposed to add support for importing file with > arbitrary name and building upon that this patch tries to make use of it > by making org-table-import smarter by simply adding more separators > (delimiters). Good idea, thank you. Some comments follow. > +(defun org-table-guess-separator (beg0 end0) > + "Guess separator for `org-table-convert-region' for region BEG0 to END0. > + > +List of preferred separator: > +comma, TAB, ';', ':' or SPACE I suggest to use full names everywhere: comma, TAB, semicolon, colon, or SPACE. > +If region contains a line which doesn't contain the required > +separator then discard the separator and search again using next > +separator." > + (let ((beg (save-excursion > + (goto-char (min beg0 end0)) > + (beginning-of-line 1) > + (point))) (beginning-of-line 1) + (point) -> (line-beginning-position) since you don't intent to move point. > + (end (save-excursion > + (goto-char (max beg0 end0)) > + (end-of-line 1) > + (if (bolp) (backward-char 1) (end-of-line 1)) I'm not sure about what you mean above. First, the second call to end-of-line is useless, since you're already at the end of the line. Second, what is wrong if point is at an empty line? Why do you want to move it back? > + (point)))) You may want to use `line-end-position'. > + (save-excursion > + (goto-char beg) > + (cond > + ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) > + ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) > + ((not (re-search-forward "^[^\n;]+$" end t)) ";") > + ((not (re-search-forward "^[^\n:]+$" end t)) ":") > + ((not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t)) " ") > + (t nil))))) I think you need to wrap `save-excursion' around each `re-search-forward' call. Otherwise each test starts at the first line containing the separator previously tested. > + > ;;;###autoload > (defun org-table-convert-region (beg0 end0 &optional separator) > "Convert region to a table. > @@ -862,10 +891,7 @@ org-table-convert-region > integer When a number, use that many spaces, or a TAB, as field separator > regexp When a regular expression, use it to match the separator > nil When nil, the command tries to be smart and figure out the > - separator in the following way: > - - when each line contains a TAB, assume TAB-separated material > - - when each line contains a comma, assume CSV material > - - else, assume one or more SPACE characters as separator." > + separator using `org-table-guess-seperator'." I wonder if creating a new function is warranted here. You could add the news checks after those already present in the code, couldn't you? Regards, -- Nicolas Goaziou From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 19 10:23:33 2021 Received: (at 47885) by debbugs.gnu.org; 19 Apr 2021 14:23:33 +0000 Received: from localhost ([127.0.0.1]:51475 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYUoH-0007nc-3z for submit@debbugs.gnu.org; Mon, 19 Apr 2021 10:23:33 -0400 Received: from mail-pf1-f174.google.com ([209.85.210.174]:41514) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYUoD-0007nN-NX for 47885@debbugs.gnu.org; Mon, 19 Apr 2021 10:23:32 -0400 Received: by mail-pf1-f174.google.com with SMTP id w6so8748273pfc.8 for <47885@debbugs.gnu.org>; Mon, 19 Apr 2021 07:23:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:references:cc:date:in-reply-to:message-id :user-agent:mime-version; bh=13IPtaGmOOwaxYsyBhyaECe/uc6zsM4sOyXxKnHgWtc=; b=IxswMqqIZMLZ4uE5f/ycRpJsi8wS2MxLhL9LEEHVtkl4elLep2TVIyPHKPfhYizAA3 yjdBCfszxXE2JGkksxHBWhvSFdq4mc9X5PvtJqZIP37hxrFsi8zwqjbrJXCkyvphKFwo u3lb1WU7ZhGLpd7EYe9/fWPcLqMfELgB82X7hLSoPVbe679q46YKw3hF16uv6jXrQBmG mCMOJ5zGlo3YrSozN0pAH3eSvbi5JpcYszDqXkuFWif6C+TS6W8W89uKKzWK35hJW+53 tgmpr+5lm1jQsDaKtKf4kn3+UTTtirH6IjNyoQJY6zO4ituIKKRF7fcpltH9DRprU7gR 2D2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:references:cc:date:in-reply-to :message-id:user-agent:mime-version; bh=13IPtaGmOOwaxYsyBhyaECe/uc6zsM4sOyXxKnHgWtc=; b=hhliLT4Bs1yd3tWnbNxwQ190w8XRe7bo+ECoHrW5DXiRskqCPLcBuuiYMw9NXXEG07 YsgEbetk/PRI7dJWIFqcWxXLibmlT/kYB3GHKge8a0AP3hKU8c56uwsiRkUjfpJTDE17 Ej6N8880DlDjJle03WutM2JHQIM8SnzsQKXvY6wGixC2xGVtgiN1n5ucjoEjsPlJ12fH tqJFx8lZNRMDVIkMVKDzgj9DV3M0NEVpeU2pYL+LDpioha6cJILllEkmzhLBS95AeJ9h lWGZ5fJzeuA92Vb1Lzq9b8QUONAZTUCE4jlRzzaQYCUeEW+I34Bm4FCxBzuOFwi1pGp3 HtHA== X-Gm-Message-State: AOAM533d0fwa4hFa8uMY2PLV8qJvD9sHyhc2KsTSL3qI+vHkbXbyd/mB j7nkv7qnqZCtMEjXGHDxjy0= X-Google-Smtp-Source: ABdhPJyQBhJX/SrGLyTgwDPXMwEkzgMBXOZrqQFzRxpPqNGH6iclHZGN664s/1F9gcnSEmkeNcBekw== X-Received: by 2002:a63:f443:: with SMTP id p3mr12122015pgk.378.1618842203762; Mon, 19 Apr 2021 07:23:23 -0700 (PDT) Received: from localhost ([45.251.50.123]) by smtp.gmail.com with ESMTPSA id i66sm10538396pgd.8.2021.04.19.07.23.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 07:23:23 -0700 (PDT) From: Utkarsh Singh To: Nicolas Goaziou Subject: Re: [PATCH] org-table-import: Make it more smarter for interactive use References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> Date: Mon, 19 Apr 2021 19:53:25 +0530 In-Reply-To: <8735vmelfs.fsf@nicolasgoaziou.fr> (Nicolas Goaziou's message of "Mon, 19 Apr 2021 10:19:03 +0200") Message-ID: <87k0oyfj4y.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 2.3 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 2021-04-19, 10:19 +0200, Nicolas Goaziou wrote: >> My previous patch proposed to add support for importing file with >> arbitrary name and building upon that this patch tries to make use of it >> by making org-table-import smarter by simply adding [...] Content analysis details: (2.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] -0.0 SPF_PASS SPF: sender matches SPF record 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.174 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.174 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: 47885 Cc: 47885@debbugs.gnu.org, emacs-orgmode@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.3 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 2021-04-19, 10:19 +0200, Nicolas Goaziou wrote: >> My previous patch proposed to add support for importing file with >> arbitrary name and building upon that this patch tries to make use of it >> by making org-table-import smarter by simply adding [...] Content analysis details: (1.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.174 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.174 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] -0.0 SPF_PASS SPF: sender matches SPF record 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders On 2021-04-19, 10:19 +0200, Nicolas Goaziou wrote: >> My previous patch proposed to add support for importing file with >> arbitrary name and building upon that this patch tries to make use of it >> by making org-table-import smarter by simply adding more separators >> (delimiters). > > Good idea, thank you. Some comments follow. > >> +(defun org-table-guess-separator (beg0 end0) >> + "Guess separator for `org-table-convert-region' for region BEG0 to END0. >> + >> +List of preferred separator: >> +comma, TAB, ';', ':' or SPACE > > I suggest to use full names everywhere: comma, TAB, semicolon, colon, or > SPACE. > >> +If region contains a line which doesn't contain the required >> +separator then discard the separator and search again using next >> +separator." >> + (let ((beg (save-excursion >> + (goto-char (min beg0 end0)) >> + (beginning-of-line 1) >> + (point))) > > (beginning-of-line 1) + (point) -> (line-beginning-position) > > since you don't intent to move point. > >> + (end (save-excursion >> + (goto-char (max beg0 end0)) >> + (end-of-line 1) >> + (if (bolp) (backward-char 1) (end-of-line 1)) > > I'm not sure about what you mean above. First, the second call to > end-of-line is useless, since you're already at the end of the line. > Second, what is wrong if point is at an empty line? Why do you want to > move it back? > >> + (point)))) > > You may want to use `line-end-position'. > >> + (save-excursion >> + (goto-char beg) >> + (cond >> + ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) >> + ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) >> + ((not (re-search-forward "^[^\n;]+$" end t)) ";") >> + ((not (re-search-forward "^[^\n:]+$" end t)) ":") >> + ((not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t)) " ") >> + (t nil))))) > > I think you need to wrap `save-excursion' around each > `re-search-forward' call. Otherwise each test starts at the first line > containing the separator previously tested. >> + >> ;;;###autoload >> (defun org-table-convert-region (beg0 end0 &optional separator) >> "Convert region to a table. >> @@ -862,10 +891,7 @@ org-table-convert-region >> integer When a number, use that many spaces, or a TAB, as field separator >> regexp When a regular expression, use it to match the separator >> nil When nil, the command tries to be smart and figure out the >> - separator in the following way: >> - - when each line contains a TAB, assume TAB-separated material >> - - when each line contains a comma, assume CSV material >> - - else, assume one or more SPACE characters as separator." >> + separator using `org-table-guess-seperator'." Thanks for reviewing the patch! > I wonder if creating a new function is warranted here. You could add the > news checks after those already present in the code, couldn't you? At first I was also reluctant in creating a new function but decided to do so because: + org-table-convert-region is currently doing two thing 'guessing the separator' and 'converting the region'. I thought it was a good idea to separate out function into it's atomic operations. + Current guessing technique is quite basic as it assumes that data (file that has to be imported) has no error/inconsistency in it. I would like to show you the doc string of Python's CSV library implementation to guess separator (region inside """): """ Looks for text enclosed between two identical quotes (the probable quotechar) which are preceded and followed by the same character (the probable delimiter). For example: ,'some text', The quote with the most wins, same with the delimiter. If there is no quotechar the delimiter can't be determined this way. """ And if this functions fails then we have: """ The delimiter /should/ occur the same number of times on each row. However, due to malformed data, it may not. We don't want an all or nothing approach, so we allow for small variations in this number. 1) build a table of the frequency of each character on every line. 2) build a table of frequencies of this frequency (meta-frequency?), e.g. 'x occurred 5 times in 10 rows, 6 times in 1000 rows, 7 times in 2 rows' 3) use the mode of the meta-frequency to determine the /expected/ frequency for that character 4) find out how often the character actually meets that goal 5) the character that best meets its goal is the delimiter For performance reasons, the data is evaluated in chunks, so it can try and evaluate the smallest portion of the data possible, evaluating additional chunks as necessary. """ I tried to do similar in Elisp but currently facing some issues due to my inexperience in functional programming. Also moving the 'guessing' part out the function may lead to development of even better algorithm than Python counterpart. Modified version of concerned function: (defun org-table-guess-separator (beg0 end0) "Guess separator for `org-table-convert-region' for region BEG0 to END0. List of preferred separator: comma, TAB, semicolon, colon or SPACE. If region contains a line which doesn't contain the required separator then discard the separator and search again using next separator." (let* ((beg (save-excursion (goto-char (min beg0 end0)) (line-beginning-position))) (end (save-excursion (goto-char (max beg0 end0)) (line-end-position))) (sep-rexp '(("," "^[^\n,]+$") ("\t" "^[^\n\t]+$") (";" "^[^\n;]+$") (":" "^[^\n:]+$") (" " "^\\([^'\"][^\n\s][^'\"]\\)+$"))) (tmp (car sep-rexp)) sep) (save-excursion (goto-char beg) (while (and (not sep) (if (save-excursion (not (re-search-forward (nth 1 tmp) end t))) (setq sep (nth 0 tmp)) (setq sep-rexp (cdr sep-rexp)) (setq tmp (car sep-rexp))))) sep))) Version without using iteration: (defun org-table-guess-separator (beg0 end0) "Guess separator for `org-table-convert-region' for region BEG0 to END0. List of preferred separator: COMMA, TAB, SEMICOLON, COLON or SPACE. If region contains a line which doesn't contain the required separator then discard the separator and search again using next separator." (let ((beg (save-excursion (goto-char (min beg0 end0)) (line-beginning-position))) (end (save-excursion (goto-char (max beg0 end0)) (line-end-position)))) (save-excursion (goto-char beg) (cond ((save-excursion (not (re-search-forward "^[^\n,]+$" end t))) ",") ((save-excursion (not (re-search-forward "^[^\n\t]+$" end t))) "\t") ((save-excursion (not (re-search-forward "^[^\n;]+$" end t))) ";") ((save-excursion (not (re-search-forward "^[^\n:]+$" end t))) ":") ((save-excursion (not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t))) " ") (t nil))))) -- Utkarsh Singh http://utkarshsingh.xyz From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 20 09:40:21 2021 Received: (at 47885) by debbugs.gnu.org; 20 Apr 2021 13:40:21 +0000 Received: from localhost ([127.0.0.1]:53362 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYqc1-0001St-Ew for submit@debbugs.gnu.org; Tue, 20 Apr 2021 09:40:21 -0400 Received: from relay7-d.mail.gandi.net ([217.70.183.200]:41307) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYqbz-0001Sf-WE for 47885@debbugs.gnu.org; Tue, 20 Apr 2021 09:40:20 -0400 X-Originating-IP: 185.131.40.67 Received: from localhost (40-67.ipv4.commingeshautdebit.fr [185.131.40.67]) (Authenticated sender: admin@nicolasgoaziou.fr) by relay7-d.mail.gandi.net (Postfix) with ESMTPSA id B3C1D20003; Tue, 20 Apr 2021 13:40:13 +0000 (UTC) From: Nicolas Goaziou To: Utkarsh Singh Subject: Re: [PATCH] org-table-import: Make it more smarter for interactive use References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> <87k0oyfj4y.fsf@gmail.com> Mail-Followup-To: Utkarsh Singh , 47885@debbugs.gnu.org, emacs-orgmode@gnu.org Date: Tue, 20 Apr 2021 15:40:12 +0200 In-Reply-To: <87k0oyfj4y.fsf@gmail.com> (Utkarsh Singh's message of "Mon, 19 Apr 2021 19:53:25 +0530") Message-ID: <87im4h9irn.fsf@nicolasgoaziou.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 47885 Cc: 47885@debbugs.gnu.org, emacs-orgmode@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hello, Utkarsh Singh writes: > At first I was also reluctant in creating a new function but decided to > do so because: > > + org-table-convert-region is currently doing two thing 'guessing the > separator' and 'converting the region'. I thought it was a good idea to > separate out function into it's atomic operations. I understand, but there is sometimes a (difficult) line to draw between "separating concerns" and "function proliferation". Anyway, that's fine here. > + Current guessing technique is quite basic as it assumes that data > (file that has to be imported) has no error/inconsistency in it. I > would like to show you the doc string of Python's CSV library > implementation to guess separator (region inside """): > > """ > Looks for text enclosed between two identical quotes > (the probable quotechar) which are preceded and followed > by the same character (the probable delimiter). > For example: > ,'some text', > The quote with the most wins, same with the delimiter. > If there is no quotechar the delimiter can't be determined > this way. > """ > > And if this functions fails then we have: > > """ > The delimiter /should/ occur the same number of times on > each row. However, due to malformed data, it may not. We don't want > an all or nothing approach, so we allow for small variations in this > number. > 1) build a table of the frequency of each character on every line. > 2) build a table of frequencies of this frequency (meta-frequency?), > e.g. 'x occurred 5 times in 10 rows, 6 times in 1000 rows, > 7 times in 2 rows' > 3) use the mode of the meta-frequency to determine the /expected/ > frequency for that character > 4) find out how often the character actually meets that goal > 5) the character that best meets its goal is the delimiter > For performance reasons, the data is evaluated in chunks, so it can > try and evaluate the smallest portion of the data possible, evaluating > additional chunks as necessary. > """ For the problem we're trying to solve, this sounds like over-engineering to me. Do we want so badly to guess a separator? > I tried to do similar in Elisp but currently facing some issues due to > my inexperience in functional programming. Also moving the 'guessing' > part out the function may lead to development of even better algorithm > than Python counterpart. > > Modified version of concerned function: > > (defun org-table-guess-separator (beg0 end0) > "Guess separator for `org-table-convert-region' for region BEG0 to END0. > > List of preferred separator: > comma, TAB, semicolon, colon or SPACE. > > If region contains a line which doesn't contain the required > separator then discard the separator and search again using next > separator." > (let* ((beg (save-excursion > (goto-char (min beg0 end0)) > (line-beginning-position))) > (end (save-excursion > (goto-char (max beg0 end0)) > (line-end-position))) Thinking again about it, this needs extra care, as end0 might end up on an empty line. You tried to avoid this in your first function, but I think this was not sufficient either. Actually, beg0 could also start on an empty line. This needs to be tested extensively, but as a first approximation, I think `beg' needs to be defined as: (save-excursion (goto-char (min beg0 end0)) (skip-chars-forward " \t\n") (if (eobp) (point) (line-beginning-position))) and `end' as (save-excursion (goto-char (max beg end0)) (skip-chars-backward " \t\n" beg) (if (= beg (point)) (point) (line-end-position))) Then you need to bail out if beg = end. > (sep-rexp '(("," "^[^\n,]+$") sep-rexp -> sep-regexp > ("\t" "^[^\n\t]+$") > (";" "^[^\n;]+$") > (":" "^[^\n:]+$") > (" " "^\\([^'\"][^\n\s][^'\"]\\)+$"))) At this point, I suggest to use `rx' macro instead. > (tmp (car sep-rexp)) > sep) > (save-excursion > (goto-char beg) > (while (and (not sep) > (if (save-excursion > (not (re-search-forward (nth 1 tmp) end t))) > (setq sep (nth 0 tmp)) > (setq sep-rexp (cdr sep-rexp)) > (setq tmp (car sep-rexp))))) I suggest this (yes, I like pattern-matching, `car' and `cdr' are so 80's) instead: (save-excursion (goto-char beg) (catch :found (pcase-dolist (`(,sep ,regexp) sep-regexp) (save-excursion (unless (re-search-forward regexp end t) (throw :found sep)))) nil)) Again all this needs to extensively tested, as there are a lot of dangers lurking around. Regards, -- Nicolas Goaziou From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 20 13:15:29 2021 Received: (at 47885) by debbugs.gnu.org; 20 Apr 2021 17:15:29 +0000 Received: from localhost ([127.0.0.1]:55864 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYtyC-0000zl-MC for submit@debbugs.gnu.org; Tue, 20 Apr 2021 13:15:29 -0400 Received: from mail-pf1-f172.google.com ([209.85.210.172]:39769) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lYtyA-0000zX-To for 47885@debbugs.gnu.org; Tue, 20 Apr 2021 13:15:27 -0400 Received: by mail-pf1-f172.google.com with SMTP id c17so26021095pfn.6 for <47885@debbugs.gnu.org>; Tue, 20 Apr 2021 10:15:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:in-reply-to:references:cc:date:message-id :mime-version; bh=NukRM5CJ9bRA54iEIh/uGDnqEAtbdbMrVJmPi5g7L1s=; b=B11A6MV1bKSsQOtaRnHrFhpUcMcR6Gw36jvcBsILbTn8K3poKLNCWGQ7MQwqHA1pFw fQjLKkdBWMRtwvJOL8LdBG/uFpTfiJ6gSc0hjvIXp7hBHH5R10JloDUK1ZMoKz0ByWZD +G2tilfU+Bx2SWF7XllS2bp8n0egmHsHEj1oFLjETU1lNqZgR/h8jw5G0PRSrKgi3Zf3 5TWqEMtSLQw6MN/weaK3+nLZajW6Q+8ZCxIxh5mD94V4DiCD+GsiEdmurLBBh37o/xs5 ppdc6MbJiQR5zG0i4asvYg039L80nPTYIY8QmS8WyrSGyQucK8yUh9Fw7QUKPu+mGq36 23jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:cc:date :message-id:mime-version; bh=NukRM5CJ9bRA54iEIh/uGDnqEAtbdbMrVJmPi5g7L1s=; b=CBxNCnwhvSumObao3iz7z7GIkKD9Yb4ve3t/yVEW98LimCku8hJG6nmaPzMck//ZOc 1hLmjDzvKWEIH+5S7mymGXNDDzCj8xOMopgodSwk5BNn6j1FWK8p7jXYOE+OvkejqisL K//8Tx8Inq3EH2zQp9WuOqEuSi27vC6yOnUukqYSMc9DSAcj3DCOSNl3qL1aGhZrVwnu Q0NBFSPDRL6e6/wKJ/IR0gMNU2ZTaMOwRA1oiwSQ8Ghzymiarznffs5No8fCgMieStAN sQsdYgizlFx6pQ4ueX2eZ5Nfieujsf4bcMqZPLebASPB+DuwndsyF2vRZtlQf3Wft6Hb 5k4A== X-Gm-Message-State: AOAM530mwMbwaTmfilV8SZXxcV/yTOb7Ecu3AQH2A3K6d5NKful0ytSK WiValQ6SlwiqQbnJmUAF1GIQ/eQnX+4= X-Google-Smtp-Source: ABdhPJy+pErFuieS7YszOwMxguHhXEazQ3/xqgbi4yWGIM7JK6PSf83UMOrBbD4JSeLBqFPFgneFWQ== X-Received: by 2002:aa7:9f08:0:b029:25b:70c0:a31b with SMTP id g8-20020aa79f080000b029025b70c0a31bmr17852213pfr.61.1618938920844; Tue, 20 Apr 2021 10:15:20 -0700 (PDT) Received: from localhost ([45.251.50.123]) by smtp.gmail.com with ESMTPSA id a20sm6751156pfi.138.2021.04.20.10.15.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 10:15:20 -0700 (PDT) From: Utkarsh Singh To: Nicolas Goaziou Subject: Re: [PATCH] org-table-import: Make it more smarter for interactive use In-Reply-To: <87im4h9irn.fsf@nicolasgoaziou.fr> References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> <87k0oyfj4y.fsf@gmail.com> <87im4h9irn.fsf@nicolasgoaziou.fr> Date: Tue, 20 Apr 2021 22:45:22 +0530 Message-ID: <87r1j4ri6t.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 2.2 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, On 2021-04-20, 15:40 +0200, Nicolas Goaziou wrote: > For the problem we're trying to solve, this sounds like over-engineering > to me. Do we want so badly to guess a separator? Content analysis details: (2.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.210.172 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.172 listed in list.dnswl.org] X-Debbugs-Envelope-To: 47885 Cc: 47885@debbugs.gnu.org, emacs-orgmode@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, On 2021-04-20, 15:40 +0200, Nicolas Goaziou wrote: > For the problem we're trying to solve, this sounds like over-engineering > to me. Do we want so badly to guess a separator? Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.172 listed in list.dnswl.org] -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.210.172 listed in wl.mailspike.net] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager Hi, On 2021-04-20, 15:40 +0200, Nicolas Goaziou wrote: > For the problem we're trying to solve, this sounds like over-engineering > to me. Do we want so badly to guess a separator? Earlier I took is as an assignment to learn Elisp but now I don't think we should increase complexity this much. > Thinking again about it, this needs extra care, as end0 might end up on > an empty line. You tried to avoid this in your first function, but > I think this was not sufficient either. Actually, beg0 could also start > on an empty line. > > This needs to be tested extensively, but as a first approximation, > I think `beg' needs to be defined as: > > (save-excursion > (goto-char (min beg0 end0)) > (skip-chars-forward " \t\n") > (if (eobp) (point) (line-beginning-position))) > > and `end' as > > (save-excursion > (goto-char (max beg end0)) > (skip-chars-backward " \t\n" beg) > (if (= beg (point)) (point) (line-end-position))) > > Then you need to bail out if beg = end. > >> (sep-rexp '(("," "^[^\n,]+$") > > sep-rexp -> sep-regexp > >> ("\t" "^[^\n\t]+$") >> (";" "^[^\n;]+$") >> (":" "^[^\n:]+$") >> (" " "^\\([^'\"][^\n\s][^'\"]\\)+$"))) > > At this point, I suggest to use `rx' macro instead. > > I suggest this (yes, I like pattern-matching, `car' and `cdr' are so > 80's) instead: > > (save-excursion > (goto-char beg) > (catch :found > (pcase-dolist (`(,sep ,regexp) sep-regexp) > (save-excursion > (unless (re-search-forward regexp end t) > (throw :found sep)))) > nil)) > Thanks! I was not aware of pcase-dolist function. Function after doing the necessary changes: (defun org-table-guess-separator (beg0 end0) "Guess separator for `org-table-convert-region' for region BEG0 to END0. List of preferred separator: comma, TAB, semicolon, colon or SPACE. If region contains a line which doesn't contain the required separator then discard the separator and search again using next separator." (let* ((beg (save-excursion (goto-char (min beg0 end0)) (skip-chars-forward " \t\n") (if (eobp) (point) (line-beginning-position)))) (end (save-excursion (goto-char (max beg end0)) (skip-chars-backward " \t\n" beg) (if (= beg (point)) (point) (line-end-position)))) (sep-regexp '(("," (rx bol (1+ (not (or ?\n ?,))) eol)) ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol)) (";" (rx bol (1+ (not (or ?\n ?\;))) eol)) (":" (rx bol (1+ (not (or ?\n ?:))) eol)) (" " (rx bol (1+ (not (or ?' ?\" )) (not (or ?\s ?\;)) (not (or ?' ?\"))) eol)))) sep) (unless (= beg end) (save-excursion (goto-char beg) (catch :found (pcase-dolist (`(,sep ,regexp) sep-regexp) (save-excursion (unless (re-search-forward (eval regexp) end t) (throw :found sep)))) nil))))) > Again all this needs to extensively tested, as there are a lot of > dangers lurking around. Summary of things that still requires a review: + Setting boundary right + When using SPACE as separator is it sufficient to check for all for all non quoted SPACE's? -- Utkarsh Singh http://utkarshsingh.xyz From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 23 00:58:30 2021 Received: (at 47885) by debbugs.gnu.org; 23 Apr 2021 04:58:30 +0000 Received: from localhost ([127.0.0.1]:35996 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lZnte-0002iN-Dc for submit@debbugs.gnu.org; Fri, 23 Apr 2021 00:58:30 -0400 Received: from mail-pl1-f170.google.com ([209.85.214.170]:41545) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lZntb-0002i9-Rv for 47885@debbugs.gnu.org; Fri, 23 Apr 2021 00:58:29 -0400 Received: by mail-pl1-f170.google.com with SMTP id e2so20333246plh.8 for <47885@debbugs.gnu.org>; Thu, 22 Apr 2021 21:58:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=xznZQK2tcTDx3ofiT9a/6PP6K4yrTb9plv41Xw+QtRQ=; b=fIhH5IdFOLgbe/SbNfq6y1YEhlmTUYWlAQkyUBH4g91W6TDbB1MFUC17eeYj3KTPMe PDT/gEYCSqAmfK0j3SQt8KlWYLgTR4vJR62oJ03ehJsN6AvMFbI5ZON2/bGkHBETKXM6 kbJjUYU2aMEtKK8GwRYGaI59JJ/0gGSX4/wDxxIBxqVH5eAjtmcy/uQyJmS7KXNYL73d AMrbALcHxCN2JGW/afzzSJL96PTzFfiHxPz6mCMmcuXtNLg2256RTdMNM1VkPsS0s46c D0Wd3HfFxtzR2R8YvJWWMMJ9N5jJNprUm64ky7h0UjNWrjuCQ+MOikqVkhHA7/+qYjGb i/3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=xznZQK2tcTDx3ofiT9a/6PP6K4yrTb9plv41Xw+QtRQ=; b=X7ztIn1lio2Nsu4Nj7tJdXIzQqVgTGVUjcDz3DlMwpsnVhCqhpb/Nk0Gc1tThHVitl ojs5mpMa+XO2OlFRDYH6FQgyh6MyTykDVss67WjfPxEYq7e0SZUw4Vpj2pm5ZnIrnkO2 hHsVYuqaRHfP0KJGXfxKnpg8U2kOMo161L5kmBDWTI9AreayAtb9QRlhzeaTaybWFOO8 gUsADovX0jdH4nbFrnNRuXpDZQRG4wBncclj3mtwawfeVgIVTq5/trfE8aUnLescQgmr /1f/QEnK9f8Z0RCgON2IIzoRM6WFyATqcb+mgI50LHiqcX1CWU7xvmkTCK53Ru6BLbeA 5Lew== X-Gm-Message-State: AOAM531nrChajGV5DHqxaxMzepe4CirMNb6Z+roOqxe/PxIXRy2qfPvn iHWBxegKMHScOY7PJk8zDvs= X-Google-Smtp-Source: ABdhPJwrp0Mf40X39ayDYQaZPIKEHa33geCndDy3k+6AHVrSFU8iRBOFCl8TM5cvWBsMCjPX3/6o1w== X-Received: by 2002:a17:90b:4504:: with SMTP id iu4mr2405377pjb.76.1619153901799; Thu, 22 Apr 2021 21:58:21 -0700 (PDT) Received: from localhost ([103.77.0.212]) by smtp.gmail.com with ESMTPSA id i14sm3358231pfa.156.2021.04.22.21.58.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Apr 2021 21:58:21 -0700 (PDT) From: Utkarsh Singh To: Nicolas Goaziou Subject: Re: [PATCH] org-table-import: Make it more smarter for interactive use References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> <87k0oyfj4y.fsf@gmail.com> <87im4h9irn.fsf@nicolasgoaziou.fr> Date: Fri, 23 Apr 2021 10:28:24 +0530 In-Reply-To: <87im4h9irn.fsf@nicolasgoaziou.fr> (Nicolas Goaziou's message of "Tue, 20 Apr 2021 15:40:12 +0200") Message-ID: <87zgxpwqa7.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 2.3 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, On 2021-04-20, 15:40 +0200, Nicolas Goaziou wrote: > Again all this needs to extensively tested, as there are a lot of > dangers lurking around. Content analysis details: (2.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.214.170 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.170 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: 47885 Cc: 47885@debbugs.gnu.org, emacs-orgmode@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.3 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, On 2021-04-20, 15:40 +0200, Nicolas Goaziou wrote: > Again all this needs to extensively tested, as there are a lot of > dangers lurking around. Content analysis details: (1.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.214.170 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.170 listed in wl.mailspike.net] 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders --=-=-= Content-Type: text/plain Hi, On 2021-04-20, 15:40 +0200, Nicolas Goaziou wrote: > Again all this needs to extensively tested, as there are a lot of > dangers lurking around. I am attaching my patch which also include my previous suggestion of including yes-or-no prompt to org-table-import to allow file which don't have csv, tsv or txt as extension. Here are some concerns with require your attention: + When using org-table-import interactively if we failed to guess separator then we will be left with a user-error message and an 'unconverted table'. We can make use of 'temp-buffer' to import our file after successfully conversion. + Conversion part of org-table-convert-region make a distinction between '(4) (comma separator) and rest of the separator we should either string version of comma as AND condition or rewrite to simplify it. I am willing to do these possible changes but currently waiting for your review for org-table-guess-separator as there can be more serious bugs lurking around on my code which I am considering base for these changes. All the best, Utkarsh --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=org-table.patch Content-Description: org-table diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index 0e93fb271f..84bc981fec 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -846,6 +846,42 @@ org-table-create (goto-char pos)) (org-table-align))) + +(defun org-table-guess-separator (beg0 end0) + "Guess separator for `org-table-convert-region' for region BEG0 to END0. + +List of preferred separator: +comma, TAB, semicolon, colon or SPACE. + +If region contains a line which doesn't contain the required +separator then discard the separator and search again using next +separator." + (let* ((beg (save-excursion + (goto-char (min beg0 end0)) + (skip-chars-forward " \t\n") + (if (eobp) (point) (line-beginning-position)))) + (end (save-excursion + (goto-char (max beg end0)) + (skip-chars-backward " \t\n" beg) + (if (= beg (point)) (point) (line-end-position)))) + (sep-regexp '(("," (rx bol (1+ (not (or ?\n ?,))) eol)) + ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol)) + (";" (rx bol (1+ (not (or ?\n ?\;))) eol)) + (":" (rx bol (1+ (not (or ?\n ?:))) eol)) + (" " (rx bol (1+ (not (or ?' ?\" )) + (not (or ?\s ?\;)) + (not (or ?' ?\"))) eol)))) + sep) + (unless (= beg end) + (save-excursion + (goto-char beg) + (catch :found + (pcase-dolist (`(,sep ,regexp) sep-regexp) + (save-excursion + (unless (re-search-forward (eval regexp) end t) + (throw :found sep)))) + nil))))) + ;;;###autoload (defun org-table-convert-region (beg0 end0 &optional separator) "Convert region to a table. @@ -859,20 +895,19 @@ org-table-convert-region (4) Use the comma as a field separator (16) Use a TAB as field separator (64) Prompt for a regular expression as field separator -integer When a number, use that many spaces, or a TAB, as field separator -regexp When a regular expression, use it to match the separator -nil When nil, the command tries to be smart and figure out the - separator in the following way: - - when each line contains a TAB, assume TAB-separated material - - when each line contains a comma, assume CSV material - - else, assume one or more SPACE characters as separator." +integer When a number, use that many spaces, or a TAB, as field separator +regexp When a regular expression, use it to match the separator +nil When nil, the command tries to be smart and figure out the + separator using `org-table-guess-seperator'." (interactive "r\nP") (let* ((beg (min beg0 end0)) (end (max beg0 end0)) re) + (if (> (count-lines beg end) org-table-convert-region-max-lines) (user-error "Region is longer than `org-table-convert-region-max-lines' (%s) lines; not converting" org-table-convert-region-max-lines) + (when (equal separator '(64)) (setq separator (read-regexp "Regexp for field separator"))) (goto-char beg) @@ -881,17 +916,13 @@ org-table-convert-region (goto-char end) (if (bolp) (backward-char 1) (end-of-line 1)) (setq end (point-marker)) - ;; Get the right field separator - (unless separator - (goto-char beg) - (setq separator - (cond - ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) - ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) - (t 1)))) + (when (and (not separator) + (not (setq separator + (org-table-guess-separator (beg end))))) + (user-error "Failed to guess separator")) (goto-char beg) (if (equal separator '(4)) - (while (< (point) end) + (while (< (point) end) ;; parse the csv stuff (cond ((looking-at "^") (insert "| ")) @@ -905,7 +936,7 @@ org-table-convert-region (setq re (cond ((equal separator '(4)) "^\\|\"?[ \t]*,[ \t]*\"?") ((equal separator '(16)) "^\\|\t") - ((integerp separator) + ((integerp separator) (if (< separator 1) (user-error "Number of spaces in separator must be >= 1") (format "^ *\\| *\t *\\| \\{%d,\\}" separator))) @@ -921,12 +952,8 @@ org-table-convert-region (defun org-table-import (file separator) "Import FILE as a table. -The command tries to be smart and figure out the separator in the -following way: - -- when each line contains a TAB, assume TAB-separated material; -- when each line contains a comma, assume CSV material; -- else, assume one or more SPACE characters as separator. +The command tries to be smart and figure out the separator using +`org-table-guess-seperator'. When non-nil, SEPARATOR specifies the field separator in the lines. It can have the following values: @@ -938,7 +965,8 @@ org-table-import - regexp When a regular expression, use it to match the separator." (interactive "f\nP") (when (and (called-interactively-p 'any) - (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))) + (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)) + (not (yes-or-no-p "File does not havs .txt .txt .csv as extension. Do you still want to continue? "))) (user-error "Cannot import such file")) (unless (bolp) (insert "\n")) (let ((beg (point)) --=-=-= Content-Type: text/plain -- Utkarsh Singh http://utkarshsingh.xyz --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 27 16:21:32 2021 Received: (at 47885) by debbugs.gnu.org; 27 Apr 2021 20:21:32 +0000 Received: from localhost ([127.0.0.1]:51011 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lbUD5-0006y1-Lb for submit@debbugs.gnu.org; Tue, 27 Apr 2021 16:21:32 -0400 Received: from relay8-d.mail.gandi.net ([217.70.183.201]:60909) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lbUD2-0006xh-TQ for 47885@debbugs.gnu.org; Tue, 27 Apr 2021 16:21:30 -0400 X-Originating-IP: 185.131.40.67 Received: from localhost (40-67.ipv4.commingeshautdebit.fr [185.131.40.67]) (Authenticated sender: admin@nicolasgoaziou.fr) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id CDAC51BF208; Tue, 27 Apr 2021 20:21:21 +0000 (UTC) From: Nicolas Goaziou To: Utkarsh Singh Subject: Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> <87k0oyfj4y.fsf@gmail.com> <87im4h9irn.fsf@nicolasgoaziou.fr> <87zgxpwqa7.fsf@gmail.com> Date: Tue, 27 Apr 2021 22:21:20 +0200 In-Reply-To: <87zgxpwqa7.fsf@gmail.com> (Utkarsh Singh's message of "Fri, 23 Apr 2021 10:28:24 +0530") Message-ID: <875z07jx6n.fsf@nicolasgoaziou.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 47885 Cc: 47885@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hello, Utkarsh Singh writes: > I am attaching my patch which also include my previous suggestion of > including yes-or-no prompt to org-table-import to allow file which don't > have csv, tsv or txt as extension. I suggest to make several patches. Do not try to stuff as many changes as possible in a single one, please. > + When using org-table-import interactively if we failed to guess > separator then we will be left with a user-error message and an > 'unconverted table'. We can make use of 'temp-buffer' to import our > file after successfully conversion. I'm not sure to understand what you mean. > + Conversion part of org-table-convert-region make a distinction between > '(4) (comma separator) and rest of the separator we should either string > version of comma as AND condition or rewrite to simplify it. Ditto. But it can be the object of another patch. Let's concentrate on `org-table-guess-separator' first. > I am willing to do these possible changes but currently waiting for your > review for org-table-guess-separator as there can be more serious bugs > lurking around on my code which I am considering base for these > changes. You should definitely write tests for this function. Here's a start: (ert-deftest test-org-table/guess-separator () "Test `test-org-table/guess-separator'." ;; Test space separator. (should (equal " " (org-test-with-temp-text "a b\nc d" (org-table-guess-separator (point-min) (point-max))))) (should (equal " " (org-test-with-temp-text "a b\nc d" (org-table-guess-separator (point-min) (point-max))))) ;; Test "inverted" region. (should (equal " " (org-test-with-temp-text "a b\nc d" (org-table-guess-separator (point-max) (point-min))))) ;; Do not error on empty region. (should-not (org-test-with-temp-text "" (org-table-guess-separator (point-max) (point-min)))) (should-not (org-test-with-temp-text " \n" (org-table-guess-separator (point-max) (point-min))))) > + (end (save-excursion > + (goto-char (max beg end0)) This should be beg0 instead of beg above. > + (sep-regexp '(("," (rx bol (1+ (not (or ?\n ?,))) eol)) > + ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol)) > + (";" (rx bol (1+ (not (or ?\n ?\;))) eol)) > + (":" (rx bol (1+ (not (or ?\n ?:))) eol)) > + (" " (rx bol (1+ (not (or ?' ?\" )) > + (not (or ?\s ?\;)) > + (not (or ?' ?\"))) eol)))) Use (sep-regexp (list (list "," (rx bol (1+ (not (or ?\n ?,))) eol)) (list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol)) (list ";" (rx bol (1+ (not (or ?\n ?\;))) eol)) (list ":" (rx bol (1+ (not (or ?\n ?:))) eol)) (list " " (rx bol (1+ (not (or ?' ?\" )) (not (or ?\s ?\;)) (not (or ?' ?\"))) eol)))) so you don't need eval below, and rx forms become constants when compiled. > + sep) This `sep' binding can be removed. > + (unless (= beg end) > + (save-excursion > + (goto-char beg) > + (catch :found > + (pcase-dolist (`(,sep ,regexp) sep-regexp) > + (save-excursion > + (unless (re-search-forward (eval regexp) end t) You can drop the `eval'. > (when (and (called-interactively-p 'any) > - (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))) > + (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)) > + (not (yes-or-no-p "File does not havs .txt .txt .csv as extension. Do you still want to continue? "))) "does not have" and ".txt" -> ".tsv" I guess. Also please provide a patch with a commit message, possibly using `git format-patch'. Thanks! Regards, -- Nicolas Goaziou From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 28 04:37:45 2021 Received: (at 47885) by debbugs.gnu.org; 28 Apr 2021 08:37:45 +0000 Received: from localhost ([127.0.0.1]:51676 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lbfhY-0002V1-LQ for submit@debbugs.gnu.org; Wed, 28 Apr 2021 04:37:45 -0400 Received: from mail-pf1-f176.google.com ([209.85.210.176]:36848) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lbfhS-0002Uj-Om for 47885@debbugs.gnu.org; Wed, 28 Apr 2021 04:37:43 -0400 Received: by mail-pf1-f176.google.com with SMTP id c3so24475455pfo.3 for <47885@debbugs.gnu.org>; Wed, 28 Apr 2021 01:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=lDvzLBEA8+XP2xpA2a1Xwy7He2PmxTZ3ce+S38iHNg0=; b=Gs1Ehbr2LFvv4Rohu0Pj8yB/qTh94aAaXbtPOS1vusGfzY8eS7A3OdB/1paeFSOeo7 Lyj9lVV+uIz5fK6RGDmidoUCEmbi0MzkRU3yceMlvMVTxV/t4E7BxyZFx1550zdG1w2b ftyHaz9NOBX2lm91CvJmMw/aaKPemU+UMkdnJtlZ0jAwc2jzPsq0BFT2JZl/W/6TL/l2 RtAZmewaswtNHIzfXU3jnMRExT72h5pfY/T8sXZ6PE6rr4GgqeIEu9EFcHi0Wzgk+voY BEMIep81qBpetuKrAkX1gniQqhVklLj5evjmb5HeLD8OwaiZBBrPFEAbB0D3lNIhEUVV 6eEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=lDvzLBEA8+XP2xpA2a1Xwy7He2PmxTZ3ce+S38iHNg0=; b=Lhp34zcpa563ud4c+GDWfVvZ4vYkuB4+ihiq+AnQWmaNjcnMTEB5Z//RnFqiHaQ6Hm zM9SSRuxq1TCInPw8sbGve/FnJfz/XhfuU43wjo/0balvS4U2N4zbbUXeCvkLIzfj8Rf DXzqPNP2TfAOTgBSkpc48C0qg62mrMnhu8cCdUy7oa3G4+sdCypjWfk26VNejygkIa6S V1GLI1W245tcWJcKaRK87S5Rcr7Uvm7oitcx6FDQvZAFJos6PTXy13QUR9Z5JeRuRMor 249jGTGHZ38b/PMN5gUPwcA8jZToSJQZ772Mpco7cZ4x+YQtqeP0koJpiV8De1qSb0Gy E7ZQ== X-Gm-Message-State: AOAM5316NhxGoQ+NXP2u7o8UhDJwHF0YtUPwK/OM2wCzrdEnsmt3HMuS au16xxIkLrXkvwwsH6n84RULTcRJTfE= X-Google-Smtp-Source: ABdhPJxj5kiAqgPquQINHbjSBetKh8WIIGh/NILBgmSKjVw0jC+V+Ej2l1Vu/CMMUOJKbQ1+qlVGbA== X-Received: by 2002:a05:6a00:1a12:b029:272:bfa6:867f with SMTP id g18-20020a056a001a12b0290272bfa6867fmr20131640pfv.6.1619599052719; Wed, 28 Apr 2021 01:37:32 -0700 (PDT) Received: from localhost ([43.230.65.99]) by smtp.gmail.com with ESMTPSA id j23sm4586075pfh.179.2021.04.28.01.37.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 01:37:32 -0700 (PDT) From: Utkarsh Singh To: Nicolas Goaziou Subject: Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> <87k0oyfj4y.fsf@gmail.com> <87im4h9irn.fsf@nicolasgoaziou.fr> <87zgxpwqa7.fsf@gmail.com> <875z07jx6n.fsf@nicolasgoaziou.fr> Date: Wed, 28 Apr 2021 14:07:37 +0530 In-Reply-To: <875z07jx6n.fsf@nicolasgoaziou.fr> (Nicolas Goaziou's message of "Tue, 27 Apr 2021 22:21:20 +0200") Message-ID: <87tunqby9a.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 2.3 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, On 2021-04-27, 22:21 +0200, Nicolas Goaziou wrote: >> + When using org-table-import interactively if we failed to guess >> separator then we will be left with a user-error message and an >> 'unconverted table'. We can make use of 'temp-buffer' to impo [...] Content analysis details: (2.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.210.176 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.176 listed in list.dnswl.org] 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] 0.0 T_SPF_HELO_TEMPERROR SPF: test of HELO record failed (temperror) X-Debbugs-Envelope-To: 47885 Cc: 47885@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, On 2021-04-27, 22:21 +0200, Nicolas Goaziou wrote: >> + When using org-table-import interactively if we failed to guess >> separator then we will be left with a user-error message and an >> 'unconverted table'. We can make use of 'temp-buffer' to impo [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.210.176 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.176 listed in list.dnswl.org] 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (utkarsh190601[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (utkarsh190601[at]gmail.com) 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: utkarshsingh.xyz (xyz)] -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager --=-=-= Content-Type: text/plain Hi, On 2021-04-27, 22:21 +0200, Nicolas Goaziou wrote: >> + When using org-table-import interactively if we failed to guess >> separator then we will be left with a user-error message and an >> 'unconverted table'. We can make use of 'temp-buffer' to import our >> file after successfully conversion. > > I'm not sure to understand what you mean. Note: I will advice you to apply patch no. 2 before trying out the following example. 1. Download the attached CSV file. We can call this example.csv 2. Go to *scratch* buffer. 3. Use 'M-x org-table-import' to import example.csv as org-table. You will see even thought org-table-guess-separator failed in guessing separator we are still left with unconverted region added to our buffer. >> + Conversion part of org-table-convert-region make a distinction between >> '(4) (comma separator) and rest of the separator we should either string >> version of comma as AND condition or rewrite to simplify it. > > Ditto. But it can be the object of another patch. Let's concentrate on > `org-table-guess-separator' first. > >> I am willing to do these possible changes but currently waiting for your >> review for org-table-guess-separator as there can be more serious bugs >> lurking around on my code which I am considering base for these >> changes. > > You should definitely write tests for this function. Here's a start: > > (ert-deftest test-org-table/guess-separator () > "Test `test-org-table/guess-separator'." > ;; Test space separator. > (should > (equal " " > (org-test-with-temp-text "a b\nc d" > (org-table-guess-separator (point-min) (point-max))))) > (should > (equal " " > (org-test-with-temp-text "a b\nc d" > (org-table-guess-separator (point-min) (point-max))))) > ;; Test "inverted" region. > (should > (equal " " > (org-test-with-temp-text "a b\nc d" > (org-table-guess-separator (point-max) (point-min))))) > ;; Do not error on empty region. > (should-not > (org-test-with-temp-text "" > (org-table-guess-separator (point-max) (point-min)))) > (should-not > (org-test-with-temp-text " \n" > (org-table-guess-separator (point-max) (point-min))))) > I will surely do more testing. I would also like to simplify the condition for guessing SPACE as separator due to following cases: + field1 'this is field2' 'this is field3' :: In this case we still have SPACE inside quote (' in this case). + Since SPACE is our last valid separator I think searching for a line which doesn't contains space is more than enough. Required patch: --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-org-table.el-org-table-import-add-yes-and-no-prompt.patch Content-Description: patch1 >From 6b112927de73c43edfd08254217808ebff42772a Mon Sep 17 00:00:00 2001 From: Utkarsh Singh Date: Wed, 28 Apr 2021 10:26:46 +0530 Subject: [PATCH 1/3] org-table.el (org-table-import): add yes-and-no prompt Add a yes and no prompt for files which don't have .txt, .tsv OR .csv as file extensions. --- lisp/org/org-table.el | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index 0e93fb271f..e0b2be6892 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -938,7 +938,8 @@ org-table-import - regexp When a regular expression, use it to match the separator." (interactive "f\nP") (when (and (called-interactively-p 'any) - (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))) + (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)) + (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension. Do you still want to continue? "))) (user-error "Cannot import such file")) (unless (bolp) (insert "\n")) (let ((beg (point)) -- 2.31.1 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0002-org-table.el-org-table-convert-region-move-out-separ.patch Content-Description: patch2 >From 9bb017cfc8284075e04faf5496ed560ba48d5bbc Mon Sep 17 00:00:00 2001 From: Utkarsh Singh Date: Wed, 28 Apr 2021 10:42:32 +0530 Subject: [PATCH 2/3] org-table.el (org-table-convert-region): move out separator-guessing 1. Move separator guessing code to org-table-guess-separator (new function). 2. Add semicolon, colon and SPACE to the list of know separator (separator which we can guess). --- lisp/org/org-table.el | 49 +++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index e0b2be6892..295f7a9b90 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -846,6 +846,39 @@ org-table-create (goto-char pos)) (org-table-align))) +(defun org-table-guess-separator (beg0 end0) + "Guess separator for region BEG0 to END0. + +List of preferred separator (in order of preference): +comma, TAB, semicolon, colon or SPACE. + +Search for a line which doesn't contain a separator if found +search again using next preferred separator or else return +separator as string." + (let* ((beg (save-excursion + (goto-char (min beg0 end0)) + (skip-chars-forward " \t\n") + (if (eobp) (point) (line-beginning-position)))) + (end (save-excursion + (goto-char (max beg0 end0)) + (skip-chars-backward " \t\n" beg) + (if (= beg (point)) (point) (line-end-position)))) + (sep-regexp + (list (list "," (rx bol (1+ (not (or ?\n ?,))) eol)) + (list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol)) + (list ";" (rx bol (1+ (not (or ?\n ?\;))) eol)) + (list ":" (rx bol (1+ (not (or ?\n ?:))) eol)) + (list " " (rx bol (1+ (not (or ?\n ?\s))) eol))))) + (unless (= beg end) + (save-excursion + (goto-char beg) + (catch :found + (pcase-dolist (`(,sep ,regexp) sep-regexp) + (save-excursion + (unless (re-search-forward regexp end t) + (throw :found sep)))) + nil))))) + ;;;###autoload (defun org-table-convert-region (beg0 end0 &optional separator) "Convert region to a table. @@ -862,10 +895,7 @@ org-table-convert-region integer When a number, use that many spaces, or a TAB, as field separator regexp When a regular expression, use it to match the separator nil When nil, the command tries to be smart and figure out the - separator in the following way: - - when each line contains a TAB, assume TAB-separated material - - when each line contains a comma, assume CSV material - - else, assume one or more SPACE characters as separator." + separator using `org-table-guess-seperator'." (interactive "r\nP") (let* ((beg (min beg0 end0)) (end (max beg0 end0)) @@ -882,13 +912,10 @@ org-table-convert-region (if (bolp) (backward-char 1) (end-of-line 1)) (setq end (point-marker)) ;; Get the right field separator - (unless separator - (goto-char beg) - (setq separator - (cond - ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) - ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) - (t 1)))) + (when (and (not separator) + (not (setq separator + (org-table-guess-separator beg end)))) + (user-error "Failed to guess separator")) (goto-char beg) (if (equal separator '(4)) (while (< (point) end) -- 2.31.1 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0003-org-table.el-org-table-import-add-file-prompt.patch Content-Description: patch3 >From fef97ffe27ff908647c45f1b066a845e71a0926f Mon Sep 17 00:00:00 2001 From: Utkarsh Singh Date: Wed, 28 Apr 2021 14:01:31 +0530 Subject: [PATCH 3/3] org-table.el (org-table-import): add file prompt --- lisp/org/org-table.el | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index 295f7a9b90..e904903576 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -963,7 +963,8 @@ org-table-import - (64) Prompt for a regular expression as field separator. - integer When a number, use that many spaces, or a TAB, as field separator. - regexp When a regular expression, use it to match the separator." - (interactive "f\nP") + (interactive (list (read-file-name "Import file: ") + (prefix-numeric-value current-prefix-arg))) (when (and (called-interactively-p 'any) (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)) (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension. Do you still want to continue? "))) -- 2.31.1 --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=example.csv Content-Transfer-Encoding: base64 Content-Description: csv file dGhpcyxpcyxhbixleGFtcGxlLGNzdgpvbixmaXJzdCx0d28sbGluZSx3ZSxoYXZlLGNvbW1hLGFz LHNlcGVyYXRvcgpidXQ6bm93OndlOnN3aXRjaGVkOnRvOmNvbG9uOmFzOnNlcGFyYXRvcgo= --=-=-= Content-Type: text/plain -- Utkarsh Singh http://utkarshsingh.xyz --=-=-=--