From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 24 12:02:59 2023 Received: (at submit) by debbugs.gnu.org; 24 Mar 2023 16:02:59 +0000 Received: from localhost ([127.0.0.1]:41201 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pfjsV-0005pJ-Mj for submit@debbugs.gnu.org; Fri, 24 Mar 2023 12:02:59 -0400 Received: from lists.gnu.org ([209.51.188.17]:42200) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pfjsQ-0005p6-5G for submit@debbugs.gnu.org; Fri, 24 Mar 2023 12:02:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pfjsG-0008Tf-WC for bug-guix@gnu.org; Fri, 24 Mar 2023 12:02:43 -0400 Received: from mail-lf1-f43.google.com ([209.85.167.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pfjsC-0001kR-Cj for bug-guix@gnu.org; Fri, 24 Mar 2023 12:02:39 -0400 Received: by mail-lf1-f43.google.com with SMTP id h25so2864203lfv.6 for ; Fri, 24 Mar 2023 09:02:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679673744; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Yvvj+EjVcaTO1rDgaf2FPhvvj3/lVC63F4iYYT+qynQ=; b=Lf39o91fNXB3aCVcQkDrqdBk8XJMM06BrSR2uaDb111ekDPWWqDm6LKaiOREZu0nn3 3KyEm1aZuDPMaZn3Fsr0b8PpCxr6ei/FlyGG/NmJ3mpXyeBppDEsUQi+nURbiRUnK4gU Tvm3gj6D15ZxBV6m1Rt+Y07b9unyP6BCog3Z2iaApkKIeSgVL3DjTm8Q0a7moIhp48n6 aOrHd7iQQS7tf3ZN2Y6hlFORQoHgobUuKd0KRMorA+IimXHX+yPGvBCJQ2p/7agh+fct mbRLCVGXtHE31W4G/rFObDqPVAY8ZaRpoAef9UXE/ZCkzLyxAbqQAV5IFD7h/B/xnJ31 4tdg== X-Gm-Message-State: AAQBX9cprJVDOLqAQOv3pDcpc9IucsO4pwF94YDEyVqMgAf/AIOGTWy1 X8ZBEz7MXH2Kn2hEq0ElMUDh5lLuCWBNVnJwfGPAPuA46JpLQYQZ X-Google-Smtp-Source: AKy350Y+DUCPsLVHGlilqVoQ4l787YaMdhWVBsYxMNVKyNDCqVq8rZdDHx7puORgJ8FXn22MZsbh85s24wdRRcotdRg= X-Received: by 2002:a17:906:9619:b0:932:446:b2f7 with SMTP id s25-20020a170906961900b009320446b2f7mr1483828ejx.6.1679672336311; Fri, 24 Mar 2023 08:38:56 -0700 (PDT) MIME-Version: 1.0 From: Hugo Buddelmeijer Date: Fri, 24 Mar 2023 16:38:44 +0100 Message-ID: Subject: IRC channel log search results are not chronological for recent logs To: bug-guix@gnu.org, Ricardo Wurmus Content-Type: multipart/alternative; boundary="000000000000f1e90105f7a7303b" Received-SPF: pass client-ip=209.85.167.43; envelope-from=blackshift@gmail.com; helo=mail-lf1-f43.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.1 (--) --000000000000f1e90105f7a7303b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi all, Ricardo, Searching through the IRC channel logs on https://logs.guix.gnu.org/ will show a list of matches sorted on date in descending order, except for matches from this February or March, those are at the bottom, often beyond the 100 match limit. For example, 'vdirsyncer' results in 31 matches (at the time of writing): https://logs.guix.gnu.org/guix/search?query=3Dvdirsyncer > 2023-01-10 [15:09:09] this machine has installed emacs, emacs-guix, ... > 2023-01-10 [15:12:26] For context, =E2=80=98guix size emacs emacs-= guix ... > 2022-01-18 [04:43:24] At least, vdirsyncer builds when you simply ... > 2022-01-17 [16:29:41] Hey there :) I'm currently an Arch ... > 2020-11-30 [23:29:57] jonsger: No, I'm not using radicale. It was ... > 2020-11-30 [23:31:08] sneek: later tell jonsger: No, I'm not using ... > 2020-04-29 [09:34:26] it also came up in vdirsyncer on ... > ... > 2016-01-24 [22:45:28] I don't even think you can run vdirsyncer ..= . > 2015-12-10 [00:10:51] All that and vdirsyncer doesn't even build ... > 2015-12-09 [22:39:51] https://github.com/untitaker/vdirsyncer/ ... > 2023-02-25 [03:03:54] "#61557 - vdirsyncer fails to verify ... > 2023-02-25 [03:08:01] "vdirsyncer fails to verify ... > 2023-02-25 [03:09:41] nckx: hmmm when I searched mobile ... > 2023-02-25 [03:10:49] ok yeah, it's just not tagged or ... > 2023-02-25 [03:36:53] "vdirsyncer fails to verify ... > 2023-02-25 [03:38:16] lechner: no, against vdirsyncer > 2023-02-25 [03:46:06] "vdirsyncer fails to verify certificates" All hits from February and March of this year are at the bottom of the list, while the rest is in chronological order. (The 'vdirsyncer' example was chosen because it occurs regularly, but not too often.) The list cuts off after about 100 matches, so it is impossible to find recent matches for more popular terms.The most recent chats are usually more interesting, for example when debugging an issue that occured recently. E.g. a search for Python shows nothing beyond 2023-01-31: https://logs.guix.gnu.org/guix/search?query=3Dpython So my question is, can we improve the sort order of the IRC logs? I did a bit of investigating myself and discovered the maintenance repository with the hydra directory. There is so much to learn from that directory. However, I could not really figure out what could be the problem. My hypothesis, which is more like a wild guess: - It seems the sorting is done implicitly by xapian, which will just return the matching lines in whatever order they are inserted. - Something went wrong at the transition between January 31th and February 1th, that required manual cleanup. Evidence: there are logs with a tilde in the filename, 2023-01-31.log~ and 2023-02-01.log~. - The database was emptied and repopulated to prevent entries from early in the morning of 2023-02-01 to be counted as beyond-midnight on 2023-01-31. This put all the lines in the correct order, hence correct sorting up till then. - Subsequent lines are added with the mcron job and are therefore at the end of the database, and thus at the end of the result set (beyond the limit of 100). Side note: the ~ files cause some lines to show up three times, e.g. https://logs.guix.gnu.org/guix/search?query=3D557816d497d3e9d25901370903d51= 2d6f6991aa3 > 2023-01-31 [04:52:19] dcunit3d: here's another great config: https://github.com/jsoo1/dotfiles/blob/557816d497d3e9d25901370903d512d6f699= 1aa3/emacs/init.el > 2023-02-01.log~ [04:52:19] dcunit3d: here's another great config: https://github.com/jsoo1/dotfiles/blob/557816d497d3e9d25901370903d512d6f699= 1aa3/emacs/init.el > 2023-01-31.log~ [04:52:19] dcunit3d: here's another great config: https://github.com/jsoo1/dotfiles/blob/557816d497d3e9d25901370903d512d6f699= 1aa3/emacs/init.el Side side note: those ~ entries cannot be clicked on, because (define stamp (basename file-name ".log")) lets goggles think that the ".log" is part of the date. What I don't understand is why the matches are not sorted correctly. It seems to me that (Enquire-set-sort-by-value enq 0 #f) would sort by the value of slot 0, which seems to be the date-stamp. But I don't really have a good mental model of how xapian works or what value slots actually are. (Maybe value slots start at 1 and selecting 0 means do not use any of them?= ) I tried to compare the results of #guix with those of other channels, but it seems that the logs of most other channels are either not indexed at all, or inconsistently. For example, searching for ACTION (which is a "/me" command it seems) in #spritely shows only 11 matches spread over 5 days, while it is a very common occurrence: https://logs.guix.gnu.org/spritely/search?query=3DACTION Cheers, Hugo --000000000000f1e90105f7a7303b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all, Ricardo,

<= div>Searching through the IRC channel logs on https://logs.guix.gnu.org/ will show a list of matches sorted= on date in descending order, except for matches from this February or Marc= h, those are at the bottom, often beyond the 100 match limit.
For example, 'vdirsyncer' results in 31 matches (at the= time of writing): https://logs.guix.gnu.org/guix/search?query=3Dvdirsyncer=

> 2023-01-10 [15:09:09] <elb> this machi= ne has installed emacs, emacs-guix, ...
> 2023-01-10 [15:12:26] <n= ckx> For context, =E2=80=98guix size emacs emacs-guix ...
> 2022-0= 1-18 [04:43:24] <lfam> At least, vdirsyncer builds when you simply ..= .
> 2022-01-17 [16:29:41] <johnhamelink> Hey there :) I'm c= urrently an Arch ...
> 2020-11-30 [23:29:57] <lfam> jonsger: No= , I'm not using radicale. It was ...
> 2020-11-30 [23:31:08] <= lfam> sneek: later tell jonsger: No, I'm not using ...
> 2020-= 04-29 [09:34:26] <efraim> it also came up in vdirsyncer on ...
>= ; ...
> 2016-01-24 [22:45:28] <lfam> I don't even th= ink you can run vdirsyncer ...
> 2015-12-10 [00:10:51] <lfam> A= ll that and vdirsyncer doesn't even build ...
> 2015-12-09 [22:39= :51] <lfam> http= s://github.com/untitaker/vdirsyncer/ ...
> 2023-02-25 [03:03:54] = <fruit-loops> "#61557 - vdirsyncer fails to verify ...
> 2= 023-02-25 [03:08:01] <fruit-loops> "vdirsyncer fails to verify .= ..
> 2023-02-25 [03:09:41] <elb> nckx: hmmm when I searched mob= ile ...
> 2023-02-25 [03:10:49] <elb> ok yeah, it's just no= t tagged or ...
> 2023-02-25 [03:36:53] <fruit-loops> "vdi= rsyncer fails to verify ...
> 2023-02-25 [03:38:16] <elb> lechn= er: no, against vdirsyncer
> 2023-02-25 [03:46:06] <fruit-loops>= ; "vdirsyncer fails to verify certificates"

<= div>All hits from February and March of this year are at the bottom of the = list, while the rest is in chronological order. (The 'vdirsyncer' = example was chosen because it occurs regularly, but not too often.) The lis= t cuts off after about 100 matches, so it is impossible to find recent matc= hes for more popular terms.The most recent chats are usually more interesti= ng, for example when debugging an issue that occured recently. E.g. a searc= h for Python shows nothing beyond 2023-01-31:

So my question is, can= we improve the sort order of the IRC logs?

I did a bit of investigating myself and = discovered the maintenance repository with the hydra directory. There is so= much to learn from that directory.

However, I= could not really figure out what could be the problem. My hypothesis, whic= h is more like a wild guess:
- It seems the sorting is done i= mplicitly by xapian, which will just return the matching lines in whatever = order they are inserted.
- Something went wrong at the transition= between January 31th and February 1th, that required manual cleanup. Evide= nce: there are logs with a tilde in the filename, 2023-01-31.log~ and 2023-= 02-01.log~.
- The database was emptied and repopulated to prevent= entries from early in the morning of 2023-02-01 to be counted as beyond-mi= dnight on 2023-01-31. This put all the lines in the correct order, hence co= rrect sorting up till then.
- Subsequent lines are add= ed with the mcron job and are therefore at the end of the database, and thu= s at the end of the result set (beyond the limit of 100).

Side side note: those ~ entries = cannot be clicked on, because (define stamp (basename file-name ".log&= quot;)) lets goggles think that the ".log" is part of the date.

What I don't understand is why t= he matches are not sorted correctly. It seems to me that (Enquire-set-sort-= by-value enq 0 #f) would sort by the value of slot 0, which seems to be the= date-stamp. But I don't really have a good mental model of how xapian = works or what value slots actually are. (Maybe value slots start at 1 and s= electing 0 means do not use any of them?)

I tried to compare the results of #guix with those of other channels= , but it seems that the logs of most other channels are either not indexed = at all, or inconsistently. For example, searching for ACTION (which is a &q= uot;/me" command it seems) in #spritely shows only 11 matches spread o= ver 5 days, while it is a very common occurrence:

Cheers,
Hugo







--000000000000f1e90105f7a7303b--