GNU bug report logs - #32592
s with i modifier seems to work incorrectly

Previous Next

Package: sed;

Reported by: Saito Takaaki <tails.saito <at> gmail.com>

Date: Thu, 30 Aug 2018 14:44:01 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #17 received at 32592 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Saito Takaaki <tails.saito <at> gmail.com>, 32592 <at> debbugs.gnu.org,
 bug-gnulib <at> gnu.org
Cc: bill-auger <bill-auger <at> peers.community>, Eric Blake <eblake <at> redhat.com>,
 Jim Meyering <jim <at> meyering.net>
Subject: bug#32592: heap-use-after-free in regex module (was: s with i
 modifier seems to work incorrectly)
Date: Wed, 5 Sep 2018 01:32:27 -0600
(adding gnulib)

On 04/09/18 07:02 PM, Saito Takaaki wrote:
[... discussing a sed bug ...]
> However, a friend showed me a more complex case which is
> problematic even with sed 4.4 on ideone.  The last two lines of the
> output (for the identical input lines) are  particularly interesting.
> https://ideone.com/Sq5xJX
> 
> I hope this helps even a bit.

Thank you for persisting with this bug.

The linked snippet you provided exposed a heap-use-after-free bug
in gnulib's regex module (possibly in glibc as well).

A simple way to reproduce with latest sed:

  cd sed
  ./bootstrap
  ./configure --with-included-regex
  make
  echo 'abcdefghijklmns!!!!!!!!!!' \
     | valgrind ./sed/sed -E 'h;G;s/((.).+(.))(.*\n.*\1)/\2-\3\4/i'

Results in a use-after-free relating to the back-references (valgrind
output below). There's some interplay with the input length - if the 
exclamation marks are removed, the bug is not triggered.
The bug does not trigger without the case-insensitive flag (s///i).

This is easier to trigger with gnulib (hence --with-included-regex)
but happens also with glibc's regex module.

This could also mean that the bug you previously reported and I surmised
was fixed is not fixed at all - could be that it was just much harder to
trigger with later sed versions.

I'm still learning the code so don't have a fix yet.

comments welcomed,
 - assaf

=========================

==13408== Memcheck, a memory error detector
==13408== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13408== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for 
copyright info
==13408== Command: ./sed/sed -E h;G;s/((.).+(.))(.*\\n.*\\1)/\\2-\\3\\4/i
==13408==
==13408== Invalid read of size 1
==13408==    at 0x123857: get_subexp (regexec.c:2747)
==13408==    by 0x123857: transit_state_bkref.isra.32 (regexec.c:2561)
==13408==    by 0x123BDC: merge_state_with_log (regexec.c:2345)
==13408==    by 0x1248B8: check_matching (regexec.c:1135)
==13408==    by 0x1248B8: re_search_internal (regexec.c:802)
==13408==    by 0x12921E: re_search_stub (regexec.c:424)
==13408==    by 0x12995F: rpl_re_search (regexec.c:289)
==13408==    by 0x111C84: match_regex (regexp.c:358)
==13408==    by 0x110205: do_subst (execute.c:1015)
==13408==    by 0x110205: execute_program (execute.c:1536)
==13408==    by 0x11145A: process_files (execute.c:1673)
==13408==    by 0x10B23B: main (sed.c:360)
==13408==  Address 0x56096d0 is 16 bytes inside a block of size 42 free'd
==13408==    at 0x4C2DDCF: realloc (vg_replace_malloc.c:785)
==13408==    by 0x11BF43: re_string_realloc_buffers (regex_internal.c:167)
==13408==    by 0x11CA8C: extend_buffers (regexec.c:4057)
==13408==    by 0x11CBBA: clean_state_log_if_needed (regexec.c:1697)
==13408==    by 0x123967: get_subexp (regexec.c:2778)
==13408==    by 0x123967: transit_state_bkref.isra.32 (regexec.c:2561)
==13408==    by 0x123BDC: merge_state_with_log (regexec.c:2345)
==13408==    by 0x1248B8: check_matching (regexec.c:1135)
==13408==    by 0x1248B8: re_search_internal (regexec.c:802)
==13408==    by 0x12921E: re_search_stub (regexec.c:424)
==13408==    by 0x12995F: rpl_re_search (regexec.c:289)
==13408==    by 0x111C84: match_regex (regexp.c:358)
==13408==    by 0x110205: do_subst (execute.c:1015)
==13408==    by 0x110205: execute_program (execute.c:1536)
==13408==    by 0x11145A: process_files (execute.c:1673)
==13408==  Block was alloc'd at
==13408==    at 0x4C2DDCF: realloc (vg_replace_malloc.c:785)
==13408==    by 0x11BF43: re_string_realloc_buffers (regex_internal.c:167)
==13408==    by 0x11CA8C: extend_buffers (regexec.c:4057)
==13408==    by 0x124A1A: check_matching (regexec.c:1125)
==13408==    by 0x124A1A: re_search_internal (regexec.c:802)
==13408==    by 0x12921E: re_search_stub (regexec.c:424)
==13408==    by 0x12995F: rpl_re_search (regexec.c:289)
==13408==    by 0x111C84: match_regex (regexp.c:358)
==13408==    by 0x110205: do_subst (execute.c:1015)
==13408==    by 0x110205: execute_program (execute.c:1536)
==13408==    by 0x11145A: process_files (execute.c:1673)
==13408==    by 0x10B23B: main (sed.c:360)
==13408==
==13408== Invalid read of size 1
==13408==    at 0x12385C: get_subexp (regexec.c:2747)
==13408==    by 0x12385C: transit_state_bkref.isra.32 (regexec.c:2561)
==13408==    by 0x123BDC: merge_state_with_log (regexec.c:2345)
==13408==    by 0x1248B8: check_matching (regexec.c:1135)
==13408==    by 0x1248B8: re_search_internal (regexec.c:802)
==13408==    by 0x12921E: re_search_stub (regexec.c:424)
==13408==    by 0x12995F: rpl_re_search (regexec.c:289)
==13408==    by 0x111C84: match_regex (regexp.c:358)
==13408==    by 0x110205: do_subst (execute.c:1015)
==13408==    by 0x110205: execute_program (execute.c:1536)
==13408==    by 0x11145A: process_files (execute.c:1673)
==13408==    by 0x10B23B: main (sed.c:360)
==13408==  Address 0x56096ea is 0 bytes after a block of size 42 free'd
==13408==    at 0x4C2DDCF: realloc (vg_replace_malloc.c:785)
==13408==    by 0x11BF43: re_string_realloc_buffers (regex_internal.c:167)
==13408==    by 0x11CA8C: extend_buffers (regexec.c:4057)
==13408==    by 0x11CBBA: clean_state_log_if_needed (regexec.c:1697)
==13408==    by 0x123967: get_subexp (regexec.c:2778)
==13408==    by 0x123967: transit_state_bkref.isra.32 (regexec.c:2561)
==13408==    by 0x123BDC: merge_state_with_log (regexec.c:2345)
==13408==    by 0x1248B8: check_matching (regexec.c:1135)
==13408==    by 0x1248B8: re_search_internal (regexec.c:802)
==13408==    by 0x12921E: re_search_stub (regexec.c:424)
==13408==    by 0x12995F: rpl_re_search (regexec.c:289)
==13408==    by 0x111C84: match_regex (regexp.c:358)
==13408==    by 0x110205: do_subst (execute.c:1015)
==13408==    by 0x110205: execute_program (execute.c:1536)
==13408==    by 0x11145A: process_files (execute.c:1673)
==13408==  Block was alloc'd at
==13408==    at 0x4C2DDCF: realloc (vg_replace_malloc.c:785)
==13408==    by 0x11BF43: re_string_realloc_buffers (regex_internal.c:167)
==13408==    by 0x11CA8C: extend_buffers (regexec.c:4057)
==13408==    by 0x124A1A: check_matching (regexec.c:1125)
==13408==    by 0x124A1A: re_search_internal (regexec.c:802)
==13408==    by 0x12921E: re_search_stub (regexec.c:424)
==13408==    by 0x12995F: rpl_re_search (regexec.c:289)
==13408==    by 0x111C84: match_regex (regexp.c:358)
==13408==    by 0x110205: do_subst (execute.c:1015)
==13408==    by 0x110205: execute_program (execute.c:1536)
==13408==    by 0x11145A: process_files (execute.c:1673)
==13408==    by 0x10B23B: main (sed.c:360)
==13408==
a-!!!!!!!!!!
abcdefghijklmns!!!!!!!!!!
==13408==
==13408== HEAP SUMMARY:
==13408==     in use at exit: 1,840 bytes in 5 blocks
==13408==   total heap usage: 1,131 allocs, 1,126 frees, 205,127 bytes 
allocated
==13408==
==13408== LEAK SUMMARY:
==13408==    definitely lost: 0 bytes in 0 blocks
==13408==    indirectly lost: 0 bytes in 0 blocks
==13408==      possibly lost: 0 bytes in 0 blocks
==13408==    still reachable: 1,840 bytes in 5 blocks
==13408==         suppressed: 0 bytes in 0 blocks
==13408== Rerun with --leak-check=full to see details of leaked memory
==13408==
==13408== For counts of detected and suppressed errors, rerun with: -v
==13408== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)




This bug report was last modified 6 years and 281 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.