GNU bug report logs -
#60544
sort hangs on lengthy line with invalid UTF8 characters
Previous Next
Full log
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
sort seems to do extra computations on long line with invalid UTF8 characters and could hang for days on just two lines.
Here is the minimal example I could make to reproduce the bug:
$ perl -e 'print "\xcd\xe5\xe0"; print "\n"' > file1
$ perl -e 'print "\xcd\xe5\xe0"x1000; print "\n"' > file2
To verify:
$ ls -l file*
-rw-rw-r-- 1 u u 4 Jan 4 12:13 file1
-rw-rw-r-- 1 u u 3001 Jan 4 12:13 file2
$ xxd -p file1
cde5e00a
$ xxd -p file2
cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0
[...]
cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0cde5e0
0a
Then:
$ export LC_ALL=en_US.UTF8
$ time sort --debug file1 file2
sort: using 'en_US.UTF8' sorting rules
[...]
real 0m1.951s
user 0m1.951s
sys 0m0.000s
It took nearly two seconds to sort two lines from two files.
If I replace the \xe0 with \x61 in the first (small) file, the time gets down to milliseconds:
$ perl -e 'print "\xcd\xe5\x61"; print "\n"' > file3
$ time sort --debug file3 file2
sort: using 'en_US.UTF8' sorting rules
[...]
real 0m0.007s
user 0m0.003s
sys 0m0.003s
The time it takes increases when one of the file gets larger, see for instance with 2k repetitions:
$ perl -e 'print "\xcd\xe5\xe0"x2000; print "\n"' > file4
$ time sort --debug file1 file4
sort: using 'en_US.UTF8' sorting rules
[...]
real 0m7.696s
user 0m7.690s
sys 0m0.004s
Expectedly, sort should take milliseconds at most in all cases for two moderately long lines.
$ uname -a
Linux 5.13.0-51-generic #58~20.04.1-Ubuntu SMP Tue Jun 14 11:29:12 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ apt list installed coreutils
coreutils/focal,now 8.30-3ubuntu2 amd64 [installed]
$ sort --version
sort (GNU coreutils) 8.30
Xavier de Carné de Carnavalet
[https://www.polyu.edu.hk/emaildisclaimer/PolyU_Email_Signature.jpg]<http://www.polyu.edu.hk>
www.polyu.edu.hk<http://www.polyu.edu.hk>
[https://www.polyu.edu.hk/emaildisclaimer/Icons-02.jpg]<https://www.polyu.edu.hk/cpa/online-channels/#ipolyuapp> [https://www.polyu.edu.hk/emaildisclaimer/Icons-03.jpg] <https://www.facebook.com/HongKongPolyU> [https://www.polyu.edu.hk/emaildisclaimer/Icons-04.jpg] <https://www.youtube.com/user/HongKongPolyU> [https://www.polyu.edu.hk/emaildisclaimer/Icons-05.jpg] <https://www.instagram.com/hongkongpolyu/> [https://www.polyu.edu.hk/emaildisclaimer/Icons-06.jpg] <https://www.linkedin.com/school/hong-kong-polytechnic-university/> [https://www.polyu.edu.hk/emaildisclaimer/Icons-07.jpg] <https://twitter.com/HongKongPolyU> [https://www.polyu.edu.hk/emaildisclaimer/Icons-08.jpg] <https://www.polyu.edu.hk/-/media/department/home/setting/polyu-wechat_qr-code_20190903.jpg?bc=ffffff&h=150&w=150&hash=679EE95BCB1796F71B5A4149647785C9> [https://www.polyu.edu.hk/emaildisclaimer/Icons-09.jpg] <https://www.weibo.com/hongkongpolyu>
Disclaimer:
This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful.
The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.
[file1 (application/octet-stream, attachment)]
[file2 (application/octet-stream, attachment)]
[file3 (application/octet-stream, attachment)]
[file4 (application/octet-stream, attachment)]
This bug report was last modified 2 years and 130 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.