Hi all;
I'm not sure if this is a bug.
If I download this file[1], unzip and do:
grep "<title>" wikiindexorg-20110409-history.xml | sort | uniq -D
It shows:
<title>Felix Pleşoianu Wiki</title>
<title>Felix Pleșoianu Wiki</title>
<title>ᐧᐃᑭᐱᑎᔭ</title>
<title>위키낱말사전</title>
<title>ウィクショナリー</title>
<title>언사이클로피디어</title>
<title>ไทย Wikipedia</title>
<title>한국어 Wikipedia</title>
But obviously, they are all different lines. Why?
Thanks,
emijrp
[1] http://code.google.com/p/wikiteam/downloads/detail?name=wikiindexorg-20110409-history.xml.7z