Ticket #999 (assigned defect)
makeId() and Unicode
|Reported by:||DarTar||Owned by:||BrianKoontz|
Description (last modified by BrianKoontz) (diff)
1.3 is introducing automatic fragment linking of headings via a new core method written by JavaWoman, called Wakka::makeId(). The trouble with this method is that it generates meaningless id's for non-ASCII extended latin content (this does not apply to non-Latin characters, e.g. Chinese or Arabic, as meaningless hashes are used in this case).
To give an example, the Polish heading: Użyteczne strony produces the following id via makeId():hn_Uyteczne_strony which is missing the ż character, and hence meaningless in Polish.
The method correctly applies the HTML4.0 specs which specify for id's the following naming rules:
Must begin with a letter A-Z or a-z Can be followed by: letters (A-Za-z), digits (0-9), hyphens ("-"), underscores ("_"), colons (":"), and periods (".") Values are case-sensitive
We should fix this in one of the following ways:
- (1) use a conversion table to ASCIIfy extended Latin characters so that all ż are converted to z
- (2) keep extended latin characters in the fragment id if the XHTML specs allow this
- (3) just escape every non ASCII character as MediaWiki does
I am afraid that this will need to be addressed in 1.3 if automatic fragment linking is introduced in this release, as we won't be able to make any changes to the method once people start using these ids.
#970 Document 1.3 features. Not currently documented as it appears to be broken.