Ticket #191 (closed enhancement: fixed)

Opened 8 years ago

Last modified 3 years ago

Making CamelCase optional

Reported by: NilsLindenberg Owned by: BrianKoontz
Priority: normal Milestone: 1.3.1
Component: formatters Version: 1.1.6.1
Severity: normal Keywords: CamelCase
Cc:

Description (last modified by BrianKoontz) (diff)

There are people who do not like Camelcase or environments where it is unpractical. So CamelCase at whole should be optional (with perhaps a message in the edit-window if it is enabled or not).

Perhaps with the option to turn it on/off for a page and/or a list of words which should not be CamelCase formatted.


discussion from the SuggestionBox:

Popular Scottish surnames such as McKenna are wrongly being interpreted as WikiNames - is there any way to prevent that happening? -- host86-133-223-194.range86-133.btcentralplus.com (2005-07-09 13:23:06)

How about something like a lookup table where WikiAdmins enter a list of words that will be ignored for formatting on all pages? -- JsnX

Sounds like a nice addition. Two suggestions: 1) CamelCase parsing should be made optional in the config file. 2) the list of words to be skipped could be implemented as a wiki page (something similar has been proposed for menus, acronyms and group management): this would make it a lot easier to maintain the list. My 2 cents -- DarTar

Displaying words with capital letters in the "middle" as WikiNames is a real problem. I don't want to turn off the automatic parsing because these words ARE WikiNames in most cases. It seems to me that a lookup table (e.g. a list of words as a wiki page) costs a lot of time to maintain. My suggestion is simple: "McKenna" should be interpreted as a wiki name. "Mc _ Kenna" (with underscore) should NOT be interpreted as a wiki name but written without underscore. "Mc _ _ Kenna" (with multiple underscores) should be converted to "Mc _ Kenna" (ignore the spaces, please). -- MichaelSchams

Ok this is probably stupid to point out but if you put double double quotes around words they aren't parsed by the wiki so if you want McKenna to not become a link you can say ""McKenna"" and voila... -- DanielMcNair (who has this problem regularly...) (-:

See also

  • #864 Less restrictions for (possible) user names
  • #61 Case insensitive CamelCase (Link creation, page loading)
  • #966 Unlimited Article-Names
  • #431 Using UTF-8 to store data in the DB
  • #1003 IsWikiName() review

Change History

  Changed 8 years ago by DarTar

  • owner changed from unassigned to DarTar
  • status changed from new to assigned
  • summary changed from Making Camelcase optional to Making CamelCase optional
  • component changed from core to formatters
  • milestone set to 1.1.6.3

  Changed 8 years ago by DarTar

  • milestone changed from 1.1.7 to 1.1.7.1

I've tested the effects of enabling/disabling CamelCase, but this require more work that I imagined to preserve all the functionality when CamelCase parsing is toggled off.

  Changed 8 years ago by BrianKoontz

  • description modified (diff)

  Changed 8 years ago by JavaWoman

  • keywords CamelCase added

  Changed 6 years ago by NilsLindenberg

  • description modified (diff)

  Changed 5 years ago by BrianKoontz

  • milestone changed from 1.2.1 to 1.3

  Changed 5 years ago by BrianKoontz

  • milestone changed from 1.3 to blue-sky

  Changed 4 years ago by BrianKoontz

  • description modified (diff)
  • milestone changed from blue-sky to 1.3

  Changed 4 years ago by BrianKoontz

  • description modified (diff)

  Changed 4 years ago by BrianKoontz

  • description modified (diff)

  Changed 4 years ago by BrianKoontz

  • owner changed from DarTar to BrianKoontz
  • status changed from accepted to assigned

(In [1630]) Removed CamelCase requirements for page names. Refs #431.

  Changed 4 years ago by BrianKoontz

(In [1631]) Modified Link() to accept non-CC pagenames; sanitize input in clone handler. Refs #191.

  Changed 4 years ago by BrianKoontz

(In [1632]) Modified forced links ([[...]]) to correctly parse non-CC page names. In order to do this, and to maintain backwards compatibility, the following cases are now checked (in the order specified) that should account for the vast majority of forced links. We should consider deprecating the use of forced links that contain a url and text, but no separator (currently whitespace; proposed is the pipe character "|"):

Case 1: First part is a URL, followed by one or more whitespaces, followed by link text (deprecated; backwards compatible)

Case 2: First part is a CC string, followed by one or more whitespaces, followed by link text (deprecated; backwards compatible)

Case 3: Text (possibly containing embedded whitespaces) that matches an existing internal wiki page (must not contain "|" symbol; backwards compatible)

Case 4: First part is a URL, followed by a "|" symbol, followed by link text (new)

Refs #191

  Changed 4 years ago by BrianKoontz

(In [1633]) Fixed Case 2 to properly detect CC references. Refs #191.

  Changed 4 years ago by DotMG

The lazy guy I am often write a forced link like this: ![[pageindex index]], ie: all lowercase. Upgrading to 1.3 will break such links in my wiki.

1) We may need to update the installer to replace every occurence of the first space of such forced link with a pipe symbol to ensure that upgrade doesn't break anything.

2) We need to update the default FormattingRules page.

  Changed 4 years ago by DotMG

3) Making CamelCase optional doesn't mean allowing any character in page name.

 - Pipe symbol should not be allowed in page names.
 - Any character that brings an ampersand in the resulting URL should not be allowed. (< > & " ')
 - Question mark and equal sign should not be allowed.

  Changed 4 years ago by MasinAlDujaili

If I interpret it correctly, [1633] breaks forced links, where no CC has been used: [[Wikkawiki What is Wikkawiki?]]. Such links were possible before and have surely been used. Reasons for such usage might be laziness or the urge to have the tags nicely formatted: 'Germany' looks different to 'GerMany', even if the latter might later be used to link to an existing page tag 'Germany'.

S&R during upgrade might work, but by my experience, it takes longer than script execution time allowed on many shared hosting systems -- I did this once with category stuff. Maybe there's a workaround for this (e.g. script calling itself, thus resetting the countdown).

And at last: Links as [explained|What is CamelCase?] will never be parsed correctly. We should consider dropping the old syntax at all, make the behaviour configurable (defaulting to new syntax for new installations and old syntax for upgrades) or provide means of conversion to and forth. Maybe a user wants to downgrade again, it would be stupid, if it was some kind of dead end for him.

I'd suggest to make it configurable, that only one syntax will ever be used: either Case 1 & 2 or Case 3 & 4, but not all at once. A conversion script for old->new might be provided independently.

follow-up: ↓ 19   Changed 4 years ago by GeorgePetsagourakis

In the process of upgrading Wikka, there can be a script to go through the page entries of the database and just reformat the wiki syntax. Would this be too hard ?

in reply to: ↑ 18   Changed 4 years ago by BrianKoontz

Replying to GeorgePetsagourakis:

In the process of upgrading Wikka, there can be a script to go through the page entries of the database and just reformat the wiki syntax. Would this be too hard ?

As Masin points out, there is the issue of the script possibly timing out before all the DB changes have been accomplished. Perhaps a standalone script can be included with the release (to be run locally), or a script that's smart enough to figure out where it left off in the event of a timeout, so that it can be rerun at the point where it quite the previous time.

  Changed 4 years ago by BrianKoontz

(In [1680]). Prohibit use of certain symbols in page names (|?=<>'"&). Refs #191.

  Changed 4 years ago by BrianKoontz

I believe it's safe to modify Case 2 to include non-CC strings (but still including the allowable subset of chars). This would take care of the "lazy" links that DotMG and Masin bring up:

HomePage == homepage == HoMePaGe == homePage == etc...

Page creation is really case-insensitive; any combination of upper/lower case letters match the same page. The user just doesn't have to adhere to any case conventions during page creation.

Update to follow. I believe by modifying Case 2, there is no longer a need to worry about en masse changes to a user's database to convert links.

  Changed 4 years ago by BrianKoontz

(In [1684]) Case 2 checks are now case-insensitive to accommodate existing wiki links that are not formatted in camel case. Refs #191.

  Changed 4 years ago by BrianKoontz

  • status changed from assigned to testing

  Changed 4 years ago by BrianKoontz

(In [1687]) Added new table markup/elided comment markup/new link markup. Refs #191, #970.

  Changed 4 years ago by DarTar

Brian, the docs page on CamelCase seems to imply that a space is permitted as part or a page name. Also, can you please remind me what happens to forced links using a space as a separator as of 1.3? Sorry for asking these questions if they have been addressed elsewhere!

  Changed 4 years ago by BrianKoontz

Dario, space is now permitted in a page name. This is the underlying issue behind having to change the separator from ws to a | symbol. Previous implementations should not be affected, as ws was never permitted in the link section of a forced link. However, as new forced links are created, those containing ws will require the | separator. As suggested elsewhere, we simply adopt a new markup using the | symbol for all forced links.

In 1.3, there are a couple of cases that are checked (in order) when parsing a forced link:

Case 1: Link is a URL, description is a string. Since it is assumed a URL will never contain ws, the link is parsed according to the first instance of ws.

Case 2: Link is a Wikka pagename, description is a string. Since it is assumed that all previous forced links (prior to 1.3) contain links in CamelCase format, it is safe to parse this forced link based upon the first instance of ws.

Case 3: Link is a Wikka pagename, no description. Since the description is optional in a forced link, this simply parses as a Wikka pagename.

Case 4: Link and description separated by |. This accommodates all new forced links using the | symbol as the link/description separator.

  Changed 4 years ago by BrianKoontz

I forgot to add that cases 1 and 2 are now deprecated (although I don't see these cases going away anytime soon...eventually we may just have to force the issue and convert/flag all forced links not containing a | symbol...but that's an issue to be tackled in a future release).

follow-up: ↓ 32   Changed 4 years ago by GeorgePetsagourakis

Why not just check if | exists and then split the string to a description and link that will then would be dealt with each part on its own?

follow-up: ↓ 34   Changed 4 years ago by DarTar

Another thought: spaces are now allowed in URLs but they are generally discouraged (think of what happens when mailing a URL that ends with multiple spaces...). Shouldn't Wikka at least:

* trim whitespace at the beginning and end of a page name * automatically convert spaces to underscores when generating a URL (as MediaWiki does)

  Changed 4 years ago by DarTar

Oh the same should probably apply to urlencoded spaces in page names (i.e. %20). A page stored with title A day in the life currently loads the following URL: A%20day%20in%20the%20life. Again, MW does this pretty neatly, maybe there's code that can be directly applied to the Wikka codebase.

  Changed 4 years ago by DarTar

Another issue: / should also be prohibited in pagenames as it's a reserved separator in Wikka, used to call specific handlers.

in reply to: ↑ 28   Changed 4 years ago by BrianKoontz

Replying to GeorgePetsagourakis:

Why not just check if | exists and then split the string to a description and link that will then would be dealt with each part on its own?

That's what case 4 does. But we still have to take care of "legacy" forced links...

  Changed 4 years ago by BrianKoontz

(In [1697]) Include / in list of prohibited pagename chars. Refs #191.

in reply to: ↑ 29 ; follow-up: ↓ 35   Changed 4 years ago by BrianKoontz

Replying to DarTar:

Another thought: spaces are now allowed in URLs but they are generally discouraged (think of what happens when mailing a URL that ends with multiple spaces...). Shouldn't Wikka at least: * trim whitespace at the beginning and end of a page name * automatically convert spaces to underscores when generating a URL (as MediaWiki does)

This would seem the most straightforward and least intrusive. I don't like the idea of forcing underscores on people who may not want them.

in reply to: ↑ 34   Changed 4 years ago by BrianKoontz

Replying to BrianKoontz:

Replying to DarTar:

Another thought: spaces are now allowed in URLs but they are generally discouraged (think of what happens when mailing a URL that ends with multiple spaces...). Shouldn't Wikka at least: * trim whitespace at the beginning and end of a page name * automatically convert spaces to underscores when generating a URL (as MediaWiki does)

This would seem the most straightforward and least intrusive. I don't like the idea of forcing underscores on people who may not want them.

OK, I've tested this out on a demo MW site (check dev list for URL). It's not a bad approach. I'll try this...

  Changed 4 years ago by BrianKoontz

(In [1698]). First attempt at MW-style handling of whitespace in pagenames. Refs #191.

  Changed 4 years ago by BrianKoontz

  • description modified (diff)

  Changed 4 years ago by BrianKoontz

  • description modified (diff)

  Changed 4 years ago by BrianKoontz

(In [1704]) Minor edit. Refs #191

  Changed 4 years ago by BrianKoontz

(In [1703]). Second attempt at MW-style handling of ws in pagenames. MW changes all underscores to ws when saving to the DB. Displayed pagenames are always without underscores. Underscores only appear in URLs. Refs #191.

follow-up: ↓ 42   Changed 4 years ago by BrianKoontz

(In [1705]) The only time ws should occur in page names is when it's displayed. This will require scouring the codebase for all instances of GetPageTag() or $this->tag, and evaluating each usage to determine if the tag is being used for display purposes. Some core Wikka methods require changes as well. I've checked in the core changes and examples of what will need to be modified in both templates and actions/headers if we choose to continue down this road. Refs #191

in reply to: ↑ 41   Changed 4 years ago by DarTar

Replying to BrianKoontz:

(In [1705]) The only time ws should occur in page names is when it's displayed.

I would have put it the other way round: the only time ws should not occur is when it's part of the URL, which sounds to me consistent with your comment above for [1703]. If we go this way there is no need to give a special treatment to pagenames depending on where they appear (display or not), right? I think that assuming ws as a non-default requiring a special method for display purposes will only cause trouble (and break the vast majority of user-contributed plugins). What do you think?

  Changed 4 years ago by DarTar

the edit toolbar disappeared as a result of the last 2-3 changesets

  Changed 4 years ago by DarTar

please disregard my previous comment, I was still running JS-less since this afternoon!

follow-up: ↓ 46   Changed 4 years ago by BrianKoontz

Is there a good reason why we can't have whitespace in the tag field in wikka_pages? If a good reason doesn't exist, then I agree, we should simply display the underscores in URLs.

in reply to: ↑ 45   Changed 4 years ago by DarTar

Replying to BrianKoontz:

Is there a good reason why we can't have whitespace in the tag field in wikka_pages?

the only issue I can think of is with the files action, which currently stores uploads in a folder named after the tag (an whitespace or other non ASCII characters may cause issues depending on the underlying filesystem). There are also obvious security issues to be considered as a result of dropping the CC requirement.

  Changed 4 years ago by BrianKoontz

(In [1706]) Undoing [1705]. Refs #191

follow-up: ↓ 54   Changed 4 years ago by BrianKoontz

(Reference)  Allowable page name characters in MediaWiki

  Changed 4 years ago by BrianKoontz

(In [1707]) Added % to list of prohibited Wikiname chars. Refs #191

  Changed 4 years ago by DarTar

I noticed that MW allows quotes/apostrophes, which makes page names such as L'Oréal valid. Do we have a reason to exclude this scenario?

  Changed 4 years ago by BrianKoontz

Allowing quotes is just another thing we have to check for to make sure it's been properly escaped/sanitized. Since we've always depended upon a preg_match against a CamelCase pattern to determine if a pagename is valid, can we be sure that there isn't a bit of code somewhere that does not properly sanitize a pagename (because it's been assumed that a pagename has already been filtered)?

MW seems rather liberal with what it permits in pagenames. I'm not so sure that's a path we want to follow.

follow-up: ↓ 53   Changed 4 years ago by TormodHaugen

When the MySql server is set to expect something else than UTF-8 from the client, upper and lower case versions of the same (extended) character does not register as the same character in page tags.

This can possibly be fixed by setting the charset for the client to UTF8 just after connecting to the database. This will in that case break any UTF-8 characters already in the database. (done with "mysql_query('SET NAMES UTF8')".

in reply to: ↑ 52   Changed 4 years ago by BrianKoontz

Replying to TormodHaugen:

This can possibly be fixed by setting the charset for the client to UTF8 just after connecting to the database. This will in that case break any UTF-8 characters already in the database. (done with "mysql_query('SET NAMES UTF8')".

This also fixes the problem of UTF-8 characters being saved in the MySQL default character set (usually latin-1). Fixed in #431.

in reply to: ↑ 48   Changed 4 years ago by BrianKoontz

Replying to BrianKoontz:

(Reference)  Allowable page name characters in MediaWiki

(Reference)  http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29 Allowable page name characters] in Wikipedia

(Reference)  Another wikipedia link on the topic

  Changed 3 years ago by BrianKoontz

  • milestone changed from 1.3 to 1.3.1

Updated milestone to 1.3.1

  Changed 3 years ago by BrianKoontz

(In [1765]) The following characters are not permitted as part of usernames or pagenames: [ ] { } % + | ? = < > ' " / 0x00-0x1f 0x7f , Refs #191, #843

  Changed 3 years ago by BrianKoontz

  • status changed from testing to commit

  Changed 3 years ago by BrianKoontz

  • status changed from commit to closed
  • resolution set to fixed

  Changed 3 years ago by BrianKoontz

(In [1784]) Merged 1.3.1 changes into trunk ([1765]-[1780],[1782]). Refs #191, #843, #1040, #1041, #38, #1042, #1043, #1018, #1045, #208, #415, #1039, #189

Note: See TracTickets for help on using tickets.