Ticket #34 (new task)
Use central regex library for valid patterns
| Reported by: | dartar | Owned by: | unassigned |
|---|---|---|---|
| Priority: | high | Milestone: | blue-sky |
| Component: | core | Version: | 1.1.6.0 |
| Severity: | major | Keywords: | CamelCase validation regex |
| Cc: |
Description (last modified by DarTar) (diff)
Currently, strings for different kinds of use are validated against regex patterns that are hardcoded in each module. To preserve overall consistency (and optionally to allow some flexibility, e.g. customization of patterns for valid strings) we should centralize all relevant patterns (for valid page names, valid links, valid usernames etc.) in a single regex library.
See also
Related comments migrated from WikkaBugs
Yet more formatter bugs
Looking at formatters/wikka.php to find the cause of the two bugs listed below (found one), I notice to my horror that a lot of the regular expressions used there are actually incorrect. They allow such things as using a comma to indent a line (in addition to a tab, or ~, at the start of a line: or a comma in a WikiName (even at the start) or in an InterWiki link. That can't have been the intention - it's simply a matter of incorrect RE syntax. I'd become sort of "sensitized" to this phenomenon looking at DarTar's RE on http://wikka.jsnx.com/ValidPageNames earlier today - now I see where he found an example (see my comment on that page on his RE!). See also [[,My,Page]] - yup, that's a real page now. ;-)
Rather than simply fixing all the REs (and other REs all over the place...) I'd like to propose a more fundamental solution:
- create a "library" of RE building blocks to be used in the Wikka core (for an example of what I mean with building blocks see my propopsal for an alternative RE on ValidPageNames); simply create a separate file with define()s for these building blocks, and include them at the start of the main wikka.php file;
- gather all RE used in the Wikka core here (extensions/plugins could have their own set of defines - as long as they don't have conflicting names);
- now use only these building blocks when using REs anywhere in the Wikka core.
This should make it much easier to create both correct, and consistent regular expressions; any (near)duplicates will be much easier to discover, and fix.
Probably best to leave this to just after the coming release, so we have a stable code base again. I volunteer to undertake this work. (Unlike some people, I happen to like REs.) :) --JavaWoman