Ticket #34 (new task)

Opened 5 years ago

Last modified 7 months ago

Use central regex library for valid patterns

Reported by: dartar Owned by: unassigned
Priority: high Milestone: blue-sky
Component: core Version: 1.1.6.0
Severity: major Keywords: CamelCase validation regex
Cc:

Description (last modified by DarTar) (diff)

Currently, strings for different kinds of use are validated against regex patterns that are hardcoded in each module. To preserve overall consistency (and optionally to allow some flexibility, e.g. customization of patterns for valid strings) we should centralize all relevant patterns (for valid page names, valid links, valid usernames etc.) in a single regex library.

See also

Related comments migrated from WikkaBugs

Yet more formatter bugs

Looking at formatters/wikka.php to find the cause of the two bugs listed below (found one), I notice to my horror that a lot of the regular expressions used there are actually incorrect. They allow such things as using a comma to indent a line (in addition to a tab, or ~, at the start of a line: or a comma in a WikiName (even at the start) or in an InterWiki link. That can't have been the intention - it's simply a matter of incorrect RE syntax. I'd become sort of "sensitized" to this phenomenon looking at DarTar's RE on  http://wikka.jsnx.com/ValidPageNames earlier today - now I see where he found an example (see my comment on that page on his RE!). See also [[,My,Page]] - yup, that's a real page now. ;-)

Rather than simply fixing all the REs (and other REs all over the place...) I'd like to propose a more fundamental solution:

  • create a "library" of RE building blocks to be used in the Wikka core (for an example of what I mean with building blocks see my propopsal for an alternative RE on ValidPageNames); simply create a separate file with define()s for these building blocks, and include them at the start of the main wikka.php file;
  • gather all RE used in the Wikka core here (extensions/plugins could have their own set of defines - as long as they don't have conflicting names);
  • now use only these building blocks when using REs anywhere in the Wikka core.

This should make it much easier to create both correct, and consistent regular expressions; any (near)duplicates will be much easier to discover, and fix.

Probably best to leave this to just after the coming release, so we have a stable code base again. I volunteer to undertake this work. (Unlike some people, I happen to like REs.) :) --JavaWoman

Change History

Changed 5 years ago by dartar

  • description modified (diff)

Changed 5 years ago by dartar

  • description modified (diff)

Changed 5 years ago by dartar

  • description modified (diff)

Changed 4 years ago by MovieLady

I was just going to offer to go through the code (after the official 1.1.6.2 release?) to make a list of all the regexps in all the core files to start moving this forward for the next major version (1.1.7 is the target, I'm assuming?), if that would help. :)

Changed 4 years ago by NilsLindenberg

That's a great idea MovieLady :)

Changed 4 years ago by vincent.fretin@…

Wikini ( http://www.wikini.net/) use constants WN_UPPER, WN_LOWER, WN_UPPER_NUM, WN_CHAR... for REs (externalized in 0.5.0-dev2 (2006-08-25), see:

 http://cvs.gna.org/cvsweb/wikini/includes/constants.php?rev=1.1;content-type=text%2Fplain;cvsroot=wikini)

use constants in the formatter:

 http://cvs.gna.org/cvsweb/wikini/formatters/wakka.php.diff?r1=1.45;r2=1.46;cvsroot=wikini;f=h

Changed 4 years ago by DarTar

Vincent,

yes this is precisely the idea.

Changed 4 years ago by JavaWoman

I just found that the Link() method in Wakka.class uses REs to recognize interwiki links and Wiki links that are inconsistent with the parallel REs the Formatter uses: in the Formatter the (old) commas in the REs have been removed, but in the Lin() method they have not.

See also comment with #71

Changed 4 years ago by JavaWoman

  • keywords CamelCase added; camelcase removed

Changed 3 years ago by DarTar

  • description modified (diff)

adding related ticket

Changed 7 months ago by BrianKoontz

  • type changed from defect to task
  • milestone changed from 1.3 to blue-sky
Note: See TracTickets for help on using tickets.