Ticket #13 (accepted enhancement)

Opened 4 years ago

Last modified 22 months ago

Improved formatter

Reported by: dartar Owned by: JavaWoman
Priority: high Milestone: 1.3
Component: formatters Version: 1.1.6.0
Severity: normal Keywords:
Cc:

Description (last modified by DotMG) (diff)

While our current formatter is quite capable, it has some quirks and bugs, doesn't always generate valid XHTML (though it tries hard), and misses a few things that would be nice to have or that would enable things that would be nice to have (such as a page TOCs). The improved version presented here tries to address some of these issues.

Status

Coded as beta

What it does

  • using single quotes wherever possible making RegExes and generated HTML easier to read;
  • better closing of open tags at end of document, including open indents and lists (a long-standing bug!) Now improved
  • better handling of nested lists so change of list "type" is actually detected and coded correctly; also produces nicely-formatted HTML code for lists and indents now, especially more readable for nested lists. New!
  • escaping single & (not part of an entity) (another long-standing problem); See #410
  • ability to nest one type of float within another (so a right float can contain a left float and vice versa)
  • handling ids (and making them unique) as provided in embedded code, using the makeId() method;
  • creating ids for headings based on content ('afterburner' type formatting so this includes originally embedded code); this code not only uses the makeId() method but also the html_entity_decode() method in PHP versions older than 4.3. See also #20

Refs

Related tickets / subtickets


Related comments moved from WikkaBugs

Wakka formatter: Indenting on first line

// indented text
		elseif (preg_match("/\n([\t~]+)(-|&|([0-9a-zA-ZÄÖÜßäöü]+)\))?(\n|$)/s", $thing, $matches))

This simply doesn't match indents on the first line. I know there is a problem in the edit handler as well with indents, but even if that is fixed (working solution for that problem at WikkaBugsResolved) we still will have the problem here. -- TimoK

Fixed and implemented as part of the beta ImprovedFormatter on this site; see  http://wikkawiki.org/ImprovedFormatter#hn_Better_handling_of_nested_lists_and_indents

--JavaWoman

List parsing bug?

Have a look at the source of WikkaDevelopment, you will see that tabs and unordered lists for some reasons are not correctly parsed (actually after one edit, tabs were added at the beginning of each line). I'll try to figure out why this happens... -- DarTar

Possible clue: I just stumbled over the fact that the little list of "Useful pages" at the end of the default (installation-generated) ""HomePage"" had incorrect coding: The last list item (li) was not terminated, nor was the list itself (no closing ul tag). It took me a bit of trial and error to reproduce this - but have a look at my test code at the end of SandBox (now). Originally I thought that any list at the end of a page would be unterminated, but that turned out not to be the case. Then I hit on something else: the last list element on that default ""HomePage"" (not the one here) ands with a period. My test version on SandBox now also has the last element ending in a period ... and if you look at the (HTML) page source, you'll see it is indeed an unterminated list. Somehow that ending period (maybe in combination with end-of-page?) causes the Formatter never to close the list; take that ending period away, and it works normally. I tried a few variants, to check whether it might be an odd-even problem, but no: whether the number of list items is odd or even, if the last one ends in a period, the list isn't terminated.

No difference whether it's anordered (ol) or unordered (ul) list, the formatter behaves the same way.

And, BTW, if you look at the preview when editing, the HTML source of the **preview** actually contains a lot of Wiki code that shouldn't be there!

I haven't dug in the Formatter yet to find the cause or whether end-of-page makes a difference... --JavaWoman

This is indeed an "end of page" problem: teh formatter has some code to close tags inadvertently left open, but does not do this for lists and indent (nor even fro *all* open tags). Fix coded and tested - this will be in 1.1.6.2. --JavaWoman

Change History

Changed 4 years ago by dartar

  • priority changed from normal to high

Changed 3 years ago by dartar

  • description modified (diff)

Changed 3 years ago by dartar

  • description modified (diff)

Changed 3 years ago by dartar

  • description modified (diff)

Changed 3 years ago by dartar

  • description modified (diff)

Changed 2 years ago by DarTar

  • description modified (diff)

adding #20 as related ticket

Changed 2 years ago by DarTar

  • description modified (diff)

Adding #410 as related ticket

Changed 2 years ago by DotMG

(In [490]) fixes #490 refs #13

Just ported the code at wikkawiki.org (added $trigger_floatr, and suppressing \n as suggested).

Changed 2 years ago by DotMG

(In [494]) fixes #491 refs #13

wakka2callback now returns the string, and never use echo or print().

Changed 2 years ago by DotMG

(In [497]) closes #494 refs #13

Ported the JW's code at wikkawiki.org (see link in ticket).

Note a subtle change in the $patTagWithId patter :

'((<[a-z][^>]*)((?<=\\s)id=("|\')(.*?)\\4)(.*?>))';

The .*? replaced with

[^>]* 

makes it correct with something like

<span>this id is not param: id="JW"</span>

The (?<=\\s) prepended to id= prevents it from modifying params like grid="JW", in case such param exists.

Changed 2 years ago by DotMG

(In [516]) fixes #497 refs #13

Ported JW's code at wikkawiki.org, just replaced callLevel with formatter_recursion...

Changed 2 years ago by DotMG

  • description modified (diff)

Changed 2 years ago by DotMG

(In [528]) refs #13

Matching lists at start of the page. (Adding ^ to the anchor in regexp.)

Changed 2 years ago by DarTar

Mahefa, it looks like your latest commits broke some functionality in the formatter: take a look at FormattingRules:

  • the page title is no more correctly parsed (2. Headers instead of Wikka Formatting Guide);
  • ordered lists are no more rendered with the correct type (numeral, roman numerals, letters);

Changed 2 years ago by DotMG

Hmmm, for the 1st one, beware of quotes!!!

The previous headers contain the letter n, that's why they weren't chosen. The regex

(=){3,5}([^=\n]+)(=){3,5}

select only those titles that do not have the character equal, backslash or n. We should have used doublequote here.

Anyway, another problem should be taken care of: if we only changed the singlequote with doublequote, any header that may contain the character = would be skipped. I'm correcting this now!

The second problem is caused by my last commits, I'm also correcting it.

Thank you for the alert!

Changed 2 years ago by DotMG

Sorry for the last comment! I meant the PageTitle() method in libs/Wakka.class.php

The regexp is not correct!

Changed 2 years ago by DotMG

(In [535]) refs #13 and #500 and #490

Supporting roman types for ordered list, differentiating capital and lowercases :

If the first letter of the indent type is I, V or X, it should use roman type (i or I)! Otherwise, use latin character type (a or A)

Changed 2 years ago by DarTar

Thanks for the fix.

One note, the current page generation time for FormattingRules in trunk is 3-5 seconds (at least on my local server), whereas the same page in  1.1.6.3 takes about 0.2-0.3 seconds to load.

Obviously the new formatter is adding workload to page generation, but maybe we should keep an eye on performance (which is one of our current selling points).

Changed 2 years ago by DotMG

On localhost/FormattingRules, I've got ONCE a 2.4 seconds, and then, it falls at 0.44s. The greater I had (after the exceptional 2.4 is 0.55s). Maybe something with Maintenance() ... Did you try more than once?

The heading ids generation is tolerable (0.003767)...

In fact, I believe we've made (paradoxally) an improvement in performance (page generation). E.g. the closetags() is now called only once, ...

To be compared, on the same host, the beta formatter at wikkawiki.org, and the actual in trunk (after removing codes that aren't supported by one of the 2 formatters, like new table markup).

Changed 23 months ago by BrianKoontz

Memory issues with formatter:

From MReimer_ on #wikka:

After updating to the latest Wikka version we got the following error in our logfile:

PHP Fatal error: Allowed memory size of 8388608 bytes exhausted (tried to allocate 19456 bytes) in .../wiki/formatters/wakka.php on line 38

Wikka fails for *very* big pages. We had to fix this by adding the line

ini_set("memory_limit", "16M");

on top of wikka.config.php. Someone has to find a way to deliver the content dynamically. It's bad to store the whole content in memory...

(Originally reported in #521)

Changed 22 months ago by JavaWoman

  • owner changed from unassigned to JavaWoman

Changed 22 months ago by JavaWoman

  • status changed from new to assigned

Seems to be mine... :)

Note: See TracTickets for help on using tickets.