This page shows the implementation of the dictionary decorator, intended to transform specific keywords on a page into links or other forms of content. The decorator also uses the light version of lexical transform engine, developed internally by Pelican Design & Development.

Technical aspects

Technically, two approaches were tested:

  1. I started by the tree traversal, where I walked through every leaf, looking for the keywords and transforming them into new leafs. This approach was too complicated, especially with no framework, and wasn't straightforward for text inside tree leafs with children.
  2. Flat regular expression replacement on the whole HTML code was the second attempt. It contains some caveats (see below) and is, by the way, a terrible idea (see Parsing Html The Cthulhu Way or search for similar articles from the same blog). It is still easier than the first one, and leads to really shorter code.
  3. In an ideal world, there would be a third way to do it: parse code with an HTML parser, and then highlight the words. I didn't do it. If you do it and are ready to share your code under Microsoft Public License, please contact us by e-mail.

So I've chosen the second approach.

There are still some caveats.

Caveats and limitations

  1. As said previously, the solution uses regular expressions. Regular expressions cannot be used to parse HTML code. XHTML code I use on my pages is very basic most of the time, so I'm pretty sure regular expressions will work for me. Remember that the solution doesn't understand even basic comments or CDATA. In all cases, don't use this code:
  2. The performance is below my expectations. On a quad-core machine at 5% with 8 GB of RAM at 40%, Google Chrome indicates 40 to 55 ms. when decorating keywords of the current page. I would cry if I see one day the metrics taken on a much larger page using an average customers PC, not counting that instead of manipulating the DOM, the current solution changes the document.body.innerHTML directly.

Source code

The minified source code is available for production environments.

If you're interested in getting the original source code, you can access the repository at the SVN server.

In both cases, make sure you've read the copyright notice at the bottom of this page. As usual for me, it's tricky, and different parts are covered by different licenses.

Where to start

Decorator customizations

g.js initializes the dictionary decorator. If you want to customize the template used to decorate the keywords, specify:

var template = '<a href="?k={0}" class="keyword" title="{2}">{1}</a>';
DictionaryDecorator.DecoratorTemplate(template);

before initializing the decorator, changing the string according to your needs.

Dictionary data

class.DictionaryData.Flat.php is the plugin used to retrieve the list of keywords. You can create your own plugin for your own needs, for example to load the keywords from the database.

By default, the first file corresponding to the dictionary data plugin is used. To change this behavior, you can specify the custom plugin name in dictionary.js.php.