The path to MXR

There are a couple of ways of doing development work. Hg and DVCS enable some of them, especially private local commits and working on multiple unfinished features. There's a general understanding that you should write code first and optimize later, but you usually try not to ship a version which has a performance regression.

Sadly, when I started working on MXR, I didn't have a DVCS, so I just made live changes on landfill, lots of them. One of the changes I made was to make MXR understand #include's, e.g. nsCSSFrameConstructor.cpp. This was both recognizing that a file named nsCRT.h might not live in the same directory (and where it would live), and also recognizing that a file named nsIAtom.h might really be nsIAtom.idl. If you hover over the links from the highlighted lines in MXR, you'll see that they're correct.

The first implementation of this merely ready a list of all files in the tree (.glimpse_filenames) which glimpse had generated to enable searching (and which LXR used for find). I've been trying to optimize this for a while using various tools and caching data. But essentially it meant that if there was a missed file, the entire file index would have to be read into memory, and each included file meant a linear search of that list. There's actually a file named .glimpe_filenames_index which presumably is what I would really have liked to use, but I don't know how to use it, so instead, I've made an extra index of the files list and taught MXR to use it (thankfully the code was fairly modular so I was able to drop it into a single function and everything else just worked).

Anyway, for a long time people had asked me to commit my changes to LXR back into cvs.mozilla.org, but I wasn't willing to do so until I had fixed the performance regression of my initial #include feature. This April, Benjamin landed mxr into Hg for the first time in order to enable Mozilla IT to deploy it in parallel to LXR. In July, Reed decommissioned LXR. At the end of July, while at the Summit, I started doing real optimization work using The New York Times Perl Profiler. Finally today I added what amounts to 25 lines in the source viewer and another 10 or so lines to the code which generates the index (this part of the indexing process is much faster than the step that builds the primary search index), and pages should now load nearly as fast as from LXR.

MXR Features

  1. MXR tries to handle #include, this logic also works for some other languages including Perl and Python.
  2. The indexer has a vague understanding of a number of languages, including IDL (MXR has had parsing for IDL for a while, but there was a long standing bug in its support which was fixed last week which prevented the interface name from being recognized as being declared in the idl file).
  3. Thanks to db48x, there is support once again for Doxygen.
  4. As you've probably seen, there is support for ?mark= in source viewer, this feature is something that Mozilla developers were used to using with bonsai. Please note that MXR only has a single version of a file in a given directory in a given root, which means that MXR mark URLs will not be as stable as similar bonsai mark URLs.
  5. While search has a 29 character "limit", it will now try to deal with longer searches by overflowing a portion of the search into the filter field.
  6. As with bonsai integration, MXR will try to link to "changes to this directory in the last ...". (This feature will be updated when the Mozilla pushloghtml script is improved to actually support directory queries.)
  7. MXR supports mixing and nesting repositories managed by different or even multiple version control systems.
  8. Directory descriptions are built dynamically from various sources including README files and debian/control files.