Minify your HTML on the fly!
Posted 09/18/2013 by asura
Minification is the process of removing unnecessary characters from the source code without changing its functionality. In some cases, this process drastically reduces the size of the files.
We commonly use minification for css and javascript files using libraries similar to SquishIt. These libraries not only minify at runtime but also give you a way to combine multiple css/js calls into one. It helps with the performance due to reduction in page size and also the number of calls to the server.
When it comes to HTML minification its a different story especially when it comes to Sitecore and asp.net websites. HTML minification at design time is not acceptable since it makes it difficult to change/add HTML.HTML minification at runtime is the best option, but should this be done in the page render or in the global.asax or higher up?
Our solution was to make it so that you can easily integrate it in to any Sitecore implementation without any code change. Our solution installs a DLL in the bin folder, a config file in the App_Config\Include folder and depending on the IIS version, you would need to add configuration to web.config.
We will be improving this module as we use it more in our projects and add useful features in the near future.
Once installed you can tweak the configuration file NTT.Minifier.Config. Below is an example:
The module code will not execute if the HTMLContentType is set to false or if the Debug flag is set to true. At this point in time, we are only minifying HTML on the fly.
To accomplish this, we created an HTTP Module using which we handle the PostReleaseRequestState event. This event is fired in a HTTP Application when all the request event handlers and the request state data is stored. This is the last possible event before the generated output is pushed out to the client web browser.
For each request, we check if the HTMLContentType is true and Debug is false. Once the condition has been met, we check to make sure the url doesn't contain the ExcludedKeywords, like /sitecore and checks if the ExcludeExtensions are ignored. These checks save a lot of time during processing. On average, the request takes 38 milliseconds to process but the performance gain is while the content gets downloaded on to the client. The HTML payload was typically reduced by 10 - 20%.
Before Compression:

After Compression:

Once all the checks are done, we minify the html by running multiple Regex queries. We are doing this by defining the Regex and the ReplaceWith by using a custom class called Expression. Here is a sample:
namespace NTT.Minifier
{
public class Expression
{
public Expression(Regex expr, string replaceWith = "")
{ RegularExpression = expr; ReplaceWith = replaceWith; }
public Regex RegularExpression { get; private set; }
public string ReplaceWith { get; private set; }
}
}
Here is a sample of the Regex load:
//tabs
expressionList.Add(new Expression(new Regex("\t", RegexOptions.Compiled | RegexOptions.Multiline)));
//space before tag
expressionList.Add(new Expression(new Regex(@"\s+<", RegexOptions.Multiline | RegexOptions.Compiled), " <"));
//space after tag
expressionList.Add(new Expression(new Regex(@">\s+", RegexOptions.Compiled | RegexOptions.Multiline), "> "));
//multiple spaces
expressionList.Add(new Expression(new Regex(@"^\s+", RegexOptions.Multiline | RegexOptions.Compiled)));
//comment tags excep IE if statements
expressionList.Add(new Expression(new Regex(@"", RegexOptions.Singleline | RegexOptions.Compiled), string.Empty));
In the Write method, we loop through the expressions and run the Regex's on the buffer string:
//convert byte array to string
string bufferString = Encoding.Default.GetString(buffer);
//loop all the expressions
foreach (Expression expr in expressionList)
{
bufferString = expr.RegularExpression.Replace(bufferString, expr.ReplaceWith);
}
//write the content with converting in bytes
outputStream.Write(Encoding.Default.GetBytes(bufferString), offset, Encoding.Default.GetByteCount(bufferString));
Notice that we maintain the encoding on the document. You can force UTF8 encoding if you choose to.
In general, I found that removing newline characters (/r/n) causes issues with form post in Google Chrome web browser. In particular, the OnSubmit javascript method doesn't get triggered, so all validations are ignored and the form submits.