Minify your HTML on the fly!

Posted 09/18/2013 by asura

Minification is the process of removing unnecessary characters from the source code without changing its functionality. In some cases, this process drastically reduces the size of the files.

We commonly use minification for css and javascript files using libraries similar to SquishIt. These libraries not only minify at runtime but also give you a way to combine multiple css/js calls into one. It helps with the performance due to reduction in page size and also the number of calls to the server.

When it comes to HTML minification its a different story especially when it comes to Sitecore and websites. HTML minification at design time is not acceptable since it makes it difficult to change/add HTML.HTML minification at runtime is the best option, but should this be done in the page render or in the global.asax or higher up?

Our solution was to make it so that you can easily integrate it in to any Sitecore implementation without any code change. Our solution installs a DLL in the bin folder, a config file in the App_Config\Include folder and depending on the IIS version, you would need to add configuration to web.config.

We will be improving this module as we use it more in our projects and add useful features in the near future.

Once installed you can tweak the configuration file NTT.Minifier.Config. Below is an example:


The module code will not execute if the HTMLContentType is set to false or if the Debug flag is set to true. At this point in time, we are only minifying HTML on the fly.

To accomplish this, we created an HTTP Module using which we handle the PostReleaseRequestState event. This event is fired in a HTTP Application when all the request event handlers and the request state data is stored. This is the last possible event before the generated output is pushed out to the client web browser.

For each request, we check if the HTMLContentType is true and Debug is false. Once the condition has been met, we check to make sure the url doesn't contain the ExcludedKeywords, like /sitecore and checks if the ExcludeExtensions are ignored. These checks save a lot of time during processing. On average, the request takes 38 milliseconds to process but the performance gain is while the content gets downloaded on to the client. The HTML payload was typically reduced by 10 - 20%.

Before Compression:

Before Compression

After Compression:

After Compression

Once all the checks are done, we minify the html by running multiple Regex queries. We are doing this by defining the Regex and the ReplaceWith by using a custom class called Expression. Here is a sample:

namespace NTT.Minifier
    public class Expression
        public Expression(Regex expr, string replaceWith = "")
        { RegularExpression = expr; ReplaceWith = replaceWith; }

        public Regex RegularExpression { get; private set; }
        public string ReplaceWith { get; private set; }

Here is a sample of the Regex load:

            expressionList.Add(new Expression(new Regex("\t", RegexOptions.Compiled | RegexOptions.Multiline)));

            //space before tag
            expressionList.Add(new Expression(new Regex(@"\s+<", RegexOptions.Multiline | RegexOptions.Compiled), " <"));
            //space after tag
            expressionList.Add(new Expression(new Regex(@">\s+", RegexOptions.Compiled | RegexOptions.Multiline), "> "));

            //multiple spaces
            expressionList.Add(new Expression(new Regex(@"^\s+", RegexOptions.Multiline | RegexOptions.Compiled)));

            //comment tags excep IE if statements
            expressionList.Add(new Expression(new Regex(@"", RegexOptions.Singleline | RegexOptions.Compiled), string.Empty));
In the Write method, we loop through the expressions and run the Regex's on the buffer string:

            //convert byte array to string
            string bufferString = Encoding.Default.GetString(buffer);

            //loop all the expressions 
            foreach (Expression expr in expressionList)
                bufferString = expr.RegularExpression.Replace(bufferString, expr.ReplaceWith);

            //write the content with converting in bytes
            outputStream.Write(Encoding.Default.GetBytes(bufferString), offset, Encoding.Default.GetByteCount(bufferString));
Notice that we maintain the encoding on the document. You can force UTF8 encoding if you choose to.

In general, I found that removing newline characters (/r/n) causes issues with form post in Google Chrome web browser. In particular, the OnSubmit javascript method doesn't get triggered, so all validations are ignored and the form submits.