Brotli Compression – How Much Will It Reduce Your Content?

A few years ago Brotli compression entered into the webperf spotlight with impressive gains of up to 25% over gzip compression. The algorithm was created by Google, who initially introduced it as a way to compress web fonts via the woff2 format. Later in 2015 it was released as a compression library to optimize the delivery of web content. Despite Brotli being a completely different format from Gzip, it was quickly supported by most modern web browsers.

By mid-2016 some websites and CDNs started to support it. However, adoption across the web is still quite low. Based on both Desktop and Mobile data from the HTTP Archive’s June 2018 dataset, only 71% of text based compressible resources (html, javascript, css, json, xml, svg, etc) are actually being compressed! Of them, 11% are Brotli encoded and 60% are Gzip encoded.

If you are not using Brotli today, have you ever wondered how much Brotli compression could reduce your content? Read on to find out!

Compression Levels

Both Gzip and Brotli use multiple compression levels to indicate how aggressive the algorithms will work to compress a file. With both Gzip and Brotli, the compression levels start at 1, and increase. The higher the compression level, the smaller the file (and the more computationally expensive the compression is). The below graph, which was generated from Squash benchmark results, makes this very clear.

Many popular web servers default to a mid-range gzip level, because it compresses the file adequately while keeping CPU costs in check. For example: Apache defaults to zlib’s default, which is level 6, IIS defaults to level 7, NGINX defaults to level 1. When sites can afford to, compressing to Gzip level 9 will shave off a few more bytes. The same is true for Brotli, although the CPU costs are much higher compared to Gzip. Brotli compression levels 10 and 11 are far more computationally expensive – but the savings are significant. If you are able to precompress resources, then Brotli level 11 is the way to go. If not, then Brotli level 4 or 5 should provide a smaller payload than the highest gzip compression level, with a reasonable processing time tradeoff.

Note: Akamai is able to compress to Brotli level 11 because of the way Resource Optimizer is architected. Brotli resources are only served from Akamai’s cache, and cache misses are precompressed prior to being served to an end user. This allows us to provide the most byte savings without the user incurring any processing delays.

How Much Will My Files Be Compressed With Brotli?

I’ve created a new tool which you can access here: . https://tools.paulcalvano.com/compression.php

Here’s how it works:

  1. Find a URL that you want to test the compression levels for. For example, you may want to choose the largest JS or CSS file on a page.
  2. Paste the URL for the object you want to test in the text box, and click the “Compression Test” button.
  3. On the server side, the file will be downloaded 3 times. Once w/o an Accept-Encoding header, and then with Accept-Encoding:gzip header and finally with an Accept-Encoding:br header. This allows us to log the uncompressed file size, and whether gzip and brotli are currently supported.
  4. The uncompressed version of the file is then compressed at each level for Gzip and Brotli. Compression ratios are calculated, and the compression levels used by the website are estimated based on file size.

Example:

Let’s take a look at a common JavaScript file that many sites are currently using. Below I’ve tested the minified jQuery UI 1.12 library from https://code.jquery.com/. I can see that this file is 253KB uncompressed, and that gzip compression reduces the size to 83KB. However the Gzip compression level is quite low, and increasing it to Gzip level 9 would shave off 16KB from the download. Furthermore, using Brotli compression drops the file down to 57KB, which is 26KB (32%) smaller than the Gzip compressed version that was served!

Now let’s examine another site with a very large JavaScript file. I ran a query against the HTTP Archive for the largest JavaScript file used by a site in the Alexa top 1000. I won’t shame the lucky winner here, but the file size was 4.3MB uncompressed and was gzip compressed down to 1.04 MB. Brotli level 11 would reduce this another 27% to 776KB

Go ahead and try it on your content! https://tools.paulcalvano.com/compression.php

Implementing Brotli @ Akamai

At Akamai, we support Brotli in two ways and both implementations are supported via a simple On/Off toggle:

  • Brotli Served from Origin
  • Resource Optimizer

By default Akamai sends an Accept-Encoding header to origins advertising support for Gzip compression. The Brotli Support feature instructs the edge servers to send an Accept-Encoding header that advertises support for both Brotli and Gzip, and will cache both Brotli and Gzip versions of your assets.

Resource Optimizer is a simple turnkey feature that allows you to serve Gzip compressed resources from your origin while Akamai compresses the resources for you with Brotli. The Brotli compressed resources are precompressed, encoded at level 11 compression and loaded into cache the first time they are requested, so that no user will experience a processing delay in downloading Brotli compressed resources. To turn it on, simply enable the “Resource Optimizer” feature of Adaptive Acceleration –

Conclusion

Many sites are using Gzip compression already, and there are existing tools that will tell you if you are not compressing your content. The tool I created will tell you what Gzip and Brotli compression levels you are most likely using and what is possible for your assets at each compression level. If you are unsure of how much Brotli can benefit your content, then hopefully this helps!

Impact of Page Weight on Load Time

Over the years it has been fun to track website page weight by comparing it to milestones such as the size of a floppy disk (1.44MB), the size of the original install size of DOOM (2.39MB) and when it hit 3MB last summer.

When we talk about page weight, we are often talking about high resolution images, large hero videos, excessive 3rd party content, JavaScript bloat – and the list goes on.

I recently did some research to show that sites with more 3rd party content are more likely to be slower. And then a few days later USAToday showed us an extreme example by publishing a GDPR friendly version of their site for EU visitors. The EU version has no 3rd party content, substantially less page weight and is blazing fast compared to the US version.

Shortly after the 3MB average page weight milestone was reached last summer, I did some analysis to try and understand the sudden jump in page weight. It turns out that the largest 5% of pages were influencing the average, which is a perfect example of averages misleading us.

These days we focus more on percentiles, histograms and statistical distributions to represent page weight. For example, in the image below you can see how this is being represented in the recently redesigned HTTP Archive reports.

Is page weight still something we should care about in 2018?

Thanks to the HTTP Archive, we have the ability to analyze how the web is built. And by combining it with the Chrome User Experience Report (CrUX), we can also see how this translates into the actual end user experiences across these sites. This is an extremely powerful combination.

Note: If you are not familiar with CrUX, it is Real User Measurement data collected by Google from Chrome users who have opted-in to syncing their browsing history and have usage statistic reporting enabled. I wrote about CruX and created some overview videos if you are interested in learning more about the data and how to work with it.

By leveraging CrUX and the HTTP Archive together, we can analyze performance across many websites and look for trends. For example, below you can see how often the Alexa top 10 sites are able to load pages in less than 2 seconds, 2-4 seconds, 4-6 seconds and greater than 6 seconds. It’s easy to glance that this chart and see which sites have a large percentage of slower pages. I wrote a another post about how we can use CrUX data like this to compare yourself to competitors.

But what happens if we look at the real user performance for 1,000 popular sites this way? The results are oddly symmetrical, with almost as many fast sites as slow ones. In the graph below I sorted the onLoad metrics from fast (left) to slow (right). There are 1000 tiny bars – each representing a summary of a single website’s real user experiences on Chrome browsers. The most consistently fast site in this list is the webcomic XKCD – with an impressive 93.5% of users loading pages in < 2 seconds. Some other sites in the “extremely fast” category are Google, Bing, CraigsList, Gov.uk, etc. Many of the slow sites (far right of this graph) have large page weights, videos, advertisements and numerous 3rd parties. Where do you think your site’s performance stacks up?

Since we’re interested in investigating the relationship of page weight to performance, let’s look at the top 1000 pages that are less than 1MB and the top 1000 pages that are greater than 3MB. The pattern in load times is quite revealing. A few notable observations:

  • The distribution of fast vs slow seems to be cut down the middle. Sites appear to either be mostly fast or mostly slow
  • There are far more fast <1MB pages compared to slow ones
  • There are far more slow >3MB pages compared to fast ones.
  • The fact that there are still some fast >3MB pages and slow <1MB pages proves that page weight isn’t everything, and it is possible to optimize rich experiences for performance.

Note: I’ve also applied the same logic to the top 10,000 sites, and the pattern was identical.

What About the Other Metrics that CrUX Collects?

Since CrUX contains additional metrics, I also looked at the relationship of page weight to DOM Content Loaded and First Contentful Paint. The set of graphs below compare the fastest range (<2s for onLoad, <1s for FCP and DCL) for the top 1000 sites. Across these three metrics, we see the highest correlation of load times to page weight with the onLoad metric.

(Note: the higher %s mean that more pages experienced faster load times. So higher=better in these graphs.)

What Aspects of Page Weight Impacts Performance the Most?

We’ve seen a strong correlation of performance to page weight, and we’ve learned that this is more measurable via onLoad vs First Contentful Paint. But what contributing factors of page weight impact load times the most?

If we examine the Top 1000 sites again and pull some of the page weight statistics from the HTTP Archive, we can once again compare HTTP Archive data w/ CrUX. The graphs below summarize the percentage of pages with onLoad times less than 2 seconds. The Y axis is the median number of bytes, and the X axis represents the percentage of sites with fast page loads.

In the top left graph, page weight shows a strong correlation, and the sites with less fast pages tended to have larger page weights. The remaining 3 graphs show how JavaScript, CSS and Images contribute to page weight and performance. Based on these graphs, Images and JavaScript are the most significant contributors to the page weights that affect load time. And for some slow sites, the amount of compressed JavaScript actually exceeds the number of image bytes!

Conclusion

Page weight is an important metric to track, but we should always consider using appropriate statistical methods when tracking it on the web as a whole. Certainly track it for your sites – because as you’ve seen here, the size of content does matter. It’s not the only thing that matters – but the correlation is strong enough that a spike in page weight should merit some investigation.

If your site has a page weight problem, there are a few things you do can do:

  • Akamai’s Image Manager can help to optimize images with advanced techniques such as perceptual quality compression. This is also a great way to ensure that you don’t get any surprises when a marketing promo drops an 2MB hero image on your homepage.
  • Limit the use of large video files, or defer their loading to avoid critical resources competing for bandwidth. Check out Doug Sillars’ blog post on videos embedded into web pages.
  • Lazy load images that are not in the viewport of your screen. Jeremy Wagner wrote a nice guide on this recently.
  • Ensure that you are compressing text based content. Gzip compression at a minimum should be enabled. Brotli compression is widely supported can help reduce content size further. (Akamai Ion customers can automatically serve Brotli compressed resources via Resource Optimizer)
  • Use Lighthouse and Chrome Dev Tools to audit your pages. Find unused CSS and JS with the Coverage feature and attempt to optimize.
  • Audit your 3rd parties. Many sites do not realize how much content their 3rd parties add to their site and how inconsistent their performance may become as a result. Harry Roberts wrote a helpful guide here!. Also, Akamai’s Script Manager service can help to manage third parties based on performance.
  • Track your sites page weight over time and alert on increases. If you use Akamai’s mPulse RUM service – you can do this with resource timing data (if TAO is permitted).

Thanks to Yoav Weiss and Ilya Grigorik for reviewing this and providing feedback.

HTTP Heuristic Caching (Missing Cache-Control and Expires Headers) Explained

Have you ever wondered why WebPageTest can sometimes show that a repeat view loaded with less bytes downloaded, while also triggering warnings related to browser caching? It can seem like the test is reporting an issue that does not exist, but in fact it’s often a sign of a more serious issue that should be investigated. Often the issue is not the lack of caching, but rather lack of control over how your content is cached.

If you have not run into this issue before, then examine the screenshot below to see an example:

Continue reading

Adoption of HTTP Security Headers on the Web

Over the past few weeks the topic of security related HTTP headers has come up in numerous discussions – both with customers I work with as well as other colleagues that are trying to help improve the security posture of their customers. I’ve often felt that these headers were underutilized, and a quick test on Scott Helme’s excellent securityheaders.io site usually proves this to be true. I decided to take a deeper look at how these headers are being used on a large scale.

Looking at this data through the lens of the HTTP Archive, I thought it would be interesting to see if we could give the web a scorecard for security headers. I’ll dive deeper into how each of these headers are implemented below, but let’s start off by looking at the percentage of sites that are using these security headers. As I suspected, adoption is quite low. Furthermore, it seems that adoption is marginally higher for some of the most popular sites – but not by much.

Continue reading

Cache Control Immutable – A Year Later

In January 2017, Facebook wrote about a new Cache-Control directive – immutable – which was designed to tell supported browsers not to attempt to revalidate an object on a normal reload during it’s freshness lifetime. Firefox 49 implemented it, while Chrome went ahead with a different approach by changing the behavior of the reload button. Additionally it seems that WebKit has also implemented the immutable directive since then.

So it’s been a year – let’s see where Cache-Control immutable is being used in the wild!

Continue reading

Measuring the Performance of Firefox Quantum with RUM

On Nov 14th, Mozilla released Firefox Quantum. On launch day, I personally felt that the new version was rendering pages faster and I heard anecdotal reports indicating the same. There have also been a few benchmarks which seem to show that this latest Firefox version is getting content to screens faster than its predecessor. But I wanted to try a different approach to measurement.

Given the vast amount of performance information that we collect at Akamai, I thought it would be interesting to benchmark the performance of Firefox Quantum with a large set of real end-user performance data. The results were dramatic: the new browser improved DOM Content Loaded time by an extremely impressive 24%. Let’s take a look at how those results were achieved.



Continue reading

Which 3rd Party Content Loads Before Render Start?

Since the HTTP Archive is capturing the timing information on each request, I thought it would be interesting to correlate request timings (ie, when an object was loaded) with page timings. The idea is that we can categorize resources that were loaded before or after and event.

Content Type Loaded Before/After Render Start It’s generally well known that third party content impacts performance. We see this with both resource loading, and JavaScript execution blocking the browser from loading other content. While we don’t have the data to evaluate script execution timings per resource captured here, we can definitely look at when resources were loaded with respect to certain timings and get an idea of what is being loaded before a page starts rendering. Continue reading

Exploring Relationships Between Performance Metrics in HTTP Archive Data

I thought it would be interesting to explore how some of the page metrics we use to analyze web performance compare with each other. In the HTTP Archive “pages” table, metrics such as TTFB, renderStart, VisuallyComplete, onLoad and fullyLoaded are tracked. And recently some of the newer metrics such as Time to Interactive, First Meaningful Paint, First Contentful paint, etc exist in the HAR file tables.

But first, a warning about using response time data from the HTTP Archive. While the accuracy has improved since the change to Chrome based browsers on linux agents – we’re still looking at a single measurement from many sites, all run from a single location and a single browser or mobile device (Moto G4). For this reason, I’m not looking at any specific website’s performance, but rather analyzing the full data-set for patterns and insights.

Continue reading