Tutorial: Using BigQuery to Analyze Chrome User Experience Report Data

Last week I wrote a blog post showing some examples of how you can use the Chrome User Experience report to compare your site’s RUM data to competitors. In this post I’d like to share some brief videos to help you quickly get started exploring the data via Google BigQuery.

Accessing the Chrome User Experience Report (CrUX) Data

The Chrome User Experience Report data is available on Google BigQuery, which is part of the Google Cloud Platform. To get started, log into Google Cloud, create a project for your CrUX work, and then navigate to the BigQuery console. Then add the chrome-ux-report dataset and explore the way the tables are structured. Here’s a short video that walks you through this process.

Links:

Brief CrUX Overview

Now that we’ve accessed the CrUX data, let’s explore the table structure and where you can find some resources for additional help:

Links:

Comparing Form Factor and Connection Type Distributions For Different Sites

Next let’s explore what we can do with the form factor and effective connection type dimensions. In this next video we’ll explore the % of these dimensions for two sites.

Links:

Competitive Histograms

In this next video we’ll take a look at how to analyze the performance of a single metric across multiple sites in BigQuery. We’ll export the results and graph them –

Links:

Graphing all the Metrics

Finally, let’s UNION together a bunch of queries and examine the performance of a single site by creating histograms for first paint, first contentful paint, DOM Content Loaded and onLoad.

Links:

Conclusion

I hope this helps you get started digging into the CrUX data. As I mentioned in my earlier blog post on CrUX, Akamai is working on building this functionality into mPulse so that it will be easy for our customers to quickly analyze this data in the future. But for now these videos should give you a starting point to explore the data via BigQuery.

Using Google’s CrUX to Compare Your Site’s RUM Data w/ Competitors

For years Real User Measurement (RUM) has been the gold standard for how to measure the performance of web applications. And the reason for it is quite simple: there is no better measure for how users are experiencing your site, than the users’ actual experiences themselves.

Akamai’s mPulse service is one of the more popular commercial RUM offerings, and it’s based on the open source Boomerang library. We implement it by adding some JavaScript to a page (using an asynchronous non-blocking loader) to facilitate the collection of this performance data.

Google’s Chrome User Experience Report

Google CrUX stands for “Chrome User Experience Report”, and it is a new and interesting type of real user measurement data. You do not have to do anything for them to collect the measurement data, and it most likely already has some performance data for your site!

All performance data included in CrUX is from real-world conditions, aggregated from Chrome users who have opted-in to syncing their browsing history and have usage statistic reporting enabled.

The specific elements that Google is sharing are:

  • “Origin”, which consists of the protocol and hostname
  • Effective Connection Type (4G, 3G, etc)
  • Form Factor (desktop, mobile, tablet)
  • Percentile Histogram data for First Paint, First Contentful Paint, DOM Content Loaded and onLoad

The CrUX data is made available to query via BigQuery and is also included in PageSpeed Insights reports.

Don’t Have RUM? Now You Do!

In my work at Akamai, I often meet with customers to help them understand and optimize their website’s performance. Sometimes I find myself working with folks that have not used RUM data before. Recently I’ve started creating graphs from Google’s CrUX to show them what RUM data looks like for their sites. For example, the graph below shows the Desktop and Mobile load times for an ecommerce website during March 2018.

CrUX data is specific to Google Chrome browsers – and is reported only from Chrome users who have opted-in to syncing their browsing history and have usage statistic reporting enabled. It’s also a very high level snapshot of 1 month’s data across all pages on the site. But it’s an excellent starting point that you can use to see how your site performs, and compare it to stats that you are using to evaluate your performance. If you decide that RUM data would be beneficial to you then Akamai mPulse has a Lite version that is available for free.

Comparing Akamai mPulse Data to CrUX

This raises some interesting questions:

  • How does Google’s CrUX data compare to the data we are already collecting with mPulse?
  • Are there ways that Akamai can use CrUX data to add functionality to mPulse?

The histograms I’ve been creating with CrUX are based on the queries used in one of the HTTP Archive’s reports on user experience. The query aggregates metrics based on a bin size of 100ms, and uses a clever JavaScript function to spread the bin sizes when the higher response times are aggregated into larger bin sizes.

I created a similar % histogram for mPulse and layered them on top of each other. The results are very close.

If we look at the performance for this site across other popular browsers, you can see some variations –

This is just 1 example, but I’ve run this comparison across a number of different sites and can confirm that the CrUX data is accurate when compared to Chrome browsers from RUM data.

How Do You Stack Up Against the Competition?

One area that has always been lacking in RUM is the ability to see a competitive benchmark. Google CrUX changes that though – since it provides everyone the ability to look at the performance of 3 million different sites.

Akamai is working on incorporating CrUX data into mPulse so that customers can easily compare their site’s performance to others. For example, in the graphs below I’ve compared 4 different sites within the same industry. The height of the histograms provides an indications of the distribution of Desktop vs Mobile traffic, and the data aligned to the X axis indicates how their performance compared.

Since CrUX contains dimensions for form_factor and effective_connection_type, we can also compare the relative densities to see the distribution of connection and device types across multiple sites.

I’m really excited to be partnering with Google on this – as it continues to increase the value that mPulse is able to provide our customers.

I Want To Run Some CrUX Queries. Where Do I Start?

Rick Viscomi recently presented a talk titled “Not My RUM”, in which he discussed different ways to query CrUX data. He is also maintaining a really cool repository of CrUX queries, called the “CrUX Cookbook”. Check it out for some examples on how to get started.

This exciting research will be ongoing. For inquiries and to engage on ongoing projects, please contact me at pacalvan@akamai.com. In the next few days I’ll be releasing a video that walks through the process of getting set up w/ CrUX. We’ll start from a new Google account, set up BigQuery, execute a query and then graph the results in Google Docs.

Thanks to Ilya Grigorik, Rick Viscomi and Ellen Li for their help with this.

HTTP Heuristic Caching (Missing Cache-Control and Expires Headers) Explained

Have you ever wondered why WebPageTest can sometimes show that a repeat view loaded with less bytes downloaded, while also triggering warnings related to browser caching? It can seem like the test is reporting an issue that does not exist, but in fact it’s often a sign of a more serious issue that should be investigated. Often the issue is not the lack of caching, but rather lack of control over how your content is cached.

If you have not run into this issue before, then examine the screenshot below to see an example:

Continue reading

Adoption of HTTP Security Headers on the Web

Over the past few weeks the topic of security related HTTP headers has come up in numerous discussions – both with customers I work with as well as other colleagues that are trying to help improve the security posture of their customers. I’ve often felt that these headers were underutilized, and a quick test on Scott Helme’s excellent securityheaders.io site usually proves this to be true. I decided to take a deeper look at how these headers are being used on a large scale.

Looking at this data through the lens of the HTTP Archive, I thought it would be interesting to see if we could give the web a scorecard for security headers. I’ll dive deeper into how each of these headers are implemented below, but let’s start off by looking at the percentage of sites that are using these security headers. As I suspected, adoption is quite low. Furthermore, it seems that adoption is marginally higher for some of the most popular sites – but not by much.

Continue reading

Cache Control Immutable – A Year Later

In January 2017, Facebook wrote about a new Cache-Control directive – immutable – which was designed to tell supported browsers not to attempt to revalidate an object on a normal reload during it’s freshness lifetime. Firefox 49 implemented it, while Chrome went ahead with a different approach by changing the behavior of the reload button. Additionally it seems that WebKit has also implemented the immutable directive since then.

So it’s been a year – let’s see where Cache-Control immutable is being used in the wild!

Continue reading

Measuring the Performance of Firefox Quantum with RUM

On Nov 14th, Mozilla released Firefox Quantum. On launch day, I personally felt that the new version was rendering pages faster and I heard anecdotal reports indicating the same. There have also been a few benchmarks which seem to show that this latest Firefox version is getting content to screens faster than its predecessor. But I wanted to try a different approach to measurement.

Given the vast amount of performance information that we collect at Akamai, I thought it would be interesting to benchmark the performance of Firefox Quantum with a large set of real end-user performance data. The results were dramatic: the new browser improved DOM Content Loaded time by an extremely impressive 24%. Let’s take a look at how those results were achieved.



Continue reading

Which 3rd Party Content Loads Before Render Start?

Since the HTTP Archive is capturing the timing information on each request, I thought it would be interesting to correlate request timings (ie, when an object was loaded) with page timings. The idea is that we can categorize resources that were loaded before or after and event.

Content Type Loaded Before/After Render Start It’s generally well known that third party content impacts performance. We see this with both resource loading, and JavaScript execution blocking the browser from loading other content. While we don’t have the data to evaluate script execution timings per resource captured here, we can definitely look at when resources were loaded with respect to certain timings and get an idea of what is being loaded before a page starts rendering. Continue reading

Exploring Relationships Between Performance Metrics in HTTP Archive Data

I thought it would be interesting to explore how some of the page metrics we use to analyze web performance compare with each other. In the HTTP Archive “pages” table, metrics such as TTFB, renderStart, VisuallyComplete, onLoad and fullyLoaded are tracked. And recently some of the newer metrics such as Time to Interactive, First Meaningful Paint, First Contentful paint, etc exist in the HAR file tables.

But first, a warning about using response time data from the HTTP Archive. While the accuracy has improved since the change to Chrome based browsers on linux agents – we’re still looking at a single measurement from many sites, all run from a single location and a single browser or mobile device (Moto G4). For this reason, I’m not looking at any specific website’s performance, but rather analyzing the full data-set for patterns and insights.

Continue reading

Tracking Page Weight Over Time

As of July 2017, the “average” page weight is 3MB. @Tammy wrote an excellent blog post about HTTP Archive page stats and trends. Last year @igrigorik published an analysis on page weight using CDF plots. And of course, we can view the trends over time on the HTTP Archive trends page. Since this is all based on HTTP Archive data, I thought I’d start a thread here to continue the discussion on how to gauge the increase in page weight over time.

Continue reading