<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="https://paulcalvano.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://paulcalvano.com/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-02-09T12:27:11+00:00</updated><id>https://paulcalvano.com/feed.xml</id><title type="html">Paul Calvano</title><subtitle>Paul Calvano is a Performance Architect at Etsy, where he helps optimize the performance of their online marketplace.
</subtitle><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><entry><title type="html">Serving Static Content with Cloud Storage? Don’t Forget the CDN!</title><link href="https://paulcalvano.com/2026-02-09-serving-static-content-with-cloud-storage-dont-forget-the-cdn/" rel="alternate" type="text/html" title="Serving Static Content with Cloud Storage? Don’t Forget the CDN!" /><published>2026-02-09T04:00:00+00:00</published><updated>2026-02-09T12:26:57+00:00</updated><id>https://paulcalvano.com/serving-static-content-with-cloud-storage-dont-forget-the-cdn</id><content type="html" xml:base="https://paulcalvano.com/2026-02-09-serving-static-content-with-cloud-storage-dont-forget-the-cdn/">&lt;p&gt;Many websites use cloud storage as part of how they deliver static content to end users. However using these services without a CDN in front of them has the potential to negatively impact performance. You might be surprised by how many sites do just that - about 8.5% of websites that use a CDN for their primary content, based on &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; data from January 2026!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Content Delivery Networks (CDNs) are an essential component of website delivery, especially when it comes to web performance. According to the &lt;a href=&quot;https://almanac.httparchive.org/en/2025/&quot;&gt;2025 Web Almanac&lt;/a&gt;, approximately &lt;a href=&quot;https://almanac.httparchive.org/en/2025/cdn#fig-3&quot;&gt;70% of popular websites use them&lt;/a&gt;. CDNs enable users to connect to servers that are geographically close to them (and at lower latencies), provide distributed caching to offload backend/origin servers and provide other services such as image optimization, edge compute and security. Some CDNs offer cloud storage services, but pretty much all of them can sit in front of another cloud provider’s storage when it is configured for web delivery.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/httparchive-cdn-usage.jpg&quot; alt=&quot;WebAlmanac CDN Usage&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Many websites with a cloud backend/origin host their static content on cloud storage services such as &lt;a href=&quot;https://cloud.google.com/storage?hl=en&quot;&gt;Google Cloud Storage&lt;/a&gt;, &lt;a href=&quot;https://aws.amazon.com/pm/serv-s3/&quot;&gt;Amazon S3&lt;/a&gt;, &lt;a href=&quot;https://www.oracle.com/cloud/storage/object-storage/&quot;&gt;Oracle Cloud ObjectStorage&lt;/a&gt; and &lt;a href=&quot;https://azure.microsoft.com/en-us/products/storage/blobs&quot;&gt;Azure Blob Storage&lt;/a&gt;. However it’s important to understand that these services operate as backends and do not automatically provide CDN functionality. Put another way: your content is only behind a CDN if you configure it to be!&lt;/p&gt;

&lt;p&gt;In numerous web performance audits over the years, I’ve found cloud storage hostnames being used to deliver static content to end users. During my &lt;a href=&quot;https://paulcalvano.com/speaking/#:~:text=I%E2%80%99m%20presenting%20in.-,Performance%20Mistakes,-(2024)%20%2D%20Performance%20Now&quot;&gt;Performance Mistakes talk&lt;/a&gt; in November 2024, I shared that there were 580K websites serving content directly from Amazon S3. This has the potential to negatively impact performance, since those resources are generally served from the locations they are hosted in and not a CDN.&lt;/p&gt;

&lt;p&gt;I queried the HTTP Archive to see how common this still is and found a few surprising statistics -&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Overall 5.8% of websites serve at least 1 request directly from cloud storage!&lt;/li&gt;
  &lt;li&gt;8.53% of websites that use a CDN for delivering their primary content, serve at least 1 request directly from cloud storage!&lt;/li&gt;
  &lt;li&gt;The number of websites serving content directly from Amazon S3 is now 629K - which is an increase of 8.5% since November 2024!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/websites-delivering-assets-via-cloud-storage.jpg&quot; alt=&quot;Websites Delivering Assets via Cloud Storage&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud Storage is not distributed like CDNs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you configure cloud storage services, you usually have to define which region your content will be hosted in. This is essentially where your static content will live. When you deliver that content directly via a cloud storage solution, then it will fetch the asset the same as it would if it was hosted on a webserver.&lt;/p&gt;

&lt;p&gt;For example, below is a request that was delivered from a European news site. I’m browsing it from the northeast US. The hostname &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s3.eu-central-1.amazonaws.com&lt;/code&gt; indicates that this content is being delivered from Amazon’s S3 region in Frankfurt, and I can confirm the same via a traceroute.&lt;/p&gt;

&lt;p&gt;When I examine this in Chrome DevTools, I can see very high TCP and TLS connection times for this request!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/latency-example.jpg&quot; alt=&quot;Latency Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What type of content is delivered directly via Cloud Storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this analysis, I’ve searched for 4 different popular cloud storage providers based on their documented URL structures for web delivery. For Amazon S3 I’m looking for hostnames ending in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;amazonaws.com&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s3&lt;/code&gt; optionally within the hostname. For Google Cloud Storage, I’m looking for a subdomain of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;storage.googleapis.com&lt;/code&gt;. For Azure, a subdomain of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;windows.net&lt;/code&gt; and for Oracle a subdomain of either &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;oraclecloud.com&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;customer-oci.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The graph below illustrates the amount of requests delivered by these services across all measured sites, and it’s grouped by content type. Amazon S3 is by far the most commonly used cloud storage service when it comes to this type of delivery, followed by Google Cloud Storage.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/static-asset-types-delivered-by-cloud-storage.jpg&quot; alt=&quot;Static Asset Types Delivered by Cloud Storage&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If we look at the same data by the percentage of requests to each cloud storage service, we can see that regardless of the service almost 75% of content delivered from cloud storage are scripts and images. This type of content is best served to users via a CDN rather than a centralized storage solution.&lt;/p&gt;

&lt;p&gt;Delivery of JSON content is also common. While JSON can be dynamically generated, if it’s being served from a cloud storage then it is likely cacheable. HTML delivery is less common, except for Oracle Cloud where it represents 11.8% of cloud storage requests!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/distribution-of-static-asset-types-delivered-by-cloud-storage.jpg&quot; alt=&quot;Distribution of Static Asset Types Delivered by Cloud Storage&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caching and Compression&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One thing that may not be apparent when configuring cloud storage for delivery of web content is that compression and downstream caching are often not enabled by default.&lt;/p&gt;

&lt;p&gt;Based on the HTTP Archive, a majority of compressible content types are not being served compressed when delivered directly from Cloud Storage. This is a critical performance mistake, and is often overlooked because most web servers and CDNs do this by default!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/static-asset-compression-when-delivered-via-cloud-storage.jpg&quot; alt=&quot;Static Asset Compression When Delivered via Cloud Storage&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Most content being delivered from cloud storage should be cacheable unless personalized. However the percentage of cloud storage services including Cache-Control headers is incredibly low (with the exception of Google Cloud Storage). That means that clients will have to frequently make requests to these cloud storage services.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;strong&gt;Service&lt;/strong&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;strong&gt;Cacheable Requests&lt;/strong&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;strong&gt;Non-Cacheable Requests&lt;/strong&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;strong&gt;% Cacheable&lt;/strong&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;strong&gt;% Non Cacheable&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;Amazon S3&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;736,950&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;2,587,320&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;22.2%&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;77.8%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;Google Cloud Storage&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;969,660&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;259,629&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;78.9%&lt;/td&gt;
   &lt;td style=&quot;background-color: #fff4e0; text-align: right;&quot;&gt;21.1%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;Azure Blob Storage&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;102,857&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;376,291&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;21.5%&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;78.5%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;Oracle Cloud Object Storage&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;6,763&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;27,877&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;19.5%&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;80.5%&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Digging a bit deeper we can see that images, JS, JSON and CSS account for most of the non-cacheable content delivered via Amazon S3. These asset types are often delivered with cache-control headers allowing caching from Google Cloud Storage.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot; colspan=&quot;2&quot;&gt;Amazon S3&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot; colspan=&quot;2&quot;&gt;Google Cloud Storage&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot; colspan=&quot;2&quot;&gt;Azure Blob Storage&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot; colspan=&quot;2&quot;&gt;Oracle Cloud Object Storage&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;&lt;em&gt;type&lt;/em&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;not-cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;not-cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;not-cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;cacheable&lt;/td&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;not-cacheable&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;image&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;561,592&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;1,743,876&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;489,387&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;131,988&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;84,912&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;284,536&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;4,668&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;18,132&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;script&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;91,890&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;272,434&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;370,127&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;45,621&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;5,378&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;26,061&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;1,658&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;2,599&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;json&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;41,046&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;201,543&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;48,534&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;27,917&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;303&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;6,083&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;263&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;1,126&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;css&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;26,926&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;103,007&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;30,893&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;1,340&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;2,919&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;18,666&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;141&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;863&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;xml&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;40&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;85,778&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;193&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;11,909&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;9&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;9,186&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;33&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;other&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;2,111&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;82,043&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;4,623&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;2,170&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;408&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;14,539&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;903&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;font&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;13,002&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;33,652&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;24,619&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;1,344&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;8,242&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;8,511&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;30&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;31&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;video&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;191&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;37,699&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;441&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;19,435&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;680&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;5,183&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;79&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;text&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;148&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;16,395&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;386&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;4,245&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;2&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;2,840&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;37&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;html&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;4,350&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;20&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;10,382&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;541&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;4,071&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;audio&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;3&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;6,514&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;431&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;3,278&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;4&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;138&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;3&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;3&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&quot;text-align: center;&quot;&gt;wasm&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;1&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;29&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;6&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;background-color: #ffe5e5; text-align: right;&quot;&gt;7&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
   &lt;td style=&quot;text-align: right;&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Here’s an example from a popular movie theater chain in the US. This content appears to be loaded by a 3rd party (Unbounce), and that third party is configured to load images directly from S3. There are no cache headers present, which means that the browser will &lt;a href=&quot;https://paulcalvano.com/2018-03-14-http-heuristic-caching-missing-cache-control-and-expires-headers-explained/&quot;&gt;heuristically cache&lt;/a&gt; the resources. On this particular page, this third party’s S3 content accounted for over 9MB of content - none of them containing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cache-control&lt;/code&gt; header! Beyond that there are a few opportunities for image optimization that could be applied.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/caching-example.jpg&quot; alt=&quot;Caching Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In an example from another movie theater’s website, we can see render-blocking CSS and JS loaded from Amazon S3, with no &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cache-control&lt;/code&gt; header to indicate caching, and no compression. This is perhaps the worst case scenario - where you have content critical to the rendering of your website that is loaded slowly from a centralized location, with unnecessary large payloads and then unpredictable caching (or no caching) due to a missing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cache-control&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/serving-web-content-with-cloud-storage-dont-forget-the-cdn/caching-compression-example.jpg&quot; alt=&quot;Caching Compression Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud Storage vs CDN Delivery Costs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s also worth evaluating how much delivering traffic directly from these cloud storage solutions is costing. Depending on your contracts with the providers, you may find that delivering directly via cloud storage is more expensive compared to CDN delivery - especially if you are not caching or compressing the content! If that is the case, you can get a double-win by saving money while improving performance!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s incredible that 8.5% of websites that utilize a CDN are delivering content to users directly from cloud storage services. Discovering and fixing these could provide a quick performance boost. When you audit your website’s performance, if you notice cloud storage hostnames then you should definitely investigate how they got there, and move that content behind your CDNs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP Archive queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This section provides some details on how this analysis was performed, including SQL queries. Please be warned that some of the SQL queries process a significant amount of bytes - which can be very expensive to run.&lt;/p&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;What Percentage of CDN delivered sites have request for Cloud Storage hosted assets&lt;/b&gt;&lt;/summary&gt;
   This query counts the number of sites that contain at least 1 request for an asset delivered directly via a Cloud Storage service (without a CDN).
  &lt;pre&gt;&lt;code&gt;
 SELECT
    IF(JSON_VALUE(p.summary.cdn) IS NOT NULL AND NOT JSON_VALUE(p.summary.cdn) = &quot;&quot;,true, false) AS usesCDN,
    CASE
       WHEN NET.HOST(url) LIKE &quot;%.amazonaws.com&quot; THEN &quot;Amazon S3&quot;
       WHEN NET.HOST(url) LIKE &quot;%storage.googleapis.com&quot; THEN &quot;Google Cloud Storage&quot;
       WHEN NET.HOST(url) LIKE &quot;%.windows.net&quot; THEN &quot;Azure Blob Storage&quot;
       WHEN NET.HOST(url) LIKE &quot;%.oraclecloud.com&quot; THEN &quot;Oracle Cloud Object Storage&quot;
       WHEN NET.HOST(url) LIKE &quot;%.customer-oci.com&quot; THEN &quot;Oracle Cloud Object Storage&quot;
       ELSE &quot;Unknown&quot;
    END AS CloudStorage,
    COUNT(DISTINCT p.page) AS sites
FROM `httparchive.crawl.requests` AS r
INNER JOIN `httparchive.crawl.pages` AS p
ON r.page = p.page
WHERE
   p.date = &quot;2026-01-01&quot; AND r.date = &quot;2026-01-01&quot;
   AND p.client = &quot;mobile&quot; AND r.client = &quot;mobile&quot;
   AND p.is_root_page = TRUE AND r.is_root_page = TRUE
   AND (
     NET.HOST(url) LIKE &quot;%.amazonaws.com&quot;
     OR NET.HOST(url) LIKE &quot;%storage.googleapis.com&quot;
     OR NET.HOST(url) LIKE &quot;%.windows.net&quot;
     OR NET.HOST(url) LIKE &quot;%.oraclecloud.com&quot;
     OR NET.HOST(url) LIKE &quot;%.customer-oci.com&quot;
   )
GROUP BY 1,2
&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Caching and Compression of Cloud Storage Delivered Assets&lt;/b&gt;&lt;/summary&gt;
   This query counts the number of requests being delivered cached and compressed by content type. 
  &lt;pre&gt;&lt;code&gt;
 
SELECT
  IF(JSON_VALUE(p.summary.cdn) IS NOT NULL AND NOT JSON_VALUE(p.summary.cdn) = &quot;&quot;,true, false) AS usesCDN,
  CASE
    WHEN NET.HOST(url) LIKE &quot;%.amazonaws.com&quot; THEN &quot;Amazon S3&quot;
    WHEN NET.HOST(url) LIKE &quot;%storage.googleapis.com&quot; THEN &quot;Google Cloud Storage&quot;
    WHEN NET.HOST(url) LIKE &quot;%.windows.net&quot; THEN &quot;Azure Blob Storage&quot;
    WHEN NET.HOST(url) LIKE &quot;%.oraclecloud.com&quot; THEN &quot;Oracle Cloud Object Storage&quot;
    WHEN NET.HOST(url) LIKE &quot;%.customer-oci.com&quot; THEN &quot;Oracle Cloud Object Storage&quot;
    ELSE &quot;Unknown&quot;
  END AS CloudStorage,
  JSON_VALUE(r.summary.type) AS type,
  CASE JSON_VALUE(r.payload._contentEncoding)
    WHEN 'gzip' THEN 'Gzip'
    WHEN 'br' THEN 'Brotli'
    WHEN 'zstd' THEN 'zStandard'
    ELSE 'No compression'
  END AS compression_type,
  IF(SAFE_CAST(JSON_VALUE(r.payload._cache_time) AS INT64) &amp;gt; 0, &quot;cacheable&quot;, &quot;not-cacheable&quot;) AS cacheable,
  COUNT(DISTINCT p.page) AS sites,
  COUNT(*) AS requests
FROM `httparchive.crawl.requests` AS r
INNER JOIN `httparchive.crawl.pages` AS p
ON r.page = p.page
WHERE
  p.date = &quot;2026-01-01&quot; AND r.date = &quot;2026-01-01&quot;
  AND p.client = &quot;mobile&quot; AND r.client = &quot;mobile&quot;
  AND p.is_root_page = TRUE AND r.is_root_page = TRUE
  AND (
      NET.HOST(url) LIKE &quot;%.amazonaws.com&quot;
      OR NET.HOST(url) LIKE &quot;%storage.googleapis.com&quot;
      OR NET.HOST(url) LIKE &quot;%.windows.net&quot;
      OR NET.HOST(url) LIKE &quot;%.oraclecloud.com&quot;
      OR NET.HOST(url) LIKE &quot;%.customer-oci.com&quot;
  )
GROUP BY 1,2,3,4,5
&lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Examples of Sites that load content from cloud storage services&lt;/b&gt;&lt;/summary&gt;
   Detailed examples of sites that are loading content from cloud storage services. 
  &lt;pre&gt;&lt;code&gt;
 SELECT
   p.rank,
   r.page AS page,
   JSON_VALUE(p.summary.cdn) AS pageCDN,
   NET.HOST(r.url) AS hostname,
   COUNT(*) AS requests,
   SUM(CAST(JSON_VALUE(r.summary.respBodySize) AS INT64)/1024/1024) AS responseMB,
   STRING_AGG(DISTINCT JSON_VALUE(r.summary.type)) AS types,
   FROM `httparchive.crawl.requests` AS r
INNER JOIN `httparchive.crawl.pages` AS p
ON r.page = p.page
WHERE
   p.date = &quot;2026-01-01&quot; AND r.date = &quot;2026-01-01&quot;
   AND p.client = &quot;mobile&quot; AND r.client = &quot;mobile&quot;
   AND p.is_root_page = TRUE AND r.is_root_page = TRUE
   AND p.rank &amp;lt;= 1000 AND r.rank &amp;lt;= 1000
   AND (
     NET.HOST(url) LIKE &quot;%.amazonaws.com&quot;
     OR NET.HOST(url) LIKE &quot;%storage.googleapis.com&quot;
     OR NET.HOST(url) LIKE &quot;%.windows.net&quot;
     OR NET.HOST(url) LIKE &quot;%.oraclecloud.com&quot;
     OR NET.HOST(url) LIKE &quot;%.customer-oci.com&quot;
   )
GROUP BY 1,2,3,4
ORDER BY 6 DESC

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">Many websites use cloud storage as part of how they deliver static content to end users. However using these services without a CDN in front of them has the potential to negatively impact performance. You might be surprised by how many sites do just that - about 8.5% of websites that use a CDN for their primary content, based on HTTP Archive data from January 2026!</summary></entry><entry><title type="html">Third Parties and Single Points of Failure</title><link href="https://paulcalvano.com/2025-12-29-third-parties-and-single-points-of-failure/" rel="alternate" type="text/html" title="Third Parties and Single Points of Failure" /><published>2025-12-29T04:00:00+00:00</published><updated>2026-02-07T18:51:58+00:00</updated><id>https://paulcalvano.com/third-parties-and-single-points-of-failure</id><content type="html" xml:base="https://paulcalvano.com/2025-12-29-third-parties-and-single-points-of-failure/">&lt;p&gt;You’ve heard it many times - third party content can easily cause an otherwise well performing website to become sluggish and slow. And depending on how this content is loaded, it can also introduce single points of failure (SPOFs). When a large cloud provider or content delivery network (CDN) experiences a disruption, their impacts are felt across the world and often triggers headlines about the many websites that were affected. However, there are numerous secondary impacts triggered by third party content, which can be disruptive even to companies that don’t use the affected provider.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/spof-warning.jpg&quot; alt=&quot;Caution - Single Points of Failure &quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In this blog post I’ll discuss some of the performance and availability risks associated with third party content and how you can test for these single points of failure (SPOFs) on your websites. I’ll use &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; data to explore how many sites could be at risk and &lt;a href=&quot;https://rumarchive.com/&quot;&gt;RUM Archive&lt;/a&gt; to get a sense of how prone to slowdowns some of these third parties are!&lt;/p&gt;

&lt;h1 id=&quot;third-party-failure--spof&quot;&gt;Third Party Failure = SPOF&lt;/h1&gt;

&lt;p&gt;A third-party single point of failure (SPOF) can occur when a website depends on an external service for critical or render-blocking resources. If the third party content fails to load, then the page load may be stalled until the request times out. For example, in the two filmstrips (taken from a &lt;a href=&quot;https://docs.webpagetest.org/private-instances/&quot;&gt;WebPageTest private instance&lt;/a&gt;) you can see how a failure of a third-party affects the user experience. The SPOF measurement in this example simulates what a client would see if a render-blocking third party, such as a consent management service, became unavailable. In this case, the user would essentially see a blank screen until the third party request times out.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/spof-filmstrip-wpt-private-instance.jpg&quot; alt=&quot;WebPageTest Private Instance Filmstrip showing a SPOF&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Most recently, people have been noticing SPOFs due to outages experienced by large cloud providers and CDNs. For example, a major CDN recently experienced a few high profile outages and many of their customers were impacted. However, many sites that don’t use their services directly were also affected due to third party content that was being delivered through them. That could have been avoided by self hosting content that is critical to loading of the website.&lt;/p&gt;

&lt;p&gt;This risk isn’t just isolated to CDNs, nor is it a new issue. For example, many years ago websites started adding Facebook like buttons to their pages via synchronously loaded scripts. When Facebook experienced &lt;a href=&quot;https://www.forbes.com/sites/ericsavitz/2012/06/01/facebook-outage-slowed-1000s-of-retail-content-sites/&quot;&gt;an outage way back in 2012&lt;/a&gt;, the failed third party requests slowed down a large number of websites and browsers were stalled waiting for a script to load. Facebook fixed this a long time ago, and even introduced a &lt;a href=&quot;https://calendar.perfplanet.com/2012/the-non-blocking-script-loader-pattern/&quot;&gt;non-blocking script loader pattern&lt;/a&gt; for other websites to adopt. Additionally, &lt;a href=&quot;https://developers.facebook.com/docs/plugins/like-button/&quot;&gt;the like button is being deprecated&lt;/a&gt; in February 2026. They plan to return an empty response to avoid issues on sites that will inevitably forget to remove them!&lt;/p&gt;

&lt;p&gt;There have been other occurrences where a third party stops providing their service or is compromised as well. Using the HTTP Archive, we can see how many websites are still referencing this old content. For example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;RawGit &lt;a href=&quot;https://rawgit.com/&quot;&gt;announced&lt;/a&gt; it was shutting down in 2018. Their webpage recommends alternatives, but based on the November 2025 HTTP Archive data, there are over 25K websites still requesting content from it!&lt;/li&gt;
  &lt;li&gt;In February 2024, the Polyfill.io service was sold to a third party and concerns were raised immediately about what that meant for sites using the service. By June 2024 it was used to deliver malicious content via a &lt;a href=&quot;https://thehackernews.com/2024/07/polyfillio-attack-impacts-over-380000.html&quot;&gt;supply chain attack&lt;/a&gt;. Based on HTTP Archive data, I can see references to this third party on 161K websites in February 2024, and then it gradually decreased from 115K to 109K through June 2024!  All occurrences of its use stopped after July 2024 – possibly due to intervention by browser vendors and their domain registrar taking down the domain name.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Non-web applications can also experience issues from third party failures as well. For example, a while back one of Etsy’s CDNs experienced an incident and I needed to fail over our traffic to our other CDN. During the failover, I discovered that part of the process for this CDN failover actually relied on a third party which had a dependency on the same CDN I needed to route away from! To avoid a similar occurrence, a break-glass failover mechanism was created to bypass all dependencies on CDNs during a failover event.&lt;/p&gt;

&lt;h1 id=&quot;but-i-dont-use-provider&quot;&gt;But I Don’t Use &amp;lt;Provider&amp;gt;?&lt;/h1&gt;

&lt;p&gt;When you add a third party to your website, you are telling the browser to load their content as if it were your own. Depending on how that content is loaded, it could block the page from rendering anything to the screen. If a render-blocking third party happens to be served by a provider that is experiencing a performance degradation or outage, then the site’s performance may be significantly affected.&lt;/p&gt;

&lt;p&gt;Any third party content added to your website carries risks, and should be added with care and tested thoroughly. However, extra care should be taken when it comes to render-blocking requests. This isn’t new advice either - Steve Souders first &lt;a href=&quot;https://www.stevesouders.com/blog/2010/06/01/frontend-spof/&quot;&gt;wrote about this in 2010&lt;/a&gt;! There have been PerfPlanet articles about this dating back to 2011 (just search for &lt;a href=&quot;https://calendar.perfplanet.com/?s=SPOF&quot;&gt;SPOF&lt;/a&gt;)! A simple recommendation remains relevant 15 years later: &lt;em&gt;avoid third party single points of failure, and test/monitor your pages to ensure that none are introduced.&lt;/em&gt;&lt;/p&gt;

&lt;h1 id=&quot;how-prevalent-are-third-party-spofs-today&quot;&gt;How Prevalent are Third Party SPOFs Today?&lt;/h1&gt;

&lt;p&gt;I’m raising a lot of concern about a problem that has been well-known and talked about for over 15 years. You might wonder how much of an issue this can still be on today’s web. According to the December 2025 &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; dataset, a shocking 67.7% (of 15.78 million websites) request at least one render-blocking third party! &lt;em&gt;Note: This classification is based on request URLs that are from a different domain name than the page URLs. This does not include subdomains of the page URLs host, which means the actual numbers may be higher!)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Additionally, 60% (9.4 million) of sites load at least one render-blocking third party through a different CDN from their primary content! Even more shocking is that there are over 1 million websites loading render-blocking content via a third party that does not use a CDN at all!&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td colspan=&quot;3&quot; align=&quot;center&quot;&gt;&lt;strong&gt;Sites with Render Blocking Third Parties - HTTP Archive December 2025&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;Category&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;Number of Sites&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;% of Sites&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Number of Sites&lt;/td&gt;
   &lt;td&gt;15,780,490&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Sites with a Render Blocking Third Party&lt;/td&gt;
   &lt;td&gt;10,683,209&lt;/td&gt;
   &lt;td&gt;67.70%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Sites with a Render Blocking Third Party on a different CDN&lt;/td&gt;
   &lt;td&gt;9,444,516&lt;/td&gt;
   &lt;td&gt;59.85%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Sites with a Render Blocking Third Party not using a CDN&lt;/td&gt;
   &lt;td&gt;1,075,294&lt;/td&gt;
   &lt;td&gt;6.81%&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;If we focus on the websites that have a render-blocking third party hosted on a different CDN, you can see that the request types skew towards CSS and JavaScript (with many sites loading both of these types of requests).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/third-party-render-blocking-requests-http-archive.jpg&quot; alt=&quot;Third Party Render Blocking Requests by Content Type - HTTP Archive Dec 2025&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When we break this down by hostname and content type, you can see that Google Fonts is one of the larger sources of render-blocking third parties. There are also other font providers like Typekit and Fontawesome as well. Since CSS is generally render-blocking, including third party CSS for font loading may introduce a SPOF risk!!&lt;/p&gt;

&lt;p&gt;Additionally, there are hundreds of thousands of sites that are utilizing third parties to load libraries onto their sites. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdnjs.cloudflare.com&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdn.jsdelivr.net&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ajax.googleapis.com&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;code.jquery.com&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;maxcdn.bootstrapcdn.com&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unpkg.com&lt;/code&gt;. If you find yourself using any of these to deliver content that is critical to the loading of your website, I highly encourage you to read Harry Robert’s excellent writeup about why you should &lt;a href=&quot;https://csswizardry.com/2019/05/self-host-your-static-assets/&quot;&gt;self host your static assets&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td colspan=&quot;6&quot; align=&quot;center&quot;&gt;&lt;strong&gt;Websites w/ Third Party Content - HTTP Archive December 2025&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;hostname&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;css&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;script&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;other&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;html&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;text&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;fonts.googleapis.com&lt;/td&gt;
   &lt;td&gt;5,837,138&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;22&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;www.gstatic.com&lt;/td&gt;
   &lt;td&gt;1,277,310&lt;/td&gt;
   &lt;td&gt;1,273,686&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;cdnjs.cloudflare.com&lt;/td&gt;
   &lt;td&gt;675,493&lt;/td&gt;
   &lt;td&gt;390,541&lt;/td&gt;
   &lt;td&gt;9&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;115&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;cdn.jsdelivr.net&lt;/td&gt;
   &lt;td&gt;660,322&lt;/td&gt;
   &lt;td&gt;351,682&lt;/td&gt;
   &lt;td&gt;20&lt;/td&gt;
   &lt;td&gt;34&lt;/td&gt;
   &lt;td&gt;6&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;www.youtube.com&lt;/td&gt;
   &lt;td&gt;772,925&lt;/td&gt;
   &lt;td&gt;39,762&lt;/td&gt;
   &lt;td&gt;5&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ajax.googleapis.com&lt;/td&gt;
   &lt;td&gt;57,354&lt;/td&gt;
   &lt;td&gt;702,890&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;code.jquery.com&lt;/td&gt;
   &lt;td&gt;67,244&lt;/td&gt;
   &lt;td&gt;368,220&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;maxcdn.bootstrapcdn.com&lt;/td&gt;
   &lt;td&gt;332,198&lt;/td&gt;
   &lt;td&gt;49,794&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;use.fontawesome.com&lt;/td&gt;
   &lt;td&gt;315,930&lt;/td&gt;
   &lt;td&gt;21,300&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;6&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;tpc.googlesyndication.com&lt;/td&gt;
   &lt;td&gt;173&lt;/td&gt;
   &lt;td&gt;299,810&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;use.typekit.net&lt;/td&gt;
   &lt;td&gt;251,913&lt;/td&gt;
   &lt;td&gt;43,365&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;6&lt;/td&gt;
   &lt;td&gt;71&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;static.xx.fbcdn.net&lt;/td&gt;
   &lt;td&gt;147,085&lt;/td&gt;
   &lt;td&gt;136,605&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p.typekit.net&lt;/td&gt;
   &lt;td&gt;272,158&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;static1.squarespace.com&lt;/td&gt;
   &lt;td&gt;236,853&lt;/td&gt;
   &lt;td&gt;6,458&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;definitions.sqspcdn.com&lt;/td&gt;
   &lt;td&gt;204,076&lt;/td&gt;
   &lt;td&gt;37,235&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;unpkg.com&lt;/td&gt;
   &lt;td&gt;127,853&lt;/td&gt;
   &lt;td&gt;84,910&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;13,738&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;pagead2.googlesyndication.com&lt;/td&gt;
   &lt;td&gt;435&lt;/td&gt;
   &lt;td&gt;203,536&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;204&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;www.google.com&lt;/td&gt;
   &lt;td&gt;326&lt;/td&gt;
   &lt;td&gt;186,648&lt;/td&gt;
   &lt;td&gt;146&lt;/td&gt;
   &lt;td&gt;14,379&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;assets.squarespace.com&lt;/td&gt;
   &lt;td&gt;135,575&lt;/td&gt;
   &lt;td&gt;41,561&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;c0.wp.com&lt;/td&gt;
   &lt;td&gt;85,064&lt;/td&gt;
   &lt;td&gt;90,345&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;stackpath.bootstrapcdn.com&lt;/td&gt;
   &lt;td&gt;101,810&lt;/td&gt;
   &lt;td&gt;27,460&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;kit.fontawesome.com&lt;/td&gt;
   &lt;td&gt;4,497&lt;/td&gt;
   &lt;td&gt;101,627&lt;/td&gt;
   &lt;td&gt;14&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;maps.googleapis.com&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;98,646&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;a.amxrtb.com&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;97,849&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;cdn-cookieyes.com&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;91,836&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;10&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;consent.cookiebot.com&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;79,041&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;t1.daumcdn.net&lt;/td&gt;
   &lt;td&gt;22,278&lt;/td&gt;
   &lt;td&gt;54,285&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;h1 id=&quot;third-party-slowdowns&quot;&gt;Third Party Slowdowns&lt;/h1&gt;

&lt;p&gt;While major cloud provider failures are not a frequent occurrence, slowdowns and microoutages absolutely are. If your content is delivered by a third party then the performance of that content is out of your control. Self hosting where possible will put you in control of the delivery of this content and reduce the risk of a slowdown triggered by a third party. If you have to load any content from a third party, you can (and should) test for single points of failure and monitor their performance.&lt;/p&gt;

&lt;p&gt;Some Real User Monitoring (RUM) services have the ability to collect resource timing data, which makes it possible to monitor third party content across all your site’s visitors. You can also monitor via synthetic measurements as well - which may be helpful in case payload sizes drastically increase. At Etsy I use &lt;a href=&quot;https://www.speedcurve.com/blog/performance-budgets/&quot;&gt;Speedcurve’s performance budgets&lt;/a&gt; to detect shifts in quantifiable metrics (such as requests, sizes, etc). I often correlate alerts with our RUM data to determine if a third party change resulted in a slowdown, and also add custom metrics for third parties that may present a SPOF risk. In the example below, you can see that the number of blocking scripts and stylesheets are consistently low - but the budget is set so that any shift from the high-water mark will trigger an alert.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/speedcurve-performance-budgets.jpg&quot; alt=&quot;Speedcurve Performane Budgets&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Speedcurve also provides the ability to &lt;a href=&quot;https://support.speedcurve.com/docs/first-third-parties&quot;&gt;track individual third parties&lt;/a&gt; as part of their performance budgets - which can be helpful if you have identified a third party that you want to track over time.&lt;/p&gt;

&lt;p&gt;I’m also using &lt;a href=&quot;https://www.catchpoint.com/synthetic-monitoring&quot;&gt;Catchpoint&lt;/a&gt; to trend third party domains over time. In the graph below, I’m aggregating results from a large number of Chrome synthetic measurements, grouping the results by hostname, and excluding first party domains. This type of reporting provides the ability to monitor all discovered third parties over time; collecting insights into their performance, availability, requests counts and payload sizes. In the example below you can see that one of the third parties experienced a large outage on November 25th, and another experienced a smaller outage on December 5th. Fortunately these were not render blocking and did not impact the user experience.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/catchpoint-third-party-monitoring.jpg&quot; alt=&quot;Catchpoint Third Party Monitoring&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://rumarchive.com/&quot;&gt;The RUM Archive&lt;/a&gt; is also an interesting project for analyzing third party performance data. This &lt;a href=&quot;https://rumarchive.com/datasets/&quot;&gt;dataset&lt;/a&gt; comes from approximately 100 of Akamai’s customers mPulse data. While this will not capture as many third parties as we can observe in the HTTP Archive, it does provide the ability to look at their real world performance data! When I combined these two data sources for render blocking third parties, I found 15 third parties that had at least 1 million RUM measurements.   The table below breaks down their DNS, TCP and TTFB times. Many of these third parties have relatively fast DNS times, but there’s a lot of room for improvement when it comes to TCP connection times and TTFB.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/rumarchive-third-party-resource-timings.jpg&quot; alt=&quot;RUM Archive Third Party Resource Timings&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;While the table above illustrates the p75 timings, we can also look at other percentiles as well to get a larger picture of the overall performance impact of these third parties. The graph below illustrates the inter-quartile range for TTFB, with the bars representing the p25 through p75 - essentially 50% of all measurements.    The whiskers represent the p5 and p95. A few third parties stand out for having very poor TTFB, while CookieLaw and Critero seem to have the fastest and most consistent performance.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/rumarchive-third-party-ttfb-percentiles.jpg&quot; alt=&quot;RUM Archive Third Party TTFB Percentiles&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;how-to-detect-render-blocking-third-parties&quot;&gt;How to Detect Render-Blocking Third Parties&lt;/h1&gt;

&lt;p&gt;When you run a test in &lt;a href=&quot;https://www.webpagetest.org/&quot;&gt;WebPageTest&lt;/a&gt;, you can see a yellow circle with an X in the waterfall next to each request that is render-blocking. A similar visual is used in the legacy UI if you are using a &lt;a href=&quot;https://docs.webpagetest.org/private-instances/&quot;&gt;WebPageTest private instance&lt;/a&gt;. If you notice any render blocking third party domains, then make a note of them so that you can run SPOF tests.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/wpt-render-blocking-requests.jpg&quot; alt=&quot;WebPageTest Render Blocking Requests&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There’s also my &lt;a href=&quot;https://tools.paulcalvano.com/wpt-third-party-analysis/&quot;&gt;Third Party Analyzer&lt;/a&gt; tool, which will highlight SPOF risks based on resources that are render blocking and load before FCP. This works with any WebpageTest private instance tests, Catchpoint WPT shared URLs, or Speedcurve tests. You can read more about this tool &lt;a href=&quot;https://paulcalvano.com/2024-09-03-discovering-third-party-performance-risks/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/third-party-analyzer-render-blocking-requests.jpg&quot; alt=&quot;WebPageTest Render Blocking Requests&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you run a measurement with &lt;a href=&quot;https://www.debugbear.com/test/website-speed&quot;&gt;DebugBear&lt;/a&gt;, you can see a “Blocking” indicator next to every resource that is render-blocking . They also illustrate the priority of each request as well as the domain name.  Similar to the previous examples, if you see render-blocking content for a third party domain name, then make a note of them so that you can run SPOF tests.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/debugbear-render-blocking-requests.jpg&quot; alt=&quot;Debugbear Render Blocking Requests&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And lastly, you can also see requests via a &lt;a href=&quot;https://developer.chrome.com/docs/lighthouse/overview&quot;&gt;Lighthouse&lt;/a&gt; test. As with the other examples, just look at the list of requests and determine which of these are third parties.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/lighthouse-render-blocking-requests.jpg&quot; alt=&quot;Lighthouse Render Blocking Requests&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;how-to-test-for-third-party-spofs&quot;&gt;How to Test for Third Party SPOFs&lt;/h1&gt;

&lt;p&gt;The most common way of testing for third party single points of failure is to identify which content is render-blocking, and then to test what would happen if that content fails. For years, the easiest way of testing for SPOFs was via WebPageTest as it had a SPOF simulation feature. Today, there are a handful of other methods of testing for them and below I’ll show a few examples.&lt;/p&gt;

&lt;h2 id=&quot;webpagetest-private-instance&quot;&gt;WebPageTest Private Instance&lt;/h2&gt;

&lt;p&gt;In my earlier example I used a WebPageTest private instance, which has a feature that simulates a SPOF. When you add a hostname to the list, it will run two tests - one without any changes and the other with the hostname routed to a server that silently drops the request (simulating a failure). The results of this test show two filmstrips, and you can easily determine whether the page rendering is stalled when the third party failed to load.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/wpt-private-instance-spof-test.jpg&quot; alt=&quot;WebPageTest Private Instance SPOF Test&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/wpt-private-instance-spof-results.jpg&quot; alt=&quot;WebPageTest Private Instance SPOF Results&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;catchpoint-webpagetest&quot;&gt;Catchpoint WebPageTest&lt;/h2&gt;

&lt;p&gt;There is also a SPOF feature in the &lt;a href=&quot;https://webpagetest.org&quot;&gt;public WebPagetest&lt;/a&gt; instance hosted by Catchpoint. At the time of writing this, the SPOF feature is not currently working. Once the feature is fixed, it should work the same as the above example. Until then you can still use WebPageTest to simulate a third party failure by using a &lt;a href=&quot;https://docs.webpagetest.org/spof/&quot;&gt;script&lt;/a&gt; that overrides DNS for the third party, pointing it to &lt;code&gt;[blackhole.webagetest.org](http://blackhole.webagetest.org)&lt;/code&gt;. This emulates how the SPOF feature in WebPageTest works.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/catchpoint-wpt-spof-test.jpg&quot; alt=&quot;Catchpoint WebPageTest SPOF Test&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;hosts-file-entries&quot;&gt;Hosts File Entries&lt;/h2&gt;

&lt;p&gt;You can use a hosts file to override DNS and route one or more third parties to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blackhole.webpagetest.org&lt;/code&gt;. In order to do this, look up the IP address of the blackhole hostname. At the time of writing, it resolved to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3.219.212.117&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next you can update your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/hosts&lt;/code&gt; file (Mac) or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C:\Windows\System32\drivers\etc\hosts&lt;/code&gt; file (Windows). You can add one or more third party hostnames. Once the browser picks up the new hosts file entries, you’ll be able to test for failure by simply browsing to the site you are testing.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;3.219.212.117 cdn.cookielaw.org cdn.optimizely.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;chrome-devtools&quot;&gt;Chrome DevTools&lt;/h2&gt;

&lt;p&gt;Chrome DevTools has recently added an &lt;a href=&quot;https://developer.chrome.com/blog/throttle-individual-network-requests?hl=en&quot;&gt;individual request throttling feature&lt;/a&gt; in Chrome. This feature should be enabled with Chrome 144, but if you are using an earlier version (or Chrome Canary) then you can enable an experimental flag to allow individual request throttling. You can find this flag at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;chrome://flags/#devtools-individual-request-throttling&lt;/code&gt;. DebugBear shared a great blog post about how to use this feature &lt;a href=&quot;https://www.debugbear.com/blog/chrome-devtools-throttle-individual-request&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/chrome-devtools-request-throttling-flag.jpg&quot; alt=&quot;Chrome DevTools Request Throttling Flag&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Once enabled, you’ll be able to throttle a request or domain by selecting a network profile for the content. The default options are Block, Fast 4G, Slow 4G and 3G.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/chrome-devtools-request-throttling.jpg&quot; alt=&quot;Chrome DevTools Request Throttling&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To simulate a single point of failure, I wanted something even slower than 3G. So in the network throttling profile section, I defined a profile that adds 10 seconds of latency. When you refresh the page with this profile, you can experience the site in your browser as if the third party was struggling to load the content.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/chrome-devtools-request-throttling-profile.jpg&quot; alt=&quot;Chrome DevTools Request Throttling Profile&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Try configuring the network throttling for a render blocking third party, and then go to the performance panel in DevTools and capture a trace with filmstrips.    If you see a long gap in page rendering, then you know you have a SPOF! Here’s an example from a popular US ecommerce site I found in the HTTP Archive. It’s loading a JavaScript request for a consent management service at &lt;a href=&quot;cdn-ukwest.onetrust.com&quot;&gt;cdn-ukwest.onetrust.com&lt;/a&gt;. If I slow that down by 10 seconds and measure the page load, I can see that the FCP is also delayed by the same.  Simply adding the async script tag for this third party eliminates the risk of a SPOF!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-single-points-of-failure/chrome-devtools-request-throttling-spof-test.jpg&quot; alt=&quot;Chrome DevTools Request Throttling SPOF Test&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;Third Party Single Points of Failure have been talked about for 15 years now. The problem is well understood, and there’s lots of guidance and testing approaches around this. However, that hasn’t stopped 67% of websites from adding third parties in this way!&lt;/p&gt;

&lt;p&gt;Cloud provider and CDN outages happen, and it’s unfortunate for all involved when they do. However, someone else’s cloud provider outage shouldn’t take down your website. I highly recommend auditing your third party domains to ensure that a slowdown or failure will not result in a disruption of service on your websites. Beyond that, as a general preventive measure - try to self host as much of your render-blocking content as possible.&lt;/p&gt;

&lt;p&gt;This article represents my own views and opinions and not those of Etsy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at https://calendar.perfplanet.com/2025/third-parties-and-single-points-of-failure/&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP Archive queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This section provides some details on how this analysis was performed, including SQL queries. Please be warned that some of the SQL queries process a significant amount of bytes - which can be very expensive to run.&lt;/p&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Number of sites containing render blocking third parties&lt;/b&gt;&lt;/summary&gt;
   This query counts the number of sites that contain a render blocking third party, and also whether they are using a different CDN (or no CDN)!
  &lt;pre&gt;&lt;code&gt;
   SELECT
    &quot;Number of Sites&quot; AS category,
    COUNT(DISTINCT page) AS sites
  FROM
    `httparchive.crawl.pages` 
  WHERE
    date = &quot;2025-12-01&quot;
    AND is_root_page = TRUE 
    AND client = &quot;mobile&quot; 
  

UNION ALL

 SELECT
    &quot;Sites with a Render Blocking Third Party&quot; AS category,
    COUNT(DISTINCT page) AS sites
  FROM
    `httparchive.crawl.requests`
  WHERE
    date = &quot;2025-12-01&quot;
    AND is_root_page = TRUE 
    AND client = &quot;mobile&quot; 
    AND NET.REG_DOMAIN(page) != NET.REG_DOMAIN(url)
    AND JSON_VALUE(payload._renderBlocking) = &quot;blocking&quot;

UNION ALL


 SELECT
    &quot;Sites with a Render Blocking Third Party on a different CDN&quot; AS category,
    COUNT(DISTINCT p.page) AS sites
  FROM
    `httparchive.crawl.requests` AS r
    INNER JOIN `httparchive.crawl.pages`  AS p
    ON r.page = p.page
  WHERE
    r.date = &quot;2025-12-01&quot; AND p.date = &quot;2025-12-01&quot;
    AND r.is_root_page = TRUE AND p.is_root_page = TRUE
    AND r.client = &quot;mobile&quot; AND p.client = &quot;mobile&quot;
    AND JSON_VALUE(r.payload._renderBlocking) = &quot;blocking&quot;
    AND NET.REG_DOMAIN(p.page) != NET.REG_DOMAIN(r.url)
    AND JSON_VALUE(p.summary.cdn) != JSON_VALUE(r.payload._cdn_provider)

UNION ALL

SELECT
    &quot;Sites with a Render Blocking Third Party not using a CDN&quot; AS category,
    COUNT(DISTINCT p.page) AS sites
  FROM
    `httparchive.crawl.requests` AS r
    INNER JOIN `httparchive.crawl.pages`  AS p
    ON r.page = p.page
  WHERE
    r.date = &quot;2025-12-01&quot; AND p.date = &quot;2025-12-01&quot;
    AND r.is_root_page = TRUE AND p.is_root_page = TRUE
    AND r.client = &quot;mobile&quot; AND p.client = &quot;mobile&quot;
    AND JSON_VALUE(r.payload._renderBlocking) = &quot;blocking&quot;
    AND NET.REG_DOMAIN(p.page) != NET.REG_DOMAIN(r.url)
    AND (
        JSON_VALUE(r.payload._cdn_provider) IS NULL
        OR JSON_VALUE(r.payload._cdn_provider) = &quot;&quot;)

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Third Party Render Blocking Requests by Content Type&lt;/b&gt;&lt;/summary&gt;
  This summarizes the request types that are used for render blocking third party requests.
  &lt;pre&gt;&lt;code&gt;
 SELECT
    r.type AS requestType,
    COUNT(DISTINCT p.page) AS sites,
    COUNT(*) AS requests
  FROM
    `httparchive.crawl.requests` AS r
    INNER JOIN `httparchive.crawl.pages`  AS p
    ON r.page = p.page
  WHERE
    r.date = &quot;2025-12-01&quot; AND p.date = &quot;2025-12-01&quot;
    AND r.is_root_page = TRUE AND p.is_root_page = TRUE
    AND r.client = &quot;mobile&quot; AND p.client = &quot;mobile&quot;
    AND JSON_VALUE(r.payload._renderBlocking) = &quot;blocking&quot;
    AND NET.REG_DOMAIN(p.page) != NET.REG_DOMAIN(r.url)
    AND JSON_VALUE(p.summary.cdn) != JSON_VALUE(r.payload._cdn_provider)
    AND is_main_document = FALSE
GROUP BY 1
ORDER BY 2 DESC
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Third Party Hostnames w/ Render Blocking Content&lt;/b&gt;&lt;/summary&gt;
  This query breaks down popular third party hostnames that are being used for render blocking content, whith a focus on requests that use a different CDN from the primary website. 
  &lt;pre&gt;&lt;code&gt;

 SELECT
    NET.HOST(url) as hostname,
    r.type AS requestType,
    COUNT(DISTINCT p.page) AS sites,
    COUNT(*) AS requests
  FROM
    `httparchive.crawl.requests` AS r
    INNER JOIN `httparchive.crawl.pages`  AS p
    ON r.page = p.page
  WHERE
    r.date = &quot;2025-12-01&quot; AND p.date = &quot;2025-12-01&quot;
    -- root pages
    AND r.is_root_page = TRUE AND p.is_root_page = TRUE
    -- mobile
    AND r.client = &quot;mobile&quot; AND p.client = &quot;mobile&quot;
    -- render blocking requests
    AND JSON_VALUE(r.payload._renderBlocking) = &quot;blocking&quot;
    -- with a different domain name from the page
    AND NET.REG_DOMAIN(p.page) != NET.REG_DOMAIN(r.url)
    -- and a different CDN
    AND JSON_VALUE(p.summary.cdn) != JSON_VALUE(r.payload._cdn_provider)
GROUP BY 1,2
ORDER BY 3 DESC 
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;RawGit usage&lt;/b&gt;&lt;/summary&gt;
  &lt;pre&gt;&lt;code&gt;
 
 SELECT
    COUNT(DISTINCT page)
  FROM
    `httparchive.crawl.requests` AS r    
  WHERE
    date = &quot;2025-11-01&quot;
    AND is_root_page = TRUE
    AND client = &quot;mobile&quot; 
  AND NET.HOST(url) LIKE &quot;%rawgit.com&quot;

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Polyfill.io Usage&lt;/b&gt;&lt;/summary&gt;
  &lt;pre&gt;&lt;code&gt;
 
 SELECT
    date,
    COUNT(DISTINCT page)
  FROM
    `httparchive.crawl.requests` AS r
    
  WHERE
    date BETWEEN &quot;2024-01-01&quot; AND &quot;2025-01-01&quot;
    AND is_root_page = TRUE
    AND client = &quot;mobile&quot; 
  AND NET.HOST(url) LIKE &quot;%polyfill.io&quot;
GROUP BY date
ORDER BY date
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;RUM Archive stats for Render Blocking Third Parties&lt;/b&gt;&lt;/summary&gt;
  &lt;pre&gt;&lt;code&gt;
 SELECT 
  NET.HOST(url) AS hostname, 
  SUM(fetches) AS freq,

  -- DNS
  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(DNSHISTOGRAM),
    [0.75],
    10,
    false
  )) / 1000 AS dns_p75,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(DNSHISTOGRAM),
    [0.95],
    10,
    false
  )) / 1000 AS dns_p95,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(DNSHISTOGRAM),
    [0.99],
    10,
    false
  )) / 1000 AS dns_p99,

  -- TCP
  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TCPHISTOGRAM),
    [0.75],
    10,
    false
  )) / 1000 AS tcp_p75,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TCPHISTOGRAM),
    [0.95],
    10,
    false
  )) / 1000 AS tcp_p95,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TCPHISTOGRAM),
    [0.99],
    10,
    false
  )) / 1000 AS tcp_p99,

  -- TLS
  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TLSHISTOGRAM),
    [0.75],
    10,
    false
  )) / 1000 AS tls_p75,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TLSHISTOGRAM),
    [0.95],
    10,
    false
  )) / 1000 AS tls_p95,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TLSHISTOGRAM),
    [0.99],
    10,
    false
  )) / 1000 AS tls_p99,

  -- TTFB
  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TTFBHISTOGRAM),
    [0.75],
    10,
    false
  )) / 1000 AS ttfb_p75,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TTFBHISTOGRAM),
    [0.95],
    10,
    false
  )) / 1000 AS ttfb_p95,

  PARSE_NUMERIC(`akamai-mpulse-rumarchive.rumarchive.PERCENTILE_APPROX`(
    ARRAY_AGG(TTFBHISTOGRAM),
    [0.99],
    10,
    false
  )) / 1000 AS ttfb_p99

FROM `akamai-mpulse-rumarchive.rumarchive.rumarchive_resources` 
WHERE date = &quot;2025-12-20&quot; 
  AND NET.HOST(url) IN (
    SELECT
        NET.HOST(url) as hostname,
      FROM
        `httparchive.crawl.requests` AS r
        INNER JOIN `httparchive.crawl.pages`  AS p
        ON r.page = p.page
      WHERE
        r.date = &quot;2025-12-01&quot; AND p.date = &quot;2025-12-01&quot;
        -- root pages
        AND r.is_root_page = TRUE AND p.is_root_page = TRUE
        -- mobile
        AND r.client = &quot;mobile&quot; AND p.client = &quot;mobile&quot;
        -- render blocking requests
        AND JSON_VALUE(r.payload._renderBlocking) = &quot;blocking&quot;
        -- with a different domain name from the page
        AND NET.REG_DOMAIN(p.page) != NET.REG_DOMAIN(r.url)
        -- and a different CDN
        AND JSON_VALUE(p.summary.cdn) != JSON_VALUE(r.payload._cdn_provider)
    GROUP BY 1
    ORDER BY COUNT(DISTINCT p.page) DESC
    )
GROUP BY 1
HAVING SUM(fetches) &amp;gt;= 1000000
ORDER BY 2 DESC;

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">You’ve heard it many times - third party content can easily cause an otherwise well performing website to become sluggish and slow. And depending on how this content is loaded, it can also introduce single points of failure (SPOFs). When a large cloud provider or content delivery network (CDN) experiences a disruption, their impacts are felt across the world and often triggers headlines about the many websites that were affected. However, there are numerous secondary impacts triggered by third party content, which can be disruptive even to companies that don’t use the affected provider.</summary></entry><entry><title type="html">AI Bots and Robots.txt</title><link href="https://paulcalvano.com/2025-08-21-ai-bots-and-robots-txt/" rel="alternate" type="text/html" title="AI Bots and Robots.txt" /><published>2025-08-21T04:00:00+00:00</published><updated>2025-09-01T15:16:23+00:00</updated><id>https://paulcalvano.com/ai-bots-and-robots-txt</id><content type="html" xml:base="https://paulcalvano.com/2025-08-21-ai-bots-and-robots-txt/">&lt;p&gt;There’s been a lot of discussion lately around AI crawlers and bots, which are used to train LLMs and/or fetch content on behalf of their users. In the past few weeks I’ve seen blog posts about the amount of traffic from these crawlers, techniques and products to control how and what they can crawl, reports of misbehaving crawlers and more. Ironically, there’s even AI based services to mitigate AI crawler bots! Given how much interest there is, I thought I’d try and explore some &lt;a href=&quot;https://httparchive.org/&quot; target=&quot;_blank&quot;&gt;HTTP Archive&lt;/a&gt; data to see how sites are using robots.txt to state their preferences on AI crawling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Robots.txt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A robots.txt file is located at the root of an origin, and provides instructions for how bots should interact with a website. This practice began in 1994 and quickly became a &lt;a href=&quot;https://www.robotstxt.org/robotstxt.html&quot; target=&quot;_blank&quot;&gt;de facto standard&lt;/a&gt;. Years later, Google introduced the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9309&quot; target=&quot;_blank&quot;&gt;Robots Exclusion Protocol&lt;/a&gt;, which was standardized in 2022.&lt;/p&gt;

&lt;p&gt;A simple example of a robots.txt directive is below. This directive tells a User-Agent with the string &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GPTBot&lt;/code&gt; that it is not permitted to crawl the site.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;User-Agent: GPTBot
Disallow: /
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It’s important to note that robots.txt does not restrict access to bots by itself, as adherence to it is voluntary. But analyzing the content of these files can provide some insight on the overall sentiment towards AI bots.&lt;/p&gt;

&lt;h2 id=&quot;how-many-sites-use-robotstxt-files&quot;&gt;How Many Sites use Robots.txt Files?&lt;/h2&gt;

&lt;p&gt;The HTTP Archive collects page details from millions of websites each month, and uses a custom metric to fetch a robots.txt file from each site. The data from July 2025 shows that 94% of 12 million websites have a robots.txt file containing at least 1 directive. The HTTP Archive’s &lt;a href=&quot;https://almanac.httparchive.org/&quot; target=&quot;_blank&quot;&gt;Web Almanac&lt;/a&gt; has an entire &lt;a href=&quot;https://almanac.httparchive.org/en/2024/seo&quot; target=&quot;_blank&quot;&gt;chapter on SEO&lt;/a&gt;, containing more details around the contents of robots.txt - and is definitely worth a read.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;Sites&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;% of Sites&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Serves a Robots.txt file&lt;/td&gt;
   &lt;td&gt;12,155,217&lt;/td&gt;
   &lt;td&gt;94.12%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;No Robots.txt&lt;/td&gt;
   &lt;td&gt;759,409&lt;/td&gt;
   &lt;td&gt;5.88%&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Some bots choose to identify themselves in the User-Agent string of an HTTP request. Others attempt to hide what they are doing, which is one of the reasons why many sites utilize services to block unwanted traffic. This research will focus on robots.txt files, which tell us whether site owners would like to restrict AI bots based on their advertised User-Agent strings.&lt;/p&gt;

&lt;p&gt;Many AI services publish the User-Agent strings for this purpose, and also provide guidance on how they adhere to robots.txt directives. Additionally it’s commone for a service to advertise multiple User-Agents since they can be used for different purposes (crawling, responding to user input, etc).&lt;/p&gt;

&lt;p&gt;For example&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://platform.openai.com/docs/bots&quot; target=&quot;_blank&quot;&gt;ChatGPT&lt;/a&gt; advertises themselves as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ChatGPT-User&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GPTBot&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OAI-SearchBot&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler&quot; target=&quot;_blank&quot;&gt;Anthropic&lt;/a&gt; advertises themselves as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ClaudeBot&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Claude-User&lt;/code&gt;, and  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Claude-SearchBot&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://support.apple.com/en-us/119829&quot; target=&quot;_blank&quot;&gt;Apple&lt;/a&gt; advertises themselves as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Applebot-Extended&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are many more AI agents, with more showing up all the time - and unfortunately not all of them respect the robots.txt directives. For this research, I’ve used a list of AI Bots from the &lt;a href=&quot;https://github.com/ai-robots-txt/ai.robots.txt&quot; target=&quot;_blank&quot;&gt;AI Robots.txt Github repository&lt;/a&gt; to determine which robots.txt entries are targeted towards the various AI services.&lt;/p&gt;

&lt;h2 id=&quot;user-agents-referenced-in-robotstxt&quot;&gt;User-Agents Referenced in Robots.txt&lt;/h2&gt;

&lt;p&gt;The most popular User-Agent referenced in robots.txt files is simply a wildcard &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*&lt;/code&gt;. In fact 97.4% of robots.txt files have at least 1 directive using this wildcard, often to allow and/or disallow bots to part or all of their site’s content to all bots. Typically the next most frequent group of User-Agents are for search bots (googlebot, bingbot, etc) and SEO bots (mj12bot, ahrefsbot, semrushbot, etc).&lt;/p&gt;

&lt;p&gt;Over the past few years, User-Agents for AI Bots have been added to many sites’ robots.txt files. As of July 2025, AI Bots top the list of User Agents referenced across popular sites. In fact almost 21% of the top 1000 websites have rules for ChatGPT’s “GPTBot” in their robots.txt file. There’s an interesting pattern shift around site popularity, with a greater percentage of popular sites having AI bot directives vs a large percentage of SEO bot directives on less popular sites.&lt;/p&gt;

&lt;p&gt;The table below breaks this out by site popularity (using Google’s &lt;a href=&quot;https://developer.chrome.com/docs/crux/methodology/metrics#popularity-metric&quot; target=&quot;_blank&quot;&gt;CrUX rank&lt;/a&gt;). In the darker shaded areas of this table, you can see many references to bots operated by popular AI services - ChatGPT, Claude, Google, Perplexity, Anthropic, etc.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/ai-bots-and-robots-txt/pct-sites-user-agents-in-robotstxt.jpg&quot; alt=&quot;Percent of Sites Referencing Specific User-Agents in robots.txt files&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;when-did-sites-start-adding-ai-crawlers-to-robotstxt&quot;&gt;When Did Sites Start Adding AI Crawlers to robots.txt&lt;/h2&gt;

&lt;p&gt;In August 2023, ChatGPT added &lt;a href=&quot;https://platform.openai.com/docs/bots&quot; target=&quot;_blank&quot;&gt;documentation&lt;/a&gt; about its crawler, including instructions on how site owners can block them. Shortly afterwards, articles with &lt;a href=&quot;https://www.theverge.com/2023/8/7/23823046/openai-data-scrape-block-ai&quot; target=&quot;_blank&quot;&gt;instructions&lt;/a&gt; on how to block ChatGPT started appearing, it was discussed in &lt;a href=&quot;https://news.ycombinator.com/item?id=37030568&quot; target=&quot;_blank&quot;&gt;Hacker News&lt;/a&gt; and there were also claims that some sites were &lt;a href=&quot;https://arstechnica.com/information-technology/2023/08/openai-details-how-to-keep-chatgpt-from-gobbling-up-website-data/&quot; target=&quot;_blank&quot;&gt;scrambling to block&lt;/a&gt; AI agents.  While the claims may have sounded sensational, the numbers supported it.  In August 2023 the number of sites that included rules for GPTBot in robots.txt files went from 0 to almost 125k sites! A month later it was 299k sites. By November GPTBot was referenced on 578k websites! That’s a massive increase in a short period of time.&lt;/p&gt;

&lt;p&gt;In the tables below you can see the number of websites referencing specific User-Agents in their robots.txt files month to month. Claudebot first appeared in December 2023 on just 2,382 sites (increasing to 30k within 4 months) and PerplexityBot appeared in January 2024 with just 157 sites (increasing to 31k in April 2024). These were not picked up as quickly as GPTBot, which may have been due to limited public awareness of the rise of these models.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/ai-bots-and-robots-txt/user-agents-in-robotstxt-2023.jpg&quot; alt=&quot;User Agents Referenced in robots.txt files - 2023&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Throughout 2024 you can see that more and more sites include directives for the AI bots. Apple’s &lt;a href=&quot;https://support.apple.com/en-us/119829&quot; target=&quot;_blank&quot;&gt;crawler&lt;/a&gt; was revealed in May 2024, and news reports about it started showing up in June.  By September there was almost 262k sites including it in their robots.txt files.   Perplexity and Claude also started appearing in over 100k site’s robot.txt in May 2024. And a handful of new ones started appearing as they gained popularity.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/ai-bots-and-robots-txt/user-agents-in-robotstxt-2024.jpg&quot; alt=&quot;User Agents Referenced in robots.txt files - 2024&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That brings us to 2025 where you can see that ChatGPT, Claude, Facebook and others appear in the robots.txt files of over 560k sites! A few newer bots started showing up as well - belonging to Meta, DuckDuckGo, and Quora. This might be due to some services and platforms updating robots.txt files automatically, but it could also be due to more awareness as there have been frequent articles about AI crawlers and bots throughout 2025.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/ai-bots-and-robots-txt/user-agents-in-robotstxt-2025.jpg&quot; alt=&quot;User Agents Referenced in robots.txt files - 2025&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can explore this data more in &lt;a href=&quot;https://public.tableau.com/views/UserAgentsAppearinginRobots_txtFiles-HTTPArchiveJanuary2023-July2025/Sheet1?:language=en-US&amp;amp;publish=yes&amp;amp;:sid=&amp;amp;:redirect=auth&amp;amp;:display_count=n&amp;amp;:origin=viz_share_link&quot; target=&quot;_blank&quot;&gt;this interactive visualization&lt;/a&gt;. There are many more User-Agents than what I listed here, and if you scroll down you might spot some newer ones.&lt;/p&gt;

&lt;h2 id=&quot;to-allow-or-not-to-allow&quot;&gt;To Allow or Not to Allow…&lt;/h2&gt;

&lt;p&gt;So far we’ve looked at how many websites are including directives for various User-Agents, and we’ve observed substantial growth in the references to AI crawlers and bots. But we’ve also observed a difference in how sites are using them based on popularity. Now let’s look at the types of rules being applied to them.&lt;/p&gt;

&lt;p&gt;It’s not uncommon for rules to exist both allowing a bot and disallowing certain paths.   So the presence of both could indicate that site owners want to allow those bots to access portions of their sites.    For example on Vimeo’s robots.txt file, you can find&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Block Open AI - Allowing specific marketing/informational paths
User-agent: GPTBot
Disallow: /
Allow: /features/
Allow: /solutions/
Allow: /enterprise/
Allow: /integrations/
Allow: /blog/
Allow: /create/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Some sites look to restrict AI traffic, such as this excerpt from Healthline’s robots.txt file -&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;User-agent: GPTBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ClaudeBot
Disallow: /
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This graph below shows AI Crawler directives for popular websites. From this data we can see that it’s very common for popular websites to attempt to restrict access to AI bots.  However, they are not uniform in the bots that they choose to disallow - most of the time only including directives for the most popular AI services.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/ai-bots-and-robots-txt/ai-crawler-directives-popularsites.jpg&quot; alt=&quot;AI Crawler Directives for Popular Sites&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If we look at this across all sites a new pattern emerges. We can see a lot more sites that have both allow and disallow directives for AI agents. This may be due to large platforms automatically updating these agents in their customer’s robots.txt file. For example, &lt;a href=&quot;https://support.squarespace.com/hc/en-us/articles/360022347072-Request-that-AI-models-exclude-your-site&quot; target=&quot;_blank&quot;&gt;Squarespace&lt;/a&gt; and &lt;a href=&quot;https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/&quot; target=&quot;_blank&quot;&gt;Cloudflare&lt;/a&gt; both have solutions that appear to automatically add AI crawlers to a site’s robots.txt file.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/ai-bots-and-robots-txt/ai-crawler-directives-allsites.jpg&quot; alt=&quot;AI Crawler Directives for Popular Sites&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can explore this data more in these interacitive visualizations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://public.tableau.com/app/profile/paul.calvano8666/viz/AICrawlerrobots_txtfileDirectivesforPopularSites/Sheet1?publish=yes&quot; target=&quot;_blank&quot;&gt;Popular Sites&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://public.tableau.com/views/AICrawlerrobots_txtfileDirectivesforAllSites/Sheet12?:language=en-US&amp;amp;:sid=&amp;amp;:redirect=auth&amp;amp;:display_count=n&amp;amp;:origin=viz_share_link&quot; target=&quot;_blank&quot;&gt;All Sites&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI crawlers and bots have become a significant source of synthetic traffic for many sites, and awareness has been growing over the last few years. More and more bots are being introduced which makes it a challenge to keep up. It’s interesting to see how trends in news cause major shifts in website strategy across the web. Often new technologies or features are adopted at a gradual rate. The appearance of the AI bot user agents in so many websites over a short period of time reflects the general sentiment that site owners have towards the scraping of their content for training models.&lt;/p&gt;

&lt;p&gt;However, there isn’t uniformity in how AI bots are being treated across the web - especially when compared to the month to month consistency of the SEO and Search agents. The most popular AI services bots appear in more robots.txt files because they have been noticed, and newer ones aren’t picked up as quickly. One can interpret this to show that although the intent to manage how these bots crawl websites is clear, the effectiveness for an individual site is limited to how well each site maintains the list of bots and crawlers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP Archive queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This section provides some details on how this analysis was performed, including SQL queries. Please be warned that some of the SQL queries process a significant amount of bytes - which can be very expensive to run.&lt;/p&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Number of sites containing robots.txt files&lt;/b&gt;&lt;/summary&gt;
   This query counts the number of websites that contain a robots.txt file. In order to ensure that we are counting an actual robots.txt file and not error pages, this query only counts robots.txt files that return an HTTP 200 status code and contains one of the following directives: allow, disallow, crawl_delay, noindex, sitemap or user_agent. 
  &lt;pre&gt;&lt;code&gt;
SELECT
 sites,
 sites_with_robots_txt,
 ROUND(sites_with_robots_txt / sites,4) AS pct_sites_with_robots_txt
FROM (
 SELECT
 COUNT(*) AS sites,
 COUNTIF(CAST(JSON_VALUE(custom_metrics.robots_txt.record_counts.by_type, &quot;$.allow&quot;) AS INT64)
   + CAST(JSON_VALUE(custom_metrics.robots_txt.record_counts.by_type, &quot;$.disallow&quot;) AS INT64)
   + CAST(JSON_VALUE(custom_metrics.robots_txt.record_counts.by_type, &quot;$.crawl_delay&quot;) AS INT64)
   + CAST(JSON_VALUE(custom_metrics.robots_txt.record_counts.by_type, &quot;$.noindex&quot;) AS INT64)
   + CAST(JSON_VALUE(custom_metrics.robots_txt.record_counts.by_type, &quot;$.sitemap&quot;) AS INT64) 
   + CAST(JSON_VALUE(custom_metrics.robots_txt.record_counts.by_type, &quot;$.user_agent&quot;) AS INT64) &amp;gt; 0)
   AS sites_with_robots_txt
 FROM `httparchive.crawl.pages` AS pages
 WHERE date = &quot;2025-07-01&quot;
 AND client = &quot;mobile&quot;
 AND CAST(JSON_VALUE(custom_metrics.robots_txt, &quot;$.status&quot;) AS INT64) = 200
 AND is_root_page = true
)
  
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Percent of Sites with User-Agents Appearing in robots.txt Files, by popularity rank&lt;/b&gt;&lt;/summary&gt;
   The HTTP Archive stores robots.txt information in a custom metrics object.   In this SQL script, we’re UNNESTing each user-agent and then searching for it’s stats within the custom metric.
  &lt;pre&gt;&lt;code&gt;
CREATE TEMP FUNCTION GetByAgent(json STRING, agent STRING)
RETURNS STRING
LANGUAGE js AS r&quot;&quot;&quot;
 try {
   const obj = JSON.parse(json || '{}');
   const byua = (((obj || {}).record_counts || {}).by_useragent) || {};
   const body = byua[agent] || byua[String(agent).toLowerCase()] || byua[String(agent).toUpperCase()];
   return body ? JSON.stringify(body) : null;
 } catch (e) { return null; }
&quot;&quot;&quot;;


WITH robots_txt_ua AS (
 SELECT
   rank,
   page,
   agent,
   GetByAgent(TO_JSON_STRING(custom_metrics.robots_txt), agent) AS agent_obj
 FROM `httparchive.crawl.pages`,
 UNNEST(
   REGEXP_EXTRACT_ALL(
     TO_JSON_STRING(JSON_QUERY(custom_metrics.robots_txt, '$.record_counts.by_useragent')),
     r'&quot;([^&quot;]+)&quot;:\{'
   )
 ) AS agent
 WHERE date = &quot;2025-07-01&quot;
   AND client = &quot;mobile&quot;
   AND is_root_page = TRUE
),
robots_txt_rule_counts_by_ua AS (
 SELECT
   rank,
   page,
   LOWER(agent) AS agent,
   agent_obj,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.allow') AS INT64)        AS allow_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.crawl_delay') AS INT64)  AS crawl_delay_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.disallow') AS INT64)     AS disallow_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.noindex') AS INT64)      AS noindex_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.other') AS INT64)        AS other_cnt
 FROM robots_txt_ua
),
pages_in_rank_group AS (
 SELECT
   rank,
   COUNT(DISTINCT page) AS total_sites
 FROM `httparchive.crawl.pages`
 WHERE
   date = &quot;2025-07-01&quot;
   AND client = &quot;mobile&quot;
   AND is_root_page = TRUE
GROUP BY 1
ORDER BY 1
)

SELECT
 robots_txt_rule_counts_by_ua.rank,
 agent,
 total_sites,
 COUNT(DISTINCT page) AS sites
FROM robots_txt_rule_counts_by_ua
LEFT JOIN pages_in_rank_group
ON robots_txt_rule_counts_by_ua.rank = pages_in_rank_group.rank
GROUP BY 1,2,3
ORDER BY 1,4 DESC  
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;2023-2025 Trend of Sites with User-Agents Appearing in robots.txt Files&lt;/b&gt;&lt;/summary&gt;
   This query builds on top of the previous one, but runs against multiple dates.
   &lt;p /&gt;
   &lt;b&gt;Warning&lt;/b&gt;: As of July 2025, this SQL query processes approximately 330GB of data. Running multiple queries like this can be costly.
  &lt;pre&gt;&lt;code&gt;
CREATE TEMP FUNCTION GetByAgent(json STRING, agent STRING)
RETURNS STRING
LANGUAGE js AS r&quot;&quot;&quot;
 try {
   const obj = JSON.parse(json || '{}');
   const byua = (((obj || {}).record_counts || {}).by_useragent) || {};
   const body = byua[agent] || byua[String(agent).toLowerCase()] || byua[String(agent).toUpperCase()];
   return body ? JSON.stringify(body) : null;
 } catch (e) { return null; }
&quot;&quot;&quot;;


WITH robots_txt_ua AS (
 SELECT
   date,
   page,
   rank,
   agent,
   GetByAgent(TO_JSON_STRING(custom_metrics.robots_txt), agent) AS agent_obj
 FROM `httparchive.crawl.pages`,
 UNNEST(
   REGEXP_EXTRACT_ALL(
     TO_JSON_STRING(JSON_QUERY(custom_metrics.robots_txt, '$.record_counts.by_useragent')),
     r'&quot;([^&quot;]+)&quot;:\{'
   )
 ) AS agent
 WHERE
   date &amp;gt;= &quot;2023-01-01&quot;
   AND client = &quot;mobile&quot;
   AND is_root_page = TRUE
)

SELECT
 date,
 agent,
 COUNT(DISTINCT page) AS sites
FROM robots_txt_rule_counts_by_ua
GROUP BY 1,2
HAVING COUNT(DISTINCT page) &amp;gt; 100
ORDER BY 1,3 DESC  
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;AI Bot Directives&lt;/b&gt;&lt;/summary&gt;
    This query uses the [AI Robots.txt Github repository](https://github.com/ai-robots-txt/ai.robots.txt){:target=&quot;_blank&quot;} to classify User-Agents by AI service.   Is then counts the number of sites that include different directives and combinations of directives in their robots.txt files
  &lt;pre&gt;&lt;code&gt;
CREATE TEMP FUNCTION GetByAgent(json STRING, agent STRING)
RETURNS STRING
LANGUAGE js AS r&quot;&quot;&quot;
 try {
   const obj = JSON.parse(json || '{}');
   const byua = (((obj || {}).record_counts || {}).by_useragent) || {};
   const body = byua[agent] || byua[String(agent).toLowerCase()] || byua[String(agent).toUpperCase()];
   return body ? JSON.stringify(body) : null;
 } catch (e) { return null; }
&quot;&quot;&quot;;


WITH bots AS (
 SELECT 'AddSearchBot' AS name,'Unclear at this time.' AS operator,'Unclear at this time.' AS respect_robotstxt,'AI Search Crawlers' AS `function`
 UNION ALL SELECT 'AI2Bot','Ai2','Yes','Content is used to train open language models.'
 UNION ALL SELECT 'Ai2Bot-Dolma','Ai2','Yes','Content is used to train open language models.'
 UNION ALL SELECT 'aiHitBot','aiHit','Yes','A massive, artificial intelligence/machine learning, automated system.'
 UNION ALL SELECT 'Amazonbot','Amazon','Yes','Service improvement and enabling answers for Alexa users.'
 UNION ALL SELECT 'Andibot','Andi','Unclear at this time','Search engine using generative AI, AI Search Assistant'
 UNION ALL SELECT 'anthropic-ai','Anthropic','Unclear at this time.','Scrapes data to train Anthropics AI products.'
 UNION ALL SELECT 'Applebot','Apple','Unclear at this time.','AI Search Crawlers'
 UNION ALL SELECT 'Applebot-Extended','Apple','Yes','Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.'
 UNION ALL SELECT 'Awario','Awario','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'bedrockbot','Amazon','Yes','Data scraping for custom AI applications.'
 UNION ALL SELECT 'bigsur.ai','Big Sur AI','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'Brightbot 1.0','Browsing.ai','Unclear at this time.','LLM/AI training.'
 UNION ALL SELECT 'Bytespider','ByteDance','No','LLM training.'
 UNION ALL SELECT 'CCBot','Common Crawl Foundation','Yes','Provides open crawl dataset, used for many purposes, including Machine Learning/AI.'
 UNION ALL SELECT 'ChatGPT Agent','OpenAI','Yes','AI Agents'
 UNION ALL SELECT 'ChatGPT-User','OpenAI','Yes','Takes action based on user prompts.'
 UNION ALL SELECT 'Claude-SearchBot','Anthropic','Yes','Claude-SearchBot navigates the web to improve search result quality.'
 UNION ALL SELECT 'Claude-User','Anthropic','Yes','Claude-User supports Claude AI users by fetching pages for questions.'
 UNION ALL SELECT 'Claude-Web','Anthropic','Unclear at this time.','Undocumented AI Agents'
 UNION ALL SELECT 'ClaudeBot','Anthropic','Yes','Scrapes data to train Anthropics AI products.'
 UNION ALL SELECT 'CloudVertexBot','Unclear at this time.','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'cohere-ai','Cohere','Unclear at this time.','Retrieves data to provide responses to user-initiated prompts.'
 UNION ALL SELECT 'cohere-training-data-crawler','Cohere','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'Cotoyogi','ROIS','Yes','AI LLM Scraper.'
 UNION ALL SELECT 'Crawlspace','Crawlspace','Yes','Scrapes data'
 UNION ALL SELECT 'Datenbank Crawler','Datenbank','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'Devin','Devin AI','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'Diffbot','Diffbot','At the discretion of Diffbot users.','Aggregates structured web data for monitoring and AI model training.'
 UNION ALL SELECT 'DuckAssistBot','Unclear at this time.','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'Echobot Bot','Echobox','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'EchoboxBot','Echobox','Unclear at this time.','Data collection to support AI-powered products.'
 UNION ALL SELECT 'FacebookBot','Meta/Facebook','Yes','Training language models'
 UNION ALL SELECT 'facebookexternalhit','Meta/Facebook','No','Ostensibly only for sharing, but likely used as an AI crawler as well'
 UNION ALL SELECT 'Factset_spyderbot','Factset','Unclear at this time.','AI model training.'
 UNION ALL SELECT 'FirecrawlAgent','Firecrawl','Yes','AI scraper and LLM training'
 UNION ALL SELECT 'FriendlyCrawler','Unknown','Yes','We are using the data from the crawler to build datasets for machine learning experiments.'
 UNION ALL SELECT 'Gemini-Deep-Research','Google','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'Google-CloudVertexBot','Google','Yes','Build and manage AI models for businesses employing Vertex AI'
 UNION ALL SELECT 'Google-Extended','Google','Yes','LLM training.'
 UNION ALL SELECT 'GoogleAgent-Mariner','Google','Unclear at this time.','AI Agents'
 UNION ALL SELECT 'GoogleOther','Google','Yes','Scrapes data.'
 UNION ALL SELECT 'GoogleOther-Image','Google','Yes','Scrapes data.'
 UNION ALL SELECT 'GoogleOther-Video','Google','Yes','Scrapes data.'
 UNION ALL SELECT 'GPTBot','OpenAI','Yes','Scrapes data to train OpenAIs products.'
 UNION ALL SELECT 'iaskspider/2.0','iAsk','No','Crawls sites to provide answers to user queries.'
 UNION ALL SELECT 'ICC-Crawler','NICT','Yes','Scrapes data to train and support AI technologies.'
 UNION ALL SELECT 'ImagesiftBot','ImageSift','Yes','Scrapes the internet for publicly available images.'
 UNION ALL SELECT 'img2dataset','img2dataset','Unclear at this time.','Scrapes images for use in LLMs.'
 UNION ALL SELECT 'ISSCyberRiskCrawler','ISS-Corporate','No','Scrapes data to train machine learning models.'
 UNION ALL SELECT 'Kangaroo Bot','Unclear at this time.','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'LinerBot','Unclear at this time.','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'meta-externalagent','Meta/Facebook','Yes','Used to train models and improve products.'
 UNION ALL SELECT 'meta-externalfetcher','Meta/Facebook','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'MistralAI-User','Mistral','Unclear at this time.','AI Assistants'
 UNION ALL SELECT 'MistralAI-User/1.0','Mistral AI','Yes','Takes action based on user prompts.'
 UNION ALL SELECT 'MyCentralAIScraperBot','Unclear at this time.','Unclear at this time.','AI data scraper'
 UNION ALL SELECT 'netEstate Imprint Crawler','netEstate','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'NovaAct','Unclear at this time.','Unclear at this time.','AI Agents'
 UNION ALL SELECT 'OAI-SearchBot','OpenAI','Yes','Search result generation.'
 UNION ALL SELECT 'omgili','Webz.io','Yes','Data is sold.'
 UNION ALL SELECT 'omgilibot','Webz.io','Yes','Data is sold.'
 UNION ALL SELECT 'OpenAI','OpenAI','Yes','Unclear at this time.'
 UNION ALL SELECT 'Operator','Unclear at this time.','Unclear at this time.','AI Agents'
 UNION ALL SELECT 'PanguBot','Huawei','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'Panscient','Panscient','Yes','Data collection and analysis using machine learning and AI.'
 UNION ALL SELECT 'panscient.com','Panscient','Yes','Data collection and analysis using machine learning and AI.'
 UNION ALL SELECT 'Perplexity-User','Perplexity','No','Used to answer queries at the request of users.'
 UNION ALL SELECT 'PerplexityBot','Perplexity','Yes','Search result generation.'
 UNION ALL SELECT 'PetalBot','Huawei','Yes','Used to provide recommendations in Hauwei assistant and AI search services.'
 UNION ALL SELECT 'PhindBot','phind','Unclear at this time.','AI-enhanced search engine.'
 UNION ALL SELECT 'Poseidon Research Crawler','Poseidon Research','Unclear at this time.','AI research crawler'
 UNION ALL SELECT 'QualifiedBot','Qualified','Unclear at this time.','Company offers AI agents and other related products.'
 UNION ALL SELECT 'QuillBot','QuillBot','Unclear at this time.','Company offers AI detection, writing tools and other services.'
 UNION ALL SELECT 'quillbot.com','QuillBot','Unclear at this time.','Company offers AI detection, writing tools and other services.'
 UNION ALL SELECT 'SBIntuitionsBot','SB Intuitions','Yes','Uses data gathered in AI development and information analysis.'
 UNION ALL SELECT 'Scrapy','Zyte','Unclear at this time.','Scrapes data for a variety of uses including training AI.'
 UNION ALL SELECT 'SemrushBot-OCOB','Semrush','Yes','Crawls your site for ContentShake AI tool.'
 UNION ALL SELECT 'SemrushBot-SWA','Semrush','Yes','Checks URLs on your site for SEO Writing Assistant.'
 UNION ALL SELECT 'Sidetrade indexer bot','Sidetrade','Unclear at this time.','Extracts data for a variety of uses including training AI.'
 UNION ALL SELECT 'Thinkbot','Thinkbot','No','Insights on AI integration and automation.'
 UNION ALL SELECT 'TikTokSpider','ByteDance','Unclear at this time.','LLM training.'
 UNION ALL SELECT 'Timpibot','Timpi','Unclear at this time.','Scrapes data for use in training LLMs.'
 UNION ALL SELECT 'VelenPublicWebCrawler','Velen Crawler','Yes','Scrapes data for business data sets and machine learning models.'
 UNION ALL SELECT 'WARDBot','WEBSPARK','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'Webzio-Extended','Unclear at this time.','Unclear at this time.','AI Data Scrapers'
 UNION ALL SELECT 'wpbot','QuantumCloud','Unclear at this time.','Live chat support and lead generation.'
 UNION ALL SELECT 'YaK','Meltwater','Unclear at this time.','AI-enabled consumer intelligence'
 UNION ALL SELECT 'YandexAdditional','Yandex','Yes','Scrapes/analyzes data for the YandexGPT LLM.'
 UNION ALL SELECT 'YandexAdditionalBot','Yandex','Yes','Scrapes/analyzes data for the YandexGPT LLM.'
 UNION ALL SELECT 'YouBot','You','Yes','Scrapes data for search engine and LLMs.'
),
robots_txt_ua AS (
 SELECT
   page,
   agent,
   GetByAgent(TO_JSON_STRING(custom_metrics.robots_txt), agent) AS agent_obj
 FROM `httparchive.crawl.pages`,
 UNNEST(
   REGEXP_EXTRACT_ALL(
     TO_JSON_STRING(JSON_QUERY(custom_metrics.robots_txt, '$.record_counts.by_useragent')),
     r'&quot;([^&quot;]+)&quot;:\{'
   )
 ) AS agent
 WHERE date = &quot;2025-07-01&quot;
   AND client = &quot;mobile&quot;
   AND is_root_page = TRUE
),
robots_txt_rule_counts_by_ua AS (
 SELECT
   page,
   LOWER(agent) AS agent,
   agent_obj,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.allow') AS INT64)        AS allow_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.crawl_delay') AS INT64)  AS crawl_delay_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.disallow') AS INT64)     AS disallow_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.noindex') AS INT64)      AS noindex_cnt,
   SAFE_CAST(JSON_VALUE(agent_obj, '$.other') AS INT64)        AS other_cnt
 FROM robots_txt_ua
)




SELECT
 agent,
 operator,
 respect_robotstxt,
 SUM(freq) AS sites,
 SUM(IF(directives=&quot;both allow and disallow&quot;, freq, NULL)) AS both,
 SUM(IF(directives=&quot;allow&quot;, freq, NULL)) AS allow,
 SUM(IF(directives=&quot;disallow&quot;, freq, NULL)) AS disallow,
 SUM(IF(directives=&quot;crawl_delay&quot;, freq, NULL)) AS crawl_delay,
 SUM(IF(directives=&quot;noindex&quot;, freq, NULL)) AS noindex
FROM (
 SELECT
   agent,
   operator,
   respect_robotstxt,
   CASE
     WHEN allow_cnt &amp;gt; 0 AND disallow_cnt &amp;gt; 0 THEN &quot;both allow and disallow&quot;
     WHEN allow_cnt &amp;gt; 0 THEN &quot;allow&quot;
     WHEN disallow_cnt &amp;gt; 0 THEN &quot;disallow&quot;
     WHEN crawl_delay_cnt &amp;gt; 0 THEN &quot;crawl_delay&quot;
     WHEN noindex_cnt &amp;gt; 0 THEN &quot;noindex&quot;
     ELSE &quot;other&quot;
   END as directives,
   COUNT(DISTINCT page) AS freq,
 FROM robots_txt_rule_counts_by_ua r
 JOIN bots b
 ON LOWER(r.agent) = LOWER(b.name)
 GROUP BY 1,2,3,4
 ORDER BY 5 DESC
)
GROUP BY 1,2,3
ORDER BY 4 DESC  
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">There’s been a lot of discussion lately around AI crawlers and bots, which are used to train LLMs and/or fetch content on behalf of their users. In the past few weeks I’ve seen blog posts about the amount of traffic from these crawlers, techniques and products to control how and what they can crawl, reports of misbehaving crawlers and more. Ironically, there’s even AI based services to mitigate AI crawler bots! Given how much interest there is, I thought I’d try and explore some HTTP Archive data to see how sites are using robots.txt to state their preferences on AI crawling.</summary></entry><entry><title type="html">Discovering Third Party Performance Risks</title><link href="https://paulcalvano.com/2024-09-03-discovering-third-party-performance-risks/" rel="alternate" type="text/html" title="Discovering Third Party Performance Risks" /><published>2024-09-03T04:00:00+00:00</published><updated>2024-09-04T11:06:33+00:00</updated><id>https://paulcalvano.com/discovering-third-party-performance-risks</id><content type="html" xml:base="https://paulcalvano.com/2024-09-03-discovering-third-party-performance-risks/">&lt;p&gt;It likely comes as no surprise that third party content can be a significant contributor to slow loading websites and poor user experience. As performance engineers, we often need to find ways to balance requirements for their features with the strain that they can put on user experience. Unfortunately, for many sites this becomes a reaction to slowdowns and failures detected in production.&lt;/p&gt;

&lt;p&gt;I thought it might be interesting to attempt to identify third parties that could pose a performance risk, so that they could be proactively analyzed. That led to building a tool called &lt;a href=&quot;https://tools.paulcalvano.com/wpt-third-party-analysis/&quot;&gt;Third Party Explorer&lt;/a&gt;, which leverages &lt;a href=&quot;https://www.webpagetest.org&quot;&gt;WebpageTest&lt;/a&gt; data to help analyze a third party’s impact on a page load. The idea behind this tool is that some of the insights already collected during a WebPageTest measurement may enable you to prioritize a list of domains to evaluate proactively.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://tools.paulcalvano.com/wpt-third-party-analysis/&quot; loading=&quot;lazy&quot;&gt;&lt;img src=&quot;/assets/img/blog/discovering-third-party-performance-risks/third-party-explorer.jpg&quot; alt=&quot;Third Party Explorer&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Are Third Parties Slow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;According to the &lt;a href=&quot;https://almanac.httparchive.org&quot;&gt;Web Almanac&lt;/a&gt;, &lt;a href=&quot;https://almanac.httparchive.org/en/2022/third-parties#fig-1&quot;&gt;94% of websites&lt;/a&gt; utilize at least one third party domain, and the median popular site uses 43 third parties. Not all third parties impact performance in the same way, but all it takes is one poorly configured third party. Often when someone discovers third party issues, it’s due to availability, loading performance, and interactivity delays caused by them.&lt;/p&gt;

&lt;p&gt;When I think about how or whether a third party’s content may impact performance, I usually consider the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When is the third party loaded?
    &lt;ul&gt;
      &lt;li&gt;Is it render blocking?&lt;/li&gt;
      &lt;li&gt;Does it load before important rendering metrics, such as First Contentful Paint or Largest Contentful Paint?&lt;/li&gt;
      &lt;li&gt;Are there gaps in loading first party content that correlate to the third party?&lt;/li&gt;
      &lt;li&gt;Are there gaps in loading other third party content that correlate to the third party?&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;How is it delivered?
    &lt;ul&gt;
      &lt;li&gt;How much content is served to the client?&lt;/li&gt;
      &lt;li&gt;What type of content is served to the client?&lt;/li&gt;
      &lt;li&gt;Is it using a Content Delivery Network to deliver resources?&lt;/li&gt;
      &lt;li&gt;Is it allowing the browser to cache static resources?&lt;/li&gt;
      &lt;li&gt;Are its resources compressed adequately?&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;What is it doing?
    &lt;ul&gt;
      &lt;li&gt;How much CPU time is used?&lt;/li&gt;
      &lt;li&gt;Does it result in excessive long tasks?&lt;/li&gt;
      &lt;li&gt;Does it result in excessive requests?&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you go down that list and answer these questions for a particular domain, you are likely to start forming an opinion of whether that third party could be a performance risk. It’s important to note that your analysis does not stop here, but rather it’s just beginning. If you suspect that a particular third party is a performance risk, testing, validating and ongoing monitoring should come after discovery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analyzing Third Party Performance in WebPageTest&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you run a measurement via &lt;a href=&quot;https://www.webpagetest.org/&quot;&gt;WebPageTest&lt;/a&gt;, the existing reports can help you quickly identify some important third parties to analyze. For example, the graph below is a “Connection View”, which I find helpful when looking for gaps that correlate to the loading of a particular third party. Dark shaded bars also indicate data transfer, so it may be easy to spot third parties that have an excessive amount of darker shades in this view. Looking at &lt;a href=&quot;https://www.webpagetest.org/result/240630_AiDcQX_6FH/&quot;&gt;this example&lt;/a&gt;, there are no gaps in the loading of first party content, most of the third party content is loading after LCP, and there are a few third parties that appear to be loading a lot of content.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/discovering-third-party-performance-risks/wpt-connection-view.jpg&quot; alt=&quot;WebPageTest Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If this type of analysis is unfamiliar, then I highly recommend checking out &lt;a href=&quot;https://twitter.com/TheRealNooshu&quot;&gt;Matt Hobb’s&lt;/a&gt; comprehensive &lt;a href=&quot;https://nooshu.com/blog/2020/12/31/how-to-run-a-webpagetest-test/&quot;&gt;guide to&lt;/a&gt; WebPageTest!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There are other interesting reports in WebPageTest that can be useful for analyzing third party content. For example, the “Domains Breakdown” report shows a summary of domains loaded on the site, as well as their request and byte count. The “Performance Optimization Overview” report shows a report card for each request, indicating whether a few performance best practices are met. And the “Opportunities &amp;amp; Experiments” section provides a comprehensive summary of some performance issues that were identified, with the option of running experiments using &lt;a href=&quot;https://product.webpagetest.org/experiments&quot;&gt;WebPageTest Pro&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third Party Explorer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While WebPageTest provides a great deal of insight into third parties, sometimes it can be challenging to get a sense of which third parties to focus on. I wanted to build on top of the Domain Breakdown feature by creating a dashboard view that enables you to dive deeper. Fortunately all of this data exists in WebPageTest measurement results, so no further instrumentation was needed.&lt;/p&gt;

&lt;p&gt;After navigating to the &lt;a href=&quot;https://tools.paulcalvano.com/wpt-third-party-analysis/&quot;&gt;Third Party Explorer&lt;/a&gt; tool, enter a WebPageTest test result URL and click submit. This will download and parse the results from your WebPageTest measurement.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/discovering-third-party-performance-risks/enter-wpt-url.jpg&quot; alt=&quot;WebPageTest Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Once the test results are parsed, you can see a summary of the domains, including potential performance and SPOF risks. &lt;em&gt;Note that I emphasize “potential” since you’ll really need to test your site to determine whether they are truly an issue.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/discovering-third-party-performance-risks/request-summary.jpg&quot; alt=&quot;WebPageTest Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Scroll down a little further, and you’ll see a list of domains. For each domain you can see the number of KB or requests loaded, and filter the results by content type. Each column is sortable. In the table you can see a lot of information (you may need to scroll to the right), such as:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Whether the domain is using a CDN&lt;/li&gt;
  &lt;li&gt;Number of requests and total bytes transferred&lt;/li&gt;
  &lt;li&gt;Render blocking content&lt;/li&gt;
  &lt;li&gt;A summary of third party requests or bytes between specific time ranges, such as before FCP, between FCP and LCP, etc.&lt;/li&gt;
  &lt;li&gt;A summary of requests that have not been compressed&lt;/li&gt;
  &lt;li&gt;Requests grouped by cache TTLs&lt;/li&gt;
  &lt;li&gt;Requests grouped by CPU overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;The two checkboxes in this table are for you to use. They are prepopulated with some suggestions based on the results, and alllow you to make a note of which domains to follow up on during your analysis. So you can check and uncheck domains during your analysis, and then review which ones to followup on later.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/discovering-third-party-performance-risks/domain-summary.jpg&quot; alt=&quot;WebPageTest Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For example, sorting through these results, I can see that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Google Tag Manager and TikTok both have high CPU execution times, and load a lot of scripts between LCP and Page Load.&lt;/li&gt;
  &lt;li&gt;The domain cdn.pdst.fm is loading 22KB of JavaScript, which is not compressed.&lt;/li&gt;
  &lt;li&gt;The domain js.adsrvr.org is render blocking, but loads towards the end of the HTML body.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While it seems like we’re in pretty good shape, there’s still a few things that we could improve on here. When I started building this tool there were a few more third parties in this list - some with significant performance issues such as inadequate compression levels, large JavaScript payloads that were not cached on a CDN, and cacheable content delivered via S3 instead of a CDN. Fortunately the third parties acted on feedback I shared with them and addressed the issues. I’ve started using this tool to proactively assess the performance impact of third parties prior to integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are Perf Risks and SPOF Risks?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the domain summary, you’ll find two checkboxes next to each domain, labeled “Perf Risk” and SPOF Risk”. If you find this checked for a domain, it doesn’t necessary mean that it’s a problem - but that it’s worth reviewing the domain to determine whether it is. I used a simple rubric based on my own experiences/opinions for determining which domains to label as a performance risk or a single point of failure risk:&lt;/p&gt;

&lt;p&gt;You’ll find SPOF risk checked if the domain has at least one render blocking request.&lt;/p&gt;

&lt;p&gt;You’ll find Perf risk checked if:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Overall
    &lt;ul&gt;
      &lt;li&gt;A domain has at least one render blocking request&lt;/li&gt;
      &lt;li&gt;If a domain has more than 10KB of text based content that is not compressed with gzip, brotli or zstd&lt;/li&gt;
      &lt;li&gt;If a domain is delivering at more than 20KB requests (excluding XHRs) without a CDN&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Before LCP
    &lt;ul&gt;
      &lt;li&gt;A domain loads more than 30KB of content&lt;/li&gt;
      &lt;li&gt;A domain loads any requests (excluding XHRs) that are not cacheable, have no cache policy or have a TTL of 0s&lt;/li&gt;
      &lt;li&gt;A domain’s scripts utilizes over 30ms or more CPU time (compile + evaluate + execute)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Between LCP and Page Load
    &lt;ul&gt;
      &lt;li&gt;A domain loads more than 50KB of content&lt;/li&gt;
      &lt;li&gt;A domain loads requests (excluding XHRs) that are not cacheable, have no cache policy or have a TTL of 0s&lt;/li&gt;
      &lt;li&gt;A domain’s scripts utilizes over 50ms or more CPU time (compile + evaluate + execute)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;After Page Load
    &lt;ul&gt;
      &lt;li&gt;A domain loads more than 100KB of content&lt;/li&gt;
      &lt;li&gt;A domain’s scripts utilizes over 100ms or more CPU time (compile + evaluate + execute)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feel free to use the checkboxes to select or deselect domains that you feel are not applicable to your site. And if you have ideas on a better rubric for a third party peformance risks, I’m open to suggestions!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bringing it Back to WebPageTest for a Deeper Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that we’ve reviewed how to use the tool, go run a WebPageTest measurement and see what you can find. Once you have some third parties to investigate, there’s a few things you can try:&lt;/p&gt;

&lt;p&gt;At the bottom of the Third Party Explorer tool you will find sections for “SPOF Evaluation” and “Performance Evaluation”. If you use the checkboxes in the tool to mark up which third parties you are concerned about, then you can use these sections to launch a SPOF test via WebPageTest or to populate a list of domains to block. This will enable you to run WebPageTest measurements to see what your user experience would be like if the third party failed or if it was removed.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/discovering-third-party-performance-risks/evaluation.jpg&quot; alt=&quot;WebPageTest Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you subscribe to WebPageTest Pro, check out the list of “Opportunities &amp;amp; Experiments” and try to see what optimizing a particular third party might do. From there you can experiment with blocking third parties, running them as first party content, etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Third party performance is no doubt a significant cause of poor user experience on the web. While it can be challenging to identify poorly performing third parties proactively, attempting to do so during the evaluation stage of implementing one can prove to be mutually beneficial for both you and the third party.&lt;/p&gt;

&lt;p&gt;There’s no shortage of great tools out there that will help you identify when a third party is slowing down your site. The goal of this article was to help you proactively assess whether a third party meets a criteria that merits some further review and analysis.&lt;/p&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">It likely comes as no surprise that third party content can be a significant contributor to slow loading websites and poor user experience. As performance engineers, we often need to find ways to balance requirements for their features with the strain that they can put on user experience. Unfortunately, for many sites this becomes a reaction to slowdowns and failures detected in production.</summary></entry><entry><title type="html">Third Parties and Certificate Revocations</title><link href="https://paulcalvano.com/2024-08-04-third-parties-and-certificate-revocations/" rel="alternate" type="text/html" title="Third Parties and Certificate Revocations" /><published>2024-08-04T04:00:00+00:00</published><updated>2024-08-05T03:12:24+00:00</updated><id>https://paulcalvano.com/third-parties-and-certificate-revocations</id><content type="html" xml:base="https://paulcalvano.com/2024-08-04-third-parties-and-certificate-revocations/">&lt;p&gt;On Monday July 29th, DigiCert &lt;a href=&quot;https://www.digicert.com/support/certificate-revocation-incident&quot;&gt;announced&lt;/a&gt; the need to revoke a large number of certificates due to a bug in domain validation. The CA/B Forum’s &lt;a href=&quot;https://cabforum.org/working-groups/server/baseline-requirements/documents/&quot;&gt;strict requirements&lt;/a&gt; to revoke these certificates within 24 hours resulted in a pretty busy Monday and Tuesday for a lot of folks. For some others, the deadline was moved to August 3rd due to exceptional circumstances. What remained a mystery was how many sites and third parties would be affected, how many would be prepared in time and what the impact of a mass revocation might look like across the web. In this blog post we’ll use the &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; to explore the impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which hostnames were affected?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=1910322&quot;&gt;bugzilla&lt;/a&gt; post, DigiCert shared a list of 86k serial numbers for certificates that were impacted and needed to be revoked. The list did not include hostnames, so it wasn’t easy to see which domains would be affected from the files alone. The &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; contains details for every certificate it encounters, so I was able write some queries to correlate the list of serial numbers with certificate used across 16 million websites. &lt;i&gt;If you are interested in the queries used to do this analysis, you’ll find them at the end of this post.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;In my analysis, I found 13,823 of the certificate serial numbers on publicly available web pages during last month’s HTTP Archive crawl. Many of these were first party resources, but a few hundred hostnames belonged to popular third parties. Overall I found that &lt;b&gt;1,241,943 websites would have been impacted by this revocation in some way&lt;/b&gt;, meaning they either made a first or third party request for a resource that used at least one of the affected certificates! Here’s a list of some of the more popular domains that were affected. The list contains an apex domain, the number of sites requesting resources from it, and the number of impacted subdomains (containing certificates needing to be revoked).&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td colspan=&quot;3&quot; align=&quot;center&quot;&gt;&lt;b&gt;Third Party with Certificates Affected by DigiCert Recovations&lt;/b&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Domain&lt;/td&gt;
   &lt;td&gt;Number of Sites&lt;/td&gt;
   &lt;td&gt;Number of Hostnames&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;yahoo.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;467,827&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;49&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;rubiconproject.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;387,241&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;20&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;fontawesome.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;299,959&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;10&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;pinterest.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;281,145&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;57&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;taboola.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;133,309&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;43&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;pinimg.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;91,573&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;14&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ib-ibi.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;60,557&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;snapchat.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;49,390&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;15&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;advertising.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;41,815&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;datadoghq-browser-agent.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;35,404&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;sift.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;13,962&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;scdn.co&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;10,949&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;usonar.jp&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7,402&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;4&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;sojern.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7,371&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;4&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;olark.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;6,966&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Looking the most popular third party domains impacted by this revocation, you can see that many of them reissued their certificates on July 30th based on the validity dates.  The initial deadline to reissue certificates was July 30th 19:30 UTC.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td colspan=&quot;3&quot; align=&quot;center&quot;&gt;&lt;b&gt;Third Party Hostnames Affected by DigiCert Revocation&lt;/b&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;Host&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;ValidFrom&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;ValidTo&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;pixel.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;pr-bh.ybp.yahoo.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Jan 22 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ups.analytics.yahoo.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Jan 22 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;kit.fontawesome.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Jan 27 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;token.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;eus.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;pixel-us-east.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;secure-assets.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ct.pinterest.com&lt;/td&gt;
   &lt;td&gt;Aug 2 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Aug 7 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;cms.analytics.yahoo.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Jan 22 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;fastlane.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ka-p.fontawesome.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Jan 27 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;log.pinterest.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Aug 7 23:59:59 2024 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;s.pinimg.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Aug 7 23:59:59 2024 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;assets.pinterest.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Aug 7 23:59:59 2024 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;sync.taboola.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Dec 31 23:59:59 2024 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;prebid-server.rubiconproject.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 3 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;global.ib-ibi.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Apr 2 23:59:59 2025 GMT&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;cdn.taboola.com&lt;/td&gt;
   &lt;td&gt;Jul 30 00:00:00 2024 GMT&lt;/td&gt;
   &lt;td&gt;Dec 31 23:59:59 2024 GMT&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;After the final deadline of August 3rd, 2024 19:30 UTC had passed, I ran an test against a list of the 13,823 hostames I discovered. I found that 78.51% of them had reissued their certificates prior to the initial deadline or switched to a using certificates not subject to revocation. Another 5.3% of the hostnames reissued their certificates during the extension. However 9.3% of hostnames - 1,291 - had failed to get their certificates reissued and were revoked on Aug 3rd.  Since then, 156 hostnames reissued - but there are still 1,135 (8.21%) hostnames delivering a revoked certificate!&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td colspan=&quot;3&quot; align=&quot;center&quot;&gt;&lt;b&gt;Validity Start Dates for Affected Certificates&lt;/b&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;Certificate Validity Date&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;Certificates&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;Percent of Certificates&lt;/strong&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Jul 29 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;121&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.88%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Jul 30 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;9,680&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;70.03%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Jul 31 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;977&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7.07%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Aug 1 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;426&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;3.08%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Aug 2 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;267&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1.93%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Aug 3 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;75&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.54%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Aug 4 2024&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;90&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.65%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;  
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Using Different Cert&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1,052&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7.61%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;Not Updated&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1,135&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;8.21%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;When the certificates were revoked, there were only 2 major third parties that were affected. One was a security service used by approximately 14k sites. Another was a live chat system that was used by a ~200 sites. Fortunately the failure of those third parties did not impact functionality of the sites, and they have since reissued their certificates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring for third party failures&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Often we don’t have the liberty of advance notice of impending failures - such as the recent Crowdstrike outages, CDN failures, and other major internet platform incidents. In this case, we had at least 24 hours notice that a massive certificate revocation event would occur (and then a few additional days after the extension).&lt;/p&gt;

&lt;p&gt;Using the HTTP Archive data I could see that none of the third parties used by my employer were impacted. However to be absolutely certain I configured a Catchpoint dashboard to monitor for third party availability issues. This dashboard displays the % availability for each third party host, the number of failures for each third party, and some load time metrics. The idea was that if a particular third party we use experienced an issue, we’d be able to identify it quickly.&lt;/p&gt;

&lt;p&gt;The dashboard was created by using a line chart broken down by host. Enabling “host data” allowed me to chart some host metrics such as availability and number of failures, as well as exclude first party content. You can see some more details on how to do this in this &lt;a href=&quot;https://www.catchpoint.com/blog/how-to-filter-out-the-noise-with-zones-and-hosts-a-catchpoint-differentiator&quot;&gt;blog post from Catchpoint&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-certificate-revocations/catchpoint-1.jpg&quot; width=&quot;200&quot; alt=&quot;Catchpoint dashboard configuration screenshot&quot; loading=&quot;lazy&quot; /&gt;&lt;/td&gt;
   &lt;td&gt;&lt;img src=&quot;/assets/img/blog/third-parties-and-certificate-revocations/catchpoint-2.jpg&quot; width=&quot;200&quot; alt=&quot;Catchpoint dashboard configuration screenshot&quot; loading=&quot;lazy&quot; /&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;You may ask why not use real user monitoring (RUM) data for this? RUM can give you timing information on third party requests and additional metrics if the third party sets the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Timing-Allow-Origin&lt;/code&gt; header. It’s great for detecting performance issues related to their party content. However, detecting failures in loading third party resources is not as easy since a failure simply won’t show up in resource timing data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preparing for Third Party Failures&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a popular third party fails or degrades, sometimes you’ll read about it in the news, especially if it breaks functionality on a large number of websites. Far too often organizations handle third party performance/failure risks reactively. There’s a few things you can do to prepare though.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Identify third party single poin of failures (SPOFs).
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://product.webpagetest.org/tutorials/how-to-simulate-a-single-point-of-failure-spof-using-webpagetest&quot;&gt;WebPageTest’s SPOF feature&lt;/a&gt; is great for this!&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Identify third party performance risks.
    &lt;ul&gt;
      &lt;li&gt;Test to see what happens when you block or remove their resources. (&lt;a href=&quot;https://andydavies.me/blog/2018/02/19/using-webpagetest-to-measure-the-impact-of-3rd-party-tags/&quot;&gt;WebPageTest&lt;/a&gt; or &lt;a href=&quot;https://developer.chrome.com/docs/devtools/network-request-blocking&quot;&gt;Chrome&lt;/a&gt;)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Identify third parties that might impact functionality on your site.
    &lt;ul&gt;
      &lt;li&gt;Test to see what happens when you block or remove their resources.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Monitor third party performance and availability.
    &lt;ul&gt;
      &lt;li&gt;A combination of RUM for performance and Synthetic measurements for availability can be helpful here.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ve also been working on a tool that will help identify potential third parties that are worth investigating for performance or single point of failure risks. Hoping to share that with you all very soon!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While 86k certificates may not sound like a huge amount compared to the scale of the web, the way those certificates were used across some very popular third parties could have impacted over a million websites. There’s been a lot of negativity about DigiCert regarding this, but I have a lot of empathy for what they’ve been dealing with this past week. It was no doubt frustrating for folks to frantically update certificates. This could have been incredibly disruptive to a large part of the web due to third party failures, but it wasn’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP Archive queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This section provides some details on how this analysis was performed, including SQL queries and commands for testing the certificates.  Please be warned that some of the SQL queries process a signicant amount of bytes - which can be very expensive to run (particularly the first one).&lt;/p&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Extract certificate serial numbers from HTTP Archive&lt;/b&gt;&lt;/summary&gt;
   The list of affected certificates that DigiCert provided included the serial numbers, but not a hostname.  In order to identify the hostnames, this query was written to collect serial numbers from all DigiCert certificates found in the HTTP Archive requests table. The approach involved base64 decoding the certificate, converting it to bytes and extracting the substring where the serial number exists. This is a bit of a hack, but it worked!
   &lt;p&gt;&amp;nbsp;&lt;/p&gt;
   &lt;b&gt;Warning&lt;/b&gt;: this query processes 13 TB of data, which is much higher than the  1 TB free tier.  The results for it have been saved in another BigQuery table for analysis: `httparchive.scratchspace.2024_07_01_cert_serials`.
  &lt;pre&gt;&lt;code&gt;

CREATE TEMPORARY FUNCTION extractCertHex(cert_block STRING)
RETURNS STRING
LANGUAGE js AS '''
   // Extract the Base64 content between the certificate tags
   const base64Match = cert_block.match(/-----BEGIN CERTIFICATE-----\\s*([A-Za-z0-9+/=\\s]+)\\s*-----END CERTIFICATE-----/);
   if (!base64Match) {
       return 'Invalid Certificate Block';
   }
   const base64Content = base64Match[1].replace(/\\s+/g, '');

   // Base64 decode
   function base64ToBytes(base64) {
       const chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
       let bytes = [];
       let buffer = 0, bits = 0;

       for (let i = 0; i &amp;lt; base64.length; i++) {
           if (base64[i] === '=') break;
           const val = chars.indexOf(base64[i]);
           buffer = (buffer &amp;lt;&amp;lt; 6) | val;
           bits += 6;
           if (bits &amp;gt;= 8) {
               bits -= 8;
               bytes.push((buffer &amp;gt;&amp;gt; bits) &amp;amp; 0xFF);
           }
       }

       return bytes;
   }

   // Decode the Base64 content
   const cert_der = base64ToBytes(base64Content);

   // Convert DER to hexadecimal
   let cert_hex = '';
   for (let i = 0; i &amp;lt; cert_der.length; i++) {
       const hex = cert_der[i].toString(16).padStart(2, '0');
       cert_hex += hex;
   }

   return cert_hex;
''';

SELECT
 DISTINCT NET.HOST(url) AS host,
 SUBSTR(extractCertHex(JSON_EXTRACT_SCALAR(payload, &quot;$._certificates[0]&quot;)),31,32) AS serial_num
FROM 
   `httparchive.requests.2024_07_01_mobile`
WHERE
 JSON_EXTRACT_SCALAR(payload, &quot;$._securityDetails.issuer&quot;) LIKE &quot;%DigiCert%&quot;
 AND JSON_EXTRACT_SCALAR(payload, &quot;$._certificates[0]&quot;) IS NOT NULL

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Identify hostnames subject to revocation&lt;/b&gt;&lt;/summary&gt;
   In order to identify the hostnames subject to revocation, I uploaded a copy of the revocation serial numbers to a table: `httparchive.scratchspace.digicert_revocation_20240730`.  Then I performed a simple `INNER JOIN` on the output from the previous query to identify hostnames that had a serial number in the revocation list.
  &lt;pre&gt;&lt;code&gt;
SELECT DISTINCT 
   host, 
   d.serial AS serial
FROM 
   `httparchive.scratchspace.2024_07_01_cert_serials` ha
INNER JOIN 
   `httparchive.scratchspace.digicert_revocation_20240730` as d
ON ha.serial_num = d.serial
WHERE 
   d.serial IS NOT NULL  
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Summarize popular third party domains that had hostnames impacted by the revocation&lt;/b&gt;&lt;/summary&gt;
   This query summarized domain names from the requests table by the number of sites loading a resource from it, and the number of hostnames that appeared in DigiCert's list of revoked hostnames.  The previous query is used in the `IN()` clause of this query.
  &lt;pre&gt;&lt;code&gt;
SELECT 
   NET.REG_DOMAIN(url) domain, 
   COUNT(DISTINCT page) sites, 
   COUNT(DISTINCT NET.HOST(url)) hostnames
FROM 
   `httparchive.all.requests` AS r
WHERE 
  date = &quot;2024-07-01&quot;
  AND client = &quot;mobile&quot;
  AND is_root_page = true
  AND NET.HOST(url) IN (
    SELECT DISTINCT host
    FROM `httparchive.scratchspace.2024_07_01_cert_serials` ha
    INNER JOIN `httparchive.scratchspace.digicert_revocation_20240730` as d 
    ON ha.serial_num = d.serial
    WHERE d.serial IS NOT NULL    
  )
GROUP BY 1
ORDER BY 2 DESC

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Summarize popular third party hostnames that had hostnames impacted by the revocation&lt;/b&gt;&lt;/summary&gt;
   This query is similar to the previous one, but summarizes by hostname instead of domain name.
  &lt;pre&gt;&lt;code&gt;
SELECT 
   NET.HOST(url) hostname, 
   COUNT(DISTINCT page) sites
FROM 
   `httparchive.all.requests` AS r
WHERE 
  date = &quot;2024-07-01&quot;
  AND client = &quot;mobile&quot;
  AND is_root_page = true
  AND NET.HOST(url) IN (
    SELECT DISTINCT host
    FROM `httparchive.scratchspace.2024_07_01_cert_serials` ha
    INNER JOIN `httparchive.scratchspace.digicert_revocation_20240730` as d 
    ON ha.serial_num = d.serial
    WHERE d.serial IS NOT NULL    
  )
GROUP BY 1
ORDER BY 2 DESC
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Bash Script to test certificates for affected hosts&lt;/b&gt;&lt;/summary&gt;
   This script will loop through a list of hostnames and extract the validity dates and serial numbers for each certificate, timing out after 3 seconds if the host is unresponsive.
  &lt;pre&gt;&lt;code&gt;
for i in $(cat all_hosts.txt); do 
  timeout 3 echo | openssl s_client -connect &quot;$i:443&quot; 2&amp;gt;/dev/null | 
  openssl x509 -noout -startdate -enddate -serial 2&amp;gt;/dev/null | 
  awk -F= -v host=&quot;$i&quot; '
    /^notBefore/ { start = $2 } 
    /^notAfter/  { end = $2 } 
    /^serial/    { serial = $2 } 
    END { 
      print host &quot;,&quot; start &quot;,&quot; end &quot;,&quot; serial 
    }'
done &amp;gt; all_hosts_checked_20240803_1930UTC.csv
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">On Monday July 29th, DigiCert announced the need to revoke a large number of certificates due to a bug in domain validation. The CA/B Forum’s strict requirements to revoke these certificates within 24 hours resulted in a pretty busy Monday and Tuesday for a lot of folks. For some others, the deadline was moved to August 3rd due to exceptional circumstances. What remained a mystery was how many sites and third parties would be affected, how many would be prepared in time and what the impact of a mass revocation might look like across the web. In this blog post we’ll use the HTTP Archive to explore the impact.</summary></entry><entry><title type="html">Choosing Between gzip, Brotli and zStandard Compression</title><link href="https://paulcalvano.com/2024-03-19-choosing-between-gzip-brotli-and-zstandard-compression/" rel="alternate" type="text/html" title="Choosing Between gzip, Brotli and zStandard Compression" /><published>2024-03-19T04:00:00+00:00</published><updated>2024-03-19T14:18:39+00:00</updated><id>https://paulcalvano.com/choosing-between-gzip-brotli-and-zstandard-compression</id><content type="html" xml:base="https://paulcalvano.com/2024-03-19-choosing-between-gzip-brotli-and-zstandard-compression/">&lt;p&gt;HTTP compression is a mechanism that allows a web server to deliver text based content using less bytes, and it’s been supported on the web for a very long time. In fact the first web browser to support gzip compression was &lt;a href=&quot;https://web.archive.org/web/19990824044001/https://www.desy.de/web/mosaic/help-on-version-2.1.html&quot;&gt;NCSA Mosaic v2.1&lt;/a&gt; way back in 1993! The web has obviously come a long way since then, but today pretty much every web server and browser still supports gzip compression.&lt;/p&gt;

&lt;p&gt;In recent years, new and innovative compression methods have gained browser support. One in particular that has achieved widespread adoption is &lt;a href=&quot;https://github.com/google/brotli&quot;&gt;Brotli&lt;/a&gt;. First supported in Chrome 50 (2016), it was &lt;a href=&quot;https://caniuse.com/brotli&quot;&gt;supported&lt;/a&gt; by all modern browsers a year later. Brotli is able to compress files much smaller than gzip, albeit with a higher computational overhead. Based on HTTP Archive data from January 2024, Brotli is actually used more than gzip for JavaScript and CSS!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/compression_by_content_type.jpg&quot; alt=&quot;Compression by Content Type&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Facebook’s &lt;a href=&quot;https://facebook.github.io/zstd/&quot;&gt;zStandard&lt;/a&gt; compression is another promising new method which aims to serve smaller payloads compared to gzip while also being faster. zStandard was added to the &lt;a href=&quot;https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding&quot;&gt;IANA’s list of HTTP content encodings&lt;/a&gt; with an identifier of “zstd”, and support for it was &lt;a href=&quot;https://chromestatus.com/feature/6186023867908096&quot;&gt;added to Chrome in version 123&lt;/a&gt;, which was released this month. Facebook recently shared some &lt;a href=&quot;https://facebook.github.io/zstd/#benchmarks&quot;&gt;benchmarks&lt;/a&gt; that show it performing signifcantly faster than gzip.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/zStandard_benchmarks.jpg&quot; alt=&quot;zStandard Benchmarks&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Beyond this, there’s also &lt;a href=&quot;https://developer.chrome.com/blog/shared-dictionary-compression&quot;&gt;shared dictionaries&lt;/a&gt; for Brotli and zStandard, which have the potential to significantly reduce byte sizes.&lt;/p&gt;

&lt;p&gt;While it may take some time for browsers, web servers and CDNs to catch up, it’s worth pondering which compression method is right for your content. A few years ago I wrote a &lt;a href=&quot;https://paulcalvano.com/2018-07-25-brotli-compression-how-much-will-it-reduce-your-content/&quot;&gt;blog post about Brotli compression&lt;/a&gt; as well as a tool to help you determine how Brotli could compress your content relative to gzip. I’ve updated this tool to include zStandard compression as well as show the relative latency incurred at each compression level. You can find the new/updated tool at &lt;a href=&quot;https://tools.paulcalvano.com/compression-tester/&quot;&gt;https://tools.paulcalvano.com/compression-tester/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://tools.paulcalvano.com/compression-tester/&quot; loading=&quot;lazy&quot;&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/compression_tester.jpg&quot; alt=&quot;Compression Tester&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How HTTP Compression Works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a client makes an HTTP request, it includes an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Accept-Encoding&lt;/code&gt; header to advertise the compression encodings it can support. The web server then selects one of the advertised encodings that it also supports and serves a compressed response with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Content-Encoding&lt;/code&gt; header to indicate which compression was used.&lt;/p&gt;

&lt;p&gt;In the example below, the client advertised support for gzip, Brotli, and Deflate compression. The server returned a gzip compressed response containing a text/html document.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    GET / HTTP/2
    Host: httparchive.org
    Accept-Encoding: gzip, deflate, br

    HTTP/2 200
    content-type: text/html; charset=utf-8
    content-encoding: gzip
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If a client sends multiple encodings in its Accept-Encoding header, then the server will have to choose one. For example, if I send an HTTP request to Facebook’s homepage and advertise support for gzip, Brotli and zStandard - their server chooses to deliver the response via zStandard.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    GET / HTTP/2
    Host: www.facebook.com
    accept-encoding: gzip, deflate, br, zstd
    
    HTTP/2 200 
    content-encoding: zstd
    content-type: text/html; charset=&quot;utf-8&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Gzip Compression&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Gzip is fundamentally supported by all web servers, browsers and intermediaries (CDNs, proxies, etc), mostly by default. Despite how easy it is to serve content gzip compressed, there are a few things to keep in mind:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Most web servers and CDNs default to gzip compression level 6, which is a reasonable default. Some servers (ie, NGINX) &lt;a href=&quot;https://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip&quot;&gt;default to gzip level 1&lt;/a&gt;, which usually results in faster compression times but  results in a larger file. Make sure to check your compression levels!&lt;/li&gt;
  &lt;li&gt;Many CDNs can gzip compress resources for you, which is helpful if you missed something on  your origin server. Some CDNs enable this by default.&lt;/li&gt;
  &lt;li&gt;Some CDNs may decompress and recompress content for you if they need to inspect or manipulate the contents. This may have an impact on your time to first byte (TTFB) especially if the content that needs to be recompressed is large.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Brotli Compression&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/google/brotli&quot;&gt;Brotli&lt;/a&gt; compression was created by Google and is supported across all major web browsers. At its highest compression level files can often be reduced 15-25% more than gzip. Higher compression levels come at a significant latency cost though.&lt;/p&gt;

&lt;p&gt;Many popular web servers support Brotli via modules. For example, Apache has a &lt;a href=&quot;https://httpd.apache.org/docs/2.4/mod/mod_brotli.html&quot;&gt;mod_brotli module&lt;/a&gt; which defaults to level 5. NGINX has a &lt;a href=&quot;https://github.com/google/ngx_brotli&quot;&gt;module called ngx_brotli&lt;/a&gt; which defaults to level 6. Brotli is supported by a variety of CDNs, albeit in slightly different ways. It’s best to understand how your specific CDN handles this compression algorithm. For example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Most CDNs have the ability to serve Brotli compressed payloads from an origin server that supports Brotli.
    &lt;ul&gt;
      &lt;li&gt;This is done by varying the content by the Content-Encoding response header.&lt;/li&gt;
      &lt;li&gt;Some CDNs do this by default, others require additional steps.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Some CDNs can Brotli compress responses even if an origin server does not support Brotli.
    &lt;ul&gt;
      &lt;li&gt;Usually they fetch the result via gzip from your origin.&lt;/li&gt;
      &lt;li&gt;Some CDNs can perform on the fly Brotli compression for static and dynamic, usually at specific compression levels they define.&lt;/li&gt;
      &lt;li&gt;Some CDNs can pre-compress static content at higher compression levels&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While the highest compression levels might be ideal for static content, you’ll want to be careful with dynamic content to avoid impacting your TTFB. Additionally if your CDN offers dynamic Brotli compression, then you may want to determine if the byte savings are worth the latency overhead from decompressing and re-compressing the response at the edge. In some cases it may be best to Brotli compress dynamic content from the origin, or stick with Gzip if your origin doesn’t support Brotli.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;zStandard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://facebook.github.io/zstd/&quot;&gt;zStandard&lt;/a&gt; is a newer compression method developed by Facebook. It was designed to compress at ratios similar to gzip compression, but with faster compression and decompression speeds. Historically its usage has been mostly filesystem related, however Chrome is the first browser to support it as of March 2024. While most CDNs do not have support for zStandard yet, many could feasibly vary the cache key for origin compressed responses similar to the way they support Brotli.&lt;/p&gt;

&lt;p&gt;When examining the top 10k sites in the HTTP Archive, zStandard compression usage appears to be mostly confined to sites owned by Meta such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www.facebook.com&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www.instagram.com&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www.messenger.com&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www.whatsapp.com&lt;/code&gt;, etc) and Netflix. The Meta sites are delivering zStandard encoded content to Chrome browsers. Netflix’s appears to default to gzip compression, possibly implementing zStandard as a test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compression Tester&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A few years ago I wrote a compression tool that was designed to help you determine whether gzip is sufficient for your content, or if Brotli could provide a reduction in payload sizes. I’ve released a new version of this tool that includes zStandard compression as well as compression times for each individual test.&lt;/p&gt;

&lt;p&gt;You can find the new/updated tool at &lt;a href=&quot;https://tools.paulcalvano.com/compression-tester/&quot;&gt;https://tools.paulcalvano.com/compression-tester/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we’re looking at these results, a few things you’ll want to keep in mind are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;If your content is dynamic / non-cacheable, then it will be very sensitive to any additional latency you add. Higher Brotli and zStandard compression levels can bec computationally expensive, so it’s best to use a more moderate compression level.&lt;/li&gt;
  &lt;li&gt;If your content is static / cacheable, then you may want to use higher compression levels.&lt;/li&gt;
  &lt;li&gt;Compression settings are typically set on a server and not per request.  If you do not have flexibility with setting this at a more granular level, you should go with the lowest common denominator required for all of your content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive deeper into a few examples:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Facebook&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When browsing Facebook, the largest JavaScript resource I saw was 2.37 MB uncompressed. It was delivered to my browser as a 514 KB zStandard compressed response. When testing this file via the Compression Tester tool, it was served a 645 KB gzip and 526 KB Brotli response.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/fb_script_example.jpg&quot; alt=&quot;Facebook Script Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Comparing the compression test results to the delivered responses, we can see that the server likely compressed in the middle range. For example it delivered a 526 KB Brotli payload (presumably level 5), but it could have delivered Brotli level 11 . This would come at a higher computational cost though - which may have been a factor in their selection. zStandard also appears to be delivering a smaller file, but based on this test the computational overhead is more than double what gzip level 6 costs.&lt;/p&gt;

&lt;p&gt;For this response, Brotli level 9 would provide the best compression ratio with a CPU overhead similar to zStandard 12.  If it’s possible to precompress payloads, then the highest compression levels would reduce the payloads further. However Brotli appears to outperform zStandard in both compression ratio and compression time up until level 9 for this response.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/fb_script_example2.jpg&quot; alt=&quot;Facebook Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The base HTML page for Facebook, it is 62 KB uncompressed. They support gzip, Brotli and zStandard - and it seems to be compressed at the lowest compression levels. While compression level 1 is often undesired, in this case there doesn’t appear to be much of an advantage of applying higher levels of compression due to limited byte savings. Additionally for Facebook’s HTML all 3 compression algorithms produce a similar size payload - but Brotli and zStandard both compress their HTML faster than gzip.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/facebook_html_example.jpg&quot; alt=&quot;Facebook HTML Compression Results&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandals.com Homepage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Out of the top 10 thousand websites, Sandals has the largest HTML payload - clocking in at almost 7.9MB (delivered as a 804 KB gzip compressed payload). Additionally they have a 9.4MB script bundle (gzip compressed to 2.5 MB). In the waterfall graph below you can see the impact that these large payloads are having on the experience. Reducing them solves only part of the problem, as that’s still a huge amount of content for the browser to parse, evaluate, and execute. But let’s see how these compression algorithms do.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/sandals_wpt_example.png&quot; alt=&quot;Sandals WPT Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Sandals appears to be gzip compressing at the highest possible compression level. If we assume that this page is dynamically generated and try to stay within the relative compression times:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Brotli level 9 or zStandard level 15 would result in approximately 75% smaller payload with faster compression times compared to gzip 9.&lt;/li&gt;
  &lt;li&gt;Brotli level 5 appears to be a good tradeoff between compression ratio and time (209 KB, and 73% faster to compress)&lt;/li&gt;
  &lt;li&gt;In this particular example, zStandard is producing a larger payload compared to Brotli.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sandal’s is also using Cloudflare’s CDN which &lt;a href=&quot;https://developers.cloudflare.com/speed/optimization/content/brotli/&quot;&gt;supports Brotli compression&lt;/a&gt;, so enabling this could be a quick performance win for them.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/sandals_homepage_compression.jpg&quot; alt=&quot;Sandals Homepage compression&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compression Levels vs Compression Times&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When considering a compression level, it’s important to balance the time it takes to compress a payload with the estimated byte savings. For example, utilizing the maximum compression level will often produce the smallest payload, but will do so at a higher computational cost. Likewise, the lowest compression levels are often really fast to compress, but might not be as effective.&lt;/p&gt;

&lt;p&gt;I tested the base HTML of the top 10 thousand websites’ HTML pages and their largest first party request. The results below show that the majority of gzip compression seems to fall within the estimated range of 4-6 (likely 6 since that is a common default). However ~30% of sites are utilizing gzip level 1, which often provides inadequate compression. The majority of Brotli compression appears to be level 4. However there’s ~25% of sites delivering Brotli level 1. And finally the majority of Brotli level 11 usage seems to be for static content.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/choosing-between-gzip-brotli-and-zstandard-compression/gzip_br_compression_levels.jpg&quot; alt=&quot;gzip and Brotli compression levels&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The table below details a few sites that are serving their HTML using either gzip or Brotli level 1. Using such a low compression level will likely result in larger payload. In many of these examples, the Brotli compressed payload is actually larger than gzip. Fortunately some of these sites are leveraging services that will automatically deliver gzip because of the byte discrepancy - but they could still benefit from increasing the compression level.&lt;/p&gt;

&lt;table&gt;
   &lt;tr&gt;
      &lt;td&gt;&lt;/td&gt;
      &lt;td colspan=&quot;3&quot; style=&quot;text-alight: center&quot;&gt;&lt;b&gt;Content Length Delivered (KB)&lt;/b&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;&lt;b&gt;url&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;&lt;b&gt;Uncompressed&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;&lt;b&gt;gzip&lt;/b&gt;&lt;/td&gt;
      &lt;td&gt;&lt;b&gt;Brotli&lt;/b&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.epicurious.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 3877&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 393&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 684&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.anker.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 3273&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 628&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 232&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.bonappetit.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1889&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 230&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 348&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.allure.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1817&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 239&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 372&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.gq.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1665&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 226&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 340&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.newyorker.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1638&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 252&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 342&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.vanityfair.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1593&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 231&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 339&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://seekingalpha.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1453&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 279&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.vogue.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1446&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 204&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 312&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
      &lt;td&gt;https://www.cntraveler.com&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 1422&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 205&lt;/p&gt;&lt;/td&gt;
      &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt; 311&lt;/p&gt;&lt;/td&gt;
   &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;The table below shows the results of applying gzip level 6, brotli level 5 and zStandard level 12 against the base HTML on these pages A few observations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Gzip level 6 reduces most of the gzipped payloads by 25-30% compared to the size delivered via gzip level 1.&lt;/li&gt;
  &lt;li&gt;Brotli level 5 was able to reduce their sizes by almost 75% compared to gzip level 1. In many cases the compression time overhead is comparable to gzip level 6 - but this varies.&lt;/li&gt;
  &lt;li&gt;zStandard level 12 was able to provide similar compression levels to brotli level 5 while maintaining compression times similar to gzip level 6.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on these examples, real time zStandard compression seems to provide a slight advantage over Brotli - achieving the same sizes with faster compression times.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td colspan=&quot;3&quot; style=&quot;text-align: center&quot;&gt;&amp;gt;After applying higher compression levels&lt;/td&gt;
   &lt;td colspan=&quot;3&quot; style=&quot;text-align: center&quot;&gt;Compression Time (seconds)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;url&lt;/td&gt;
   &lt;td&gt;gzip (L6)&lt;/td&gt;
   &lt;td&gt;Brotli (L5)&lt;/td&gt;
   &lt;td&gt;zStd (L12)&lt;/td&gt;
   &lt;td&gt;gzip (lL6)&lt;/td&gt;
   &lt;td&gt;Brotli L5)&lt;/td&gt;
   &lt;td&gt;zStd (L12)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.epicurious.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;271&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;145&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;145&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.083&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.093&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.068&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.anker.com
   &lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;412&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;219&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;219&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.099&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.081&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.088&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.bonappetit.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;169&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;113&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;114&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.044&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.061&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.054&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.allure.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;179&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;135&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;135&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.051&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.067&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.043&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.gq.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;168&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;118&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;119&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.049&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.053&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.050&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.newyorker.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;191&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;127&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;128&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.053&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.071&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.043&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.vanityfair.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;173&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;128&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;128&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.058&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.098&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.042&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://seekingalpha.com
   &lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;227&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;171&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;174&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.049&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.073&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.079&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.vogue.com&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;152&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;115&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;115&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.053&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.075&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.047&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;https://www.cntraveler.com
   &lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;155&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;117&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;117&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.052&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.052&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;0.048&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Cacheable objects can often be compressed at higher levels - especially if they are able to be precompressed.  When evaluating the largest first party objects hosted on the top 10 thousand sites, I found that Brotli level 5 and zStandard level 12 resulted in similar file sizes and compression times - much like the results above.  However when evaluating Brotli compression level 11 vs zStandard 19, the smallest files are almost always generated by Brotli level 11. zStandard’s compression times are 4x faster than Brotli 11 though.   So if you have the ability to precompress your objects, Brotli 11 may still be preferred.  If not, then zStandard may be the better option.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td colspan=&quot;4&quot; style=&quot;text-align: center&quot;&gt;% File size reduction compared to gzip level 6&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;br5&lt;/td&gt;
   &lt;td&gt;zstd12&lt;/td&gt;
   &lt;td&gt;br11&lt;/td&gt;
   &lt;td&gt;zstd19&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Average&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;8.85%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;9.07%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;19.18%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;14.11%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p50&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;6.99%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7.33%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;17.53%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;12.40%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p75&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;10.25%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;10.94%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;22.21%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;17.10%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p90&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;15.29%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;15.74%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;27.21%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;22.40%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HTTP compression is an incredibly important feature, and has long been overlooked due to the universal support of gzip compression across all web servers and browsers. It’s great to see innovation in this space, and the addition of another compression encoding along with the possibility of shared dictionary compression in the future.&lt;/p&gt;

&lt;p&gt;The research I’ve shared in this article also shows that for many sites Brotli will provide better compression for static content. zStandard could potentially provide some benefits for dynamic content due to its faster compression speeds. Additionally:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A surprising amount of sites are using low level gzip compression, and should consider increasing their compression levels.&lt;/li&gt;
  &lt;li&gt;For dynamic content
    &lt;ul&gt;
      &lt;li&gt;Brotli level 5 usually result in smaller payloads, at similar or slightly slower compression times.&lt;/li&gt;
      &lt;li&gt;zStandard level 12 often produces similar payloads to Brotli level 5, with compression times faster than gzip and Brotli.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;For static content
    &lt;ul&gt;
      &lt;li&gt;Brotli level 11 produces the smallest payloads&lt;/li&gt;
      &lt;li&gt;zStandard is able to apply their highest compression levels much faster than Brotli, but the payloads are still smaller with Brotli.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course your mileage will vary and there’s really no universal answer. It’s worth running some tests on your site to see whether your content would benefit, and which compression levels to consider. And then experimenting with RUM data to evaluate whether the approach you decide on is successful. I hope that tool I created helps you get started on this analysis for your site!&lt;/p&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">HTTP compression is a mechanism that allows a web server to deliver text based content using less bytes, and it’s been supported on the web for a very long time. In fact the first web browser to support gzip compression was NCSA Mosaic v2.1 way back in 1993! The web has obviously come a long way since then, but today pretty much every web server and browser still supports gzip compression.</summary></entry><entry><title type="html">Identifying Font Subsetting Opportunities with Web Font Analyzer</title><link href="https://paulcalvano.com/2024-02-16-identifying-font-subsetting-opportunities/" rel="alternate" type="text/html" title="Identifying Font Subsetting Opportunities with Web Font Analyzer" /><published>2024-02-16T14:00:00+00:00</published><updated>2024-02-17T19:20:07+00:00</updated><id>https://paulcalvano.com/identifying-font-subsetting-opportunities</id><content type="html" xml:base="https://paulcalvano.com/2024-02-16-identifying-font-subsetting-opportunities/">&lt;p&gt;10 years ago, custom web fonts were a niche feature &lt;a href=&quot;https://almanac.httparchive.org/en/2022/fonts#fig-1&quot;&gt;used by ~10% of websites&lt;/a&gt;. Today they are used by over 83% of websites! Fonts are generally loaded as a &lt;a href=&quot;https://docs.google.com/document/d/1bCDuq9H1ih9iNjgzyAL0gpwNFiEP4TZS-YLRp_RuMlc/edit#&quot;&gt;high priority resource&lt;/a&gt;, and some sites use techniques such as &lt;a href=&quot;https://web.dev/articles/codelab-preload-web-fonts&quot;&gt;preload&lt;/a&gt; and &lt;a href=&quot;https://developer.chrome.com/docs/web-platform/early-hints&quot;&gt;early hints&lt;/a&gt; to get them to load as quickly as possible. Custom web fonts are important to many sites, since rendering with a specific typography is often preferred from a design perspective. However, this can easily become a performance issue when a large amount of fonts are loaded.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/font-loading-example.jpg&quot; alt=&quot;Image of a WebPageTest waterfall where a site is loading a large amount of fonts&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore some potential issues around font loading and the performance benefits of a lesser used feature - font subsetting. We’ll look at &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; data to understand how prevalent the issue is, and then examine a few case studies. And finally, I created a new tool - &lt;a href=&quot;https://tools.paulcalvano.com/wpt-font-analysis/&quot;&gt;Web Font Analyzer&lt;/a&gt; - that may help you explore whether font subsetting is something to consider for your site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There’s been a lot written about web fonts over the years. In fact the &lt;a href=&quot;https://almanac.httparchive.org/&quot;&gt;HTTP Archive’s Web Almanac&lt;/a&gt; has an &lt;a href=&quot;https://almanac.httparchive.org/en/2022/fonts&quot;&gt;entire chapter&lt;/a&gt; dedicated to this topic. A few years ago Zach Leatherman wrote a fantastic &lt;a href=&quot;https://www.zachleat.com/web/font-checklist/&quot;&gt;checklist&lt;/a&gt; on font loading strategies, which included using preload to load fonts earlier. Way back in 2016 the CSS &lt;a href=&quot;https://developer.chrome.com/blog/font-display/&quot;&gt;font-display&lt;/a&gt; attribute was introduced, and today it is &lt;a href=&quot;https://caniuse.com/?search=font-display&quot;&gt;supported&lt;/a&gt; on all modern browsers and used by almost a third of websites! &lt;a href=&quot;https://almanac.httparchive.org/en/2022/fonts#variable-fonts&quot;&gt;Variable fonts&lt;/a&gt; are heavily used by Google Fonts which are widely used across the web. Barry Pollard wrote a great article on &lt;a href=&quot;https://www.tunetheweb.com/blog/should-you-self-host-google-fonts/&quot;&gt;self hosting Google Fonts&lt;/a&gt;. And just last month Stoyan Stefanov wrote an article about &lt;a href=&quot;https://www.phpied.com/bytes-normal-web-font-study-google-fonts/&quot;&gt;google font sizes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The font-display feature was a major step forward in font loading performance, as it gave developers control over whether to prioritize rendering or typography. Using font-display:swap would avoid a rendering delay by painting text using a system font and then swapping the actual font after it’s loaded.&lt;/p&gt;

&lt;p&gt;Font optimization strategies are great, but when combined the results can be confusing. For example, preloading fonts is a great way to get them to load earlier, but using font-display:swap at the same time may result in a less effective use of bandwidth early in the page load. It’s a good idea to understand exactly how your fonts are loading, how they are being used and what they contain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Font sizes across popular sites&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using the HTTP Archive we can explore font usage across millions of websites. For the purpose of this analysis, we’ll focus on the top million sites. As of January 2024, 81.5% of the top million sites are utilizing at least one custom web font. Usage varies widely, with the average site loading 238 KB of fonts.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/font-usage-by-site-popularity.jpg&quot; alt=&quot;Font usage by site popularity&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Depending on the font loading strategy used, fonts may be delivered at different parts of the page loading lifecycle. For the purpose of this analysis, I’m going to break this up into 4 parts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Before FCP - Means that the fonts finished downloading before the First Contentful Paint was measured. This could indicate that a font was render blocking, fetched with a high priority or preloaded.&lt;/li&gt;
  &lt;li&gt;FCP to LCP - Means that the font was loaded in between the First and Largest Contentful Paint. These fonts were loaded while other resources critical to the user experience were loading&lt;/li&gt;
  &lt;li&gt;LCP to onLoad - The fonts were loaded after the Largest Contentful Paint but before the onLoad event.&lt;/li&gt;
  &lt;li&gt;After onLoad - This could indicate that the font was either delayed or not used by the DOM until much later on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Out of the top million sites that are loading fonts, 63.6% are loading at least 1 custom web font prior to FCP. I would expect this to be high, especially considering how often preloading fonts is recommended.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/font-downloads-relative-to-perf-metrics.jpg&quot; alt=&quot;Font downloads relative to performance metrics&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;However, 25% of these sites are loading more than 75 KB of fonts before the FCP, and over 4500 sites are loading more than 500 KB of fonts during this time! Regardless of the benefits of the font loading strategies applied - I’d say that there is likely some waste occurring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Glyphs vs Characters on Pages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A font is essentially a typographical representation of a character. One can display the same text with different fonts and they will appear differently on a web page - however the unicode value for the character will always be the same. For example, the space character is 0032, the values ABC are 0065, 0066 and 0067 respectively across numerous fonts.&lt;/p&gt;

&lt;p&gt;Some font packages are designed to display icons, and others are designed for text. However the one thing that is not always apparent to developers is how many glyphs are included in each font package. For example, a popular Google font called &lt;a href=&quot;https://fonts.gstatic.com/s/materialicons/v140/flUhRq6tzZclQEJ-Vdg-IuiaDsNcIhQ8tQ.woff2&quot;&gt;“Material Icons”&lt;/a&gt; is used on 114K websites. It contains 2229 glyphs and adds 128 KB to websites using it. It’s very unlikely that sites are making use of all these glyphs.&lt;/p&gt;

&lt;p&gt;The most popular Google font is &lt;a href=&quot;https://fonts.gstatic.com/s/opensans/v36/memvYaGs126MiZpBA-UvWbX2vVnXBbObj2OVTS-mu0SC55I.woff2&quot;&gt;Open Sans&lt;/a&gt;, and it’s used by over 1.5 million websites. You can use a tool like &lt;a href=&quot;https://fontdrop.info/&quot;&gt;FontDrop&lt;/a&gt; to explore the contents of your fonts, and you might be surprised! This font contains 280 glyphs and adds 43 KB to pages using it. Loading a few fonts like this can really add up.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/open-sans.jpg&quot; alt=&quot;Open Sans Font Glyphs&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;So how does that compare with the rendered HTML of a page? Using the HTTP Archive, I was able to write a query that extracts and summarizes the visible glyphs in rendered HTML pages for the top 10K sites. We can then compare this to the minimum and maximum number of glyphs in a page’s custom web fonts. The results might surprise you!.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The median popular site contains 3 fonts, totalling 95 KB. The rendered HTML has 101 glyphs, while the smallest font has 248 glyphs.&lt;/li&gt;
  &lt;li&gt;At every percentile, the number of glyphs in the smallest font has 2-3x the number of glyphs compared to the rendered HTML.&lt;/li&gt;
&lt;/ul&gt;

&lt;table&gt;
  &lt;tr style=&quot;font-weight:bold&quot;&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td colspan=&quot;5&quot;&gt;&lt;p style=&quot;text-align: center&quot;&gt;Font Glyphs vs Rendered HTML&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style=&quot;font-weight:bold&quot;&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td&gt;Font Count&lt;/td&gt;
   &lt;td&gt;Font Weight&lt;/td&gt;
   &lt;td&gt;Visibly Glyphs&lt;/td&gt;
   &lt;td&gt;Min Font Glyphs&lt;/td&gt;
   &lt;td&gt;Max Font Glyphs&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p50&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;3&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;95 KB&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;101&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;248&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;524&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;

  &lt;tr&gt;
   &lt;td&gt;p75&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;5&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;193 KB&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;124&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;444&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;861&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;

  &lt;tr&gt;
   &lt;td&gt;p95&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;9&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;542 KB&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;221&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;901&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;2229&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;

  &lt;tr&gt;
   &lt;td&gt;p99&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;15&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1001 KB&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;631&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1530&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;3248&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;      
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;How to Test Your Page&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There’s a few tools in this area that provide some insight into font usage:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://fontdrop.info/&quot;&gt;FontDrop&lt;/a&gt; provides a useful way to explore the contents of your fonts. Simply drop the font file onto their UI and it will show you the metadata and all the glyphs contained in it.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://yellowlab.tools/&quot;&gt;YellowLabs&lt;/a&gt; has a font audit that will tell you if you have unused glyphs and summarize it by character sets. This can also be useful when deciding to subset fonts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As part of this research, I created a tool named “Web Font Analyzer” that will leverage results from a WebPageTest measurement to show you a summary of the glyphs used on a page, when they are loaded relative to some performance metrics and how many glyphs are supported by each font. I’ve also attempted to provide some guidance on how you can use this information in the tool.&lt;/p&gt;

&lt;p&gt;You can find the tool at &lt;a href=&quot;https://tools.paulcalvano.com/wpt-font-analysis/&quot;&gt;https://tools.paulcalvano.com/wpt-font-analysis/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Using the HTTP Archive we can identify sites that exhibit some of these issues, and there are quite a few. Let’s look at a few examples using this tool alongside WebPageTest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example - Mayoclinic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After running a &lt;a href=&quot;https://www.webpagetest.org/result/240113_BiDcGH_4W6/1/details/&quot;&gt;WebPageTest&lt;/a&gt; measurement for Mayoclinic’s homepage, I copy and pasted the URL of the test results into the font analyzer tool. The tool will fetch the measurement data and create a summary of the fonts that were loaded.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/web-font-analyzer.jpg&quot; alt=&quot;Web Font Analyzer&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The page summary section tells you how many fonts you are loading, and how large they are. It also provides a summary of the number of visible characters in the rendered HTML. In this example, there were 366 KB of fonts loaded on a page that contained only 85 visible glyphs. You can also click on “show glyphs” to see a summary of the actual glyphs in the rendered HTML.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/char-summary-mayoclinic-example.jpg&quot; alt=&quot;Character Summary - Mayoclinic&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Next we can see where these fonts are loading. The majority of these fonts are loading prior to the First Contentful Paint.And all of them are loading prior to the Largest Contentful Paint. It’s very likely that these font assets are competing with other resources for bandwidth.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/when-are-fonts-loading-mayoclinic-example.jpg&quot; alt=&quot;When are Fonts Loading - Mayoclinic&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If we examine the summary of fonts used on the page, we can see that many of them contain more than 500 glyphs!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/font-usage-summary-mayoclinic-example.jpg&quot; alt=&quot;Font Usage Summary - Mayoclinic&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the WebPageTest measurement we can see 8 font files loading immediately after the HTML.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/wpt-mayoclinic-example-1.jpg&quot; alt=&quot;Mayoclinic WebPageTest Measurement&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the HTTP response header for the base page, there is a Link header with preloads for 5 font files. Of these preloads, 4 are for fonts hosted on www.mayoclinic.com, and the other is design.mayloclinic.com. While the preloaded fonts are downloading, the HTML initiates 4 preloads for the font files on www.mayoclinic.com (however those are already in flight). Then the fonts.css file loads, which references fonts from design.mayoclinic.com. During this page load, approximately 700ms was spent loading fonts, half of which were unused. Each of these font files were ~42 KB and contained &amp;gt; 500 glyphs. Meanwhile the page contained less than 100 glyphs in the rendered HTML.&lt;/p&gt;

&lt;p&gt;Recommendations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Update CSS to use the fonts from www.mayoclinc.com.&lt;/li&gt;
  &lt;li&gt;Subset the font files to a latin character set to reduce their weight by up to 90%.&lt;/li&gt;
  &lt;li&gt;Continue preloading the (much smaller) font files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example - Kia&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can see another example on Kia.com US homepage. The page weight is 21.5 MB, and 3 MB of that are font files. There are 85 visible glyphs on the page, and expanding the glyphs we can see that there are two korean glyphs (codepoints 54620 and 44544) that are used to render the “Korean” language switch link in the footer.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/char-summary-kia-example.jpg&quot; alt=&quot;Character Summary - Kia&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Prior to LCP, there was almost 7 MB of content loaded, including all of the fonts. Font weight accounts for 14% of overall page weight, but a whopping 42% of bytes loaded prior to LCP! The fonts are almost certainly competing for bandwidth.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/when-are-fonts-loading-kia-example.jpg&quot; alt=&quot;When are Fonts Loading - Kia&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When looking at the individual font assets loaded, we can see that these are not using font-display:swap, and that 3 of them contain 15,190 glyphs! It also appears that all of the KiaSignature fonts are delivered as WOFF files (converting them to WOFF2 would reduce some of the bytes). Their icons font is also delivered as a TTF file, and not gzip compressed.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/font-usage-summary-kia-example.jpg&quot; alt=&quot;Font Usage Summary - Kia&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The fonts are referenced in clientlib-base.css and they are not being preloaded. At the start of the waterfall we can see the first-party CSS and JS loading, but then clientlib-base.js gets interrupted by the higher priority KiaSignatureRegular.woff font file. This font is 965 KB, which delays the JavaScript and ultimately the first contentful paint. Additionally, the font is only cached for 1 day - so repeat visitors will need to download the fonts again.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/wpt-kia-example-1.jpg&quot; alt=&quot;Kia WebPageTest Example - Part 1&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Further down the waterfall we can see numerous PNG images. However they are being delayed by 2.1 MB of fonts. At the same time, a 2 MB hero image loaded via Adobe Experience Cloud (Scene7) is fetched and used as the poster image for a hero video.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/wpt-kia-example-2.png&quot; alt=&quot;Kia WebPageTest Example - Part 2&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;While fonts are not the only factor affecting the performance of this page, they are clearly holding back the FCP and LCP by competing with other resources for bandwidth. Applying some easy to implement performance optimizations on the images such as lazy loading, using optimal image formats, and better cache directives will help - but ultimately the font loading will still cause delays - so subsetting them would be ideal.&lt;/p&gt;

&lt;p&gt;Recommendations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Subset the KiaSignature fonts for regional websites so that they are using only the necessary glyphs. For example on the US site, using latin + extended latin + the specific KR characters needed.&lt;/li&gt;
  &lt;li&gt;Convert WOFF files to WOFF2&lt;/li&gt;
  &lt;li&gt;Enable gzip compression for kia-icons.ttf. This would reduce the file to 104 KB (45% smaller). Subset the font to reduce the size even further.&lt;/li&gt;
  &lt;li&gt;Fonts are only cached on the browser for 1 day. TTL should be increased.&lt;/li&gt;
  &lt;li&gt;Not font related, but consider image and video compression, alternate image formats,and lazy loading images.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to Subset Fonts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In some of the examples above, font subsetting would be an excellent optimization. However based on the data from the HTTP Archive, it doesn’t seem that this is being used very often. Zach Leatherman wrote a great tool for subsetting fonts, called &lt;a href=&quot;https://github.com/zachleat/glyphhanger/&quot;&gt;Glyphhanger&lt;/a&gt;.  You can also use the &lt;a href=&quot;https://github.com/fonttools/fonttools&quot;&gt;fonttools&lt;/a&gt; command line library to subset your fonts.&lt;/p&gt;

&lt;p&gt;To use fonttools, you need to install the library.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;sudo apt-get install fonttools 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once installed, you’ll have the pyfmtfont application which can be used to subset fonts. You can see some examples of its usage in &lt;a href=&quot;https://markoskon.com/creating-font-subsets/&quot;&gt;this blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When I was creating the WPT Font Analyzer tool, I started using a &lt;a href=&quot;https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/webfonts/fa-solid-900.woff2&quot;&gt;fontawesome font&lt;/a&gt; to render a checkbox, an exclamation point and an information circle icon. This resulted in a 124 KB font, which was many times the size of the rest of the tool! I was able to reduce this to a small 1 KB font by running the following command to create a subsetted font with the glyphs I needed.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pyftsubset fa-solid-900.woff2 \
        --unicodes=U+f05a+f058+f071 \
        --flavor=woff2 \
        --output-file=fa-solid-900-subset.woff2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s try this against the 3 Kia fonts we saw from the previous example. In this example, I’m subsetting&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Basic latin characters (02-7E)&lt;/li&gt;
  &lt;li&gt;Copyright symbol (A9)&lt;/li&gt;
  &lt;li&gt;Registered Trademark symbol (AE)&lt;/li&gt;
  &lt;li&gt;Trademark Symbol (2122)&lt;/li&gt;
  &lt;li&gt;Double Quotes (201C-201D)&lt;/li&gt;
  &lt;li&gt;Korean Hangul syllables (D55D, ADC0)&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pyftsubset KiaSignatureRegular.woff \
          --flavor=woff2 \
          --unicodes=&quot;U+0020-007E,U+00A9,U+00AE,U+2122,U+201C-021D,U+D55D,U+ADC0&quot; \
          --output-file=KiaSignatureRegular_subset.woff2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The resulting file is 10 KB for each of the KiaSignature fonts. So in this example 3 MB of font weight could be reduced to 30 KB! You can download this subsetted font &lt;a href=&quot;/assets/img/blog/identifying-font-subsetting-opportunities/KiaSignatureRegular_subset.woff2&quot;&gt;here&lt;/a&gt; and examine it in &lt;a href=&quot;https://fontdrop.info/&quot;&gt;FontDrop&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Fonts can be challenging to support from a web performance perspective, especially as their placement on modern web pages occurs at the intersection of design and web operations. Over the years there have been some innovative approaches to font loading, designed to limit the performance overhead of them. There’s also been a lot of great research in the web performance industry on this topic and best practices published. It’s always worth evaluating the end user experience to ensure that the tools and optimizations put in place are having the desired impact on user experience. I’m hopeful that the &lt;a href=&quot;https://tools.paulcalvano.com/wpt-font-analysis/&quot;&gt;Web Font Analyzer&lt;/a&gt; tool I created adds another lens for you to evaluate font loading through.&lt;/p&gt;

&lt;p&gt;Many thanks to &lt;a href=&quot;https://twitter.com/tunetheweb&quot;&gt;Barry Pollard&lt;/a&gt; and &lt;a href=&quot;https://twitter.com/TimVereecke&quot;&gt;Tim Vereecke&lt;/a&gt; for reviewing this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Interested in seeing some of the HTTP Archive Queries behind this analysis? Here’s a few of the queries I used.  Please be aware that some of these queries  will exceed the free tier quota - so be careful when running them!  (You can read more on how to minimize query costs at &lt;a href=&quot;https://har.fyi/guides/minimizing-costs/&quot;&gt;har.fyi&lt;/a&gt;.)&lt;/p&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Percent of sites using fonts by rank&lt;/b&gt;&lt;/summary&gt;
  This query will calculate the percentage of sites that are using at least 1 custom web font, and group the results by rank.
  &lt;pre&gt;&lt;code&gt;
SELECT 
  rank, 
  IF(SAFE_CAST(JSON_EXTRACT(summary, &quot;$.reqFont&quot;) AS INT64) &amp;gt;0,&quot;Fonts&quot;, &quot;No Fonts&quot;) as f,
  COUNT(*),
  COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY rank) AS pct

FROM `httparchive.all.pages`
WHERE 
  date = &quot;2024-01-01&quot;
  AND is_root_page = true
  AND client = &quot;mobile&quot;
GROUP BY rank,f
ORDER BY rank ASC
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Average font weight across top million sites&lt;/b&gt;&lt;/summary&gt;
  This query will calculate the average, median and p75 font weight of sites with a rank &amp;lt;= 1 million, which are using at least 1 custom web font.
  &lt;pre&gt;&lt;code&gt;
SELECT 
  COUNT(*) AS freq,
  ROUND(AVG(SAFE_CAST(JSON_EXTRACT(summary, &quot;$.bytesFont&quot;) AS INT64)),2) AS avgFontSize,
  ROUND(APPROX_QUANTILES(SAFE_CAST(JSON_EXTRACT(summary, &quot;$.bytesFont&quot;) AS INT64), 100)[SAFE_ORDINAL(50)],2) p50FontSIze,
  ROUND(APPROX_QUANTILES(SAFE_CAST(JSON_EXTRACT(summary, &quot;$.bytesFont&quot;) AS INT64), 100)[SAFE_ORDINAL(75)],2) p75FontSIze
FROM `httparchive.all.pages`
WHERE 
  date = &quot;2024-01-01&quot;
  AND is_root_page = true
  AND client = &quot;mobile&quot;
  AND rank &amp;lt;= 1000000
  AND SAFE_CAST(JSON_EXTRACT(summary, &quot;$.reqFont&quot;) AS INT64) &amp;gt; 0
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;When do fonts start loading?&lt;/b&gt;&lt;/summary&gt;
  This query will run against the top 100K sites to identify the FCP, LCP and onLoad time for each measurement. It JOINs the requests table to search for the earliest start time for a font download.  Then it groups the results by time interval - such as &quot;Before FCP&quot;, &quot;Between FCP and LCP&quot;, etc. This query processes ~1TB of data.
  &lt;pre&gt;&lt;code&gt;
CREATE TEMP FUNCTION GetLcpTime(json_data STRING)
RETURNS INT64
LANGUAGE js AS &quot;&quot;&quot;
  var data = JSON.parse(json_data);

  if (data &amp;amp;&amp;amp; Array.isArray(data)) {
    for (var i = 0; i &amp;lt; data.length; i++) {
      if (data[i] &amp;amp;&amp;amp; data[i].name === 'LargestContentfulPaint') {
        return data[i].time;
      }
    }
  }
  
  return null;
&quot;&quot;&quot;;

SELECT 
CASE
    WHEN FCP IS null THEN 'error - no FCP'
    WHEN LCP IS null THEN 'error - no LCP'
    WHEN onLoad IS null THEN 'error - no onLoad'
    WHEN firstFontStartTime IS null THEN 'error - no firstFontStartTime'
    WHEN firstFontStartTime &amp;lt; FCP THEN &quot;Before FCP&quot;
    WHEN firstFontStartTime BETWEEN FCP AND LCP THEN &quot;Between FCP and LCP&quot;
    WHEN firstFontStartTime BETWEEN LCP AND onLoad THEN &quot;Between LCP and onLoad&quot;
    WHEN firstFontStartTime &amp;gt;  onLoad THEN &quot;After onLoad&quot;
    ELSE 'Unhandled Error'
  END AS test,
  COUNT(*)
FROM (
  SELECT 
    p.page,
    SAFE_CAST(JSON_EXTRACT(p.payload, &quot;$._firstContentfulPaint&quot;) AS INT64) AS FCP,
    getLcpTime(JSON_EXTRACT(p.payload, &quot;$._chromeUserTiming&quot;)) AS LCP,
    SAFE_CAST(JSON_EXTRACT(p.summary, &quot;$.onLoad&quot;) AS INT64) AS onLoad,
    MIN(SAFE_CAST(JSON_EXTRACT(r.payload, &quot;$._all_start&quot;) AS INT64)) AS firstFontStartTime,
  FROM `httparchive.all.pages` AS p
  INNER JOIN `httparchive.all.requests` AS r
  ON CAST(JSON_EXTRACT(p.summary, &quot;$.pageid&quot;) AS INT64) = CAST(JSON_EXTRACT(r.summary, &quot;$.pageid&quot;) AS INT64)
  WHERE 
    p.date = &quot;2024-01-01&quot; AND r.date = &quot;2023-11-01&quot;
    AND p.is_root_page = true AND r.is_root_page = true
    AND p.client = &quot;mobile&quot; AND r.client = &quot;mobile&quot;
    AND rank &amp;lt;= 100000
    AND r.type = &quot;font&quot;
    AND SAFE_CAST(JSON_EXTRACT(p.summary, &quot;$.reqFont&quot;) AS INT64) &amp;gt; 0
  GROUP BY 1,2,3,4
)
GROUP BY 1
ORDER BY 2 DESC
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;How many font bytes are downloaded before FCP?&lt;/b&gt;&lt;/summary&gt;
  This query will run against the top 1 million sites to aggregate the number of bytes loaded before FCP. This query processes ~1.3TB of data.
  &lt;pre&gt;&lt;code&gt;

SELECT 
  COUNT(*),
  ROUND(APPROX_QUANTILES(fontBytesBeforeFCP, 100)[SAFE_ORDINAL(50)],2) p50FontSize,
  ROUND(APPROX_QUANTILES(fontBytesBeforeFCP, 100)[SAFE_ORDINAL(75)],2) p75FontSize,
  ROUND(APPROX_QUANTILES(fontBytesBeforeFCP, 100)[SAFE_ORDINAL(95)],2) p95FontSize,
  ROUND(APPROX_QUANTILES(fontBytesBeforeFCP, 100)[SAFE_ORDINAL(99)],2) p99FontSize,
FROM (
  SELECT 
    p.page,
    SUM(SAFE_CAST(JSON_EXTRACT(r.summary, &quot;$.respSize&quot;) AS INT64)) AS fontBytesBeforeFCP,
  FROM `httparchive.all.pages` AS p
  INNER JOIN `httparchive.all.requests` AS r
  ON CAST(JSON_EXTRACT(p.summary, &quot;$.pageid&quot;) AS INT64) = CAST(JSON_EXTRACT(r.summary, &quot;$.pageid&quot;) AS INT64)
  WHERE 
    p.date = &quot;2023-11-01&quot; AND r.date = &quot;2023-11-01&quot;
    AND p.is_root_page = true AND r.is_root_page = true
    AND p.client = &quot;mobile&quot; AND r.client = &quot;mobile&quot;
    AND rank &amp;lt;= 1000000
    AND r.type = &quot;font&quot;
    AND SAFE_CAST(JSON_EXTRACT(p.summary, &quot;$.reqFont&quot;) AS INT64) &amp;gt; 0
    AND SAFE_CAST(JSON_EXTRACT(r.payload, &quot;$._all_start&quot;) AS INT64) &amp;lt; SAFE_CAST(JSON_EXTRACT(p.payload, &quot;$._firstContentfulPaint&quot;) AS INT64)
  GROUP BY 1
)
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Font glyphs vs rendered HTML&lt;/b&gt;&lt;/summary&gt;
  This query provides a list of the top 10K web sites, individual font URLs, the number of glyphs in the font, and the number of visible glyphs in the rendered HTML.  It also provides a link to the WebPageTest results from the HTTP Archive run for further analysis. This query processes almost 4 TB of data. I stored the results of the query in a scratchspace table in the HTTP Archive `httparchive.scratchspace.2024_01_01_font_glyphs_top100k`
  &lt;pre&gt;&lt;code&gt;
CREATE TEMPORARY FUNCTION  CountVisibleGlyphs(html STRING)
RETURNS INT64
LANGUAGE js
AS &quot;&quot;&quot;
  var extractedText = '';
  if (html) {
    // Remove HTML tags and keep only text content
    extractedText = html.replace(/&amp;lt;[^&amp;gt;]+&amp;gt;/g, '');

    // Remove extra spaces and newlines
    extractedText = extractedText.replace(/\\s+/g, ' ');

    // Remove leading and trailing spaces
    extractedText = extractedText.trim();

    // Count unique characters
    var uniqueChars = new Set(extractedText.split('')).size;
    return uniqueChars;
  } else {
    return 0; // Handle cases with empty HTML content
  }
&quot;&quot;&quot;;

SELECT 
  pages.page, 
  rank,
  fontRequests.url,
  CAST(JSON_EXTRACT_SCALAR(fontRequests.summary, &quot;$.respBodySize&quot;) AS INT64) AS font_size,
  CAST(JSON_EXTRACT_SCALAR(fontRequests.payload, &quot;$._font_details.counts.num_glyphs&quot;) AS INT64) AS glyphs,
  CountVisibleGlyphs(htmlRequests.response_body) AS visibleGlyphs,
  CONCAT(&quot;https://webpagetest.httparchive.org/result/&quot;, wptid, &quot;/&quot;) AS webpagetest, 

FROM `httparchive.all.pages` AS pages
INNER JOIN `httparchive.all.requests` AS fontRequests
  ON CAST(JSON_EXTRACT(pages.summary, &quot;$.pageid&quot;) AS INT64) = CAST(JSON_EXTRACT(fontRequests.summary, &quot;$.pageid&quot;) AS INT64)
INNER JOIN `httparchive.all.requests` AS htmlRequests
  ON CAST(JSON_EXTRACT(pages.summary, &quot;$.pageid&quot;) AS INT64) = CAST(JSON_EXTRACT(htmlRequests.summary, &quot;$.pageid&quot;) AS INT64)

WHERE 
  pages.date = &quot;2024-01-01&quot; 
  AND fontRequests.date = &quot;2024-01-01&quot;
  AND htmlRequests.date = &quot;2024-01-01&quot;
    
  -- mobile
  AND pages.client=&quot;mobile&quot; 
  AND fontRequests.client=&quot;mobile&quot;
  AND htmlRequests.client=&quot;mobile&quot;
  
  -- root pages
  AND pages.is_root_page = true
  AND fontRequests.is_root_page = true
  AND htmlRequests.is_root_page = true

  -- font requests and HTML request
  AND fontRequests.type = &quot;font&quot;
  AND htmlRequests.is_main_document = true

  AND rank &amp;lt;= 10000

  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;

&lt;details&gt;
  &lt;summary&gt;&lt;b&gt;Font glyphs vs rendered HTML - analysis&lt;/b&gt;&lt;/summary&gt;
  This query uses the results from the previous table to compared the minimum and maximum number of glyphs in a font to the glyphs in the rendered HTML. Since this query uses the saved results from the previous query, it process a very small amount of data (~1 MB)
  &lt;pre&gt;&lt;code&gt;
WITH fontStats AS (
  SELECT page, 
  COUNT(*) AS fonts,
  MIN(CAST(font_size AS INT64)) AS minSize, 
  MAX(CAST(font_size AS INT64)) AS maxSize, 
  SUM(CAST(font_size AS INT64)) AS totalSize, 
  min(CAST(glyphs AS INT64)) AS minGlyphs, 
  MAX(CAST(glyphs AS INT64)) AS maxGlyphs, 
  AVG(visibleGlyphs) AS visibleGlyphs 
FROM `httparchive.scratchspace.2024_01_01_font_glyphs_top100k` 
GROUP BY 1
)

SELECT 
  ROUND(APPROX_QUANTILES(fonts, 100)[SAFE_ORDINAL(50)],2) p50FontCount,
  ROUND(APPROX_QUANTILES(fonts, 100)[SAFE_ORDINAL(75)],2) p75FontCount,
  ROUND(APPROX_QUANTILES(fonts, 100)[SAFE_ORDINAL(95)],2) p95FontCount,
  ROUND(APPROX_QUANTILES(fonts, 100)[SAFE_ORDINAL(99)],2) p99FontCount,
  ROUND(APPROX_QUANTILES(totalSize, 100)[SAFE_ORDINAL(50)],2) p50FontWeight,
  ROUND(APPROX_QUANTILES(totalSize, 100)[SAFE_ORDINAL(75)],2) p75FontWeight,
  ROUND(APPROX_QUANTILES(totalSize, 100)[SAFE_ORDINAL(95)],2) p95FontWeight,
  ROUND(APPROX_QUANTILES(totalSize, 100)[SAFE_ORDINAL(99)],2) p99FontWeight,  
  ROUND(APPROX_QUANTILES(visibleGlyphs, 100)[SAFE_ORDINAL(50)],2) p50VisibleGlyphs,
  ROUND(APPROX_QUANTILES(visibleGlyphs, 100)[SAFE_ORDINAL(75)],2) p75VisibleGlyphs,
  ROUND(APPROX_QUANTILES(visibleGlyphs, 100)[SAFE_ORDINAL(95)],2) p95VisibleGlyphs,
  ROUND(APPROX_QUANTILES(visibleGlyphs, 100)[SAFE_ORDINAL(99)],2) p99VisibleGlyphs,
  ROUND(APPROX_QUANTILES(minGlyphs, 100)[SAFE_ORDINAL(50)],2) p50MinGlyphs,
  ROUND(APPROX_QUANTILES(minGlyphs, 100)[SAFE_ORDINAL(75)],2) p75MinGlyphs,
  ROUND(APPROX_QUANTILES(minGlyphs, 100)[SAFE_ORDINAL(95)],2) p95MinGlyphs,
  ROUND(APPROX_QUANTILES(minGlyphs, 100)[SAFE_ORDINAL(99)],2) p99MinGlyphs,  
  ROUND(APPROX_QUANTILES(maxGlyphs, 100)[SAFE_ORDINAL(50)],2) p50MaxGlyphs,
  ROUND(APPROX_QUANTILES(maxGlyphs, 100)[SAFE_ORDINAL(75)],2) p75MaxGlyphs,
  ROUND(APPROX_QUANTILES(maxGlyphs, 100)[SAFE_ORDINAL(95)],2) p95MaxGlyphs,
  ROUND(APPROX_QUANTILES(maxGlyphs, 100)[SAFE_ORDINAL(99)],2) p99MaxGlyphs,   
FROM fontStats
  &lt;/code&gt;&lt;/pre&gt;
&lt;/details&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">10 years ago, custom web fonts were a niche feature used by ~10% of websites. Today they are used by over 83% of websites! Fonts are generally loaded as a high priority resource, and some sites use techniques such as preload and early hints to get them to load as quickly as possible. Custom web fonts are important to many sites, since rendering with a specific typography is often preferred from a design perspective. However, this can easily become a performance issue when a large amount of fonts are loaded.</summary></entry><entry><title type="html">Internet Explorer’s Decline in Usage in 2021</title><link href="https://paulcalvano.com/2022-01-31-internet-explorer-decline-in-2021/" rel="alternate" type="text/html" title="Internet Explorer’s Decline in Usage in 2021" /><published>2022-01-31T14:00:00+00:00</published><updated>2022-01-31T17:50:26+00:00</updated><id>https://paulcalvano.com/internet-explorer-decline-in-2021</id><content type="html" xml:base="https://paulcalvano.com/2022-01-31-internet-explorer-decline-in-2021/">&lt;p&gt;In May 2021 Microsoft &lt;a href=&quot;https://blogs.windows.com/windowsexperience/2021/05/19/the-future-of-internet-explorer-on-windows-10-is-in-microsoft-edge/&quot;&gt;announced&lt;/a&gt; that it would be officially retiring Internet Explorer in favor of the Chromium based &lt;a href=&quot;https://www.microsoft.com/en-us/edge&quot;&gt;Microsoft Edge&lt;/a&gt;. Usage for the legacy browser had been very low over the past few years, although many websites have still maintained polyfills for the older browser. In fact a number of my clients have recently told me that supporting IE 11  is required by their business, and is still a consideration when adding new features to their sites. I’m sure that the official retirement of Internet Explorer will help numerous organizations embrace modern web features and move on from some expensive &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Glossary/Polyfill&quot;&gt;polyfills&lt;/a&gt;. The official retirement date is June 15, 2022.&lt;/p&gt;

&lt;p&gt;A few years ago &lt;a href=&quot;https://twitter.com/philwalton&quot;&gt;Philip Walton&lt;/a&gt; wrote an excellent blog post about &lt;a href=&quot;https://philipwalton.com/articles/loading-polyfills-only-when-needed/&quot;&gt;loading polyfills only when needed&lt;/a&gt;. His strategy focused on optimizing the experience for users on modern browsers, and loading polyfills based on browser support for required features (or lack thereof). One thing I really like about this approach is that he recommends creating separate bundles for modern browsers. This avoids unnecessary CPU load on modern browsers, since they won’t have to &lt;a href=&quot;https://v8.dev/blog/cost-of-javascript-2019&quot;&gt;parse, compile and evaluate&lt;/a&gt; the polyfill JavaScript.&lt;/p&gt;

&lt;p&gt;More recently, &lt;a href=&quot;https://twitter.com/AlanGDavalos&quot;&gt;Alan Davalos&lt;/a&gt; published a blog post where he made the argument that the &lt;a href=&quot;https://engineering.linecorp.com/en/blog/the-baseline-for-web-development-in-2022/&quot;&gt;baseline for web development in 2022 &lt;/a&gt;should change as a result of this. He provided numerous examples of websites that have officially dropped support for the legacy browser in 2021.&lt;/p&gt;

&lt;p&gt;This made me curious about the current traffic levels for Internet Explorer, and how that has changed over the past year. Looking at &lt;a href=&quot;https://www.akamai.com/products/mpulse-real-user-monitoring&quot;&gt;Akamai’s mPulse&lt;/a&gt; data, which measures real user performance data for users across all browsers, I can see a steady decline since March 2021. Most of the Internet Explorer traffic was from the IE 11 browser, with earlier browsers being a small fraction (&amp;lt; 0.01% of pages).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/internet-explorer-decline-in-2021/internet_explorer_monthly_pct.jpg&quot; alt=&quot;Internet Explorer Traffic - Monthly&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Looking at this data daily, you can see that the decline in Internet Explorer usage was linear throughout the year. It also seems that the decline accelerated after the announcement in May 2021. Usage dropped even further in November 2021.&lt;/p&gt;

&lt;p&gt;The zig-zag pattern in the daily traffic indicates that the majority of users utilizing IE 11 are doing so during the Monday-Friday work week, with traffic dropping by two thirds on the weekends. If business users are the primary source of Internet Explorer usage, then they could be at risk of 0-day exploits once official support ends.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/internet-explorer-decline-in-2021/internet_explorer_daily_pct.jpg&quot; alt=&quot;Internet Explorer Traffic - Daily&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I thought it would be interesting to see which countries had the highest percentage of Internet Explorer traffic during January 2022.  Sure enough this varied from country to country. For example, in the United States 0.97% of pages were loaded from an Internet Explorer browser. In Great Britain, it only accounted for 0.29% of pages.&lt;/p&gt;

&lt;p&gt;Some countries had a noticeably higher percentage of Internet Explorer traffic - for example South Korea (2.36%), China (1.76%), Hong Kong (1.78%). And then there are some countries with a shockingly high percentage of Internet Explorer traffic: Haiti (26.15%), Belize (9.12%), Jamaica (7.13%), Cambodia (6.8%) and more. The graph below shows a summary Internet Explorer usage per country. You can also view an interactive version of it &lt;a href=&quot;https://public.tableau.com/app/profile/paul.calvano8666/viz/InternetExplorerUsagebyCountry-Jan2022/Sheet1&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/internet-explorer-decline-in-2021/internet_explorer_pct_by_country_jan2022.jpg&quot; alt=&quot;Internet Explorer Usage by Country - Jan 2022&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Internet Explorer is going away, and that is ultimately a good thing for the web. As support officially ends, website owners should consider removing expensive polyfills for the browser and add support for technologies that are supported on modern browsers. However this won’t happen by itself, and sites that have been bundling polyfills into a large monolithic bundle will need to put in the effort to analyze which ones are still needed. Fortunately there are some great resources on bundle analysis, such as &lt;a href=&quot;https://twitter.com/thegreengreek&quot;&gt;Sia Karamalegos’&lt;/a&gt; guide to &lt;a href=&quot;https://sia.codes/posts/lighthouse-treemap/&quot;&gt;Lighthouse Treemap&lt;/a&gt; and &lt;a href=&quot;https://nolanlawson.com/about/&quot;&gt;Nolan Lawson’s&lt;/a&gt; guide to &lt;a href=&quot;https://nolanlawson.com/2021/02/23/javascript-performance-beyond-bundle-size/&quot;&gt;bundle analysis tools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Organizations should also look to remove the legacy browsers on their employees’ machines to avoid security risks down the line. Microsoft provides guides and resources to remove legacy Internet Explorer browsers from organizations, which are available &lt;a href=&quot;https://www.microsoft.com/en-us/download/details.aspx?id=102119&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">In May 2021 Microsoft announced that it would be officially retiring Internet Explorer in favor of the Chromium based Microsoft Edge. Usage for the legacy browser had been very low over the past few years, although many websites have still maintained polyfills for the older browser. In fact a number of my clients have recently told me that supporting IE 11 is required by their business, and is still a consideration when adding new features to their sites. I’m sure that the official retirement of Internet Explorer will help numerous organizations embrace modern web features and move on from some expensive polyfills. The official retirement date is June 15, 2022.</summary></entry><entry><title type="html">Page Visibility: If a tree falls in the forest…</title><link href="https://paulcalvano.com/2021-12-31-page-visibility-if-a-tree-falls-in-the-forest/" rel="alternate" type="text/html" title="Page Visibility: If a tree falls in the forest…" /><published>2021-12-31T16:30:00+00:00</published><updated>2022-01-02T19:43:14+00:00</updated><id>https://paulcalvano.com/page-visibility-if-a-tree-falls-in-the-forest</id><content type="html" xml:base="https://paulcalvano.com/2021-12-31-page-visibility-if-a-tree-falls-in-the-forest/">&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image12.jpg&quot; alt=&quot;Picture of a tree falling in a forest&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If a tree falls in the forest and no one is around, does it make a sound? Likewise, if a web page loads in a background tab then does its load time really matter? As a user, the time it takes for background tabs to load may seem irrelevant since you are unlikely to notice delays. However if you are managing a website and measuring user experience, then it’s important to understand how visibility state can influence the data you are analyzing.&lt;/p&gt;

&lt;p&gt;In this blog post we’ll explore the &lt;a href=&quot;https://w3c.github.io/page-visibility&quot;&gt;Page Visibility API&lt;/a&gt; as well as some data from &lt;a href=&quot;https://www.akamai.com/us/en/products/performance/mpulse-real-user-monitoring.jsp&quot;&gt;Akamai’s mPulse&lt;/a&gt; to understand the visibility states of real users loading billions of page views, and what it means for web performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page Visibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://w3c.github.io/page-visibility/#visibilitystate-attribute&quot;&gt;Page Visibility API&lt;/a&gt; defines a programmatic way of determining the visibility state of a top-level browsing content, as well as a method of measuring visibility state changes over time. Web developers can use this information to determine whether a page is visible to an end user. It also gives them the ability to scale back the work being performed on a page load. The Page Visibility API is also &lt;a href=&quot;https://caniuse.com/?search=page%20visibility&quot;&gt;supported in all modern web browsers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Using this API is very simple. The attribute &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;document.visibiltyState&lt;/code&gt; will return visible or hidden depending on whether the page is visible or not. If you want to see the value changing as you switch tabs, you can also monitor the visibility changes. For example,&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;console.log(document.visibilityState + ': ' + Date())
document.onvisibilitychange = () =&amp;gt; console.log(document.visibilityState + ': ' + Date())
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image5.jpg&quot; alt=&quot;Visibility State Example&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The W3C specification, also provides an example of using this API to decide whether to autoplay a video on page load based on visibility state. It adds an event listener to listen for changes in the visibility state so that the video playback can start automatically when the page is visible.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image9.jpg&quot; alt=&quot;Example of Page Visibility API usage from specification&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;RUM tools can collect this data as well. For example, mPulse collects the visibility state of a page once the page load has completed and also measures for changes in visibility throughout the page load.&lt;/p&gt;

&lt;p&gt;The data in this blog post is based on billions of page views across all sites using mPulse during the month of November 2021.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visibility States&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How often do you right click on a link and load a page in a background tab? Or click on a link on your mobile device and switch applications before it finishes loading? Or lock the screen on your mobile device while waiting for a page to load?&lt;/p&gt;

&lt;p&gt;The graph below breaks down visibility states by device type using RUM data from mPulse. The visibility state is measured as soon as the onLoad event is fired. 11.18% of all Desktop page views were loaded in a hidden visibility state. Similarly, 9.59% of Mobile page views were loaded in a hidden visibility state.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image8.jpg&quot; alt=&quot;Distribution of Visibility States&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Less than 1% of pages were loaded in a prerender state. This feature has been deprecated in the Page Visibility API, supported inconsistently across browsers, and therefore not as useful for this analysis.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is Visibility State Important for Web Performance?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most modern browsers prioritize work being done in the foreground, and as such one would expect that a page loaded in a background tab may be slower. Individually you can examine this behavior by loading a page and looking at it’s navigation timing data. Load that same page in a non-visible tab a few times and look at the difference in timing.&lt;/p&gt;

&lt;p&gt;When analyzing the median page load time (onLoad metric) in mPulse, I can see a significant difference in performance based on visibility state. For example, the median time to load a page on Desktop was 2.8 seconds. The median load time for pages in a visible state was 2.7 seconds and the median load time in a hidden state took 4 seconds. The median load times for pages loaded in a hidden visibility state was 32% slower on Desktop and 37% slower on Mobile!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image1.jpg&quot; alt=&quot;Median Load Time by Visibility State&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Going back to the tree falling in a forest analogy, does this really matter? I’d say yes and no, for the following reasons:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;From a user experience, if the page is not visible then the user is not impacted by the delay.&lt;/li&gt;
  &lt;li&gt;As performance engineers we look to RUM data to tell us what our users are experiencing. If a large enough percentage of users are loading tabs in the background, then it may impact the metrics we are analyzing. This is exacerbated even further if you are looking at upper percentile stats.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The graph below illustrates the median load times as well as the p75, p95 and p99 based on visibility state. The p95 for all Desktop pages loaded in a visible state was 14.37 seconds. Comparatively, the page load times for hidden states was 37.65 seconds, which is more than twice as slow!.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image2.jpg&quot; alt=&quot;Desktop Load Time Percentiles by Visibility State&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Taking this one step further, the graph below shows the distribution of load times in a histogram, for both hidden and visible states. As the response times increase, hidden visibility states account for a larger percentage of experiences. At the 95th percentile, 25% of all pages were loaded in a hidden visibility state. (Note: the x axis in this graph ends at 18 seconds, which is around the 95th percentile).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image4.jpg&quot; alt=&quot;Distribution of Desktop Response Times by Visibility State&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Now let’s explore the upper percentiles. The graph below shows the same data for the slowest 5% of experiences. The percentage of hidden visibility states increases with respect to the load time, eventually approaching 36%. If you are analyzing upper percentiles to tackle some of your long tail performance issues - then not filtering out hidden visibility states will leave a significant amount of noise in your data.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image3.jpg&quot; alt=&quot;Distribution of the slowest 5% of Desktop Response Times by Visibility State&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page visibility by desktop browser&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During the month of November 2021, Chrome, Edge, Safari, Firefox and Internet Explorer accounted for 96.3% of all desktop page views. Chrome was the dominant browser, with 63.9% of all page views. However all of these browsers support the page visibility API.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image6.jpg&quot; alt=&quot;Desktop Browser Distribution&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The table below details the distribution of visibility states by Desktop browser. The percentage of hidden visibility states varies widely by browser. This may be influenced by a variety of factors, such as browser UI features (such as tabbed browsing) or by the end user switching between applications on their machine.&lt;/p&gt;

&lt;p&gt;Chrome had the highest percentage of hidden visibility state page loads (12.9%) compared to other desktop browsers. Interestingly, Edge (which is now Chromium based) had 7.86% of page loads. Given that legacy Internet Explorer (version 11 and earlier) had only 4.4% hidden visibility states, it’s possible that some users of Edge and Internet Explorer are less likely to utilize the tabbed browsing features.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td colspan=&quot;2&quot;&gt;% of page loads by visibility state&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Browser&lt;/td&gt;
   &lt;td&gt;hidden&lt;/td&gt;
   &lt;td&gt;visible&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Chrome&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;12.91%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;87.09%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Edge
   &lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7.86%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;92.14%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Safari&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;6.86%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;92.89%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Firefox&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;9.25%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;90.75%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;&lt;td&gt;IE&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;4.44%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;95.56%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;It’s worth noting that if you are measuring your site with a synthetic measurement, you are almost always loading a page in a visible state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page visibility by mobile browser&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mobile web traffic is split between browser apps, WebViews, and in-app browsers. Mobile Safari is the dominant mobile browser, with 39.2% of page loads. WebViews also represented a significant traffic share, accounting for 17% of all mobile page views (or 20% if we include Chrome Mobile and Firefox Mobile’s iOS apps).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image10.jpg&quot; alt=&quot;Mobile Browser Distribution&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The table below breaks down the visibility states measured by mobile web browsers. Overall, 5.98% of pages on Mobile Safari were loaded in a hidden visibility state, which is comparable to Desktop. However Chrome Mobile had less hidden visibility states (8.88% mobile vs 12.91% desktop). Interestingly, some Chromium based mobile browsers (such as Samsung Internet, MiuiBrowser and Edge Mobile) had a much higher percentage of hidden visibility states - likely due to differences in their UI and user base.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td colspan=&quot;2&quot;&gt;% of page loads by visibility state&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Browser&lt;/td&gt;
   &lt;td&gt;hidden&lt;/td&gt;
   &lt;td&gt;visible&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Mobile Safari&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;5.98%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;93.99%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Chrome Mobile&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;8.88%&lt;/p&gt;&lt;/td&gt;

   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;91.12%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Samsung Internet&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;12.34%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;87.66%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Crosswalk&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;2.93%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;97.07%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Firefox Mobile&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;2.48%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;97.52%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;MiuiBrowser&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;16.43%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;83.57%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Opera Mobile
   &lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;12.45%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;87.55%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Yandex Browser&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;6.16%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;93.83%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Edge Mobile&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;10.10%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;89.89%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;UC Browser&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;9.74%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;90.25%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;When we look at distribution of visibility states across WebView’s and in-app browsers, we can see that WebViews have the highest percentage of hidden visibility states. This could be due to mobile users switching apps or locking their screens before a page is finished loading.&lt;/p&gt;

&lt;p&gt;Another interesting observation is that the social media apps Instagram and Snapchat have a higher % of hidden visibility states compared to Facebook and Pinterest. Could this be due to differences in demographics of these platforms?&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;/td&gt;
   &lt;td colspan=&quot;2&quot;&gt;% of page loads by visibility state&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Browser&lt;/td&gt;
   &lt;td&gt;hidden&lt;/td&gt;
   &lt;td&gt;visible&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Mobile Safari UI/WKWebView&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;26.61%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;73.20%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Chrome Mobile WebView&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;12.73%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;87.27%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Facebook&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;2.78%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;97.22%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Chrome Mobile iOS&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7.77%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;92.20%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Instagram&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;7.16%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;92.83%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;LINE&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;1.87%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;98.12%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Firefox iOS&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;6.24%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;93.75%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Snapchat&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;9.46%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;90.48%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Android Webkit&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;4.41%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;95.59%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Pinterest&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;3.69%&lt;/p&gt;&lt;/td&gt;
   &lt;td&gt;&lt;p style=&quot;text-align: right&quot;&gt;96.31%&lt;/p&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;One thing in common across all browsers - desktop and mobile - is that there are a significant amount of hidden visibility states that are worth accounting for in our performance measurements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Beyond page load time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When we look at other performance metrics, we can see that times measured for metrics such as Total Blocking Time, Largest Contentful Paint and First Contentful Paint were also impacted by visibility state. The graph below illustrates this for Chrome Desktop pages. The metric that was most impacted was Total Blocking Time, which was almost twice as slow when hidden. This makes sense since the browser is likely not prioritizing the execution of JavaScript on the page, which also explains the 46% increase in page load times&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image11.jpg&quot; alt=&quot;Chrome Desktop Performance Metrics by Visibility State&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Largest Contentful Paint is one of the Core Web Vitals, which Google is using as a signal for search ranking. The mPulse data shows us that the p75 LCP for a hidden visibility state is 23% slower than when it is visible. At the p95, the LCP is almost twice as slow when hidden.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/page-visibility-if-a-tree-falls-in-the-forest/image7.jpg&quot; alt=&quot;Chrome Desktop LCP Percentiles by Visibility State&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Google’s Chrome User Experience Report (CrUX) &lt;a href=&quot;https://web.dev/lcp/#differences-between-the-metric-and-the-api&quot;&gt;already filters out hidden visibility states&lt;/a&gt;, which means that your search ranking will not be impacted by slow non-visible page loads. However the tools you are using to monitor these thresholds may have a blind spot here. Fortunately it’s easy enough to collect visibility state data using the Page Visibility API. For example, in mPulse, the visibility state is a dimension that you can filter in any dashboard. You can also create custom dashboards to track the distribution of experiences based on visibility states if that interests you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Page Visibility API is an incredibly useful way of determining the visibility state of a page load. It can be used to provide developers with the ability to fine tune experiences based on visibility state, which can conserve CPU and battery usage. It’s also measurable with RUM, and based on the data from mPulse we can see that page load times are slower across all browsers when the visibility state is hidden.&lt;/p&gt;

&lt;p&gt;While this may not matter as much for end user experience, it’s happening at a high enough frequency that it can influence your performance metrics. If you are optimizing for the long tail of web performance, you may want to filter out hidden visibility states.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at https://calendar.perfplanet.com/2021/page-visibility-if-a-tree-falls-in-the-forest/&lt;/em&gt;&lt;/p&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html"></summary></entry><entry><title type="html">What can the HTTP Archive tell us about Largest Contentful Paint?</title><link href="https://paulcalvano.com/2021-06-07-lcp-httparchive/" rel="alternate" type="text/html" title="What can the HTTP Archive tell us about Largest Contentful Paint?" /><published>2021-06-07T16:30:00+00:00</published><updated>2021-06-08T03:57:26+00:00</updated><id>https://paulcalvano.com/lcp-httparchive</id><content type="html" xml:base="https://paulcalvano.com/2021-06-07-lcp-httparchive/">&lt;p&gt;&lt;a href=&quot;https://web.dev/lcp/&quot;&gt;Largest Contentful Paint (LCP)&lt;/a&gt; is an important metric that measures when the largest element in the browser’s viewport becomes visible. This could be an image, a background image, a poster image for a video, or even a block of text. The metric is measured with the &lt;a href=&quot;https://wicg.github.io/largest-contentful-paint/&quot;&gt;Largest Contentful Paint API&lt;/a&gt;, which is &lt;a href=&quot;https://caniuse.com/?search=largestcontentfulpaint&quot;&gt;supported&lt;/a&gt; in Chromium browsers. Optimizing for this metric is critical to end user experience, since it affects their ability to visualize your content.&lt;/p&gt;

&lt;p&gt;Google has promoted this metric as one of the three &lt;a href=&quot;https://web.dev/vitals/&quot;&gt;“Core Web Vitals”&lt;/a&gt; that affect user experience on the web. It is also slated to become a &lt;a href=&quot;https://developers.google.com/search/blog/2021/04/more-details-page-experience&quot;&gt;search ranking signal over the next few weeks&lt;/a&gt;, which has created a lot of awareness about it. The suggested target for a good Largest Contentful Paint is less than 2.5 seconds for at least 75% of page loads.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image9.jpg&quot; alt=&quot;Largest Contentful Paint Overview&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;Source: &lt;a href=&quot;https://web.dev/lcp/&quot;&gt;https://web.dev/lcp/&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Some of the recent posts on &lt;a href=&quot;https://wpostats.com/tags/core%20web%20vitals/&quot;&gt;WPOStats&lt;/a&gt; feature interesting case studies about this metric.  For example,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Google’s &lt;a href=&quot;https://blog.chromium.org/2020/05/the-science-behind-web-vitals.html&quot;&gt;research&lt;/a&gt; found that when Core Web Vitals are met, users are 24% less likely to abandon a page before it finishes loading.&lt;/li&gt;
  &lt;li&gt;Vodafone improved LCP by 31% and saw an 8% increase in sales.&lt;/li&gt;
  &lt;li&gt;NDTV improved their LCP by 55% and saw a 50% reduction in bounce rate.&lt;/li&gt;
  &lt;li&gt;Tokopedia improved their LCP by 55% and saw a 23% increase in session duration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Identifying the Largest Contentful Paint Element&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The name of this metric implies that size is used as a proxy for importance. Because of this, you may be wondering specifically which image or text triggered it as well as the percentage of the viewport it consumed. There are a few ways to examine this:&lt;/p&gt;

&lt;p&gt;One way to visualize the Largest Contentful Paint is to look at a &lt;a href=&quot;https://webpagetest.org/&quot;&gt;WebPageTest&lt;/a&gt; filmstrip. You’ll be able to see when visual changes occurred (yellow outline) as well as when the Largest Contentful Paint event occurred (red outline).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image7.jpg&quot; alt=&quot;WebPageTest Filmstrip showing LCP Element&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In Chrome DevTools, you can also click on the LCP indicator in the “Performance” tab to examine the Largest Contentful Paint element in your browser. Using this method you can see and inspect the exact element (image, text, etc) that triggered it.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image11.gif&quot; alt=&quot;Chrome DevTools Performance Tab&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Lighthouse also has an audit that identifies the Largest Contentful Paint element. If you examine the screenshot below you’ll notice that there is a yellow box around the largest element, as well as an HTML snippet.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image5.jpg&quot; alt=&quot;Lighthouse LCP Element&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Large is the Largest Contentful Paint?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://httparchive.org/&quot;&gt;HTTP Archive&lt;/a&gt; runs Lighthouse audits for approximately 7.2 million websites every month. In the May 2021 dataset, Lighthouse was able to identify an LCP element in 97.35% of the tests. Since we have the ability to query all of these Lighthouse test results, we can analyze the result of the LCP audits and get more insight into what drives this metric across the web.&lt;/p&gt;

&lt;p&gt;Using the same boundaries that Lighthouse uses to draw the rectangle around the LCP element, it’s possible to calculate the area of it. In the above example, the product of the LCP image’s height (191) and width (340) was 64,940 pixels. Since the Lighthouse test was run with an emulated &lt;a href=&quot;https://almanac.httparchive.org/en/2020/methodology#webpagetest&quot;&gt;Moto G4 user agent&lt;/a&gt; with a screen size of 640x360, we can also calculate that this particular LCP image took up 28% of the viewport.&lt;/p&gt;

&lt;p&gt;The graph below shows the cumulative distribution of the LCP element as a percentage of screen size. The median LCP element takes up 31% of the screen size! At the 75th percentile the LCP element is nearly twice as large, taking up 59% of the screen size. Additionally 10.6% of sites actually had an LCP element that exceeded the viewport (which is why the y axis doesn’t reach 100%).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image10.jpg&quot; alt=&quot;Distribution of LCP Element Size as a Percent of Screen Size&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The graph below illustrates the same data in a histogram. From this we can see that 4.03% of sites (285,751) had a LCP element that took up 0 pixels. Upon further inspection, the 0 pixel elements appear to have been used in carousels, so by the time the audit completed the LCP element slid out of the viewport.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image3.jpg&quot; alt=&quot;Histogram of LCP Element Size as a Percent of Screen Size&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node Paths of LCP Elements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another interesting aspect of the Largest Contentful Paint audit is the nodePath of the element, which shows you where in the DOM this element was. In the example we looked at earlier, the nodePath was: ```&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1,HTML,1,BODY,8,DIV,2,SECTION,1,DIV,0,DIV,0,DIV,0,UL,0,LI,0,ARTICLE,1,DIV,0,DIV,0,A,0,IMG
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we look at the last element in the node path, we can get some insight into the type of element that triggered the Largest Contentful Paint. The most common node that triggered the Largest Contentful Paint was &amp;lt;IMG&amp;gt;, which accounted for 42% of all sites.   Next was &amp;lt;DIV&amp;gt; at 27% (which could include text or images). The &amp;lt;H1&amp;gt; through &amp;lt;H5&amp;gt; header elements accounted for 7.18% of all Largest Contentful Paints.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;LCP Node (last element in path)&lt;/td&gt;
      &lt;td&gt;Number of Sites&lt;/td&gt;
      &lt;td&gt;Percent of Sites&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;IMG&lt;/td&gt;
      &lt;td&gt;3,067,354&lt;/td&gt;
      &lt;td&gt;42.12%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;DIV&lt;/td&gt;
      &lt;td&gt;1,981,416&lt;/td&gt;
      &lt;td&gt;27.21%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;P&lt;/td&gt;
      &lt;td&gt;766,977&lt;/td&gt;
      &lt;td&gt;10.53%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;H1&lt;/td&gt;
      &lt;td&gt;291,091&lt;/td&gt;
      &lt;td&gt;4.00%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;192,498&lt;/td&gt;
      &lt;td&gt;2.64%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SECTION&lt;/td&gt;
      &lt;td&gt;182,267&lt;/td&gt;
      &lt;td&gt;2.50%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;H2&lt;/td&gt;
      &lt;td&gt;144,534&lt;/td&gt;
      &lt;td&gt;1.98%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;A&lt;/td&gt;
      &lt;td&gt;107,501&lt;/td&gt;
      &lt;td&gt;1.48%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SPAN&lt;/td&gt;
      &lt;td&gt;85,245&lt;/td&gt;
      &lt;td&gt;1.17%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HEADER&lt;/td&gt;
      &lt;td&gt;67,762&lt;/td&gt;
      &lt;td&gt;0.93%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;LI&lt;/td&gt;
      &lt;td&gt;64,212&lt;/td&gt;
      &lt;td&gt;0.88%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;H3&lt;/td&gt;
      &lt;td&gt;60,679&lt;/td&gt;
      &lt;td&gt;0.83%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;RS-SBG&lt;/td&gt;
      &lt;td&gt;51,623&lt;/td&gt;
      &lt;td&gt;0.71%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;TD&lt;/td&gt;
      &lt;td&gt;48,470&lt;/td&gt;
      &lt;td&gt;0.67%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;H4&lt;/td&gt;
      &lt;td&gt;19,039&lt;/td&gt;
      &lt;td&gt;0.26%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;VIDEO&lt;/td&gt;
      &lt;td&gt;15,649&lt;/td&gt;
      &lt;td&gt;0.21%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ARTICLE&lt;/td&gt;
      &lt;td&gt;12,860&lt;/td&gt;
      &lt;td&gt;0.18%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;FIGURE&lt;/td&gt;
      &lt;td&gt;9,208&lt;/td&gt;
      &lt;td&gt;0.13%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;BODY&lt;/td&gt;
      &lt;td&gt;8,859&lt;/td&gt;
      &lt;td&gt;0.12%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;image&lt;/td&gt;
      &lt;td&gt;8,077&lt;/td&gt;
      &lt;td&gt;0.11%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;CENTER&lt;/td&gt;
      &lt;td&gt;7,960&lt;/td&gt;
      &lt;td&gt;0.11%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The &amp;lt;VIDEO&amp;gt; element only accounted for 0.21% of sites. According to the Web Almanac, &lt;a href=&quot;https://almanac.httparchive.org/en/2020/media#videos&quot;&gt;the &amp;lt;video&amp;gt; element was used on 0.49% of mobile websites&lt;/a&gt; - so from this we can estimate that half of sites loading videos are triggering LCP with video poster images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image Weight for the LCP&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the Lighthouse audits looks for opportunities to preload the Largest Contentful Paint element, and estimates the potential savings in performance. This audit also identifies the URL for the LCP element - which can give us some insights into what type of images are being loaded as a LCP element. In the HTTP Archive data, only 67% of the Lighthouse tests were able to identify a URL for an LCP element. Based on this, we can infer that text nodes are used for the LCP on approximately 33% of sites.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image8.jpg&quot; alt=&quot;Lighthouse Preload LCP Element Recommendation&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The graph below shows the distribution of sizes for the image element that was associated with the Largest Contentful Paint. The median LCP element size was 80KB. At the 90th percentile, the LCP element size was 512KB.   If you have a large LCP image then you should consider optimizing it before you attempt to follow the Lighthouse preload recommendation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image12.jpg&quot; alt=&quot;Distribution of LCP Element Size&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Additionally, 70% of the LCP element images were JPEG and 25% were PNG.  Only 3% of sites served a webp as their LCP element.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;format&lt;/td&gt;
      &lt;td&gt;sites&lt;/td&gt;
      &lt;td&gt;% of Sites&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jpg&lt;/td&gt;
      &lt;td&gt;3,161,991&lt;/td&gt;
      &lt;td&gt;69.37%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;png&lt;/td&gt;
      &lt;td&gt;1,122,585&lt;/td&gt;
      &lt;td&gt;24.63%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;webp&lt;/td&gt;
      &lt;td&gt;141,441&lt;/td&gt;
      &lt;td&gt;3.10%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gif&lt;/td&gt;
      &lt;td&gt;84,829&lt;/td&gt;
      &lt;td&gt;1.86%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;svg&lt;/td&gt;
      &lt;td&gt;34,123&lt;/td&gt;
      &lt;td&gt;0.75%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Other&lt;/td&gt;
      &lt;td&gt;13,272&lt;/td&gt;
      &lt;td&gt;0.29%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;When we look at the LCP element as a percentage of page weight, we can see that the median LCP element is 4.17% of the total page weight. At the higher percentiles, the LCP elements are larger and also a larger percentage of page weight.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image1.jpg&quot; alt=&quot;LCP Element as a Percent of Page Weight&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;percentile&lt;/td&gt;
      &lt;td&gt;ImageRequests&lt;/td&gt;
      &lt;td&gt;ImageKB&lt;/td&gt;
      &lt;td&gt;TotalKB&lt;/td&gt;
      &lt;td&gt;LCP as a % of Page Weight&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p25&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;422&lt;/td&gt;
      &lt;td&gt;1,138&lt;/td&gt;
      &lt;td&gt;3.01%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p50&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;1,142&lt;/td&gt;
      &lt;td&gt;2,185&lt;/td&gt;
      &lt;td&gt;4.17%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p75&lt;/td&gt;
      &lt;td&gt;45&lt;/td&gt;
      &lt;td&gt;2,692&lt;/td&gt;
      &lt;td&gt;4,108&lt;/td&gt;
      &lt;td&gt;5.58%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p95&lt;/td&gt;
      &lt;td&gt;103&lt;/td&gt;
      &lt;td&gt;8,008&lt;/td&gt;
      &lt;td&gt;10,036&lt;/td&gt;
      &lt;td&gt;8.42%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Since images account for 52% of the median page weight (for the sites that have a LCP image element), we can infer that at the median 8% of page weight is used to render content to 31% of the screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does this change based on Site Popularity?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The HTTP Archive now contains rank groupings, obtained from the Chrome User Experience Report.   This can enable us to segment this analysis based on the popularity of sites.  The rank grouping indicator buckets sites into the top 1K, 10K, 100K, 1 million and 10 million.&lt;/p&gt;

&lt;p&gt;When we look at the Largest Contentful Paint image size based on popularity, it’s interesting to note that the most popular sites tend to be serving smaller images for the LCP element. While there may be numerous reasons for this, I suspect that the more popular sites are investing in image optimization solutions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image2.jpg&quot; alt=&quot;LCP Image Size by Site Popularity&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Page weight follows the same pattern, with the least popular websites having some of the largest page weights. If we look at the LCP element based on the percentage of page weight, you can see that within the top 100K sites the ratios are very close. In the less popular sites, the LCP element tends to be a much greater percentage of page weight.&lt;/p&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;rank&lt;/td&gt;
      &lt;td&gt;p25&lt;/td&gt;
      &lt;td&gt;p50&lt;/td&gt;
      &lt;td&gt;p75&lt;/td&gt;
      &lt;td&gt;p95&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Top 1k&lt;/td&gt;
      &lt;td&gt;1.61%&lt;/td&gt;
      &lt;td&gt;2.12%&lt;/td&gt;
      &lt;td&gt;2.85%&lt;/td&gt;
      &lt;td&gt;5.67%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Top 10k&lt;/td&gt;
      &lt;td&gt;1.76%&lt;/td&gt;
      &lt;td&gt;2.27%&lt;/td&gt;
      &lt;td&gt;3.00%&lt;/td&gt;
      &lt;td&gt;4.96%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Top 100k&lt;/td&gt;
      &lt;td&gt;2.07%&lt;/td&gt;
      &lt;td&gt;2.87%&lt;/td&gt;
      &lt;td&gt;3.77%&lt;/td&gt;
      &lt;td&gt;5.78%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Top 1 million&lt;/td&gt;
      &lt;td&gt;2.53%&lt;/td&gt;
      &lt;td&gt;3.49%&lt;/td&gt;
      &lt;td&gt;4.60%&lt;/td&gt;
      &lt;td&gt;6.95%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Top 10 million&lt;/td&gt;
      &lt;td&gt;3.11%&lt;/td&gt;
      &lt;td&gt;4.30%&lt;/td&gt;
      &lt;td&gt;5.75%&lt;/td&gt;
      &lt;td&gt;8.65%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;We can also make some interesting observations about how popular sites are optimizing their LCP assets. Looking at the various image formats, JPG images are the most common LCP element. Some other formats such as PNG, WebP, GIF and SVG are used more frequently in the more popular sites.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/blog/lcp-httparchive/image6.jpg&quot; alt=&quot;Largest Contentful Paint Element Format by Rank&quot; loading=&quot;lazy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Largest Contentful Paint is an important metric that helps illustrate when a page’s most significant content is rendered to the screen. In reviewing the HTTP Archive data, we can see that this area represents between 30% and 60% of a mobile viewport for a majority of sites.&lt;/p&gt;

&lt;p&gt;There are a shocking number of sites that have a LCP element that consumes a large percentage of the viewport and are delivered as large unoptimized images. Site owners should evaluate both what is triggering the Largest Contentful Paint as well as how it is loaded. Optimizing for the Largest Contentful Paint will ensure that the browser has the opportunity to load and render this content as quickly as possible.&lt;/p&gt;

&lt;p&gt;If you are interested in seeing some of the SQL queries and raw data used in this analysis, I’ve created a post with all the details in the &lt;a href=&quot;https://discuss.httparchive.org/t/analyzing-largest-contentful-paint-stats-via-lighthouse-audits/2166&quot;&gt;HTTP Archive discussion forums&lt;/a&gt;. You can also see all the data used for these graphs in this &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1fI_16nby3Yn1LHxWVd4QRyOuPqqMLmBvBU31l5kGF-8/edit?usp=sharing&quot;&gt;Google Sheet&lt;/a&gt;.&lt;/p&gt;</content><author><name>Paul Calvano</name><email>paulcalvano@yahoo.com</email></author><summary type="html">Largest Contentful Paint (LCP) is an important metric that measures when the largest element in the browser’s viewport becomes visible. This could be an image, a background image, a poster image for a video, or even a block of text. The metric is measured with the Largest Contentful Paint API, which is supported in Chromium browsers. Optimizing for this metric is critical to end user experience, since it affects their ability to visualize your content.</summary></entry></feed>