Third Parties and Certificate Revocations

On Monday July 29th, DigiCert announced the need to revoke a large number of certificates due to a bug in domain validation. The CA/B Forum’s strict requirements to revoke these certificates within 24 hours resulted in a pretty busy Monday and Tuesday for a lot of folks. For some others, the deadline was moved to August 3rd due to exceptional circumstances. What remained a mystery was how many sites and third parties would be affected, how many would be prepared in time and what the impact of a mass revocation might look like across the web. In this blog post we’ll use the HTTP Archive to explore the impact.

Which hostnames were affected?

In a bugzilla post, DigiCert shared a list of 86k serial numbers for certificates that were impacted and needed to be revoked. The list did not include hostnames, so it wasn’t easy to see which domains would be affected from the files alone. The HTTP Archive contains details for every certificate it encounters, so I was able write some queries to correlate the list of serial numbers with certificate used across 16 million websites. If you are interested in the queries used to do this analysis, you’ll find them at the end of this post.

In my analysis, I found 13,823 of the certificate serial numbers on publicly available web pages during last month’s HTTP Archive crawl. Many of these were first party resources, but a few hundred hostnames belonged to popular third parties. Overall I found that 1,241,943 websites would have been impacted by this revocation in some way, meaning they either made a first or third party request for a resource that used at least one of the affected certificates! Here’s a list of some of the more popular domains that were affected. The list contains an apex domain, the number of sites requesting resources from it, and the number of impacted subdomains (containing certificates needing to be revoked).

Third Party with Certificates Affected by DigiCert Recovations
DomainNumber of SitesNumber of Hostnames
yahoo.com

467,827

49

rubiconproject.com

387,241

20

fontawesome.com

299,959

10

pinterest.com

281,145

57

taboola.com

133,309

43

pinimg.com

91,573

14

ib-ibi.com

60,557

1

snapchat.com

49,390

15

advertising.com

41,815

1

datadoghq-browser-agent.com

35,404

1

sift.com

13,962

1

scdn.co

10,949

1

usonar.jp

7,402

4

sojern.com

7,371

4

olark.com

6,966

1

 

Looking the most popular third party domains impacted by this revocation, you can see that many of them reissued their certificates on July 30th based on the validity dates. The initial deadline to reissue certificates was July 30th 19:30 UTC.

Third Party Hostnames Affected by DigiCert Revocation
HostValidFromValidTo
pixel.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
pr-bh.ybp.yahoo.comJul 30 00:00:00 2024 GMTJan 22 23:59:59 2025 GMT
ups.analytics.yahoo.comJul 30 00:00:00 2024 GMTJan 22 23:59:59 2025 GMT
kit.fontawesome.comJul 30 00:00:00 2024 GMTJan 27 23:59:59 2025 GMT
token.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
eus.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
pixel-us-east.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
secure-assets.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
ct.pinterest.comAug 2 00:00:00 2024 GMTAug 7 23:59:59 2025 GMT
cms.analytics.yahoo.comJul 30 00:00:00 2024 GMTJan 22 23:59:59 2025 GMT
fastlane.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
ka-p.fontawesome.comJul 30 00:00:00 2024 GMTJan 27 23:59:59 2025 GMT
log.pinterest.comJul 30 00:00:00 2024 GMTAug 7 23:59:59 2024 GMT
s.pinimg.comJul 30 00:00:00 2024 GMTAug 7 23:59:59 2024 GMT
assets.pinterest.comJul 30 00:00:00 2024 GMTAug 7 23:59:59 2024 GMT
sync.taboola.comJul 30 00:00:00 2024 GMTDec 31 23:59:59 2024 GMT
prebid-server.rubiconproject.comJul 30 00:00:00 2024 GMTApr 3 23:59:59 2025 GMT
global.ib-ibi.comJul 30 00:00:00 2024 GMTApr 2 23:59:59 2025 GMT
cdn.taboola.comJul 30 00:00:00 2024 GMTDec 31 23:59:59 2024 GMT

 

After the final deadline of August 3rd, 2024 19:30 UTC had passed, I ran an test against a list of the 13,823 hostames I discovered. I found that 78.51% of them had reissued their certificates prior to the initial deadline or switched to a using certificates not subject to revocation. Another 5.3% of the hostnames reissued their certificates during the extension. However 9.3% of hostnames - 1,291 - had failed to get their certificates reissued and were revoked on Aug 3rd. Since then, 156 hostnames reissued - but there are still 1,135 (8.21%) hostnames delivering a revoked certificate!

Validity Start Dates for Affected Certificates
Certificate Validity DateCertificatesPercent of Certificates

Jul 29 2024

121

0.88%

Jul 30 2024

9,680

70.03%

Jul 31 2024

977

7.07%

Aug 1 2024

426

3.08%

Aug 2 2024

267

1.93%

Aug 3 2024

75

0.54%

Aug 4 2024

90

0.65%

Using Different Cert

1,052

7.61%

Not Updated

1,135

8.21%

 

When the certificates were revoked, there were only 2 major third parties that were affected. One was a security service used by approximately 14k sites. Another was a live chat system that was used by a ~200 sites. Fortunately the failure of those third parties did not impact functionality of the sites, and they have since reissued their certificates.

Monitoring for third party failures

Often we don’t have the liberty of advance notice of impending failures - such as the recent Crowdstrike outages, CDN failures, and other major internet platform incidents. In this case, we had at least 24 hours notice that a massive certificate revocation event would occur (and then a few additional days after the extension).

Using the HTTP Archive data I could see that none of the third parties used by my employer were impacted. However to be absolutely certain I configured a Catchpoint dashboard to monitor for third party availability issues. This dashboard displays the % availability for each third party host, the number of failures for each third party, and some load time metrics. The idea was that if a particular third party we use experienced an issue, we’d be able to identify it quickly.

The dashboard was created by using a line chart broken down by host. Enabling “host data” allowed me to chart some host metrics such as availability and number of failures, as well as exclude first party content. You can see some more details on how to do this in this blog post from Catchpoint.

Catchpoint dashboard configuration screenshotCatchpoint dashboard configuration screenshot

You may ask why not use real user monitoring (RUM) data for this? RUM can give you timing information on third party requests and additional metrics if the third party sets the Timing-Allow-Origin header. It’s great for detecting performance issues related to their party content. However, detecting failures in loading third party resources is not as easy since a failure simply won’t show up in resource timing data.

Preparing for Third Party Failures

When a popular third party fails or degrades, sometimes you’ll read about it in the news, especially if it breaks functionality on a large number of websites. Far too often organizations handle third party performance/failure risks reactively. There’s a few things you can do to prepare though.

  • Identify third party single poin of failures (SPOFs).
  • Identify third party performance risks.
    • Test to see what happens when you block or remove their resources. (WebPageTest or Chrome)
  • Identify third parties that might impact functionality on your site.
    • Test to see what happens when you block or remove their resources.
  • Monitor third party performance and availability.
    • A combination of RUM for performance and Synthetic measurements for availability can be helpful here.

I’ve also been working on a tool that will help identify potential third parties that are worth investigating for performance or single point of failure risks. Hoping to share that with you all very soon!

Conclusion

While 86k certificates may not sound like a huge amount compared to the scale of the web, the way those certificates were used across some very popular third parties could have impacted over a million websites. There’s been a lot of negativity about DigiCert regarding this, but I have a lot of empathy for what they’ve been dealing with this past week. It was no doubt frustrating for folks to frantically update certificates. This could have been incredibly disruptive to a large part of the web due to third party failures, but it wasn’t.

HTTP Archive queries

This section provides some details on how this analysis was performed, including SQL queries and commands for testing the certificates. Please be warned that some of the SQL queries process a signicant amount of bytes - which can be very expensive to run (particularly the first one).

Extract certificate serial numbers from HTTP Archive The list of affected certificates that DigiCert provided included the serial numbers, but not a hostname. In order to identify the hostnames, this query was written to collect serial numbers from all DigiCert certificates found in the HTTP Archive requests table. The approach involved base64 decoding the certificate, converting it to bytes and extracting the substring where the serial number exists. This is a bit of a hack, but it worked!

  Warning: this query processes 13 TB of data, which is much higher than the 1 TB free tier. The results for it have been saved in another BigQuery table for analysis: `httparchive.scratchspace.2024_07_01_cert_serials`.



CREATE TEMPORARY FUNCTION extractCertHex(cert_block STRING)
RETURNS STRING
LANGUAGE js AS '''
   // Extract the Base64 content between the certificate tags
   const base64Match = cert_block.match(/-----BEGIN CERTIFICATE-----\\s*([A-Za-z0-9+/=\\s]+)\\s*-----END CERTIFICATE-----/);
   if (!base64Match) {
       return 'Invalid Certificate Block';
   }
   const base64Content = base64Match[1].replace(/\\s+/g, '');

   // Base64 decode
   function base64ToBytes(base64) {
       const chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
       let bytes = [];
       let buffer = 0, bits = 0;

       for (let i = 0; i < base64.length; i++) {
           if (base64[i] === '=') break;
           const val = chars.indexOf(base64[i]);
           buffer = (buffer << 6) | val;
           bits += 6;
           if (bits >= 8) {
               bits -= 8;
               bytes.push((buffer >> bits) & 0xFF);
           }
       }

       return bytes;
   }

   // Decode the Base64 content
   const cert_der = base64ToBytes(base64Content);

   // Convert DER to hexadecimal
   let cert_hex = '';
   for (let i = 0; i < cert_der.length; i++) {
       const hex = cert_der[i].toString(16).padStart(2, '0');
       cert_hex += hex;
   }

   return cert_hex;
''';

SELECT
 DISTINCT NET.HOST(url) AS host,
 SUBSTR(extractCertHex(JSON_EXTRACT_SCALAR(payload, "$._certificates[0]")),31,32) AS serial_num
FROM 
   `httparchive.requests.2024_07_01_mobile`
WHERE
 JSON_EXTRACT_SCALAR(payload, "$._securityDetails.issuer") LIKE "%DigiCert%"
 AND JSON_EXTRACT_SCALAR(payload, "$._certificates[0]") IS NOT NULL

  
Identify hostnames subject to revocation In order to identify the hostnames subject to revocation, I uploaded a copy of the revocation serial numbers to a table: `httparchive.scratchspace.digicert_revocation_20240730`. Then I performed a simple `INNER JOIN` on the output from the previous query to identify hostnames that had a serial number in the revocation list.

SELECT DISTINCT 
   host, 
   d.serial AS serial
FROM 
   `httparchive.scratchspace.2024_07_01_cert_serials` ha
INNER JOIN 
   `httparchive.scratchspace.digicert_revocation_20240730` as d
ON ha.serial_num = d.serial
WHERE 
   d.serial IS NOT NULL  
  
Summarize popular third party domains that had hostnames impacted by the revocation This query summarized domain names from the requests table by the number of sites loading a resource from it, and the number of hostnames that appeared in DigiCert's list of revoked hostnames. The previous query is used in the `IN()` clause of this query.

SELECT 
   NET.REG_DOMAIN(url) domain, 
   COUNT(DISTINCT page) sites, 
   COUNT(DISTINCT NET.HOST(url)) hostnames
FROM 
   `httparchive.all.requests` AS r
WHERE 
  date = "2024-07-01"
  AND client = "mobile"
  AND is_root_page = true
  AND NET.HOST(url) IN (
    SELECT DISTINCT host
    FROM `httparchive.scratchspace.2024_07_01_cert_serials` ha
    INNER JOIN `httparchive.scratchspace.digicert_revocation_20240730` as d 
    ON ha.serial_num = d.serial
    WHERE d.serial IS NOT NULL    
  )
GROUP BY 1
ORDER BY 2 DESC

  
Summarize popular third party hostnames that had hostnames impacted by the revocation This query is similar to the previous one, but summarizes by hostname instead of domain name.

SELECT 
   NET.HOST(url) hostname, 
   COUNT(DISTINCT page) sites
FROM 
   `httparchive.all.requests` AS r
WHERE 
  date = "2024-07-01"
  AND client = "mobile"
  AND is_root_page = true
  AND NET.HOST(url) IN (
    SELECT DISTINCT host
    FROM `httparchive.scratchspace.2024_07_01_cert_serials` ha
    INNER JOIN `httparchive.scratchspace.digicert_revocation_20240730` as d 
    ON ha.serial_num = d.serial
    WHERE d.serial IS NOT NULL    
  )
GROUP BY 1
ORDER BY 2 DESC
  
Bash Script to test certificates for affected hosts This script will loop through a list of hostnames and extract the validity dates and serial numbers for each certificate, timing out after 3 seconds if the host is unresponsive.

for i in $(cat all_hosts.txt); do 
  timeout 3 echo | openssl s_client -connect "$i:443" 2>/dev/null | 
  openssl x509 -noout -startdate -enddate -serial 2>/dev/null | 
  awk -F= -v host="$i" '
    /^notBefore/ { start = $2 } 
    /^notAfter/  { end = $2 } 
    /^serial/    { serial = $2 } 
    END { 
      print host "," start "," end "," serial 
    }'
done > all_hosts_checked_20240803_1930UTC.csv
  

© 2024 Paul Calvano. All rights reserved.

Powered by Hydejack v9.0.2