Loading reCAPTCHA performantly in Drupal

Recently, I've been tasked with mitigating the performance impact loading the reCAPTCHA on page view. Typing out reCAPTCHA is rather difficult so thus I am going to abbreviate reCAPTCHA v3 as "r3" and reCAPTCHA v2 as "r2" for the rest of this post. I'm also going to be making a few assumptions about the reader:

  1. They are educated in the loading and use of both r2 and r3.
  2. They are familiar with the front-end JS bits of Drupal (8+ in particular).
  3. They are familiar with ES6 javascript features.

Background

In mid 2019, we launched a custom implementation of r3 that integrates with the Webform module for Drupal 8+ with the intention of gathering metrics that could help us better understand the scores that our particular user demographic receive from Google. We silently gathered scores for a little over a year before we analyzed the score distribution and started enforcing a threshold on some forms of "lesser-importance" - it's a political debate, don't get me started...

On these choice forms, it was determined that on average only < 3% of users scored less than this threshold, so in theory 97% of users would never be subject to further scrutiny. It was decided that the remaining 3% of users would be subjected to an r2 challenge. I have intentionally withheld exact score metrics and distributions and would like to reinforce that Your Mileage May Vary. Be sure to conduct your own analysis so that you can make data driven decisions!

Functional Requirements

  • It is imperative that the solution works with a globally distributed full page caching CDN.
  • If any unexpected server-side error occurs, then a submission must be allowed to pass unchallenged.
    • Is Google down?
    • Can our hosting providers' networks talk to the public internet?
    • Our logs indicate that these conditions have actually happened ~10 times!
  • R3 scores must be collected with every submission for ingestion by a CRM system.
  • R3 scores must be provided to GA.
    • Any abrupt or suspicious score fluctuations should proactively raise red flags (preferably by an automated monitoring system).
  • If a user passes the r3 check, but gets other validation errors, they must not be re-challenged.
  • If a user fails the r3 check, then they must be challenged with r2.
  • If a user fails the r3 check, and then passes the r2 check, but gets other validation errors, then they must not be re-challenged.

I've intentionally omitted any strange / legacy requirements here that the rest of the world wouldn't care about.

Performance Implications

This system was all working fine. We had destroyed virtually all spam on the forms that this feature was enabled on, but after the announcement that Core web vitals play a role in SEO ranking it was deemed that the implementation needed some optimization. The dependent library is heavy. Weighing in at around 446kB, this is one of the heavier third party things that is loaded on the front-end. As previously implemented, this was render blocking!

Fixing it was simple. I Promise.

All puns aside, the solution is pretty simple if one discounts IE11. Essentially, there are two parts: a library loader behavior and the actual behavior(s) that invoke it.

The Loader: A promise fulfilled.

The loader behavior is a simple Drupal behavior that omits the usual attach / detach member variables and exposes a single load_recaptcha method. This method returns a promise that will attach a script tag to the end of the body element. This script tag will attempt to load in the API library and the promise will asynchronously resolve or reject based on whether the library could be loaded. This is important because we definitely do not want to block the render thread with this costly operation. Just as importantly, the loader also ensures that the library is only loaded once per page-view. Subsequent calls to load_recaptcha will simply turn around and invoke the resolve callback without loading the third party asset for a second time.

(function(Drupal, drupalSettings) {

  Drupal.behaviors.recaptchaLoader = {

    // Initially false, used to prevent loading third party assets more than once.
    r3_promise: false,

    // Returns a promise.  Chain a 'then' on it and operate with the library therein.
    load_recaptcha: function() {
      if (Drupal.behaviors.recaptchaLoader.r3_promise === false) {
          Drupal.behaviors.recaptchaLoader.r3_promise = new Promise(function(resolve, reject) {

          // The site key is provided through a custom config form and attached via drupalSettings.
          const script = Object.assign(document.createElement('script'), {
            type: 'text/javascript',
            src: 'https://www.google.com/recaptcha/api.js?render=' + drupalSettings.recaptcha_sitekey,
            onload: resolve,
            onerror: reject,
          });
          document.body.appendChild(script);
        });
      }

      return Drupal.behaviors.recaptchaLoader.r3_promise;
    }
  };
})(Drupal, drupalSettings);

Optimized Loading for R3

As the reader likely knows, r3 operates invisibly. Apart from the badge and/or privacy policy notice that site owners are required to display if they opt to hide the badge. The end-user should never be aware that they're being evaluated as they're completing a form. This allows for the optimization of deferring the inclusion of the heavy asset until a user starts interacting with a form.

On attach, this library finds all webforms present with the current context and adds a focusin listener. When a webform element first receives focus, the listener fires. The listener does a few things:

  • Invokes the library loader, Drupal.behaviors.recaptchaLoader.load_recaptcha.
  • Upon success, call through the the usual grecaptcha.execute and assigns the token to a hidden form field.
  • Adds an interval callback that refreshes the r3 token every 100 seconds. The reason for this is that r3 tokens are only valid for 120 seconds before the respective scores cannot be retrieved from Google.
(function(Drupal, drupalSettings, $) {
  let r3_promise = false;
  Drupal.behaviors.myWebformLibrary = {
    recaptcha_loader: function() {
      Drupal.behaviors.myWebformLibrary.remove_focus_listeners();
      
      Drupal.behaviors.recaptchaLoader.load_recaptcha().then(function() {
        const handler = (function refresh_token() {
          $(context).find('[data-drupal-selector="edit-recaptcha-token"]').each(function() {
              const $element = $(this);
              grecaptcha.execute(drupalSettings.recaptcha_sitekey, {action: 'recaptcha_protected_form'}).then(function(token) {
                $element.val(token);
              });
            });
            return refresh_token;
        })();

        // Tokens are only good for 120 seconds, so refresh it every so often...
        setInterval(handler, 100000);
      });
    },

    // Remove focus listeners from all webforms.
    remove_focus_listeners: function() {
      const webforms = document.getElementsByClassName('webform-submission-form');
      for( let i = 0; i < webforms.length; ++i) {
        webforms[i].removeEventListener('focusin', Drupal.behaviors.myWebformLibrary.recaptcha_loader, true);
      }
    },

    // Load up the recaptcha library as soon as a user starts interacting with the form.
    attach: function() {
        const webforms = document.getElementsByClassName('webform-submission-form');
        for( let i = 0; i < webforms.length; ++i) {
          webforms[i].addEventListener('focusin', Drupal.behaviors.myWebformLibrary.recaptcha_loader, true);
        }
    }
  };
})(Drupal, drupalSettings, jQuery);

Optimized Loading for R2

R2 is quite a bit simpler, and leagues more aggravating for the end-user. I mean...can you tell the difference between a scone and a crumpet in a blurry image?

Fortunately, the library is quite a bit simpler as well. The biggest distinction is that since it is not invisible, it's expected that the "I am not a robot" checkbox should be visible as soon as the page loads as opposed to when the user first interacts with a form.

This means that in the event that a r3 challenge and an r2 challenge are rendered on the same page-view, then the library will be loaded immediately and the r3 loader will receive an already-resolved promise when the user first interacts with a webform!

(function (Drupal, drupalSettings, $) {
  Drupal.behaviors.recaptchaV2 = {
    attach: function(context) {
      Drupal.behaviors.recaptchaLoader.load_recaptcha().then(function() {
        $(context).find('.g-recaptcha').once('recaptcha-v2').each(function (k, v) {
          grecaptcha.render(v, {
              'sitekey': drupalSettings.recaptcha_v2_sitekey
            });
        });
      });
    }
  };
})(Drupal, drupalSettings, jQuery);

Conclusions

Our custom implementation of reCAPTCHA was definitely not the easiest to implement solution, but it does offer a lot more in return for taking the path less traveled. We are able to:

  • Work flawlessly with a blazing fast full page caching global CDN
  • Toggle the r2 fallback challenge on/off on a per-webform basis
  • Control r2 challenge thresholds on a per-webform basis
  • Confidently monitor and project how many end-users will be challenged by an annoying r2.
  • Associate score metrics with prospects within the CRM.
  • Make data driven decisions when creating new forms and setting up individualized protection thresholds.