Sep 23, 2016 - Nathaniel Catchpole

Asset aggregation in Drupal 8 core

Drupal core has supported CSS and JavaScript file aggregation since around 2007. Dozens of small files are concatenated into a handful of larger files to save on TCP connections, and CSS files are preprocessed to remove whitespace etc. which saves bandwidth.

Before talking about asset aggregation in Drupal 8, it’s worth considering whether it’s still relevant after ten years; why improve something that’s about to become obsolete? HTTP/2 allows multiple asset files to share a single TCP connection, which reduces the overhead of multiple requests for multiple files compared to regular HTTP.

However, asset aggregation doesn’t only reduce the number of network requests, but saves on overall bandwidth too. Minification of CSS in Drupal significantly reduces file size, and zlib compression is much more efficient with one larger file than a few dozen smaller ones. Additionally, while Drupal core doesn’t support js minification, most ‘Drupal’ JavaScript from contributed and custom modules and themes is provided unminified - project packaging and many site builds aren’t pre-minifying JavaScript. So for now, asset aggregation remains a useful aspect of web performance, albeit something we should keep under review as HTTP/2 develops into generalised use.

While there have obviously been changes to Drupal’s asset aggregation since 2007, the implementation in core has remained remarkably similar:

  • When building an HTML pages, modules and themes can add CSS and Javascript to be served as part of the overall request.

  • When rendering CSS and JS in the HTML <head> and the footer, files are put into groups (for example to allow different groups for different media queries). The groups are hashed to create a unique file name. If the file doesn’t exist, within the main page request the file is generated and saved to disk. Then the browser can read the aggregated files from disk.

This has a number of drawbacks:

  • If no JavaScript or CSS files exist on disk, the HTML page itself can’t be served until they’ve been created.
  • In a cold cache situation on a busy site, this means <em>no</em> HTML page can be served until some assets have been written to disk, and each asset has to be created serially.
  • This in turn leads to stampedes, with multiple pages trying to generate the same files
  • Because everything is done in the page, features like js minification on-the-fly would be prohibitively slow to add.
  • The ability to add arbitrary files under arbitrary conditions makes it hard to predict what the aggregates will be without the full context of the main page request.

In 2011 I opened a Drupal core issue to suggest a different approach, based on the way that image derivates in Drupal core work.

Instead of creating files in the main page request, we’d only generate the URL to the files. Then a page controller at the file path intercepts any requests for missing files, and creates them lazily, writing to disk so that the next request can get the file straight from disk. This approach was implemented in two Drupal 7 modules, and but for various reasons it’s never yet been adopted for core.

Drupal 8 requires all assets to be registered via a library, which means instead of dozens or hundreds of individual files added per page, there are usually a handful of libraries (which may include dozens or hundreds of files). From the list of library definitions for a page, it’s possible to recreate the order and groupings of the individual assets.

During 8.0.x’s development cycle we discussed using the library information to build asset aggregate URLs - encoding the information needed to produce the aggregate in the filename itself. This would massively reduce the necessary work in the main request, even compared to agrcache and advagg, however the issue stalled due to numerous pre-requisites and difficulty of implementation.

One of the advantages of the new Drupal 8 release cycle is that even though an issue might not have got into 8.0.x, it can be released in later minor version as long as it maintains backwards compatibility for the public API, and these come up every six months.

So as part of my funded core time for Third and Grove, I’ve revisited that 2011 issue to implement lazy asset-generation, with the patch currently passing tests. This approach removes file generation from blocking the main page request, it allows individual asset aggregates to be built in parallel, meaning that the full HTML page and all assets should be served considerably faster on cache misses. It also opens up the possibility of adding JavaScript minification to core, since any extra processing when minifying JavaScript files should be outweighed by the ability to do that work in parallel and serve smaller files.

The patch still needs some reviews, but with luck might be included in the 8.3.x minor release.