Surviving the Slashdot Effect: Caching Web Traffic with Rails and Cloudflare

So you've got a content-oriented website, maybe your own blog or something, and maybe you've (like me) decided to ignore the advice about using static site generators. You build your Rails site, and it is wonderful and beautiful and dynamic, and every page it replies with delights readers with your artisanally-crafted HTML. Maybe you've got some internal caching (Rails has you covered here), maybe it's all roundtrips to the database. But who cares! Your site is up and receiving traffic!

Then, suddenly, a storm hits. Congratulations! You've made it to the top of Reddit/Slashdot/Hacker News! You now have thousands, if not millions, of people beating down your door to read your content. But now your site is down! The link's comment thread is filling up with complaints about being hugged to death, and a few helpful souls are posting the archive.org equivalent of your link and siphoning away your traffic.

How do we fix this?

You could throw more compute resources at it -- think "scaling vertically/horizontally" -- which I'm sure your server/application host would ab$olutely love.

Or,
you could install some sort of proxy cache in front of it. The traditional answer here is to use nginx or Varnish as a caching proxy, but if you use a content delivery network (such as Cloudflare) it may be better to use that CDN's caching features instead. (Some might recommend using both your own cache and your CDN's cache, but I wouldn't advise this because troubleshooting cache issues is already difficult enough, and having multiple layers only makes debugging even more confusing. If you do this, you should understand your web application thoroughly.)

Since this site is fronted by Cloudflare, I want to make use of its page cache: it's free and comes with the service!

However, setting this up is not as simple as it may first appear: in a default configuration, Rails doesn't permit caching (the Cache-Control headers it sends don't allow for it), and as a result, nearly every request you receive bypasses the cache and gets passed directly to the app server. This is a screenshot of my Cloudflare dashboard showing the percentage of page requests cached before I applied the fixes I describe here (those peaks top out at ~10%):

Uh.... that's not very good!


Now, you can set up rules in the Cloudflare dashboard to override Rails' requested caching behavior, but this does not solve the underlying root cause: Rails is requesting no caching, because the Cache-Control request header it sets explicitly forbids it:

Cache-Control: NO CACHE!


Setting the Correct Cache-Control Headers with Rails

NOTE: The directions given here apply to Ruby on Rails version 7, though expires_in and fresh_when have existed since at least version 2, and concerns have been available since version 4.

Luckily, Rails makes changing this behavior fairly simple. We don't even need to really dive into how Cache-Control works! (Though here is a good guide if you want to know.) You simply call the expires_in and/or fresh_when functions in your controller, supplying an expiry and ensuring that you set public to true. Like this:

expires_in 1.hour, public: true
# or
fresh_when(@article, public: true)

However, setting this for every route is both tedious and a pretty egregious violation of DRY. Instead, we can set as much as we can once and then propagate it through our application using either class inheritance or composition (via ActiveSupport's concerns) feature. And while inheritance may be slightly easier, composition is a bit more modern and flexible; here we will be taking the latter approach.

To start, we will want to make a new concern and call it "Cacheable". The easiest way to do this is to simply go to the $RAILS_ROOT/app/controllers/concerns folder and create a new file, naming it cacheable.rb. In this file, we want to make one small action (called "set_cache_headers") and call expires_in within it. Here is a very basic and usable example, which also prevents page caching when a user is logged in:

# app/controllers/concerns/cacheable.rb
module Cacheable
  extend ActiveSupport::Concern

  included do
    before_action :set_cache_headers, unless: current_user
  end

  def set_cache_headers
    expires_in 1.hour, public: true
  end

end

Then, for each controller whose content you wish to cache, simply add "include Cacheable" at the top right below the class declaration. Here is an example pulled directly from this site's code, for the controller that powers the "About" feature:

# app/controllers/static_controller.rb
class StaticController < ApplicationController
  include Cacheable
  def about
    @page_title = "About"
  end
end

Once this is done you will see that Cache-Control is indeed being set correctly:

Objective achieved!


But! You are not finished yet! You may notice that while your 'cached' stats are going up, they aren't going up as much as one might think. This is because there is another component to page caching that we have not yet discussed: etags.

Setting the Correct Etag Headers with Rails

This is where things get a bit more tricky: Rails generates another header, called an Etag, that, in theory, is supposed to be unique for each page.  (For the more technically inclined, you can think of an Etag as like a SHA256 hash for your page.) But Rails, by default, makes this tag unique per request. Both your browser cache and your CDN cache read this header to determine whether a given request is a cache hit or cache miss, and so we will need to configure Rails' to set it correctly, based on our rendered page content (or other context).

Enter fresh_when, which provides further direction to Rails on how to render the correct etag header. You provide it with an object that describes what the page renders -- generally the model instance for the given page (the Rails docs use @article in their examples) -- and it generates a hash that is used for the Etag header. 

Using fresh_when with Dynamic Pages

For dynamic pages backed by a model instance, such as the blog post example described above and in the Rails docs, simply call fresh_when and pass it your model instance plus a value for public, inside the controller route. Like so:

# app/controllers/articles_controller.rb
def show_article
  fresh_when @article, public: !current_user
end

When combined with aforementioned Cacheable, this is sufficient to avoid page-caching in the case of logged-in users, as the expires_in directive is never called when current_user exists, and Rails reverts to its default, zero-expiry "private" cache behavior.

If you aren't using Cacheable, as described above, you will need to consult the documentation as you need to provide additional information.

Using fresh_when with Collections or ActiveRecord Relations

If your first parameter to fresh_when is an ActiveRecord relation (such as what is returned with a .where() call), passing the relation object will cause the entire collection it represents to be loaded into memory. For other kinds of collections, you will again be feeding it the entire array. This is not ideal!

Instead, you will want to pass it a single value representing the freshness of that collection, such as the most recent updated_at value. For ActiveRecord relations, that can be retrieved with something like @relation.maximum(:updated_at).

Using fresh_when with Static Pages

Static pages are a little bit simpler, as all we really need to include in the etag is controller_name and action_name, as well as something to identify the logged-in user.

So, we craft another concern, this time called StaticCacheable, and include this in each controller serving static content. Once again, like Cacheable, this is a controller-level solution; if you need something per-action that is an exercise left up to you.

To make this concern, create a new file called static_cacheable.rb and save it to your $RAILS_ROOT/app/controllers/concerns folder, right next to cacheable.rb. Note that we will include a reference to Cacheable from within StaticCacheable, so that you only need to include StaticCacheable on your static controllers. In the action it defined we simply grab the controller name, action name, and current user status and feed that into fresh_when:

# app/controllers/concerns/static_cacheable.rb
module StaticCacheable
  extend ActiveSupport::Concern

  included do
    include Cacheable
    before_action :set_static_cache_headers, unless: -> { :current_user || Rails.env.development? } #dont do this during development
  end

  def set_static_cache_headers
    # Use Rails built-in asset tracking which automatically includes asset versions
    # The template option ensures ETags update when views change
    fresh_when(
      etag: [controller_name, action_name, current_user&.email],
      template: action_name,
      public: !current_user
    )
  end
end

Finally!

Once your Cache-Control and Etag headers are under control and correctly set, and you have correctly configured your proxy service, your site should be well-equipped to handle large volumes of traffic without falling over. Hurray!

A Quick Note About Cloudflare and Implementing This

It's worth noting that Cloudflare seems to strip the Etag header when Cache-Control renders it useless, as is the case when Cache-Control is set to private. This may seem annoying but it punches out your browser cache, presumably to ease troubleshooting.

You can still see the Etag header if you pull requests directly from your webserver (by its IP address), and it will also be visible during development. Unfortunately, it seems like you will have to rely on unit tests or Cloudflare's provided statistics to verify your cache strategy is working.

sui generis.

Lyjia's Blog

See posts by category: