ModPageSpeed in front of Varnish

At work, we are using Varnish, a very good reverse proxy, since many years.
Varnish does a lot of stuff for us. One of the major is the ability to cache responses of our quite slow php backends. Getting a page in 16ms is ever better than 300ms :)

Since my venue to the London Velocity Conf, I am quite obsessed to test Google ModPagespeed.
The product is not focused on the frontend response time but more on the user experience.
By providing many filters, it can optimized web page applying rules like: CSS minifying, optimizing images, defer JavaScript and many others. Nothing we can’t do ourselves but it can save times of dev teams.

I have done my first test of ModPageSpeed few weeks ago. Results seems very good but one things hurts me. ModPageSpeed proceed in two stage optimization: the first call of the page, ModPageSpeed get the page from PHP and returns the response without any optimization to the client. And in parallel it launches a thread to optimize the page for the next client.
As it don’t optimize the first call, it forces a cache-control “no-cache” on the response.
The problem is that no page will go in the cache of Varnish with a no-cache directive.
One last thing, ModPageSpeed sends different optimizations depending of the User-Agent.

So, one side we have optimized the HTML of the response, increasing user experience, but on the other side we have put more load on our backend.

Thinking about it, I asked myself if we could put ModPageSpeed in front of Varnish. If yes, we could have an optimized HTML and less load on the backend.

The response is yes, we can put ModPageSpeed in front of Varnish !

We just need to install Apache and configure it with “ProxyPass”.

This is all the modules we need:

LoadModule authz_host_module /usr/lib64/httpd/modules/mod_authz_host.so
LoadModule deflate_module /usr/lib64/httpd/modules/mod_deflate.so
LoadModule log_config_module /usr/lib64/httpd/modules/mod_log_config.so
LoadModule setenvif_module /usr/lib64/httpd/modules/mod_setenvif.so
LoadModule proxy_module /usr/lib64/httpd/modules/mod_proxy.so
LoadModule proxy_http_module /usr/lib64/httpd/modules/mod_proxy_http.so
LoadModule status_module /usr/lib64/httpd/modules/mod_status.so
LoadModule vhost_alias_module /usr/lib64/httpd/modules/mod_vhost_alias.so

All others modules can be safely disabled.

Since ModPagespeed uses threads, think to start your Apache in worker mode:

HTTPD=/usr/sbin/httpd.worker

in

/etc/sysconfig/httpd

If you have more than one vhost, you need to use at least the 1.1.23.2-2191 of ModPagespeed to support multi-vhosts.

A default setup could be:

DocumentRoot "/var/www/htdocs"

<Directory />
  Order allow,deny
  Allow from all
</Directory>

Include conf.d/pagespeed.conf

NameVirtualHost *:80

<VirtualHost *:80>
  #Default VHost : ModPagespeed disabled
  ModPagespeed Off
  CustomLog /var/log/httpd/default_access.log combined
  Errorlog /var/log/httpd/default_error.log
</VirtualHost>

<VirtualHost *:80>
  #mywebsite.com VHost : ModPagespeed enabled
  ServerName www.mywebsite.com
  ServerAlias s1.mystaticwebsite.com
  ServerAlias s2.mystaticwebsite.com
  ModPagespeed On
  ModPagespeedEnableFilters lazyload_images,collapse_whitespace,combine_javascript,defer_javascript
  ModPagespeedMapOriginDomain http://localhost:8000 http://www.mywebsite.com
  ModPagespeedMapOriginDomain http://localhost:8000 http://mystaticwebsite.com
  ModPagespeedShardDomain mystaticwebsite.com s1.mystaticwebsite.com,s2.mystaticwebsite.com
  CustomLog /var/log/httpd/www.mywebsite.com_access.log combined
  Errorlog /var/log/httpd/www.mywebsite.com_error.log
</VirtualHost>

ProxyRequests Off
ProxyPreserveHost On
ProxyPass / http://127.0.0.1:8000/

This configuration is fully operational but not yet tested in production.

Is there someone who has already tested this architecture in production?

10 thoughts on “ModPageSpeed in front of Varnish

  1. Martin

    But then, why do you still use Varnish? Why not using mod_proxy for Apache instead of installing another piece of software? mod_pagespeed can be configured to access static files directly, instead of going through the web server. Have you tried it? Is Varnish quicker even when not used as frontend web server?

  2. petitchevalroux

    It’s seem weird to me but I’m curious to see response time …

    The main benefit from varnish is it’s hability to cache ..

    From my previous tests, it deliver a static file a lot faster than apache and scale better when concurrency increase..

    So without testing I am guessing that using this arch’ you will loose this benefits and your front will deliver content as a “normal” apacher server with pagespeed delivering static files.

    But I may be wrong so keep me in touch if you try (by email or via @petitchevalroux on tweeter) ;)

  3. Vincent ROBERT Post author

    I will try to answer all your good questions.

    First, we are not ready to use this configuration in production, this is just a concept, a idea, and i wrote this post to have a feedback from you.

    To answer to Martin, we have not tried what you suggest. For the moment, it will be very difficult for us to not use Varnish. Varnish is not just a cache, it is a real swiss knive of http, we do a lot of things that apache can’t do easily.

    To answer to Romain, https is not supported by Varnish, so if we add apache in front of Varnish, we can have a solution for that.

    To answer to petitchevalroux, this idea is pricesely to keep a cache, when using Modpagespeed on a backend, Varnish can’t cache HTML because of no-cache headers. With this solution, response of our slow backend is already in cache in Varnish. I’m not the only to think of this kind of infrastruture, take a look at “http://www.webperformancetoday.com/2012/11/26/were-teaming-up-with-amazon-web-services-to-bring-advanced-feo-to-cloudfront-customers/” , what are they doing: they put a stage of optimization between the client and the server.

  4. ops42

    sorry but that just doesn’t make any sense. all the benefits you get from using varnish are gone by squeezing that apache in front. you’re just one peak of PIs away of having your apache server trash its disk swapping like there is no tomorrow. if you can handle your load with apache in the front line then maybe varnish is overkill for you.

  5. Vincent ROBERT Post author

    Ok, I understand.

    What do you propose in order to keep a cache with Varnish and to use ModPagespeed ?

    I want to use both of them, so how do that ?

  6. ops42

    i’m not sure i would use that module in the first place. i’m a big fan of beating the knowledge of “webdev best practice” into my developers heads. a quick scan of some filters just made me realize that there seems to be some risk involved in almost all of these filters in the sense that they can’t and the don’t know so at the end of the day they will screw you over one way or another. seems easier and less trouble to make devs read up on best practice and do it right in the first place thus having no need for this module.

    but since you asked; quick thought without asking my magic 8 ball if it is feasible: if there is some kind of marker that the no-cache header is because the response is the first and mod_pagespeed is about to do its magic i could remove the no-cache, set a ttl of 1 minute (thus serving unoptimized for a while), log the request and have a special daemon pick up on this log, request the very same resource again, this time receiving the optimized version, and finally replace the varnish cache with it and its longer cache time.

    but then again, what do i know? seems more fun to whip some developers into shape, making the world a better place while doing so.

  7. Lukas

    I’ve sent a suggestion to pagespeed team, and posted on forums: Varnish, mod_pagespeed, additional header for reverse proxy caching fully optimized html.

    I dont think apache->varnish is such a bad idea, since apache is only a proxy, without heavy mods to eat up memory. Varnish is probably faster than direct file fetch, because files are cached in memory.
    Still, it would be better to cache fuly optimized page.

    btw. i think you should also add memcached as a cache backend to your apache+MPS setup.

  8. genewitch

    We use layer 7 balancers to run varnish to the side of apache, rather than in front. The balancers know if content can be cached (roughly) for example, ssl traffic isn’t cached. if there’s a cookie, that’s not cached either. So we send only stuff we can safely assume varnish might have to varnish, which in turn has the balancer as the default backend. The balancer sends all non-cachable traffic on ports 443 and 80 to the app servers (apache, most of the time).

    The problem comes in if you can’t afford a layer 7 balancer(only $80,000!)… We’re still trying to engineer a solution similar to a traffic manager that uses only FLOSS and maybe some scripts. Layer 7 balancers have ASICs and FPGAs specifically programmed to take apart a packet and quickly route it to the proper backend VIP, be it varnish, or otherwise. They also can decrypt (terminate) SSL traffic for proper routing as well. Doing this in software is very difficult, and so far only a couple of open source softwares can even come close to hardware with just basic proxying – nevermind scanning the packet and routing based on payload.

    Good luck!

  9. Waheed Barghouthi

    To answer to Vincent I think if you want to use modpagespeed and varnish you must use the bellow:-

    Varnish->Apache(modpagespeed)->varnish->Webserver application

    This way you will have the power of both and that all can be done in one box :)

    All the best,

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>