La compression es un método simple y efectivo para ahorrar ancho de banda y acelerar tu sitio web. Aunque antes me negaba a recomendar la compresión gzip para acelerar el javascript debido a problemas en navegadores antiguos.
Pero ya es 2009, y prácticamente ningún usuario hace uso de esos navegadores ya obsoletos, seguramente algún usuario siga utilizando IE 4.0 en su viejo Windows 95 pero no voy a ralentizar a todos los usuarios por una minoría que además irá disminuyendo con el tiempo. Google y Yahoo utilizan gzip compression y hoy en día para disfrutar del contenido y de la velocidad web moderna necesitaremos un ordenador moderno (valga la redundancia). A continuación os mostramos cómo activarlo.
Espera, espera, espera: ¿Por qué hacemos todo esto?
Antes de comenzar debería explicar que es content encoding. Cuando solicitas un fichero como http://www.yahoo.com/index.html, tu navegador habla al servidor web. La conversación es más o menos así:
1. Browser: Hey, GET me /index.html
2. Server: Ok, let me see if index.html is lying around!
3. Server: Found it! Here’s your response code (200 OK) and I‚Äôm sending the file.
4. Browser: 100KB? Ouch! waiting, waiting! ok, it’s loaded.
Of course, the actual headers and protocols are much more formal (monitor them with Live HTTP Headers if you’re so inclined).
But it worked, and you got your file.
Entonces cual es el problema?
Well, the system works, but it’s not that efficient. 100KB is a lot of text, and frankly, HTML is redundant. Every <html>, <table> and <div> tag has a closing tag that‚Äôs almost the same. Words are repeated throughout the document. Any way you slice it, HTML (and its beefy cousin, XML) is not lean.
And what’s the plan when a file’s too big? Zip it!
If we could send a .zip file to the browser (index.html.zip) instead of plain old index.html, we’d save on bandwidth and download time. The browser could download the zipped file, extract it, and then show it to user, who’s in a good mood because the page loaded quickly. The browser-server conversation might look like this:
1. Browser: Hey, can I GET index.html? I’ll take a compressed version if you’ve got it.
2. Server: Let me find the file! yep, it’s here. And you’ll take a compressed version? Awesome.
3. Server: Ok, I’ve found index.html (200 OK), am zipping it and sending it over.
4. Browser: Great! It’s only 10KB. I’ll unzip it and show the user.
The formula is simple: Smaller file = faster download = happy user.
Don’t believe me? The HTML portion of the yahoo home page goes from 101kb to 15kb after compression:
The (not so) hairy details
The tricky part of this exchange is the browser and server knowing it’s ok to send a zipped file over. The agreement has two parts
- The browser sends a header telling the server it accepts compressed content (gzip and deflate are two compression schemes): Accept-Encoding: gzip, deflate
- The server sends a response if the content is actually compressed: Content-Encoding: gzip
If the server doesn‚Äôt send the content-encoding response header, it means the file is not compressed (the default on many servers). The Accept-encoding header is just a request by the browser, not a demand. If the server doesn’t want to send back compressed content, the browser has to make do with the heavy regular version.
Setting up the server
The good news is that we can’t control the browser. It either sends the Accept-encoding: gzip, deflate header or it doesn’t.
Our job is to configure the server so it returns zipped content if the browser can handle it, saving bandwidth for everyone (and giving us a happy user).
In Apache, enabling output compression is fairly straightforward. Add the following to your .htaccess file:
# compress all text & html: AddOutputFilterByType DEFLATE text/html text/plain text/xml
# Or, compress certain file types by extension:
<Files *.html>
SetOutputFilter DEFLATE
</Files>
Apache actually has two compression options:
- mod_deflate is easier to set up and is standard.
- mod_gzip seems more powerful: you can pre-compress content.
Deflate is quick and works, so I use it; use mod_gzip if that floats your boat. In either case, Apache checks if the browser sent the Accept-encoding header and returns the compressed or regular version of the file. However, some older browsers may have trouble (more below) and there are special directives you can add to correct this.
If you can’t change your .htaccess file, you can use PHP to return compressed content. Give your HTML file a .php extension and add this code to the top:
In PHP: <?php if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) ob_start("ob_gzhandler"); else ob_start(); ?>
We check the Accept-encoding header and return a gzipped version of the file (otherwise the regular version). This is almost like building your own webserver (what fun!). But really, try to use Apache to compress your output if you can help it. You don’t want to monkey with your files.
Verify Your Compression
Once you’ve configured your server, check to make sure you’re actually serving up compressed content.
- Online: Use the online gzip test to check whether your page is compressed.
- In your browser: Use Web Developer Toolbar > Information > View Document Size (like I did for Yahoo, above) to see whether the page is compressed.
- View the headers: Use Live HTTP Headers to examine the response. Look for a line that says Content-encoding: gzip.
Be prepared to marvel at the results. The instacalc homepage shrunk from 36k to 10k, a 75% reduction in size.
Try Some Examples
I’ve set up some pages and a downloadable example:
- index.html – No explicit compression (on this server, I am using compression by default ).
- index.htm – Explicitly compressed with Apache .htaccess using *.htm as a rule
- index.php – Explicitly compressed using the PHP header
Feel free to download the files, put them on your server and tweak the settings.
Caveats
As exciting as it may appear, HTTP Compression isn’t all fun and games. Here’s what to watch out for:
- Older browsers: Yes, some browsers still may have trouble with compressed content (they say they can accept it, but really they can’t). If your site absolutely must work with Netscape 1.0 on Windows 95, you may not want to use HTTP Compression. Apache mod_deflate has some rules to avoid compression for older browsers.
- Already-compressed content: Most images, music and videos are already compressed. Don’t waste time compressing them again. In fact, you probably only need to compress the big 3″ (HTML, CSS and Javascript).
- CPU-load: Compressing content on-the-fly uses CPU time and saves bandwidth. Usually this is a great tradeoff given the speed of compression. There are ways to pre-compress static content and send over the compressed versions. This requires more configuration; even if it’s not possible, compressing output may still be a net win. Using CPU cycles for a faster user experience is well worth it, given the short attention spans on the web.
Enabling compression is one of the fastest ways to improve your site’s performance. Go forth, set it up, and let your users enjoy the benefits.