Mike Ferrier

I beat code into submission.

How to Use Tabelog for English Speakers

If you’re in Japan and looking for something tasty to eat, your first stop will be Tabelog. This giant database of restaurants contains a mind-boggling amount of information on what seems like every restaurant in existance. All the standard Yelp-like data is there, like restaurant names, addresses, reviews and ratings. There are also photos of restaurant interiors and exteriors, labeled and categorized photos of individual menu items, as well as the menus themselves, the average price of lunch and dinner, and map views where you can filter by restaurant genre, price, and average rating. It’s a food-lover’s dream.

The Japanese love to document and rate their meals with wanton meticulousness which translates to Tabelog being extremely detailed and thorough. Any food-serving establishment is fair game whether they be restaurants with 3 Michelin stars or simple takoyaki stalls.

Unfortunately, the site is only in Japanese. But, using a combination of trial and error and a liberal use of Google Translate, I slowly figured out how to use it.

Here are tips on using Tabelog for English speakers:

Tip #1: Google Translate Chrome Extension

When using web sites in other languages, the Google Translate Chrome Extension can be a lifesaver. It adds a little button on your Chrome toolbar that will translate the page you’re looking at to English:

Great for translating the text of links, which will help you figure out how to navigate through Tabelog.

Tip #2: Find a map of your area

From the Tabelog homepage, you’re first presented with a map of Japan, with clickable regions:

If you know where in Japan you are, you can drill down into that region by clicking the appropriate spot on the map.

If you’re not familiar with the map of Japan and aren’t sure how to get to where you are, an easier thing to do is click the link in the rop right corner:

Which will take you to a Google Maps interface at http://tabelog.com/map/:

This is similar to the map view on Yelp, and is the most useful interface for browsing for restaurants in an area. It has a Google Map which will have placenames in English, and it will have pins in the map representing rated restaurants.

So for instance, if I zoom in on Tokyo:

You can see we’ve got a list of establishments on the left, and pins corresponding to them on the map on the right.

Tip #3: Reading restaurant info

Each restaurant on the map shows a small infobox when clicked. Let’s take a look at what each restaurant infobox shows:

  1. Restaurant name
  2. Address
  3. Review excerpt
  4. Overall rating
  5. Dinner rating
  6. Lunch rating
  7. Average price of dinner
  8. Average price of lunch
  9. Flags

The flags are useful for getting an idea of the type of place it is. From left to right, if the icon is colored it means this restaurant is good for:

  1. Friends
  2. Dates
  3. Business meals
  4. Parties
  5. Families
  6. Single people

If you click the name of the restaurant, it will open the restaurant’s full entry. It looks like this:

All the same stuff from the infobox is there, plus a lot of other stuff like pictures of food, time-limited deals, and user reviews. Google Translate isn’t quite good enough to make the deals or reviews understandable, so I usually don’t pay much attention to them.

There’s a navbar across the top with five items:

From left to right, they are:

  1. Top - the homepage of the restaurant’s listing, where you start
  2. Menu - usually this is user photos of the menu, which isn’t much help unless the menu has photos. Sometimes there will be a transcribed menu, which is marginally more helpful as you can Google Translate the entries.
  3. Photos - there are four categories of photos, from top to bottom (or left to right on the Photos page):
    1. Photos of the food
    2. Photos of the restaurant interior
    3. Photos of the restaurant exterior (very useful when trying to find the place in person)
    4. Other photos
  4. Reviews - as I alluded to before, reviews are hard to read even when translated. I usually don’t read them.
  5. Map of Location - shows a Google Map with the location, as well as the full written address which can be entered into Google Maps

Tip #4: Searching and Filtering

On every page there are two search fields at the top:

These are, from left to right, Area and Keyword. Unfortunately you have to enter your search in Japanese, so these aren’t very helpful.

On the map view, there’s a button in the top right:

Clicking it brings up the search dialog:

This is where the Google Translate extension comes in handy. Translate the page to see what each search field is for:

As you can see, you can filter by things like budget, flags (translates as “Use”), and hours. There’s also a dropdown to search by “Genre”, which has very general categories like Restaurant, Ramen, Bar, Cafe, etc.

The thing I’m usually looking to filter by is cuisine type, which is conspicuously missing. You’re actually supposed to use the Keyword field for that, but again, it requires you to enter Japanese text.

Now, you could use Google Translate to translate “sushi” from English to Japanese and then paste that into the keyword field, but there’s an easier way.

The link next to the Genre dropdown that translates as “List” will take you to the massive category list:

This page has a very long list of cuisines, and the great thing is that the page translates nicely to English:

Click the cuisine you want and you’ll be taken to the restaurant list for that cuisine. For example, here’s the page for tonkatsu, which is delicious deep fried pork cutlets:

This page lists all the restaurants for that cuisine, and allows you to sort the list by rating, but it’s not geographically restricted, so it’s of limited use to us.

If you scroll down a bit, on the left side you will eventually see this:

Click that map link, and you’ll be taken back to the big map, but it’ll be filtered by the cuisine you chose. So then you can zoom back in on wherever you are, and see the highest rated tonkatsu restarants there:

Now you should be able to browse through nearby restaurants of the cuisine you want to eat. Woo!

Finding the place

Once you’ve picked a place to eat, you can enter the address into Google Maps to navigate to the place. The address is at the bottom of every page of the restaurant listing:

You can copy the address into Google Maps, or you can click the embedded map, which will take you to a bigger map, and then click the “Google” link in the bottom left corner of that bigger map, to be taken to the regular Google Maps site with the restaurant location pinned. From there you can save the location to your starred places, or email yourself a link, so that later on you can look it up on your phone.

Even with all this information, finding restaurants in Japan can still be a bit tricky. Often times restaurants have very minimal signage, and they can be just about anywhere in a building:

  • on the ground floor
  • on a random higher floor requiring taking an elevator
  • in a food court
  • in the basement
  • down a winding hallway, through some unlabeled doors, and up a random staircase that makes you think you’re going to pop out in someone’s living room or end up in a broom closet

It’s not always easy, but that’s part of the adventure!

Things to pay attention to to ensure you find what you’re looking for:

  1. The exterior photos in the Photos section. Often users have uploaded photos of the front door of the restaurant, and sometimes they’ll also upload photos of the route to get to the restaurant when it’s buried deep inside some other building.
  2. Check out Google streetview before you go and try and match up what you’re seeing with the exterior photos.
  3. Translate the address in Google Translate and look for things like the name of the building, and the floor it’s on.

Finding a building by name

Building names are usually the first part of a street address, and almost every building with multiple tenants will be labeled at street level. Understanding the name of the building is a huge help in finding something.

So for example this restaurant has the Japanese address:

東京都中央区銀座8-5-8 かわばたビル 3F

Translated:

Tokyo, Chuo-ku, Ginza 8-5-8 Kawabata building 3F

So once you’re in the area, you just have to find the Kawabata building and get to the third floor.

It’s helpful to find the building name in Japanese so that we’ll recognize it on the street. You’ll notice the translation reads “Kawabata building”; in the Japanese address this is “かわばたビル”. I know this because “ビル” is Japanese for building (literally “biru”). So the part before ビル is the building name. The sign for the building will always include ビル, so when you go there in person, you’ll know you’re at the right building when you see a sign that says かわばたビル.

I looked this up on Google Streetview and lo and behold:

Zoom in on the top of that sign:

かわばたビル!

Now all you have to do is find the elevator or stairs. Often easier said than done!

Wrapping up

So I hope this guide helps you enjoy all the culinary wonders Japan has to offer. It’s one of those rare places in the world where the culinary bar is set high and, as long as you employ some common sense and light internet research, every meal you have will be fantastic.

Finding restaurants in a foreign country where you don’t speak or read the language definitely has its challenges, and this guide is most certainly not foolproof, but using these methods I can pretty reliably get to anywhere I’m trying to go.

Leave me a comment with any questions, comments, or tips of your own.

Using Nmap and Socat to Get Around Public Internet Port Restrictions

In a previous post, I detailed how I set up a VPN server so that I could internet securely while traveling and using public internet access points. Public internet is convenient, but is usually insecure by default, so tunnelling all your traffic through a VPN is a smart bet.

However, some public networks can be a bit restrictive with what kinds of traffic they allow. For example, when we were staying in Fukuoka we stayed at the interestingly-named Hotel Active! which, though it was a great hotel and had free internet, would only allow you to send traffic out on a few ports. I had a sneaking suspicion going in that the free internet might give me problems as I had read other reviews of this hotel chain that suggested it might be troublesome.

Not only would this mean that I couldn’t use my secure VPN, but it would also prevent me from using SSH to connect to Github and other work-related secure connections. Annoying!

Thankfully there are ways around these restrictions. If they allow any outgoing traffic on any port, then you can run a server remotely to receive that data and relay it to your VPN server. To do this, we’re going to run nmap to figure out which outgoing port we can use to connect, then we’re going to run socat on our VPN server on that port, and relay the traffic to our real VPN server.

  1. Use nmap to figure out whitelisted ports
  2. Run socat to relay traffic to your VPN server
  3. Update your VPN client to use the relay port
  4. Troubleshooting

1. Use nmap to figure out whitelisted ports

The first thing you have to do is figure out on which ports the network will allow outgoing data. For this you can use the excellent nmap security scanner. Once you’re on the restrictive network you can use it to scan a bunch of regular ports and see which ones are allowed through and which ones aren’t.

WARNING: While port scanning is an invaluable tool for debugging and troubleshooting networks, it can also raise suspicion of malicious activity. For example, port scanning for a vulnerable piece of software with the intention of exploiting it. Suspicious port scanning can get you in trouble with your ISP or network administrator, so use it sparingly and if you’re not sure if you know what you’re doing, don’t do it at all.

To install nmap on OSX, you can just use homebrew and run:

1
$ brew install nmap

Once you have nmap installed, you’re ready to start scanning.

On an unfiltered network, the results of the scan will show open and closed ports, like this example where we probe ports 75 to 85:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ nmap mikeferrier.com -p 75-85 -Pn --reason

Starting Nmap 6.25 ( http://nmap.org ) at 2013-07-14 14:44 EDT
PORT   STATE  SERVICE    REASON
75/tcp closed priv-dial  conn-refused
76/tcp closed deos       conn-refused
77/tcp closed priv-rje   conn-refused
78/tcp closed unknown    conn-refused
79/tcp closed finger     conn-refused
80/tcp open   http       syn-ack
81/tcp closed hosts2-ns  conn-refused
82/tcp closed xfer       conn-refused
83/tcp closed mit-ml-dev conn-refused
84/tcp closed ctf        conn-refused
85/tcp closed mit-ml-dev conn-refused

The extra argument -Pn tells nmap not to ping the host, just to scan it, and –reason prints out the reason the port state was resolved to the value shown. This will come in handy later.

However, on a filtered network you’ll usually be able to see which ports are filtered. In this example, I’ve manually filtered outgoing ports 75-80 to show what it looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ nmap mikeferrier.com -p 75-85 -Pn --reason

Starting Nmap 6.25 ( http://nmap.org ) at 2013-07-14 14:44 EDT
PORT   STATE    SERVICE    REASON
75/tcp filtered priv-dial  no-response
76/tcp filtered deos       no-response
77/tcp filtered priv-rje   no-response
78/tcp filtered unknown    no-response
79/tcp filtered finger     no-response
80/tcp filtered http       no-response
81/tcp closed   hosts2-ns  conn-refused
82/tcp closed   xfer       conn-refused
83/tcp closed   mit-ml-dev conn-refused
84/tcp closed   ctf        conn-refused
85/tcp closed   mit-ml-dev conn-refused

You can see that nmap got “no response” from ports 75-80, and so marked them as “filtered.” Depending on how the firewall is configured, sometimes instead of “filtered” you’ll see filtered ports marked as “closed” but the reason will be “reset,” which is a different way for firewalls to deny traffic but amounts to the same thing: you can’t send traffic on those ports.

What you’re looking for here is an outgoing port that isn’t filtered, so that you can use it to send out all your tunneled VPN traffic and bypass the firewall. When I scanned from the hotel, I noticed every port from 1-100 was filtered except 53 (DNS), 67 (DHCPS), and 80 (HTTP). It looked something like this:

1
2
3
4
5
6
7
8
9
$ nmap mikeferrier.com -p 1-100 -Pn --reason

Starting Nmap 6.25 ( http://nmap.org ) at 2013-07-14 14:54 EDT
Not shown: 97 filtered ports
Reason: 97 no-response
PORT   STATE   SERVICE REASON
53/tcp closed  domain  conn-refused
67/tcp closed  dhcps   conn-refused
80/tcp open    http    syn-ack

So 97 ports were filtered, one was open (web), and two were closed, but responding as closed. This will clue you in that traffic to these ports are allowed out by the network. For my purposes, I chose port 67 to work with.

2. Run socat to relay traffic to your VPN server

socat is an excellent multipurpose relay tool. It can pretty much relay traffic from anywhere to anywhere, and so it’s the perfect choice for relaying our traffic from the unrestricted port to our VPN server. Installing socat is left as an exercise for the reader, but you need to install it on a remote server, ideally on the same server as your VPN server so that you’re relaying locally and not across the internet.

I’m using port 67 as the relay port, and my VPN server listens on UDP port 1194, so you need to configure socat to listen for UDP packets on port 67, and relay them to localhost UDP port 1194:

1
$ sudo socat UDP-LISTEN:67,fork UDP:localhost:1194

The fork directive tells socat to fork a process for each connection so that you can keep listening on port 67. Without this, socat will terminate along with the end of the first connection.

You won’t see any output from this command, but if you want to troubleshoot or verify it’s working, you can add a -v flag to make socat’s output more verbose. It’ll spit out a ton of garbage when you connect so you can tell the connection is being made.

3. Update your VPN client to use the relay port

Once this relay is set up, you’re ready to connect. You have to reconfigure your VPN client to use the new relay port, though. This is generally done either through a configuration GUI or by editing a config file.

I use the very intuitive Tunnelblick OpenVPN frontend GUI for OSX, so for me the process is:

  1. From the Tunnelblick config screen, select your VPN configuration and click the gear icon, and select Edit OpenVPN Configuration File

  2. Find where the port is specified, and switch it to your relay port:

Once that’s done and saved, connect to your VPN and you should be up and running.

4. Troubleshooting

If for some reason it doesn’t work, you can troubleshoot by running nmap on the relay port to make sure you can connect to it from your local machine:

$ sudo nmap mikeferrier.com -p 67 -sU -Pn –reason

1
2

You can also start <code>socat</code> in verbose mode so that any connection activity is output to the screen:

$ sudo socat -v UDP-LISTEN:67,fork UDP:localhost:1194 “`

Happy relaying!

Living in Japan: the Cellphone Situation

I love Tokyo, but it can be a nightmare to navigate. It has the same problem that any ancient city has, which is that the people laying out the streets made things up as they went along.

Compare Manhattan’s carefully planned and easy to use grid system…

… with Tokyo’s sprawling labyrinth of streets:

The upshot of this is that, when in Tokyo, bring a GPS. Luckily, just about everyone now has a GPS-enabled map in their pocket on their smartphone. So all you need to do is bring an unlocked phone, get a data plan, and you’ll be good to go.

As for which cellphone provider to use, there are plenty of MNOs and MVNOs to choose from, but the problem with the big three MNOs (Docomo, KDDI, and SoftBank) is that they all want you to buy a new handset from them, enter into a multi-year contract, and buy full package Voice + SMS + Data plans. Blech.

In this post I’ll be focusing on the excellent MVNO B-Mobile as they seem to offer the best prices on both short- and long-term data plans.

I’ve found MVNO data plans generally come in two flavors:

  • Low speed (usually 30kB/s or less) - good for things like email and data-based messaging, not-so-good for data heavy stuff like web. Also explicitly blocks “streamed” services like music, video, and Skype.
  • High speed (3G/4G/LTE speeds) - as fast as the network you’re on can go, and usually allows streaming services (check the fine print).

If you’re staying in Japan for a month or less, it’s generally a good idea to just get a visitor’s prepaid SIM card. That way you can sidestep the paperwork that goes with getting a longer term plan. Trust me, it’s just easier.

For prepaid SIM cards, 1 GB of highspeed will run you about 4,000 yen or you can get 14 days of unlimited low speed for the same price — info here. Heads up though, the 1 GB of data expires 14 days from the day it is shipped, so if you screw up on the address or can’t get it working, the clock’s still ticking. An alternative to consider is Econnect Japan. Their 1 GB prepaid plan is slightly better than B-Mobile’s as it’s the same price but your prepaid data lasts 30 days instead of B-Mobile’s 14 days.

Compared to long-term data plans, prepaid data plans are kind of expensive. So if you’re going to be here for a month or more, you’re going to want a recurring subscription data plan. There’s one caveat, though: there are government regulations on these longer-term plans in order to curb cellphone fraud. Basically, providers need to verify your address through a call from you to them on a Japanese cellphone or landline, or through them mailing you a confirmation code to a Japanese residential address. So if you’re staying in a hotel during your stay, you may have to employ some creativity (e.g. using your hotel’s landline) to fulfill the requirements.

For long-term plans, B-Mobile seems to have the best prices and widest variety of deals right now. Since my wife and I both wanted data we opted for the PairGB SIM which is kind of a great deal: you buy the two SIM cards for 3,150 yen and then you sign up for a monthly subscription: 2,970 yen for 2 GB between both SIM cards per month. $15 per month for a GB of data with no contract? That’s a good deal no matter what country you’re in.

You have three options for purchasing:

I went with the Yodobashi option. Make sure you get the appropriate SIM format for whichever phone you have. We each have an iPhone 4 so we got the Paid GB Micro SIM format (“マイクロ” is Japanese for “Micro”).

Eventually, you’ll have the B-Mobile SIM card(s) grasped firmly in your hot little hands. They’ll actually be Docomo SIM cards as that’s who B-Mobile is reselling, and they’ll look something like this:

That silver thing is the SIM card pop-out pin that you got with your iPhone. You did remember to bring it with you, right? Don’t worry, an unbent paper clip will work too.

Now simply follow the instructions inside the package and go to the appropriate activation URL. Do your best to stumble your way through the Japanese forms (the automatic page translation in Chrome helps a ton) and when you finally finish, it tells you at what time the cards will be activated (about 45 minutes from the completion of the form in our case) and also that a confirmation code is being mailed to your address.

Now you can put the SIM cards into your phone. Be sure to enter the correct APN settings for the product you bought, as they’re different for each product. A list can be found here.

A good habit to get into is to reset your “cellular usage” meter each month so that you can keep tabs on how much data you use. You can get there in iOS from the Settings app -> General -> Usage -> Cellular Usage. It looks like this:

The logged-in area of B-Mobile’s website will also tell you how much you’ve used up so far in the month. Also there you will find how many days left you have to enter the confirmation code. You remember the confirmation code, don’t you? The one they mailed you? If you don’t enter it in 30 days, they’ll cancel your subscription.

A word of warning: my SIM cards didn’t work out of the box for some reason, and I had to call the B-Mobile English help line a bunch of times to get it fixed. By some separate but equally mysterious glitch I couldn’t get through to that line through Skype, so I had to use Melissa’s Japanese crap phone. Considering her phone was like $20 and it saved my ass, I’d also recommend getting one if you’re spending more than a month here. A post on that coming up soon.

Hope that helps you out, and feel free to ask questions in the comments.

Living in Japan

Melissa and I decided last year to spend the first 6 months of 2013 living in Japan. We’ve both always wanted to spend some time living abroad, and it just so happens that, at this point in our lives, all the stars seemed to align at once to provide this opportunity. We’re both able to work remotely, our condo lease was up, we’re both eligible for the Working Holiday Visa, so we figured it was time.

We’ve finally made it here and settled in, and after a crazy first two weeks I decided I should start posting about the things we’ve had to figure out that might be useful to others who come here to live as well.

While we’re here, we’re going to be pretty much exclusively using AirBnB for accomodations, which is significant: without AirBnB this excursion would have been much more difficult and expensive. Before AirBnB, renting a place in Japan involved such unpleasantries as

  • hiring an agent to find rental properties
  • negotiating with a prospective landlord in broken Japanese
  • paying 2-3 months rent in refundable deposit
  • paying 1-2 months key money in unrefundable “key money”

Key money is one of those things about Japan that, coming from other places in the world, blow your mind when it’s first explained to you. Basically it’s a gift of 1-2 months’ rent to the landlord. That’s right, not a deposit but a gift. For the privilege of allowing you to rent from them. Fucking bonkers.

Apparently it’s a practice that dates from the end of WW2 when rebuilding efforts were still being undertaken and housing was scarce. More info here.

AirBnB listings for Tokyo have been slowly growing this year, but there are currently around 150 listings which is a pretty healthy selection to choose from. With AirBnB you can book long-term accomodations and pay much less than hotels – the place we’re living in was around $60/day.

One thing to remember if you go this route is that gift giving is an important part of the social niceties that are expected of the Japanese. Be sure to bring or buy gifts for whoever you’re renting from to show them you’re a thoughtful foreigner and not a bum.

Rescuing Multiple Exception Types in Ruby and Binding to Local Variable

Took me a few minutes to figure this out and wasn’t easy to Google, so hopefully this helps someone out.

Rescuing multiple exceptions in one rescue clause is pretty intuitive:

1
2
3
4
5
begin
  rand(2) == 0 ? ([] + '') : (foo)
rescue TypeError, NameError
  puts "oops"
end

I wanted to also bind the exception, whatever it is, to a local variable. To do that for a single exception is like:

1
2
3
4
5
begin
  [] + ''
rescue TypeError => e
  puts "oops: #{e.message}"
end

To combine the two, list the exceptions and then name the local variable with the last type in the list:

1
2
3
4
5
begin
  rand(2) == 0 ? ([] + '') : (foo)
rescue TypeError, NameError => e
  puts "oops: #{e.message}"
end

Using Google Latitude to Map Your Travel

Melissa and I just got back from our mini-rtw trip to Asia, during which we had data roaming on our phones. I kept the Google Latitude app open in the background, which pings your location to Google every so often.

When I got back and took a look at the map, the results were pretty cool. Some highlights:

Here’s us going all over the place in Tokyo, with our home base in Shibuya. You can see the Yamanote Line loop mapped out clearly. You can also see our trip to the Imperial Palace in Chiyoda for the Emperor’s New Year’s address.

During our Hong Kong stop we took the Turbo Jetfoil to Macau – you can see the ferry’s path above.

In Hong Kong we stayed in Aberdeen, which is on the south side of Hong Kong island. Every day we would take a taxi up to the north side which would cost about $60 HKD, or around $8 CAD. As you can see, sometimes the cab took the toll tunnel, other times the winding mountain roads. Also apparently we hung out in Wan Chai and Causeway Bay a lot.

Here’s the entire trip if you’d like to play around with it:


View Asia Trip 2011 in a larger map

Fake Cookie Store for Rails Unit Tests

While writing tests for an AB testing library, I needed to simulate a Rails cookie store, but I didn’t want to do any controller requests. Without doing a controller request, you aren’t given any cookie store to test with, so I whipped up this fake cookie store that seems to work pretty well.

1
2
3
4
5
6
7
8
9
10
11
12
13
class FakeCookieStore < Hash
  # allow for things like cookies.permanent.signed etc
  def method_missing(name, *args, &block); self; end

  def []=(key, value)
    # simulate cookies[key] = {:value => 'foo'}
    if value.is_a?(Hash) && (v = value[:value])
      super(key, v)
    else
      super
    end
  end
end

Granting Access to a Single S3 Bucket Using Amazon IAM

If you’ve ever used Amazon’s AWS console then you probably know that though sometimes it can be clunky, it has a ton of functionality for interacting with the various AWS serivces. So when I needed to give one of my coworkers at 4ormat access to one of our S3 buckets, I immediately investigated the laziest option: figuring out how they could login to the S3 console and use that to manage the bucket.

The S3 console is pretty great. Uploading, downloading, creating folders, managing permissions, even copying and pasting buckets between files is a snap. If I could figure this out, I would save myself all the work of setting up S3Fox or even worse, writing an interface from scratch.

After some trial and error, success! I’ve written a quick guideline on how to do this below.

1. Login to the IAM AWS console

Login here as the owner of the AWS account. Click the IAM tab.

2. Create an account alias

This step is optional, but it gives you a nice login URL for your users. Add an account alias in the AWS Account Alias section of the IAM console. Then, your login URL will be youralias.signin.aws.amazon.com.

If you don’t do this, your login page URL will be a bunch of random numbers.

3. Create a new group or a new user

With IAM you can create a group that has certain permissions, and then assign users to that group. Or, you can just create users piecemeal, but then you can’t reuse permissions.

If you want a group, create it first. Then create a user and assign it to that group.

4. Set a password for the new user

Click the new user you’ve created and then click the Security Credentials tab. On that page, you can click Manage Password to add a password for your user. Without a password, the user won’t be able to login to the AWS console.

Make sure your user knows to use the login page from step #2 in order to login — they can’t use the regular AWS login page.

You’ll notice your user also has a AWS access key created: API clients using this key will have the same permissions as the user would in the AWS console.

5. Add permissions for your user

Permissions are added either on the group the user is in, or if you decided not to create a group, the user account itself.

Click the user or group, then click the Permissions tab. Here you can see which permissions policies are currently attached to the group or user. Click the Attach Policy button. You’ll get a pop-up where you can Manage User Permissions. Here you can select a prerolled policy, use the Policy Generator, or just paste in a custom policy.

There are two permissions that need to be added in order for your user to be able to login, see the bucket list in the S3 console, and manage the one bucket you’ve assigned.

To manage the bucket, you need to grant the s3:* action for the bucket you designate. AWS policies designate resources by their Amazon Resource Name, or ARN and for S3 buckets, they look like: arn:aws:s3:::bucket-name-here. So to grant your user full access to your bucket, you’d paste the policy:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "Statement": [
    {
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::4ormat-knowledge-base",
        "arn:aws:s3:::4ormat-knowledge-base/*"
      ]
    }
  ]
}

Now, you would think that this would be enough to enable the user to use the S3 console to manage the bucket, but you’d be wrong. Turns out the user needs one more permission to do the initial listing of the buckets in order to be able to select a bucket, and its called s3:ListAllMyBuckets. You need to add that permissions too, and it looks like this:

1
2
3
4
5
6
7
8
9
{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",
      "Resource": "arn:aws:s3:::*"
    }
  ]
}

6. Done!

You’re done. Give the user their credentials and the login page, and then bask in the glory of laziness.

Update Jul 31st, 2013 — Jay Klehr in the comments posted the full merged JSON object for those having trouble merging them together:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::4ormat-knowledge-base",
                "arn:aws:s3:::4ormat-knowledge-base/*"
            ]
        }
    ]
}

Compiling EEE for Rubyscript2Exe on Mac OSX

If you’re having problems compiling EEE for Rubyscript2Exe or AllInOneRuby on OSX, this post will explain how to modify the source to compile properly.

Rubyscript2Exe is a framework to “transform your Ruby application into a standalone, compressed Windows, Linux or Mac OS X (Darwin) executable.” To do so, it depends on a little Pascal program called Environment Embedding Executable, or EEE. It doesn’t look like this stuff has been updated in a while, because I had problems compiling it on my OSX system. It seems that the EEE source code is not compatible with the latest FreePascal compiler.

After downloading and installing the latest fpc, compiling barfs out some errors:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ fpc -Xs -B eee.pas
Free Pascal Compiler version 2.4.4 [2011/05/01] for i386
Copyright (c) 1993-2010 by Florian Klaempfl
Target OS: Darwin for i386
Compiling eee.pas
eee.pas(151,49) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(172,29) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(204,43) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(395,52) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(395,70) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(395,90) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(421,52) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(421,70) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(421,90) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(444,52) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(444,70) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(444,90) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(489,52) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(489,70) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(489,90) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(512,52) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(512,70) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(512,90) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(535,52) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(535,70) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(535,90) Error: Parameters cannot contain local type definitions. Use a separate type definition in a type block.
eee.pas(1164) Fatal: There were 21 errors compiling module, stopping
Fatal: Compilation aborted
Error: /usr/local/bin/ppc386 returned an error exitcode (normal if you did not specify a source file to be compiled)

$ 

Turns out fpc has a problem with local type definitions in function parameters, and it considers static-length string identifiers, such as string[255], to be type definitions. Apparently this was not so when eee.pas was first written, because the function definitions are full of string[1] and string[255].

To fix, we simply have to define both of those types on their own, and then replace all references to these types with the name of the defined types. Here’s how I did it.

First I added the types, one for string[1] and one for string[255]:

1
2
3
4
5
type
  string255 = string[255];

type
  string1 = string[1];

Then I replaced all occurences of string[1] with string1, and string[255] with string255, e.g.:

1
2
3
4
5
6
7
type

  header =  record
    klasse  : string1;
    tekst    : string255;
    datalength  : longint;
  end;

Once these changes are made, compiling is no problem:

1
2
3
4
5
6
7
8
9
10
$ fpc -Xs -B eee_fixed.pas
Free Pascal Compiler version 2.4.4 [2011/05/01] for i386
Copyright (c) 1993-2010 by Florian Klaempfl
Target OS: Darwin for i386
Compiling eee_fixed.pas
Assembling (pipe) eee_fixed.s
Linking eee_fixed
1169 lines compiled, 0.4 sec

$ 

You can download eee.pas with these changes made here.

My pascal is pretty rusty, so if anyone has any pointers on improving this code, please let me know in the comments.

My Beautiful Dark Twisted Reverse-proxy LRU Cache

Reverse-proxy caching is generally one of the low-hanging fruit of scaling a site. If your reverse proxy is nginx, then you’ve probably seen the modules HttpMemcachedModule and HttpRedis, both of which are pretty good at fetching from memcached or redis based on a simple key.

The config would look a little something like this:

1
2
3
4
5
6
7
8
9
10
11
12
server {
  location / {
    set $memcached_key $uri;
    memcached_pass     redis_server:11211;
    default_type       text/html;
    error_page         404 @fallback;
  }

  location @fallback {
    proxy_pass backend;
  }
}

But what if you want to do something a bit more complex? Say you wanted to do namespaced caching, where there are two cache operations per request: one to fetch the version number of the requested resource, and a second to fetch the actual cached data from a key which interpolates the value of the first operation. Since HttpRedis only allows you to fetch values from redis based on a key computed in your nginx script, this isn’t possible.

Enter the nginx modules Redis2 and Lua. The former allows you to make any call you like to redis, as opposed to HttpRedis which only allows the plain old GET command. The latter allows you do embed Lua scripts in your nginx config, effectively giving nginx a bigger brain and allowing you to do some pretty fancy stuff.

In this post we’ll set nginx and redis up to serve as a reverse-proxy LRU cache. We recently started using this setup on 4ormat and it’s sped up our site considerably and offloaded the hits to the application server by around 91%!

  1. Install lua and redis.parser
  2. Build nginx with module support
  3. Redis structure
  4. Configure nginx
  5. Cache script
  6. Configure Redis
  7. Configure your app
  8. Conclusion and Benchmark

1. Install lua and redis.parser

In order to build the nginx lua module, you’ll need lua installed on your system. We also need the redis.parser library in order to easily parse raw redis responses.

On most systems, lua is already installed or is easily installed with your local package manager.

OSX:

1
brew install lua

Ubuntu:

1
sudo apt-get install lua

Gentoo:

1
sudo emerge lua

Once that’s done, you just need the redis.parser library:

1
2
3
4
5
6
$ curl https://github.com/agentzh/lua-redis-parser/tarball/v0.04 -s -L -o lua-redis-parser.tar.gz

$ tar zxvf lua-redis-parser.tar.gz
...untar output...

$ cd agentzh-lua-redis-parser-ceffe35

On Linux, you can just type make to build, but on OSX I found you have to do it by hand:

1
2
3
4
5
# linux:
$ make
# osx:
$ gcc -I/usr/local/Cellar/lua/include/ -O2 -fPIC -Wall -Werror -o parser.lo -c redis-parser.c
$ gcc -o parser.so -bundle -undefined dynamic_lookup -fomit-frame-pointer parser.lo

Then, install the library:

1
$ sudo make INSTALL_PATH=/usr/local/lib/lua/5.1/ install

2. Build nginx with module support

Nginx does not support dynamic module loading, so in order to build new functionality into nginx you need to recompile it with the appropriate modules.

Here’s how I do it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
$ curl http://nginx.org/download/nginx-1.0.0.tar.gz -s -O
$ curl http://mdounin.ru/hg/ngx_http_upstream_keepalive/archive/tip.tar.gz -s -o ngx_http_upstream_keepalive.tar.gz
$ curl https://github.com/chaoslawful/lua-nginx-module/tarball/v0.1.6rc5 -s -L -o lua-nginx-module.tar.gz
$ curl https://github.com/agentzh/set-misc-nginx-module/tarball/v0.21rc3 -s -L -o set-misc-nginx-module.tar.gz
$ curl https://github.com/simpl/ngx_devel_kit/tarball/v0.2.17rc2 -s -L -o ngx_devel_kit.tar.gz
$ curl https://github.com/agentzh/redis2-nginx-module/zipball/v0.06 -s -L -o redis2-nginx-module.tar.gz
$ for f in *.gz; do tar xzvf $f; rm -f $f; done
...untar output...

$ ls -1
agentzh-redis2-nginx-module-62f5b6a
agentzh-set-misc-nginx-module-4b0512a
chaoslawful-lua-nginx-module-0e0b0fc
nginx-1.0.0
ngx_http_upstream_keepalive-c6396fef9295
simpl-ngx_devel_kit-bc97eea

$ cd nginx-1.0.0
$ ./configure \
>     --prefix=/opt \
>     --conf-path=/etc/nginx/nginx.conf \
>     --add-module=../agentzh-redis2-nginx-module-62f5b6a \
>     --add-module=../agentzh-set-misc-nginx-module-4b0512a \
>     --add-module=../chaoslawful-lua-nginx-module-0e0b0fc \
>     --add-module=../ngx_http_upstream_keepalive-c6396fef9295 \
>     --add-module=../simpl-ngx_devel_kit-bc97eea \
>     --add-module=/usr/local/rvm/gems/ruby-1.8.7-p330@default/gems/passenger-3.0.6/ext/nginx

...configure output...

Configuration summary
  + using system PCRE library
  + OpenSSL library is not used
  + md5: using system crypto library
  + sha1: using system crypto library
  + using system zlib library

  nginx path prefix: "/opt"
  nginx binary file: "/opt/sbin/nginx"
  nginx configuration prefix: "/etc/nginx"
  nginx configuration file: "/etc/nginx/nginx.conf"
  nginx pid file: "/opt/logs/nginx.pid"
  nginx error log file: "/opt/logs/error.log"
  nginx http access log file: "/opt/logs/access.log"
  nginx http client request body temporary files: "client_body_temp"
  nginx http proxy temporary files: "proxy_temp"
  nginx http fastcgi temporary files: "fastcgi_temp"
  nginx http uwsgi temporary files: "uwsgi_temp"
  nginx http scgi temporary files: "scgi_temp"

$ make
...make output...

$ make install
...install output...

A lot of these modules were written by the very awesome agentzh, so a big thanks to him. He’s working on something very exciting, a full fledged web application server in nginx and lua! Will definitely be keeping my eye on that project.

You’ll notice I add the Phusion Passenger module in with the path to my gem. You can omit that if you’re not using Passenger.

3. Redis structure

This is a good time to figure out where things are going to live in redis. We have two bits of information we need to store:

  1. Resource versions, which will be numbers that we increment when something changes in the app in order to invalidate the cache
  2. Cached content, which will be written by the app after a cacheable request is complete, so that future requests can be served from cache

For this post, I’m storing the resource version numbers in “resource_versions”, which will be a hash keyed by the request URI. The cached content itself will live in the root of the redis database, keyed by the request URI plus “:version=n”, where n is the resource version. Like this:

1
2
3
4
5
6
7
8
9
10
{
  "resource_versions": {
    "http://myproject.dev/": 1,
    "http://myproject.dev/foos": 2,
    "http://myproject.dev/foo/123": 3,
  },
  "http://myproject.dev/:version=1": "<html>...</html>",
  "http://myproject.dev/foos:version=2": "<html>...</html>",
  "http://myproject.dev/foo/123:version=3": "<html>...</html>",
}

Why not also store the cached content in a hash? The redis LRU eviction policy evicts keys from the root of the hash when the memory limit is reached; if there were only two keys, then either all our cached content would be evicted, or all our resource versions. Storing the cached content in the root ensures that it will be a least recently used bit of HTML that gets evicted.

4. Configure nginx

First thing you need to do is an an upstream block in your nginx.conf pointing to your redis server:

1
2
3
4
5
6
7
8
# keepalive connection pool to a single redis running on localhost
upstream redis {
  server localhost:6379;

  # a pool with at most 1024 connections
  # and do not distinguish the servers:
  keepalive 1024 single;
}

Keepalive connections are more efficient than creating a new connection to redis every time; this is enabled by the the ngx_http_upstream_keepalive module compiled into nginx.

Nginx has a powerful internal system called subrequests. It basically allows complex request rerouting internally to nginx that remains transparent to both the client and application. In this nginx config, internal locations are used to pass off requests further down the chain as necessary.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
server {
  listen 80;
  server_name myproject.dev;
  root /Users/mferrier/dev/myproject/public;

  location / {
    # try to serve files directly from root, otherwise pass to @cache
    try_files $uri @cache;
  }

  location @cache {
    internal;
    default_type   text/html;

    set $full_uri $scheme://$host$request_uri;

    content_by_lua_file '/Users/mferrier/dev/myproject/config/nginx.cache.lua';

    error_page     404 = @app;
  }

  location @app {
    internal;
    passenger_enabled on;
  }
}

First we check for the existence of the requested file in the filesystem. If it doesn’t exist, we pass control to @cache. @cache uses the content_by_lua_file directive from the Lua nginx module to specify that the content for this request should be generated by an external lua script. If that script returns a 404, then we pass control down to @app. @app is our application, in this case a Rails app handled by Passenger.

The equals sign = after the error code specifies to ultimately use the response code returned by @app rather than the 404 returned by @cache.

We also need some internal locations to simplify the querying of redis. Our structure will require GET requests for the cached content in the root of the database, and HGET requests for the resource versions stored at resource_versions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# requires the "key" argument
# http://redis.io/commands/get
location /redis_get {
  internal;
  set_unescape_uri $key $arg_key;
  redis2_query get $key;

  redis2_connect_timeout 200ms;
  redis2_send_timeout 200ms;
  redis2_read_timeout 200ms;
  redis2_pass redis;

  error_page 500 501 502 503 504 505 @empty_string;
}

# requires the "hash_key and "key" argument
# http://redis.io/commands/hget
location /redis_hget {
  internal;
  set_unescape_uri $key $arg_key;
  redis2_query hget $arg_hash_key $key;

  redis2_connect_timeout 200ms;
  redis2_send_timeout 200ms;
  redis2_read_timeout 200ms;
  redis2_pass redis;

  error_page 500 501 502 503 504 505 @empty_string;
}

# returns an empty string
location @empty_string {
  internal;
  content_by_lua 'ngx.print("")';
}

Both /redis_get and /redis_hget are meant to be used in an ngx_lua block with ngx.location.capture and redis.parser.parse_reply (more on that in the next step.)

You’ll also notice some sensible timeouts, and we pass control of any errors to the @empty_string internal location, defined last. This location serves as a rescue, and returns an empty string back to @cache, which treats an empty string as a cache miss. So if the redis server goes away or takes too long, it ends up being a cache miss rather than a 500 on the request.

All that is needed now is the lua script which will check for cached content in redis, and either return that content or return a 404 if it wasn’t found.

5. Cache script

In the previous step, we specified that the content for the @cache internal location was to be handled by the script nginx.cache.lua. Every incoming request which isn’t a request for a static file will be handled by this script, and it will either return some cached content, or pass the request along to @app.

This script needs to:

  1. Determine if the request is cacheable based on the request method and request path, and return HTTP_NOT_FOUND if it isn’t
  2. Try to grab the current version of the requested resource from redis, and return HTTP_NOT_FOUND if it doesn’t exist
  3. Construct the cached content key for the requested resource from the result of the previous step
  4. Try to grab the cached content from redis using the cached content key from the previous step, and return HTTP_NOT_FOUND if no cached content is found Otherwise, we have a cache hit and can return the cached content

Remember, a 404 response code in the @cache location causes control to be passed to @app.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
if (ngx.var.request_method ~= "GET" and ngx.var.request_method ~= "HEAD") then
  ngx.log(ngx.NOTICE, "@cache: skipping uncacheable request method: ", ngx.var.request_method)
  ngx.exit(ngx.HTTP_NOT_FOUND)
end

-- lua patterns, not regexes
local cacheable_resource_matchers = {
  "^/$",
  "^/foo",
}

-- whether the request is cacheable
local cacheable = false

for i, matcher in ipairs(cacheable_resource_matchers) do
  if (string.find(ngx.var.uri, matcher)) then
    ngx.log(ngx.NOTICE, "@cache: cacheable request found: ", ngx.var.host, ngx.var.uri)
    cacheable = true
    break
  end
end

if (cacheable == false) then
  ngx.log(ngx.NOTICE, "@cache: skipping uncacheable request: ", ngx.var.host, ngx.var.uri)
  ngx.exit(ngx.HTTP_NOT_FOUND)
end

-- parser object that will receive redis responses and parse them into lua objects
local parser = require("redis.parser")

-- key in redis that stores the resource version numbers
local version_hash_key = "resource_versions"

-- full uri of the request, used to construct the cache key
local full_uri = ngx.var.scheme.."://"..ngx.var.host..ngx.var.request_uri

-- key under which the current version of this resource is stored
local version_key = ngx.var.request_uri
local response_body = ngx.location.capture("/redis_hget",
    {args = {hash_key = version_hash_key, key = version_key}}).body
local res, typ = parser.parse_reply(response_body)

if (typ == parser.BULK_REPLY and not(res == nil) and (#res > 0)) then
  ngx.log(ngx.NOTICE, "@cache: cache HIT on version key ", version_key, ", value: ", res)

  local version_value = tonumber(res)
  local cache_key = string.format("%s:version=%s", full_uri, version_value)
  local response_body = ngx.location.capture("/redis_get",
      {args = {key = cache_key}}).body
  local res, typ = parser.parse_reply(response_body)

  if (typ == parser.BULK_REPLY and not(res == nil) and (#res > 0)) then
    ngx.log(ngx.NOTICE, "@cache: cache HIT on cache key: ", cache_key, ", content length: ", #res)
    ngx.print(res)
    ngx.exit(ngx.OK)
  else
    ngx.log(ngx.NOTICE, "@cache: cache MISS on cache key: ", cache_key)
  end
else
  ngx.log(ngx.NOTICE, "@cache: cache MISS on version key: ", version_key)
end

ngx.exit(ngx.HTTP_NOT_FOUND)

In this script we do opt-in caching: only the locations we specify in cacheable_resource_matchers are eligible to be cached. The opposite of this would be opt-out caching, where everything is cached unless we specify it shouldn’t be. I find opt-in caching to be safer, because it’s generally worse to cache something that shouldn’t be rather than not cache something that should be.

The cacheable resource matchers are Lua Patterns, which are kind of like regular expressions, but a bit less powerful: you can’t use the logical OR operator “|”. If you need full blown regular expressions, there are lua regex libraries you can install.

One of the great things about the lua nginx module is that code is automatically cached between requests. That means you don’t have to worry about the require statement in the lua script; it will only ever happen once.

6. Configure Redis

Adding these two lines to your redis config will ensure that your memory usage never exceeds the limit you set:

1
2
maxmemory 64mb
maxmemory-policy allkeys-lru

7. Configure your app

All that remains to be done at this point is to set up your app so that it will do two things:

  1. Write back the result of a cacheable request to redis
  2. Increment the resource version when a resource is updated

Most of this should be customized to your particular app, but here’s the basic framework I use in Rails.

First we need to set up redis and the around_filter in ApplicationController:

config/initializers/redis.rb

1
2
REDIS_CONFIG = YAML.load_file(Rails.root.join('config', 'redis.yml'))[Rails.env].with_indifferent_access
$REDIS = Redis.new REDIS_CONFIG

app/controllers/application_controller.rb:

1
2
3
class ApplicationController < ActionController::Base
  around_filter Cache
end

app/controllers/cache.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
module Cache
  CACHEABLE_RESOURCE_MATCHERS = [
    "^/$",
    "^/foo",
  ]

  VERSION_HASH_KEY = "resource_versions"
  CACHE_KEY_FORMAT   = "%s:version=%s"

  def request_cacheable?(request)
    return false unless (request.method == "GET" || request.method == "HEAD")
    CACHEABLE_RESOURCE_MATCHERS.any? do |pattern|
      request.path =~ /#{pattern}/
    end
  end

  def response_cacheable?(response)
    response.status == 200
  end

  def version_key_for_request(request)
    "#{request.scheme}://#{request.host}#{request.fullpath}"
  end

  def version_value_for_request(request)
    query_safely do |cache|
      cache.hget(VERSION_HASH_KEY, version_key_for_request(request))
    end
  end

  def cache_key_for_request(request, version = 0)
    full_url = "#{request.scheme}://#{request.host}#{request.fullpath}"
    CACHE_KEY_FORMAT % [full_url, version]
  end

  def cache_response(request, response, version_value)
    query_safely do |cache|
      cache.multi do
        # if version value is nil, ensure it hasn't been set since the beginning
        # of this request so that we don't clobber it
        version_value ||= version_value_for_request(request)
        if version_value.nil?
          version_key = version_key_for_request(request)
          log %{initializing version key "#{version_key}" with value "0"}
          cache.hset(VERSION_HASH_KEY, version_key, 0)
          version_value = 0
        end

        cache_key = cache_key_for_request(request, version_value)
        log %{writing cache: key = "#{cache_key}", content length #{response.body.length})}
        cache.set(cache_key, response.body)
      end
    end
  end


  def filter(controller)
    if request_cacheable?(controller.request)
      version_value = version_value_for_request(controller.request)
      yield
      if response_cacheable?(controller.response)
        cache_response(controller.request, controller.response, version_value)
      end
    else
      yield
    end
  end

  def invalidate_cache_for_url(url)
    query_safely do |cache|
      cache.hincrby(VERSION_HASH_KEY, url, 1)
    end
  end

  def log(msg, level = :info)
    Rails.logger.send(level, %{[CACHE] #{msg.to_s}})
  end

  # yields the redis client, but catches connection errors. use this when you 
  # don't mind if your cache operation fails silently.
  def query_safely
    begin
      yield $REDIS
    rescue Errno::ECONNREFUSED
      log %{ERROR: Connection refused while connecting to redis! Discarding query silently.}, :warn
    rescue Errno::EAGAIN
      log %{ERROR: Connection timeout while querying redis! Discarding query silently.}, :warn
    end
  end
end

It has been said that there are only two hard problems in computer science: naming things and cache invalidation. Implementing effective cache invalidation for your particular application is left as an exercise for the reader, but as an example let’s say you have a Foo model whose current state affects the paths /foos and /foo/:id. You could write an observer like this:

app/models/foo_observer.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class FooObserver < ActiveRecord::Observer
  def after_save(record)
    return unless record.changed?

    afftected_urls(record).each do |u|
      Cache.invalidate_cache_for_url u
    end
  end

  def after_destroy(record)
    afftected_urls(record).each do |u|
      Cache.invalidate_cache_for_url u
    end
  end

  def affected_urls(record)
    [
      "http://myproject.com/foos",
      "http://myproject.com/foo/#{record.to_param}",
    ]
  end
end

config/application.rb

1
2
3
4
5
module MyProject
  class Application < Rails::Application
    config.active_record.observers = :foo_observer
  end
end

8. Conclusion and Benchmark

By serving content from nginx, you can easily see 1000% improvement in requests per second served. From a baseline Rails app served on a High CPU Medium instance on EC2 that serves around 100 requests per second, here’s a benchmark once the caching is added:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
$ ab -n 1000 -c 100 http://testserver/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking ocadportfolio.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        nginx/0.8.54
Server Hostname:        testserver
Server Port:            80

Document Path:          /features
Document Length:        8357 bytes

Concurrency Level:      100
Time taken for tests:   0.970 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      8540096 bytes
HTML transferred:       8394800 bytes
Requests per second:    1031.12 [#/sec] (mean)
Time per request:       96.982 [ms] (mean)
Time per request:       0.970 [ms] (mean, across all concurrent requests)
Transfer rate:          8599.47 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        9   11   2.5     10      41
Processing:    22   83  32.2     84     263
Waiting:       11   73  32.4     74     254
Total:         32   94  32.6     95     274

Percentage of the requests served within a certain time (ms)
  50%     95
  66%    117
  75%    119
  80%    121
  90%    129
  95%    142
  98%    163
  99%    168
 100%    274 (longest request)

Over 1000 requests per second, with a 96ms mean request time, and less than 1ms mean time across all concurrent requests. This setup also has the nice side effect of being able to serve cached content even if the application is completely down. Not too shabby!

It’s also worth noting that recently Salvatore Sanfilippo, author of redis, added experimental lua scripting to a side branch of redis. This seems to have gotten a lot of positive feedback from the community, and it should find its way into the master branch pretty soon. This basically adds the same embedded lua scripting support to redis, which means the same sorts of things implemented in nginx in this post could be achieved on the server side of redis without any special nginx modules.

Code for this post can be found here. If you have any questions, feel free to post in the comments.

If you enjoyed this post, feel free to upvote on hackernews.