Memcached vs MySQL

I recently had lunch with Dan Weinreb who I met at the Xconomy cloud computing event back in June.  We talked about many topics, mostly scalable database architectures, but also about caching.  He mentioned that he was doing some stuff with memcached lately, which I found very interesting.  Now, memcached certainly has some nice features, but I mentioned to him that I found its performance to be surprisingly lackluster.  But people still rave about it and use it in really big installations (i.e. Facebook).  Yes, we do use memcached in production at StyleFeeder, but it’s not in widespread use.  Instead, we rely on sharding our data across 100 MySQL databases.  This works really well for a number of reasons, not least of which is the fact that we cannot fit all of our data in memory cost effectively.  We also have stringent performance requirements for our site, which means that we need to have very simple data access paths.  Most pages on our site can be loaded with one single database query.

Dan mentioned that someone he knows did some basic benchmarks that clocked in around 700 requests per second.  I wanted to see what our numbers were like.

(Before I share these numbers, I want to emphasize that I’m not ready to hang my hat on these numbers yet, but I figured I’d share them for comments.)

100,000 get requests executed serially:

Memcached: Requests per second: 684
MySQL: Requests per second: 884

Surprising, eh?  This is for the same data coming out of one our shards and the same data coming out of memcached.

I have more unanswered questions: instead of doing this serially, what happens when I have 20 concurrent threads pulling data out?  Does the memcached client library make a big difference?

I also wonder in what cases it makes sense to use memcached.  If you’re like us and have more data than you can reasonably hold in memory, you probably can’t use memcached unless you’re able to hit your main data store without a big penalty.  If you have an amount of data that can fit in memory, you should use something like Whirlycache (only relevant if you’re using a jvm), which did 2,500 requests per millisecond for the same test.

If you simply need to share data across a wide range of nodes, does memcached even make sense at that point?  Perhaps in the case of a more dynamic architecture, memcached and libketama are pretty key.  Rigging that machinery manually with a MySQL backend is possible, but not the kind of thing you’d want to focus on unless you’re doing systems work.

I’m curious to hear what people think, because there’s certainly a lot of conventional wisdom behind memcached that I can’t understand right now.  Francois seems to be in the same camp as me.

6 Comments

  1. […] put some thoughts and scrappy data about memcached over on the StyleFeeder tech […]

  2. Yoav Shapira says:

    Interesting post with interesting numbers. Thank you for sharing.

    I think the type of situation where memcached is best is exactly what it was designed for: when the database design is not clean, and complex queries (or sets of queries) are require as part of page load. In that case, with data that can be somewhat stale, you can off-load the servers a bit by caching the final results of those queries in memcached.

    In defense of memcached, it was designed before massive sharding across many commodity servers became available to the masses ;)

  3. Thanks for the link back. I can shed some light (I think) on a couple of questions.

    Last time I looked at the memached code, it was clearly single threaded, relying on libevent to deliver kick-ass polling performance. This is all well and good, but there is still only a single thread to handle the requests (quick sidebar, never ever cross the beams and cause memcached to swap.) So I would suspect that you would get a performance boost on your way up to running 20 concurrent threads pulling data out, but that it will cap at some point and stay there.

    MySQL on the other hand is multi-threaded (duh!) and so will handle running 20 concurrent threads pulling data out in a more graceful fashion.

    At its most basic if the data in already in memory on the machine, pulling it out using memcached or MySQL should not make one whit of difference. Where things will diverge (broadly) is with disc access (a well tuned RDBMS should be CPU bound anyway) and with concurrency.

    I also ran some psycho tests on memcached such as putting lots and lots of small 8-16 byte objects in there and found that I could only use about 10% of the memory I allocated to memcached before stuff started being aged out, something to do with slab allocation.

  4. Just a quick comment on Yoav’s comment. Because of the variability in design he talks about, getting reliable and definitive stats on whether memcached is good or bad will be impossible (or at least the subject of a good debate), it just depends on your architecture.

  5. […] Development — François Schiettecatte @ 1:41 pm Some interesting observations about Memcached Vs. MySQL over on the StyleFeeder Tech […]

  6. […] de plusieurs centaines de requêtes par seconde. C’est le constat qui est fait dans ce post ou l’on apprend que MySql peut traiter près de 900 requêtes par seconde, contre 700 pour […]