Archive for January 2010

Hadoop for the lone analyst, part 2: patching and releasing to yourself

We left off, in part one of this series, at the point where we had Hadoop running with the Cloudera distribution, version 0.20.1+152. That’s Apache Hadoop release 0.20.1, plus 152 patches that Cloudera’s copious experience tells them they need for work in the real world. But perhaps we’re using, say, Hadoop streaming, and we read […]

Better Categories in Data Feeds

Category data is useful in creating navigation structures for affiliate sites that consume data feeds.  Sadly, there is a major lack of standards and practices in the filling in of category fields.  Here are a few pointers: 1. Don’t reinvent the wheel There are examples of rich category hierarchies out there, be it Amazon/Zappos/PopShops/GoldenCAN that […]


INTRO Here at StyleFeeder, we spend a lot of time figuring out what our users are doing, and trying to figure out what they want. One of the tools we have brought to bear on these questions is Hadoop. Among the technical tools these days, Hadoop is like the prettiest girl in school, and it’s […]