139 RR Riak with Sean Cribbs and Bryce Kerley

by Charles Max Wood on January 15, 2014

Get your Ruby Rogues T-Shirt or hoodie!! Ladies’ sizes available as well!

Panel

Discussion

01:32 – Job Replacement Guide by David Brady

03:28 – Sean Cribbs Introduction

04:31 – Bryce Kerley Introduction

04:45 – Riak and Advantages

08:51 – The CAP Theorem

10:27 – What is Riak?

14:07 – Introducing Riak 2.0: Data Types, Strong Consistency, Full-Text Search, and Much More

16:05 – Autocomplete

27:50 – Scaling

30:02 – Guidelines for Designing Code

37:39 – HTTP 2.0 Support

41:40 – MapReduce

46:24 – Full-Text Search

49:50 – Primary Data Store

52:27 – Programming Riak

Picks

Book Club

Ruby Under a Microscope by Pat Shaughnessy! We will be interviewing Pat on February 27, 2014. The episode will air on March 6th, 2014. No Starch was kind enough to provide this coupon code your listeners can use to get a discount for Ruby Under a Microscope. Use the coupon code ROGUE for 40% off! (Coupon expires April 1, 2014.)

Next Week

Heroku with Richard Schneeman

Transcript

DAVID:  If Liz was up and dressed, I’d send her to the mailbox and then I could squee in the middle of the show, “My shirt came! My shirt came!”

[Laughter]

[Hosting and bandwidth provided by the Blue Box Group. Check them out at BlueBox.net.] 

[This podcast is sponsored by New Relic. To track and optimize your application performance, go to RubyRogues.com/NewRelic.]

[This episode is sponsored by Code Climate. Code Climate automated code reviews ensure that your projects stay on track. Fix and find quality and security issues in your Ruby code sooner. Try it free at RubyRogues.com/CodeClimate.]

[This episode is sponsored by SendGrid, the leader in transactional email and email deliverability. SendGrid helps eliminate the cost and complexity of owning and maintaining your own email infrastructure by handling ISP monitoring, DKIM, SPF, feedback loops, whitelabeling, link customization and more. If you’d rather focus on your business than on scaling your email infrastructure, then visit www.SendGrid.com.]

CHUCK:  Hey everybody and welcome to episode 139 of the Ruby Rogues podcast. This week on our panel, we have Avdi Grimm.

AVDI:  Hello.

CHUCK:  James Edward Gray.

JAMES:  I’m Batman.

CHUCK:  David Brady.

DAVID:  Wait, I thought I was Batman.

CHUCK:  I’m Charles Max Wood from DevChat.TV. We also have two special guests this week, Sean Cribbs.

SEAN:  Howdy.

CHUCK:  And Bryce Kerley.

BRYCE:  Good afternoon.

CHUCK:  Before we get started, I’m hearing that Dave has an announcement he’d like to make. So, I’m going to let him go and then we’ll have you guys introduce yourselves.

DAVID:  Thanks, Chuck. So, we always do introductions and then Chuck tells us what course he’s working on. And I wanted to do that today. I’m putting together an emergency job replacement guide. Basically, if you are looking for work and want to know all the dirty tricks that I have learned — not the dirty tricks but the really clever tricks that I’ve learned over the years to get that next job fast, how to get your resume not to the top of the pile but make your resume destroy the pile so that it is the pile, I’ve got exactly the product that you are looking for.

And you can watch me write this eBook and sign up to get an early copy of it at JobReplacementGuide.com. So, come over there and sign up and let me know how interested you are. [Chuckles] And I hope you don’t need it right now. But if you do, come let me know and I’ll get cracking and get that out to you.

JAMES:  I’m holding out for the sequel, Brain Replacement Guide.

DAVID:  Yeah.

[Laughter]

DAVID:  That sounds like the plot to a really bad piece of downloadable content for [inaudible].

JAMES:  Yeah. [Laughs] Yup. That’s awesome though. Cool, sounds cool. I like the book. Chuck, you forgot to let our guests introduce themselves in your excitement to get to Dave.

DAVID:  Oh, yeah.

CHUCK:  Yeah, I said that I would let Dave talk and then I would let them introduce themselves. So Sean, why don’t you introduce yourself?

DAVID:  This was important because Bryce is probably looking for a job.

JAMES:  [Chuckles]

DAVID:  So, [inaudible] for me. I’ll stop, sorry.

CHUCK:  He will be after he’s on this show.

[Laughter]

BRYCE:  I was already forecasting having to talk about a career-limiting move, but…

[Laughter]

CHUCK:  Alright. Sean, why don’t you introduce yourself?

SEAN:  Well, yeah, what should I say? I’ve been doing Ruby since about 2006. So it’s pretty exciting to finally be on this podcast with all you fun guys. And you may have known me from some of my previous regrettable exploits including Radiant CMS and the library Ripple, which I started four years ago. So I’ve done a number of Rails projects too that were fun. But nowadays, I’m mostly doing Erlang and Python and JavaScript. So sort of I’ll have a bit longer view on [chuckles] or Ruby’s a little bit in my past, although I still do review Bryce’s pull requests, so I’ll be fine.

DAVID:  We still love you.

[Laughter]

SEAN:  Well, I still love Ruby.

DAVID:  Come back. Come back.

[Laughter]

BRYCE:  I’m trying to make him not love Ruby with my pull requests, but we’ll see how that goes.

[Laughter]

SEAN:  You got to try harder, Bryce. [Laughs]

CHUCK:  He went to Erlang because Ruby doesn’t scale, right?

JAMES:  [Chuckles] That’s right.

CHUCK:  Alright Bryce, how about you?

BRYCE:  So, I’ve been doing Ruby since about 2005 and I’ve been working at Basho for about two and a half years. For the last several months, I’ve been working on the Riak Ruby client gem.

JAMES:  Wow, very cool.

CHUCK:  So, Riak is just a fancy version of MongoDB, right?

JAMES:  [Laughs] Wow.

BRYCE:  Well, except non-crashy.

[Laughter]

BRYCE:  Well, I hate to break it to you, all suffer crashes. It’s just what does it do when it crashes, right?

JAMES:  Yeah right.

DAVID:  Yeah, that’s true. Mongo eats your data and Riak maybe, I don’t know. Sorry.

[Laughter]

DAVID:  Let’s just pull back from the ‘everybody hates Mongo’ thread. Let’s do a show about ‘we love Riak’. Let’s keep it positive, keep it light.

[Laughter]

SEAN:  We could talk about compare and contrast. But if we wanted to go from first principles, Riak is designed to be distributed from the beginning, versus adding a sharding or redistribution layer on top. So, the flipside of that is it tends to be hard to run Riak on a single server because you get three copies of all your data. [Laughs] So, we usually recommend people go minimum five machines. But basically what that means is the reason you’re picking Riak is you know you have a big dataset or you know you need resiliency or you know that you’re going to need to, in the case of our product, our enterprise version, you need to replicate across multiple data centers for locality’s sake or for disaster recovery. So, those are the reasons you pick Riak.

I don’t have the pit for Mongo, so, I’m sorry I can’t give that in contrast. But generally Mongo seems to be an easier switch if you’re already using a relational database. But the reasons you pick it are totally different than the reasons you pick Riak.

JAMES:  So, let’s talk about that a little. Can you give us just the 10,000-foot view what’s the problem domain where Riak just cleans up?

SEAN:  Yeah. So, Riak does really well in big, high-volume applications. So, you’re doing a lot, especially lots of writes. Relational databases are really good at making reads fast but if you were just pumping a ton of data into your system, they can be problematic under certain circumstances. So, what Riak gives you is the ability to add more machines to get more capacity. But you’re also not increasing latency as you add machines. You’re actually probably keeping it constant or decreasing it.

So, a lot of people will run benchmarks, usually micro-benchmarks and like, “Oh I get X number of queries per second or whatever,” but a pretty typically overlooked thing in those benchmarks is what is the distribution of each of those request. Not just how many can you pump through but how long does each one take and what is the shape and the curve of those latencies? And what we find with a lot of the big applications is it’s not just that they got a ton of data coming through. It’s also that they need every single user to have a good experience. And if they get bad experience for one user that may mean a lost sale or somebody gets really mad and writes a bad review, that sort of thing. So, it’s really for those high-volume applications that also need low latency and high availabilities.

So, the origin of Riak really comes from Amazon’s Dynamo paper. We’ve since diverged from that of course. But their problem was shopping carts. And a lot of people I know by now since the NoSQL started really in 2009, a lot of people read the Dynamo paper already. But the brief overview of it is they needed a system that made their shopping carts highly available to every single costumer with a consistently low latency. And they were willing to sacrifice some consistency in order to get that. And so, that’s where Riak comes from, at least philosophically.

JAMES:  So, since you’re talking about it a lot, I guess what we’re discussing here is the CAP theorem, which says that of the three things consistency, availability and tolerance, is that the other one?

SEAN:  Partition tolerance, yeah.

JAMES:  Partition tolerance. We can only have two, right? Isn’t that the way it works?

SEAN:  [Laughs] Well, I think that only have two is a bit misleading. What it says more is that it’s impossible to have all of them. It is possible to have one of them, not the other two.

[Laughter]

SEAN:  But more generally the thing is, and there’s this great post by Code Hale on this, should be pretty well-known among Rubyists for his work early on in the Rails community as well, but it’s called You Can’t Sacrifice Partition Tolerance. And essentially what it says is that you can have a distributed system, because we’re all building distributed systems nowadays whether we like it or not. But when you have a network failure which happens very frequently, a lot more frequently than people like to admit, you have to choose.

So, it’s not about pick two, it’s when you have a partition, are you preferring to be consistent which means that you might have to reject requests, or are you preferring to be available which means that you might have inconsistency when the partition heals? So, the typical choice there is one or the other, or in this circumstance this, or in that circumstance that.

JAMES:  That’s interesting.

CHUCK:  So, I want to just back up a little bit. I know that some people are familiar with Riak, but many people aren’t. So, I just want to talk about the basic ideas behind Riak in the sense that it’s a key-value store. So, you can store JSON or whatever in there, right? It’s not a document database and it’s not a columnar database. It’s just a key-value store that scales?

SEAN:  Yup.

JAMES:  [Chuckles] That scales. I love it.

[Laughter]

DAVID:  And it’s got a picture of a ring.

SEAN:  Yes.

DAVID:  Which is actually really important.

SEAN:  We love that picture of it too much, actually. [Laughs]

DAVID:  Yeah, actually that picture of the ring is actually really important. I’m actually bringing it up as a soft pitch to you guys to talk about the ring and an answer to Chuck’s question of what is Riak? The ring is actually the Dynamo pattern, is that correct?

SEAN:  Yeah. So, everybody knows, most people probably who listen to this podcast use Git, right? So, you know those SHAs that you get every time you make a commit? That’s like a hash of the commit you make. So, what Riak does is it takes that basically same SHA hash and says for every key we’re going to store, for the value for that key, the hash of that key says this is the logical location where that key is stored. And if you think of that as a 160-bit integer that tops out at 2^160 – 1 and then wraps around at 0, then you have a ring. So, that gives you a way to determine if you split that ring up into fix-sized partitions like we do, where to put that key when you store it. I’ll stop for questions and then we can go on. I don’t want to steal the whole time here.

JAMES:  Gotcha.

DAVID:  In the database?

JAMES:  [Chuckles] In the database.

SEAN:  Yeah, yeah.

[Laughter]

SEAN:  But we’re not talking about just one file or directory on disk. We’re talking about multiple machines in your data center or AWS that are storing your data. So, it’s like which machines do these data go on? So, what Riak will do is it will split this, what we call consistent hash rings or consistent hashing space, up into fix-sized buckets. We call them partitions. And then you hash the key and that points to a range which is owned by one of those partitions. And then you pick the next two, let’s say if you’re using the default replication factor of three, you pick the next two around the ring and that says where the other two replicas go.

Now, you can take those partitions and map them to individual machines in the cluster. And then that tells you, “Okay, when I write this thing, I know that the hash of the key is this, so it corresponds to these three partitions. Look up which machines those partitions are mapped to and then you can send those writes out to those machines, or reads for that matter.”

JAMES:  Gotcha. So, that’s how you get the big fault tolerances because the data is replicated several times across the partitions?

SEAN:  Right. And you can choose reduced or increased availability based on how many of these do I want to wait for? So, you can say, “I want to wait to make sure that it’s written to at least two before I say yeah, this write succeeded,” because distributed systems, they fail all the time. You have to make a tradeoff of how much assurance do I need that this was written or I’m just going to fire and forget. So, those are the sorts of things that you can decide at an application level that also have effects on the consistency of your data.

JAMES:  So, we talked about how Riak is at its heart a key-value store. But it’s grown beyond that too, right? Don’t links provide an almost graph database like feature? I don’t want to say it is. And then I’ve been looking at the 2.0 release and where you’re going with that and you’re getting some interesting new data types there too. Want to talk about that?

BRYCE:  So, I can talk a lot about the Riak 2.0 data types. I’ve actually been working on the support for those in the Ruby client for the last several months because they’re a complicated and interesting feature that I want to represent correctly and completely I guess, in Ruby. So, the high-level view, and I gave a talk about this at a Scottish Ruby conference Fringe a couple of years ago, but the high-level view is that the Riak data types are built on convergent replicated data types or CRDTs. And what these provide are sets counters and some other data structures that you can write to in multiple places, see something consistent from every participant in the party, and then merge them independently at different data centers, say, and still get an accurate representation of what the structure should be based on what you can see.

And there’s a lot of hand-waving in there because this comes down to the light cones between data centers or different users interacting with the system and what actually is correct, and if correctness is actually a thing you give up doing distributed systems. Yeah, so the short version of that is yes, counters, sets and maps containing the above, other maps, Booleans and strings, they’re coming soon in Riak 2.

DAVID:  You said something that really interests me, and let me know if this is derailing the topic or if it’s a great logical place to go next. But you just said if correctness is something you’re willing to give up. And hopefully every formally trained computer scientist just had an involuntary full body shudder when you said that, because there’s an old joke in computer science. If my program doesn’t actually have to be correct, then I can write it in 0 bytes and it will take up 0 time.

JAMES:  I write that program a lot.

[Laughter]

DAVID:  Yeah, exactly. Can you talk a little bit about what a system would look like that does not depend on the data being correct? I’m assuming it’s fault-tolerant of data consistency. But I can’t even get my head around what a system would look like that had three different copies of my customer’s credit card number.

JAMES:  [Laughs] Sure you can. You’re just not thinking about it the right way.

DAVID:  Well, I can think about it, but I just automatically remember all the beatings I got in school.

JAMES:  [Laughs] So, just before I did this call, I was working on a feature in an app that’s autocomplete. And so, somebody starts typing something and then we give them the possible choices matching out of our data in our database and we’re using elastic search for something like that. And then elastic search has this feature where you can have it learn, help it learn by what they choose. So then, you record what they choose in some way. And so, depending on the amount of traffic there is, you basically need to make a checkmark in the database or something of that kind every time they make a choice.

So, they typed these three characters and then they chose this. So, these three characters to this, check. And then when you’re ranking things, you can basically add up all the checkmarks and see which one people typically choose and it makes your autocomplete better. So, if you have a really simple system and don’t have a lot of traffic or whatever, you might just store those as a record in the database and then you can do a count, grouping by the type of item or the query or whatever and get the count. But if you have a ton of traffic, then storing that in a relational database is going to suck. So, this is a great example of data that’s nice to have because it will make the autocomplete better, but does it have to be 100% correct? No, not really.

DAVID:  Okay.

JAMES:  Most people throw it in a data store for a period of time and then they reindex once a day or something like that to make the autocomplete smarter. And if you do have a big enough volume like we do in our case where it’s not a good idea to shove this in a relational database, then you put it somewhere else like Redis maybe is a good choice because it’s pretty quick or whatever. But then if your Redis instance goes down, then you’re going to lose all those checkmarks. But that’s not a super big deal. It just means your autocorrect won’t be as smart as it was yesterday and it will build back up.

DAVID:  Right.

AVDI:  I want to zero in on something you just said and clarify a little bit. So, when you say it’s not a good idea to store that in a relational database, do you mean because of the write volume is going to hurt other writes? Do you mean because of just the space that it’s going to take up? What specifically do you mean by that?

JAMES:  Look Avdi, I was trying to do a lot of hand-waving there. Don’t make me [inaudible].

[Laughter]

AVDI:  So, let me specify the background that that comes from. After years of doing various projects that involved integrating different systems, my stress level goes up anytime somebody talks about adding another system to integrate with. Anytime you talk about adding another system to integrate with, my stress level goes up.

JAMES:  Okay. So, that’s a great point. I just mentioned elastic search and Redis in the same conversation.

[Laughter]

SEAN:  And Avdi, I think that’s a great point because part of the motivation beyond just solving our customer’s pain and our open source user’s pain for adding these data structures that Bryce mentioned is that that means there are fewer systems. If they choose Riak, there are fewer other systems that they have to add in. And I think that back to what they mean and what they’re about, I don’t think that Bryce was suggesting that they would be incorrect. I think what he was trying to say is that all of the operations you send to mutate these data structures are commutative. So, what that means is anybody can apply them in any order. And then when they all come back together at the end and everybody’s happy and consistent, then there’s actually a representation of the data structure that makes sense that reflects all of the operations.

DAVID:  Okay.

AVID:  Right. Well, I was actually responding to James. I was wondering if he could just expand a tiny bit on what he meant when he said that it’s not a good idea to store that data in a relational database.

JAMES:  Yes.

DAVID:  Actually, I wanted to add an extra point to that. Do you mean the relational database or do you mean the operations database?

JAMES:  Yeah, yeah. It’s all good questions. And I was trying to give an example without too much detail. But just to clarify, I meant it was not a good idea to store those autocomplete conversions in our primary relational database because in this particular case, we do a substantial amount of volume and there are two autocomplete fields on every form. So, they’re coming in very fast and basically they’re doubled because there are two of them.

AVDI:  Okay. So, it’s the write load?

JAMES:  Yeah, it’s the write load, that we would be pumping in so many entries and that would be detracting from our ability to do other database operations. And the database is heavily used.

AVDI:  Right, because you can’t say, “Oh prioritize these writes over those writes.”

DAVID:  Right.

JAMES:  Right. And to be fair, you also pinned me on, “Oh but you’re adding a bunch of services to get around it,” which is true. But we already use Redis for several other things so Redis is already there and available. And one of the fun parts of doing Redis in this particular problem, and I think we should get into this more with Riak, is all of these NoSQL databases have their sweet spot and this is one of the cases where I think Redis is neat. Because the default “persistency mechanism” in Redis is none normally and then periodically just grab the whole thing and shove it in a file [inaudible] so that if you had to you could roll back to that point. With something like autocompletion conversion data, that’s almost ideal. We don’t need to be consistent to the very last entry. Who cares? But being able to restore to some point would be great.

DAVID:  I think it’s interesting that I’ve known for a couple of years now that you should not do reporting from your operations database because you have to join so many tables together and it just kills performance. You really need to offload that to another service, ideally to a vertical database or column store or something like that or to a data warehouse where you can actually do proper reporting. And it feels like there’s another thing that you need to not be, it’s a common theme in NoSQL, there’s another thing that you need to not be doing in your operations database and that’s logging. This is like a special case of logging, right?

JAMES:  Yeah.

AVDI:  Almost.

SEAN:  What do you mean by logging?

DAVID:  Just streaming a whole bunch of micro-events that really aren’t of huge importance to the customer perhaps or to the business but are of great importance to development or to people planning out the next thing that they want to give to the customer. And so usually, it’s characterized by a very high write load and it’s stuff you just want to go, I want to shove 20,000 of these events into the database every second and I’d really like the shopping cart to not stop working while I’m doing it.

JAMES:  [Chuckles] I think you’re mostly referring to metrics. But yeah.

DAVID:  Yeah. Metrics is a really good example of that. If you’ve got a JavaScript dingbat that’s gathering up mouse move heat maps, you really don’t want to be locking tables to write next to the shopping cart in the database under the same thing. And I feel like I’m reducing Riak to a toy logger and that’s not my intent.

[Chuckles]

DAVID:  [Sighs] But that’s what I’m doing because it is just a toy logger and, I’m kidding. Anyway, my point is that yeah, if you’ve got a very high write load, and I want to call back to James, thank you for basically answering the key question which was you could actually have some data that is eventually consistent but before it’s eventually consistent, it’s good enough. The autocomplete data, if one shard read, I can’t remember what Riak calls them, but if one of them returns back, “Oh yeah this is 15 times,” and another one says, “No this is 17 times,” that’s okay. That’s good enough.

JAMES:  Avdi brought up a good point I think I’d like to throw to the Riak team here. There is this overhead anytime you say, “Oh we’re going to introduce autocomplete so we’re going to need elastic search and Redis,” and then you got to think, “Okay, so the sys events, they’re going to have to do some installs and we may need to partition some new boxes or whatever.” Riak seems to be on a higher end of that. If by default the recommendation is something like five machines or something like that, do you think that puts it at a slightly higher barrier to entry when people decide to go to it? I’m curious.

SEAN:  Yeah, I can’t count how many times we have had to answer the why is it so slow on one machine question on our mailing list.

[Laughter]

SEAN:  And we have to say sorry, you’re storing three copies of your data every time. And it’s been a hard thing I think to be, when you’re trying to sell a product right it’s hard to be honest about the shortcomings. We’ve tried really hard to be honest about Riak and say you’re not going to need it if you have a single machine running your web application and your database and your web server. You’re not going to need Riak. So, what we tend to get is people who have already crossed into the multiple machine territory and some form of people who are just hobbyists or just looking at Riak for fun. So, big enterprises use Riak, companies with large volume applications, and there are a lot of systems similar to Riak. Voldemort which came out of LinkedIn and Cassandra are in some ways pretty similar to Riak so that the use cases for those often overlap.

CHUCK:  I want to talk a little bit about the scaling, because I’ve used Voldemort and Cassandra. And the scaling is painful for those. I was just wondering, is Riak any easier to get the scaling up on? I hear this about most databases. “We need it on a couple of machines. It’s hard.”

SEAN:  Well, I’ve heard, depending on the version, I’ve heard sometimes in Cassandra you have to double the cluster size when you grow it. And I don’t know why that is. There’s some technical reason I don’t understand. But Riak, you tend to grow it by multiple machines at a time, but you can grow it by one machine at a time if you have to. The challenge there is every time you add a machine, Riak has to pick where each of those partitions in the ring we talked about before, where they go now.

And so you see, if you add a machine one at a time you’re going to tend to get more of that data shuffling around. And that data shuffling around takes time too. It takes your network bandwidth. It takes disk I/O. So, adding machines is never free. What you get in the end is that you have more capacity. But our awesome support engineers have spent a decent amount of time helping customers grow their clusters.

CHUCK:  So, would it be cheaper then, to just add more disk space to your existing servers?

SEAN:  Yeah. And people do that too. It usually means taking it offline to do the disk copy type thing. But those don’t necessarily solve the same problem. So, if it’s a matter of how much can I pump through one machine, if that’s the problem they’re trying to solve or how much space do I have on this machine, then yeah take down, add more disk, bring it back up is the answer. But sometimes one of our customers with the largest cluster has done both. But their cluster, they have both upgraded the hardware on the machine and added more machines as time goes on and they needed to serve more traffic and hold more data.

AVDI:  I want to talk a little bit about this from the coding side. I’m curious. Since this is pretty clearly something that you add in when you realize you need the capacity, it’s not something that you’re necessarily going to know from the get go, that you need this level of capacity. And so, you’re going to hit a point where you’re writing information like those autocomplete accepts and you suddenly realize that your main store doesn’t cut it anymore. Are there some sort of broad guidelines for designing your code that make it easier to bring this kind of second store in that you found?

JAMES:  That’s a good question.

BRYCE:  So, in my experience with Rails apps and Rails-like apps, the MVC structure that everybody tends to gravitate to feels like it works really well for this. And because it just provides a nice integration point for something. And especially if you’re using lots of just the plain Ruby objects to manage how your other models are interacting with each other. So, let’s talk about I’m logging into a website. Am I loading a user model and checking the password or am I creating a login instance?

Now if you want to start keeping these logins in a more structured format where I want to be able to see a list of my existing sessions, kill these sessions, I don’t want these sessions live in Riak, that existing model class feels like a really good spot, the login model class, to integrate using Riak to track all these different session to destroy them if I think that somebody’s got my password and then snuck in or if I left myself logged in a public computer or something like that.

AVDI:  What if you’re already putting some user session? I’m thinking about the case where in a typical Rails app you have a lot of models that are very tightly tied to Active Record which means they’re tied to your primary relational store.

SEAN:  Yeah, I think that the biggest conceptual shift that the people encounter is that you’re not focusing on getting everything normalized and as pure and deduplicated as possible. You tend to optimize for “Here’s the thing that I have to display on this screen. So, how can I reduce this to the fewest number of requests to Riak or to whatever key value store you’re using, just to display this page?” So, often when we find there will be data duplication, things will be copied, you will tend to put larger blobs of information rather than having smaller records with lots of foreign key references. So, there’s a big conceptual shift there.

But usually like Bryce mentioned, people get into it via session store because session store is just a place to put your stuff. It’s a single name for the session being the key and then a value, which is whatever you want it to be. So, session store is a very common use case. If your users are uniquely identified, like UUID or some kind of hash or whatever, or if you only allow them one email address, that’s a good use case for Riak. So, those things where you can immediately say I know what the key for this thing is or I can take these other pieces of information and construct the key, those are the best use cases for Riak.

Now we have the other things like secondary indexes and in 2.0 we’re getting full Solr-based search which is pretty awesome, and these data structures that we talked about previously. But for the most part its bread and butter is can you give me the exact key that you want and I’ll get that back to you in a really reasonable latency.

JAMES:  It’s interesting to hear you say session store. I don’t know why. I guess I often think of the user session as not super vital, not like ridiculously mission critical. Losing it means I’ll lose somebody’s login, but it doesn’t mean I’m probably going to lose a lot of data or whatever, whereas I tend to think of Riak, that I should use it in situations where I’m much more paranoid about my data safety level. Am I thinking about that wrong?

SEAN:  No, I think that’s fair. On the other hand, what we typically think about with Rails sessions and especially because so many people had pain in the past. I don’t know if any of you have used active record sessions before. I made that mistake. [Laughs] But so much of what we put in sessions in most apps is really small. If it can fit in a cookie or if you have to put it in an external store, you put it in something that’s ephemeral, like memcached or Redis or something. But there are plenty of applications that have larger session objects. Take for example something where you are filling out a form, like a sequence of steps.

JAMES:  Wizard.

SEAN:  Like a wizard, right? [Chuckles] You’ve got some ugly government form to fill in and you need to take multiple steps to fill in all the pieces. That’s something that if you aborted it, if you logged out, maybe you don’t want to keep that around. But while you’re still there, if there’s a lot of data there that needs to persist through your time through the wizard, then yeah maybe you want it for that.

I think the biggest example of somebody using Riak is a session stores, Wikia. They provide while label media wiki instances basically. They’re sort of a shoot-off of Wikipedia back in the day. But they for their power users especially, they replicate those people’s sessions across multiple data centers using Riak and probably because their sessions are so big. They said, “We can’t keep this in memcached or we can’t keep it in a cookie. And if one of these data centers fails, we want somebody to be able to still be logged in on the other data center and we can redirect their traffic there.”

JAMES:  Yeah, that’s cool. That’s a good example.

BRYCE:  And related to that there are a lot of sites, and I’m particularly thinking of GitHub here, where one of the big features and conveniences aside is that you generally don’t get logged out for no good reason. So, I’m just looking at my GitHub security history and I have a session that I signed in back in October and it’s January 9th right now.

[Laughter]

BRYCE:  And these sessions, they materialize these and keep enough of the data persistent on their server that I can go in and log myself out. Say my phone got stolen and I never set a fingerprint or a passcode on it, I wouldn’t want that to be continuously logged into GitHub and be able to access my proprietary code. So, having that session distributed so it would be accessible but at the same time be able to remotely destroy it seems valuable.

JAMES:  Oh no, they can have my GitHub login if they’re going to do my work for me.

CHUCK:  [Laughs]

DAVID:  Yeah, but what if it was me?

JAMES:  Alright, good point.

DAVID:  You see the problem here.

[Laughter]

CHUCK:  I’ll do your work for you as long as I get paid.

JAMES:  [Laughs] That’s cool. Riak’s got multiple interfaces, right? An HTTP interface and then there’s a binary? The other one?

BRYCE:  Yeah. It’s a binary interface based on the protocol buffers standard.

JAMES:  Gotcha. And didn’t the Ruby client recently drop the HTTP interface? Is that being phased out in favor of the protocol buffers?

BRYCE:  So, for Riak 2, in the Riak Ruby client 2 we are dropping the HTTP interface. So, the history there is that originally, Riak I believe was only HTTP. Sean, is that correct?

SEAN:  Yes, that is correct.

BRYCE:  So, over time more features got added to the new protocol buffers interface. But it wasn’t really until Riak 1.4 where protocol buffers and HTTP were a pretty complete parity. So before that, if you were using protocol buffers for some features, you may still need HTTP for all the features. With Riak 1.4 we finally got all the features to parity on both interfaces. And with that, we noticed that one of the big code maintenance issues was that when a feature would change or a feature would get added, it would be more than twice the work to make sure it got added correctly to both backends in this case. Actually, more than that considering we have two different HTTP backends in 1.4.

So with 2.0, with a lot of the new features, the new Solr-based search which we call Yokozuna and the CRDT stuff, there’s a lot of new work there. And making the new work only work with protocol buffers seems to be a good way to make sure that it would get done quickly and correctly. And we’ve have time, or turns out it’s mostly just me, would have time to work on other features that are getting added. So, we made the decision that we remove HTTP support from Riak 2, the gem for it.

AVDI:  How much of an efficiency boost are you seeing from using binary protocols?

BRYCE:  I don’t have any handy benchmarks about that personally.

DAVID:  Dun, dun, dun.

CHUCK:  [Laughs]

DAVID:  So, all it really does is it made the code harder to use.

[Laughter]

SEAN:  Well, no. Actually, so what it would do is it makes it harder to poke at the database without a client.

DAVID:  Yeah.

SEAN:  We’re still thinking about maybe we should have something like a Mongo shell or your PostgreSQL shell just for poking.

JAMES:  Redis CLI.

SEAN:  Yeah, Redis CLI. Those are good examples of how other databases have made it really friendly for users. And sadly, all of the examples on our documentation that aren’t language-specific are, “Hey use curl from the command line.” And the flipside of that is now you have to know all these switches for curl and how to set the correct header and all those crazy things. I think in a lot of cases the protocol buffers, the binary protocol is more efficient just because it has smaller payloads and you’re not doing text parsing. You can make the decoding and encoding routines more efficient.

JAMES:  It seems to be a recurring theme for us lately. We did the HTTP 2.0 episode recently which was basically about this.

CHUCK:  Yeah, but we were mean to them.

JAMES:  [Laughs]

CHUCK:  So, I want to ask a little bit about the MapReduce and the full-text search that it says right on your front page that you have. How do you do that if it’s a key-value store? Are you just parsing the contents and saying, “Yeah there’s text in there somewhere,” or is there more to it than that?

SEAN:  So, do you want to start with MapReduce or search? Which one do you want to take first? Because they’re actually very different.

CHUCK:  How about MapReduce?

SEAN:  Okay. So the same, and Bryce feel free to jump in here at any point, but the same concepts that we use to distribute the data around the busser can be used to distribute the work of a MapReduce job. So, you basically feed it here are some keys. There are multiple ways to feed it with keys. And then you apply different phases to it. Map phases will load data from disk and transform it. Reduce phases collect the results of previous phases into a single value or a list of values. And so, what we can do is ship the code to the data. So, if you have a large cluster, in addition to getting that extra storage capacity you get extra processing power.

Now the challenge of this is often what people want to do is scan across all their data and feed it into this job, which David you were talking a bit ago about operational versus analytics databases and this is sort of the same problem. If you were using Riak as an operations database and then you throw this big batch query at it, it’s not going to feel very good for your application. [Chuckles]

JAMES:  [Laughs] Goes about as well as the seven table join in SQL, right?

SEAN:  Right, right. [Chuckles] Actually there was a customer of ours who used to have an 18 table join inside of a stored procedure. And they switched it out for Riak in precomputing things.

[Laughter]

CHUCK:  I was going to say I think I just threw up in my mouth.

SEAN:  Yeah, you probably should, right? [Laughs] But now if you’re doing a small number of keys, maybe that may fit inside the context of a web request or some kind of AJAX thing. So, we typically say, “You know what? Only do this if you have to. You’re better off just fetching keys. But you can’t do that sort of open-ended processing.”

CHUCK:  So, in the case of say analytics for example, you could map across all of the entries for a given day or week or month or whatever and then have it reduce to just the counts or whatever information you need and it just distributes out across the different nodes so that they all do their work. And then does it aggregate it somewhere? How exactly is that?

SEAN:  Yeah, the results stream back to the client request coordinator. So, as it gets results, it will just send it straight back to the client. But if you have a reduce phase in there, that’s a [fan] endpoint so you will have delay. And it will have to aggregate all the results before it sends it to the client.

CHUCK:  Okay. The other question I have related to that is you were talking about it almost like you could only map across keys. Can you map across the values as well?

SEAN:  Oh, I’m sorry. I didn’t mean to mislead there. A map function is run against a single value. But the input to a phase that has a map function, it has to be a key. So, that could be you do list all the keys in this bucket and that’s the input. And then across each key that it finds, it will run that map function. Or it could be perform this secondary index query and pass the keys to the map phase. Or it could be do this full-text search and then feed the resulting document IDs into the map phase, which those document IDs should correspond to a key. So, there are multiple ways to give that information to a map phase but the map phase just takes a key, loads the data that corresponds to that key off of the disk, and then processes it.

CHUCK:  And so, you can have secondary indexes so you have the key that’s one thing and then you can have another index on your objects that is another thing that you can query against.

SEAN:  Yeah, so…

JAMES:  I would say that’s a common tactic for key-value databases in general basically, is to precalculate how you’re going to need this data or something.

SEAN:  Yup.

CHUCK:  Yeah. When you mentioned it, it just seemed like that was, “Oh okay. So, I can get my dataset another way if I’m proactive about collecting it that way.”

SEAN:  Right.

JAMES:  And SQL’s doing that too, right? It’s just doing it behind the scenes for us. We specify to index on this particular key and it builds a lookup map so that it can get at those records faster going through that key, blah, blah, blah. How about the new full-text search in 2.0? That looks pretty neat.

BRYCE:  It is quite neat, actually. So, it’s completely built around the Solr and Lucene projects in Java. So now Riak, if you want to use Yokozuna, the new 2.0 search, you have to bring along a JVM for the ride. But it’s rather nice. You just throw normal Solr queries at it and it runs the query against each node and then joins the results together in a sensible way to make sure you don’t have the same result multiple times because it matched on multiple nodes. It manages indexing correctly.

So, you basically tell the client, or you tell the Riak server through the client that you want this bucket to be indexed into this index and it just works. It does have, it can read all sorts of different documents. JSON, XML, HTML, I believe it can do Microsoft Word and within the last couple of versions. So, I haven’t actually tested it. It also works on the CRDTs or the data types.

JAMES:   So, that’s interesting. So, when you say read JSON, I can specify some arbitrary field in there that’s the thing I need to search on?

BRYCE:  That’s correct.

JAMES:  That’s cool.

CHUCK:  And you’re basically using Solr and the Lucene engine and the data store that it’s using is Riak. Is that what I gather or did I misunderstand?

BRYAN:  It’s using the engine at a low-level but then Riak has to do some extra work to make sure that since the query’s being run in multiple instance of the engine all around the cluster, that you get coherent results back.

CHUCK:  That makes sense.

SEAN:  Right. And there is Solr Cloud out there. Some people may think this is that, but Riak does the distribution for this like Bryce said. And then Solr does the query.

JAMES:  That’s cool.

BRYCE:  And a few months ago, I threw together a gem that lets you string the queries together just like a Rails 3 Active Record queries.

JAMES:  Oh, like the dot syntax where you just keep tacking things on?

BRYCE:  Yeah. @bucket.query.where name is this value, order by created at.

JAMES:  Interesting.

DAVID:  Oh, cool.

CHUCK:  So, it’s like NoSQL SQL.

JAMES:  [Laughs]

BRYCE:  Right. And that’s what I’ve been going for. And I feel like since I’ve been at Basho, I’ve been gravitating around this idea of bringing Active Record like stuff, or the cutting edge Active Record stuff to work on Riak. And there’s been flirting at but not actually making a query planner that can emit MapReduce jobs, trying to and failing to retrofit some stuff on top of Ripple, the old Active Model layer that we used to maintain and haven’t for about two years now. But it’s been fascinating and especially with the Riak secondary indexes by learning how learning how to write those and design those for specific queries. I feel like I’ve gotten a lot better at writing postgres indexes as well.

JAMES:  That’s interesting.

SEAN:  Yeah, because you have to them all manually in Riak. [Laughs].

JAMES:  Right, I know right?

BRYCE:  Right.

DAVID:  Yeah, yeah.

BRYCE:  But even the things like understanding the cardinality of the different components of the index.

JAMES:  So, Riak probably isn’t used as the primary data store replacement, I would assume. I’m assuming it’s generally brought in in some project to handle the special case where the super [inaudible] and key-value store benefits and yet the main store typically stays SQL or whatever. Would you think I have that right or do you see lots of people using it as their primary data store?

BRYCE:  I’ve personally worked on a few projects where Riak was the primary and only data store. But I don’t know if it’s necessarily the best fit for that role. In particular, the case that I always worry about is whenever you’re creating a new user with an email address and a password, what happens if two people think they own the same email address? There’s a mirror universe Bryce K. out there with brycek@gmail.com or they think they have it and they sign up for lots of sites with my email. So, what happens if we both sign up for the same site with the same email and different passwords? Who wins in that case? And the tools that Riak gives you or Riak 1.4 today gives you don’t give you a good answer for that.

JAMES:  That’s a good point. So really, the ideal application it seems what I’m hearing you say, is something where I can SHA and that’s always going to be this unique identifier in some way.

BRYCE:  Well, you don’t have to SHA it. Riak can SHA it for you.

JAMES:  Right, yeah.

BRYCE:  But yeah, content-addressable data is great fit for Riak. And there’s been a lot of buzz around Datomic and they have last year unveiled that they support Riak as a storage engine for Datomic. And the way they do that is all their data is immutable. So once it’s written, which includes not just the data but all the indexes, and they’re just like, “Hey this block of data and indexes, its content is its identifier. So, we’ll just hash the content and store that. And if it’s not there, we’ll just keep fetching it ‘til it’s there, if we expect it to be there.” So, there are lots of ways that, especially if you don’t mind duplicating data or rewriting if you are less concern with mutating existing data and you just want to write new stuff, immutability is a great use case for storing things in Riak.

AVDI:  We’ve been addressing this mostly from the point of view of client programmers of Riak. I’m curious about your experience programming Riak. First of all, what’s the language breakdown that you’re using?

BRYCE:  So, Riak itself is mostly Erlang. There are a few parts, the levelDB on disk format is implemented in C++. And there may be some JavaScript bits in the MapReduce stuff.

AVDI:  Okay. What’s your experience been coding in Erlang? Has that been good?

BRYCE:  My experience, yeah. I think a lot of people complain about Erlang with the syntax. That seems to be the first thing that comes up.

DAVID:  Well, obviously.

BRYCE:  Yeah, it’s so ugly.

[Laughter]

BRYCE:  Actually, I think harder than the syntax is the mental model.

SEAN:  Yeah.

BRYCE:  But once you’ve embraced the idea of actors and message passing and immutable data, always copying things, then the syntax falls away. On the other hand,

DAVID:  The syntax is just like a warning, right?

[Laughter]

DAVID:  This looks like awful…

BRYCE:  [Inaudible] is that we’re used to being in a different mode, right?

DAVID:  Yeah. What the hell is this crap? Okay, good. Now you’re in the right mind frame.

[Laughter]

DAVID:  Yeah, I was talking about concurrency.

BRYCE:  Jose’s Elixir is softening that blow for people. And so, that’s a good thing.

AVDI:  Which may or may not be a good thing.

BRYCE:  Right.

JAMES:  What’s the counter [inaudible]?

BRYCE:  Well, it might be a disservice, yeah.

AVDI:  It’s interesting because if you’re a Ruby programmer, it gives you a lot of Ruby constructs with utterly alien semantics appear apparently Ruby.

JAMES:  I see what you’re saying. So, it looks like a def but it behaves differently so you don’t get what you’re expecting basically.

AVDI:  Exactly.

JAMES:  Yeah, that’s a good point.

CHUCK:  Alright. Well, I hate to be a party pooper but I had some guy invite me to an Erlang podcast. So, I need to start wrapping up so I’m ready on time.

JAMES:  Aha! Those Erlang people.

DAVID:  Just spin up another 15 copies of yourself, Chuck.

CHUCK:  There we go. Working on that.

JAMES:  We’ll say this though. Doesn’t Riak seem to be squarely in the middle of the Erlang problem set, right? It’s almost an ideal fit.

SEAN:  Yeah.

BRYCE:  Yeah.

JAMES:  The super ridiculous tolerance and concurrency and stuff.

BRYCE:  Right, so all of those things that we want from a total system level also go down to the individual components in terms of reliability and being able to recover from failure. Yeah, you’re right. It’s such a great fit.

CHUCK:  Alright. Well, let’s do the picks. Avdi, do you want to start us with picks?

AVDI:  Alright. So, one thing that I’ve been using a ton and really, really appreciating for the last few weeks is IRCCloud. And IRCCloud is pretty much exactly what it sounds like. It is a cloud-based web frontend to IRC. And it’s pretty much single-handedly gotten me back onto IRC after many years because it addresses just about all of the problems that I’ve had with IRC that have kept me from getting back into it. It keeps me logged in all the time, whether or not I’ve got a client open. It’s got a web client and then it’s got mobile clients.

And whether or not I’m logged in, it has me logged in so I’m not losing chat logs and losing messages that people have sent me because I wasn’t logged in. And if I join a room on my computer then I’m also joined on my phone and if I join on my phone it’s also joined in the computer. So, it’s just a seamless experience where it’s not like different clients, different places which are subscribed to different rooms. And a lot of other cool stuff like lately they added the ability where if somebody pastes in a gist, it actually just inlines the gist right in the chat stream so you can just look at it instead of clicking through the link. And a lot of other neat stuff. Pings me…

JAMES:  [inaudible] awesome.

AVDI:  Yeah, it pings me on my phone when somebody mentions me or tries to direct message me. And yeah, I could go on and on. It’s pretty great. I’m back on IRC because of it. So, they’ve got some slight limitations if you use the free version and then there’s a $5 a month version which doesn’t have any limitations. So yeah, IRCCloud is pretty sweet.

Something else I’ve been using a lot lately, moving off of technical stuff, is Google One Today which is a phone app. I’m not sure if there’s a web app as well, but there are apps for Android and iOS. And basically what it does is it gives you a different vetted charity every day, pops up on your phone and gives you a little bit of information about it and gives you the opportunity to donate a dollar to that charity. And actually, if you scroll through, I think it actually presents several charities a day. So, if you’re not too into the one that it pops up, you can look around for another interesting one.

It also has some fun options for setting up small matches so that you can post a match and then when other people contribute, you’ll match them dollar for dollar, up to a small, low limit. And it’s very cool because if you’re like me, you are always a little bit concerned that maybe some of the charities aren’t as good, maybe they’re not putting all the money towards, or enough of the money towards the actual work. And this gives me a way to spread money around to a lot of different interesting charities.

JAMES:  That’s awesome.

CHUCK:  Cool. James, what are your picks?

JAMES:  I feel like mine are really boring now that I’ve heard Avdi’s. [Chuckles] I want to go play with his stuff.

[Chuckles]

CHUCK:  Was that an, “I don’t have any picks”?

JAMES:  No, no. I have them.

[Laughter]

JAMES:  I collected a bunch while I was on vacation so I’ll spread them out over the next couple of weeks. But I saw this just before we recorded. There’s a new blog post on the Code Climate blog about “When Is It Time to Refactor?” and it’s really great. Just some basic tips about is it time yet or can it wait? I think one of those decisions we eternally wrestle with or eternally get wrong, one of the two. So, that was interesting.

Another interesting thing I saw on vacation is the sandi_meter. So, when Sandi Metz was on the show, she gave us a bunch of rules, preferable no classes over a hundred lines, no methods over [a fork], et cetera, et cetera. And the sandi_meter is a static analysis tool that will run through your code and tell you how much you’re violating Sandi meter. And the cool part is it has an HTML output mode complete with graphs and pretty pictures.

DAVID:  That’s awesome. I wrote a poor man’s version of that a month ago to format, to keep me safe in my code. I’m going to use that. Thank you.

JAMES:  Yeah, it’s cool. You should check it out. And then while I was on break I did many things. And like I said, I’ll spread out the content, but one of the things I did was play Gone Home off of Steam. It’s a game. And essentially, I’ve actually been a little hesitant to describe it as a game. It’s very, I wouldn’t say it’s got conflict in or anything like that. It’s just you find yourself in this situation where it’s very reasonable why you don’t know what’s going on and you explore it until you’ve figured out what’s going on. And so, it’s very cool. It’s more of an interactive story, I would say, than specifically a game. But really good, you can play through it in a couple of afternoons in your free time. It’s not lengthy and it’s really well done and has these side things you can figure out to pay you off for paying attention. So, cool game. I enjoyed it. Those are my picks.

DAVID:  Cool.

CHUCK:  Awesome. David, what are your picks?

DAVID:  Okay. So, my first pick is MailChimp. And there’s a bunch of mail services out there. If you want to build a mailing list and put together email campaigns and send drip stuff out there, there are a bunch of good services out there. There’s AWeber, however you pronounce it. There’s ConvertKit. And this is by no means a comprehensive review of all of them, but I looked at them and I had a coupon for three months free for one of the services and I signed in. and their webpage crashed. And so, I filled it out again and then they said, “No that email’s already been taken.” And there was no way to fix it.

So, I went to MailChimp and it just worked. And I had a list up and running in 15 minutes and I knew nothing. I’ve never actually run a mailing list before. And I had a working mailing list in 15 minutes. That includes time sitting reading through the concepts of building your subscriber base into groups and making different mailing lists and having campaigns, all that stuff. 15 minutes and there was an email in my inbox and in my wife’s inbox spamming us with some mythological product that I had thought up for the test list. So, MailChimp. I really, really like them. Very, very easy to use and the web interface for designing the emails is actually quite pleasant to use.

My second pick is ‘The Gentle Art of Verbal Self-Defense’. Katrina picked this a few months ago. I’m specifically picking the revised and updated version for 2009 which Amazon, hang on, UPS has literally just pulled up outside my door. They might actually be delivering my copy right now. The 2009 version is more updated. The original version was 1985 or something like that. So, there’s a lot more internet stuff like email and dealing with stuff like that in the new version. Suzette Haden Elgin is the woman who wrote it.

And it turns out that I have been chatting with her online through an anonymous handle as she’s just somebody on LiveJournal that I knew. And she was this very kind lady that knew a lot about linguistics. And it turned out that when I went looking for her books, somebody said you should go talk to so and so, and I’m like, “Wait a minute. I’ve been talking to her. You’re telling me that she’s my hero and she’s been my hero this entire time?” So, that was a fun moment for me.

My last pick is a game, or a pair of games. I suggest you buy them both. On Steam, you can get Metro 2033 and Metro: Last Light. These are basically if you think of the Fall Out games as post-apocalyptic fantasy, Metro is post-apocalyptic hard core sci-fi. The mutants are freaky monsters. There are thousands of types of species of mutations that have come because of the apocalypse. There’s a compelling story line. It’s set in the subway underneath Moscow, in the metro 20 years after the apocalypse has happened. And the first game is Metro 2033 and it actually forces you, if you want the good ending, you have to go to extreme lengths.

If you play it like a regular first person shooter and you should the bad guys and you save the good guys, you’re going to get the bad ending because there are only 10,000 people left alive in the world. So, killing bad people is not really all that ethical. And so, if you want the huge ethical good ending, you actually have to go out of your way to not kill people who are your enemies just because they have a different political ideology. It’s okay to kill evil people, but it’s not okay to kill people who are merely strongly opinionated and different than you. And it’s very, very exciting. Metro: Last Light, you can buy these both in a single bundle.

Sorry, this review is taking too long. These games are just fantastic. I can’t say enough good things about them. Metro: Last Light is the sequel and it has the same kind of ethical conundrum. You finish Metro 2033 by committing genocide on the bad guys and Metro: Last Light is the sequel and you basically realize that you really shouldn’t have genocided the bad guys, that that was actually a very, very bad moral thing to do and you spend the entire game trying to save humanity from itself. And yeah, it’s been a long time since I’ve had a video game ask the question is humanity even worth saving. And you don’t really have a quick answer. And the game finally ends, in my opinion, it ends with a yes but not on merit. They’re worth saving just because, but not because they’re good people or not because we’re good people.

So, I just can’t say enough good things about Metro 2033 and Metro: Last Light. First person shooters. There aren’t any cheat codes but there are hacks and cheats if you like to play casually like I do. You can contact me if you want help, because the games are quite hard. And I don’t like hard games. I like games that I can read like a novel. And so, I highly recommend those.

And then one throw on quick thing, Avdi recommended a bunch of charities. I love CharityNavigator.org. It’s a website that gives transparency into charities. Basically they’ll go in and they’ll say, “This particular charity has not exposed their finances to the SEC or they have. And we’ve calculated how much of a percentage of overhead they’re taking.” And there are people out there that have, “Oops, yeah,” they got totally busted. The CEO of the charity is taking 98% of the money given to the charity as his personal salary. And 2% is being given to the actual relief fund. And other charities get much higher ratings and that sort of thing. So, CharityNavigator.org is a great place to look up how charities are doing. So, them’s my picks.

CHUCK:  Awesome. Alright, I’m going to go real fast. The last week or so, I spent at Disneyland and then at New Media Expo. And so, those are my picks. Disneyland, which is way fun, over in California. It’s the one in California, not the one in Florida or wherever else. And yeah, it was just great to spend time with my family. It was a little less great to spend time with my wife’s family.

JAMES:  [Laughs]

CHUCK:  But it was fun. And then New Media Expo is the big conference for blogging, podcasting, video production, web TV, et cetera. And anyway, it was a lot of fun. I met some terrific people including some of my podcasting heroes. And so, just a terrific time. And so, if there are any listeners, I know there’s at least one listener I met at New Media Expo, it was terrific to meet you. Anyway, that’s pretty much all I’ve got for picks. Sean, what are your picks?

SEAN:  Okay. So, I’ve got a bunch but I’ll make them short. Lately I’ve been looking for an alternative to this one app we have that the UI’s written with EmberJS. And I came upon a project by David Nolen called Om which works with Facebook’s React.js library. And it’s all ClojureScript. [Chuckles] So, I’ve been looking at that, learning ClojureScript, and trying to see if this will work for us to replace Ember. It might or might not. We’ll see.

I really love to write good tests when given the possibility so one of my favorite tools for a long time, we’ve used at Basho, it’s called QuickCheck. It was originally written in Haskell. There’s a commercial version that we use for Erlang, just a couple of open source versions for Erlang.

And then there’s actually, I haven’t vetted it yet, but there’s one for Ruby called rantly. So, the basic idea is it’s a different testing paradigm. You write properties sort of in variants, is probably a better word, at times over your code rather than writing little happy path unit tests or fail path unit tests. And it will actually find, by generating input to your algorithm, it will find edge cases and bugs for you. And a number of people have actually used QuickCheck in whatever language to find bugs in the standard library, in the runtime. So, it’s actually really awesome at finding bugs you would never expect.

And so, a blog that I love to read, there’s this awesome grad student from UC Berkley who goes by the name of Peter Bailis, @pbailis on Twitter. And he has a lot of great stuff about distributed systems and databases and whatnot, so I love to read his work. He’s also unlike a lot of academics really open about the stuff he’s working on and usually posts a blog post before he posts a paper to a conference, which is not common in academia. So, go check out Peter Bailis and his work.

And finally, and this is not technical, but recently a friend of mine bought me this game for iOS called Bastion. It came out a few years ago.

DAVID:  Yeah.

SEAN:  But really this awesome game. But I think what really tickles the music nerd in me is the soundtrack.

DAVID:  Yes! Yes!

SEAN:  It’s so awesome. It is sort of a combination of electronica and the cowboy wild west vibe. And it is just really an incredible piece of work. So, definitely [check that out].

DAVID:  Does it make you think of Firefly?

SEAN:  Absolutely. That’s the first thing I thought of.

[Laughter]

DAVID:  Yes! Yes! Yes! Sorry, carry on.

SEAN:  And it is really great. Those are my picks.

CHUCK:  Alright. Bryce, what are your picks?

BRYCE:  So, I don’t want to upstage you because I went to Epcot at Walt Disney World last week and had a good time.

CHUCK:  [Laughs]

BRYCE:  So, the day after that Sam Elliott, a coworker here at Basho, and I went to the Kennedy Space Center visitor complex. And I’ve been there many, many times. But the new space shuttle Atlantis exhibit there is simply fantastic. The entrance to it, the presentation of the spacecraft itself, just beautifully done, and I love the space center a lot.

Besides that, this week I’ve been doing a lot of debugging of the Beefcake gem in JRuby. I’ve been using the free VisualVM tool for inspecting memory and CPU usage in JVMs. And it’s been very valuable and I don’t know if I could have found the problem as quickly as I did without it.

CHUCK:  Alright. Well, thanks for coming, guys. It was a great discussion and I hope we can get some people to go out and try Riak and see what it’s all about.

JAMES:  Very cool, thank you.

CHUCK:  Alright, I just want to remind everybody about our Book Club book. We’re reading ‘Ruby Under a Microscope’ so go check that out. And besides that, we’ll wrap this up and we’ll catch you all next week.

Trackbacks

  1. […] 139 RR Riak with Sean Cribbs and Bryce Kerley […]

Previous post:

Next post: