Cloud and NoSQL: A use case
March 5, 2012 at 20:23
Nico in Clojure, Clojure, CouchDB, Heroku, PostgreSQL

This article is inspired by the many questions I see in forums about what is the best language, NoSQL database, or cloud service to use today.

A frequent reply to these questions is “What is your exact goal and use case?”, which is something people often find difficult to answer. For writing a use case is a tricky exercise indeed. In which depth of details to go? What is meaningful and what is not? How to make it fit inside a forum post? And how to keep your startup's end goal private?

Another recurring and objective reply is “Experiment to find what works best for you”. Still, one who is new to the Cloud and NoSQL space cannot experiment with everything. Aiming to reduce one's experimentation space is legitimate, which brings back to the question about use case.

I therefore thought that I would share my personal use case, and how I decided to experiment with Clojure, CouchDB, Heroku and PostgreSQL before anything else. My hope is that it will help you formulate your own use case and put you on the track of finding your first tool set, in particular by showing you in which way a use case and a first technological bet connect together.

A word of warning: objectivity is not the point of this article. Everything that follows is personal and subjective. It is based on my very own experience (or lack thereof), understanding (or lack thereof) and intuition (or lack thereof). This is intentional. Making your first bet should involve a huge amount of your own subjectivity and intuition. Experimentation will bring objective answers.

In the end, writing about this is a risky exercise. At best I will change my mind. At worst I got it all wrong... but I don't think so.

Interestingly, I found that my needs could be expressed in pretty generic terms, and without revealing the details of what I am up to; so if you want to ask for help on a forum, there is a lot you can say without selling your soul.

My use case

I have a few projects in the pipe now, and here is my reality in a nutshell.

If you are an online shop, a news publisher, an online RPG, or a S&P500 company, then bits of your equation should be different from this. You might have a supply of developers and administrators, you might be working with in-house data only, you might soon be snowed under petabytes of data, you might be streaming digital data.

A McKinsey analysis would be more structured and polished, but this use case is really good enough for the purpose of narrowing which technologies to start experimenting with.

What about the technologies I know most?

Technically, these are things that Azure, .Net, SQL server and Matlab would do for me. But I think they would benefit projects that are a bit more mature than mine, if only for the higher upfront costs that they entail. Also:

“Plan A"

What follows is my first shot at an overall platform that could work for my projects, and reasons why. The choice is very much an overall one, and not just a choice of individual parts, and the process to arrive there was very iterative.

I will also describe technologies that I did put on the side for now. I may well end up using them as plan B if plan A doesn't work as well as I hope.

Software: Open Source

Cloud infrastructure: Amazon Web Services

Preferred to:

Hosting: Heroku

Preferred to:

SQL: PostgreSQL

Preferred to:

NoSQL: CouchDB

Preferred to:

Functional programming: Clojure

Preferred to:

Object programming: Java

Preferred to:

Summary

In a nushell, Plan A is very much:

a bet on the trio Heroku, Clojure, CouchDB
with PostgreSQL as safe SQL bet
and Hadoop over AWS as a long term cloud scaling environment.

What I need to assess now is whether Heroku and hosted CouchDB deliver good enough performance, and whether CouchDB meets all my high expectations (too high?).

To summarize how tools match the use case:

Data analysis projects
Heterogeneous and source driven data CouchDB
Data can be stored as text CouchDB
Rapidity of development CouchDB, Clojure
robustness, data redundancy CouchDB
Process redundancy Heroku // in-house
Easy data harvesting CouchDB, Clojure
Harvesting in the cloud, research in-house, production in the cloud CouchDB, Clojure, Heroku
In-house research data mirror production data in the cloud CouchDB
Web application projects
Application driven data CouchDB, PostgreSQL
Rapidity of development, ease of deployment Clojure, Heroku
Robustness CouchDB, PostgreSQL
Single environment from data exploration to deployment Clojure, CouchDB
Big data CouchDB, AWS
Minimal administration Heroku, CouchDB
Managing several environments CouchDB
Minimal upfront cost, linear costs Open source and freemium
Recognised ecosystem AWS, Java
Good personal understanding Java, Clojure, CouchDB, PostgreSQL, Heroku

Your story?

I hope the above will help you to formulate your own use case and to find technologies that fulfil it. In particular, I hope this gave you an idea of how advanced an analysis you can perform based on the information that can be found on the web, in forums and in books (and a little bit of experimentation of the side). I will love to hear your own stories, and to advertise use cases and solutions that are different from the above.

In the end, the one piece of advice I would give when approaching the space is: understand what you plan to use, and feel certain that you and your team can become intimate with it. For example: I don't think I can become intimate enough with EC2 in the short-term, so I am happily giving its flavour of elasticity up for now. I see myself becoming intimate with PostgreSQL more easily than with MySQL. I felt intimate with CouchDB soon after I started reading about it, and less so with other NoSQL solutions.

Now back to experimenting with all this. And I am looking forward (am I?) to telling you whether I change my mind or not!

Article originally appeared on Chaomancy (http://www.chaomancy.com/).
See website for complete article licensing information.