
Join us Thursday January 19 at 6:30 PM at our brand new Headquarters (aka Fort Awesome). Join and RSVP here.

Join us Thursday January 19 at 6:30 PM at our brand new Headquarters (aka Fort Awesome). Join and RSVP here.
by Ricky Ho (noreply@blogger.com) at January 16, 2012 12:41 AM
So apparently my last entry ruffled some feathers, so maybe I should explain why I think Couchbase is the future?
Simple Fast Elastic.
That's pretty much it. We make it very simple to get started, we are extremely fast (and getting faster), and we really are "web scale", with the ability to add and remove machines from a cluster to rapidly scale your capacity to your workload.
The Membase product was very fast and scalable, but a bit too simple, with no reporting capability or cross-datacenter replication capability.
The CouchDB product has a lot of features, but is too slow, unable to keep up with high loads and inability scale-out on it's own.
The combination of the 2 will hit a sweet spot to allow developers to quickly get their apps up and running, along with the reliability, speed and low cost that make running it in production cheap and worry free.
Our 2.0 product is coming soon, adding CouchDB style views and reporting with a nifty trick for extremely fast failover while maintaining full coherency with the underling distributed data storage (we are calling it our B-Superstar index). We'll of course have lighting fast reads (same as Memcached) but also very fast durable writes. For 2kb docs, we are currently getting sustained random insert/updates rates of 25k writes/sec, fully durable, with compaction in background so it can go all day and all night. We've got some more write work coming soon which we are hoping will give us another performance boost too before 2.0. Stay tuned.
And so right now the focus is on the features and customers that pay, a thing that allow us to build a real sustainable business. And that's REAL DAMN IMPORTANT. It's not enough to build some cool technology, not enough to build a community of excited technologist. You need to cross the chasm and build a real business. A business that provides support, training, documentation and of course a reliable product. A business you can call up when you have difficultly upgrading from an old version, or are getting some weird error you've never seen before at 3am. A business you know will be around to support you for years to come.
And so while we focus on the features and customers that most quickly make us a viable business (and it's growing fast), we are still looking to build the features and technology to expand our use cases and, get customers and developers excited. Future versions are planned to have full CouchDB compatible replication technology, with the ability to support all sorts of mobile and embedded databases, such as our new TouchDB projects for iOS and Android. So with Couchbase you can have fast, scalable database in the cloud that also supports the offline use of thousands, or millions of apps on devices that drop in and out of internet connectivity, and can sync when connected but still completely usable when disconnected.
That's some cool shit. Simple Fast Elastic. And Reliable. And Mobile. That's why Couchbase.
… is not Damien Katz.
The blog post Damien Katz wrote earlier today, doesn't mean much or anything for the Apache CouchDB project (or memcache project for that matter). If anything it's a public note that Damien Katz acknowledged that he moved (on) from CouchDB to Couchbase.
I'm not a contributor to CouchDB by means of code, (but) I blog a lot, I maintain the FreeBSD port, wrote a book and have an opinion on many things CouchDB. I've been a CouchDB-user for something like four years (since pre-0.8 times) and a BigCouch-user (and Cloudant customer) of about 1.5-two years.
I am not sure what Damien Katz tried to achieve when he posted his message to the community and while I personally find it ignorant (to say the least), it worries me how it is perceived by the general public.
Of course it may sound like the end of the world when the creater of CouchDB quits his own project, but truth to be told, Damien Katz left CouchDB a long time ago. Couchbase moved past CouchDB long before they announced it. Basically when they started integrating Membase, though there are all kinds of notable contributions from Couchbase employees (e.g. Filipe and Jan).
Damien himself hasn't (actually) commited in over a year to CouchDB. Which makes his move no real surprise, just the way he decided to communicate surprised me. Especially since he said to have no regrets, I find the tone and statements in his blog post rather questionable. That is both from a personal and professional perspective.
I just attended a Couchbase event in Berlin last year where talks about CouchDB were given along with newer Couchbase developments. So personally, while I welcome clarity, all too sudden changes in strategy don't make me happy. If I was new to CouchDB and/or Couchbase, this would look like a headless chicken (excuse the image) and way too much drama to get into.
And on a professional level, Damien's posts invalidates the efforts of many people who both contribute and work with Apache CouchDB on a daily basis.
Then later today Damien shared this:
TIL, if you create an open source project, you should stick with it forever and ever. Family can live off unicorns and stardust. — Damien Katz
First off, the most people in this discussion (excluding HN, of course ;-)) are actually active Open Source contributors one way or the other. Many of us have other projects (plural) besides CouchDB. It's not that we troll about something we have no idea about.
Secondly, it's not the fact that Damien left, it's how he left.
No one blames people for moving on: it happens all the time. I do it all the time — write code, push it out, move on.
If code is good enough it'll be picked up, if not, it'll rott on Github for forever. It happened to other projects and it happened to CouchDB. But why would anyone pronounce a project dead where he is not anymore invested in?
Anyway: I wish Damien good luck in the future.
I wrote a blog post about the current state of CouchDB last year (2011) in early December.
A few things have changed since then:
Overall, I still see contributions from notable community members all around. Not sure if it's my own perception, but there has been more activity as of late. A new release of Apache CouchDB is just around the corner. Overall, good times for Apache CouchDB indeed.
The other thing that happened after my blog post is that Couchbase said in their 2011 review (which was published in late December) that it would officially step off Apache CouchDB and contribute documentation and OSX builds (aka CouchDBX or Couchbase Single Server) to the Apache CouchDB project.
This is great news and announcing that they will step off the turf is fine too since it clears up a couple misconceptions people may have about Apache CouchDB and the former Couchbase Single Server.
And all in, this makes Damien's blog post even more unnecessary and confusing to many users out there. Especially confusing for those who are not neck-deep in Apache CouchDB — and by that I mean: they are neither subscribed to a mailing list, take part on IRC, read the CouchDB planet or know any of the contributors directly.
In the end his blog post confuses people because it contains absolutely nothing but fear, uncertainty and doubt (FUD).
I can't reveal too much because it's not my business to announce anything — let me just say that there are good times ahead for Apache CouchDB.
I'd like to get past all the drama and re-focus on what is important: CouchDB and our data. I don't care for the rest, I want to see exciting things from Apache CouchDB in 2012.
by Till Klampaeckel (till@php.net) at January 09, 2012 10:51 PM
The CouchDB world is currently full of “The future of CouchDB” blog posts. It started with the blog post from Damien Katz the creator of CouchDB. Of course people were also concerned about the future of GeoCouch. No worries, it will be good.
The reactions were quite different. People who are not deeply involved with the CouchDB community think that this means the end of Apache CouchDB. My reaction was positive, I tweeted:
“It’s good to see the Damien is so open to [the] world”
The reason was, that for me it was pretty clear that it would happen, and I was just happy that Damien officially made the cut.
The reactions from CouchDB community members where pretty much what Till Klampäckel describes in his blog post. You could see it comming after Couchbase announced that they are not the CouchDB company and that their product won’t be Apache CouchDB compatible.
I agree with Till here, the way Damien wrote his blog post, isn’t the best imaginable. For outsiders, it really seems to be the end of Apache CouchDB, but it is not. For me it just shows, why foundations like the Apache Foundation are such a great idea. Even if the original creator leaves the project, it still lives on.
Apache CouchDB has a lot of contributers and the mailing lists and IRC channel is busy as always. That CouchDB has a future is also shown by the blog post from Cloudant. They will keep supporting Apache CouchDB.
After this quick recap what happened so far, it’s time to talk about the future of GeoCouch. As you may know, I work for Couchbase on the integration of spatial functionality into their product.
Currently the overlap between Apache CouchDB and the version Couchbase uses internally is still quite huge, but it will diverge more and more in the future. Thus it will get harder and harder to maintain a single version that supports Apache CouchDB and Couchbase.
The good news is, that GeoCouch is pretty much a data structure only. It's an R-tree that stores JSON documents. This can easily be used by CouchDB and Couchbase. Perhaps small wrappers will be needed, but those should be minimal.
The easiest way to understand how the future looks like is in a small illustration:
GeoCouch's core is the R-tree, it's the same code for CouchDB and Couchbase. On top of it there will be code that is specific to either CouchDB or Couchbase.
This means that the majority of the devlopment I do for Couchbase will also improve the GeoCouch you can use for CouchDB.
The future of all three, Apache CouchDB, Couchbase and GeoCouch looks bright.
What's the future of CouchDB? It's Couchbase.
Huh? So what about Apache CouchDB? Well, that's a great project. I founded it, coded the earliest versions almost completely myself, I've spent a huge amount of blood, sweat and tears on it. I'm very proud of it and the impact it's had. And now I, and the Couchbase team, are mostly moving on. It's not that we think CouchDB isn't awesome. It's that we are creating the successor to it: Couchbase Server. A product and project with similar capabilities and goals, but more faster, more scalable, more customer and developer focused. And definitely not part of Apache.
With Apache CouchDB, much of the focus has been around creating a consensus based, developer community that helps govern and move the project forward. Apache has done, and is doing a good job of that. But for us, it's no longer enough. CouchDB was something I created because I thought an easy to use, peer based, replicating document store was something the world would find useful. And it proved a lot of the ideas were possible and useful and it's been successful beyond my wildest ambitions. But if I had it all to do again, I'd do many things different.
If it sounds like I'm saying Apache was a mistake, I'm not. Apache was a big part in the success of CouchDB, without it CouchDB would not have enjoyed the early success it did. But in my opinion it's reached a point where the consensus based approach has limited the competitiveness of the project. It's not personal, it's business.
And now, as it turns out, I have a chance to do it all again, without the pain of starting from scratch. Building on the previous Apache CouchDB and Membase projects, throwing out what didn't work, and strengthening what does, and advancing great technologies to make something that is developer friendly, high performance, designed for mission critical deployment and mobile integration, and can move faster and more responsively to users and customers needs than a community based project.
Apache CouchDB, as project and community, is in fine shape. And many of us at Couchbase are still contributing back to it. But the future, the one I'm pushing forward on, is Couchbase Server.
And what is my part in building Couchbase? Right now I'm focusing on getting Couchbase 2.0 ready for serious production use. I'm once again an engineer and coder, back in the trenches, designing and writing code, reviewing code and designs, helping other engineers and solving tough problems. And I'm dead serious about making it the easiest, fastest and most reliable NoSQL database. Easy for developers to use, easy to deploy, reliable on single machines or large clusters, and fast as hell. We are building something you can put your mission critical, customer facing business data on, and not feel like you're running a dirty hack.
Soon, to work more closely with the team (and get rid of my nasty Oakland commute), I'll be relocating my family to the Mountain View area. Shit just got real!
And I'm really excited about the work we've got in the pipeline. We are moving more and more of the core database in C/C++, while still using many of the concurrency and reliability design principles we've proven with the Erlang codebase. And Erlang is still going to be part of the product as well, particularly with cluster management, but most of the performance sensitive portions will be moving to over C code. Erlang is still a great language, but when you need top performance and low level control, C is hard to beat.
Anyway, there so much to talk about, to much for one blog post. One of my New Years resolutions is to blog more, and I've got a ton of interesting things to talk about. The trials of tribulations of building a startup and an engineering culture. What's wrong (and right) with Erlang. Bringing forth UnQL. TouchDB for Mobile. And yes, we'll still interoperate with Apache CouchDB and Memcached. But the future is Couchbase.
Ride with me.
Edit
As J. Chris Anderson notes in the comments, Couchbase is completely open source and Apache licensed:
Everything Couchbase does is open source, we have 2 github pages that are very active:https://github.com/couchbaselabs
Probably the most fun place to jump into development is the code review: http://review.couchbase.org/
Let me clarify, if you like Apache CouchDB, stick with it. I'm working on something I think you'll like a lot better. If not, well, there's still Apache CouchDB.
Update, 2011-12-21: Couchbase posted their review of 2011 (the other day) — TL;DR: Couchbase Single Server (their Apache CouchDB distribution) is discontinued and its documentation (and its buildtools) will be contributed to Apache CouchDB.
When Ubuntu1 dropped CouchDB two weeks ago, there were a couple things which annoy (present tense) me a lot. Add to that the general echo from various media outlets blogs which pronounced CouchDB dead and a general misconception how this situation or CouchDB in general is dealt with.
Some people said I am caremad about CouchDB and that is probably true. Let me try to work through these things without offending more people.
What annoy[ed,s] me about this situation is that I wrote a chapter about Ubuntu1 in my CouchDB book. And while I realize that as soon as a book is published the information is outdated, I also want to say that I could have used the space for another project.
I talked to a couple of people about CouchDB at Ubuntu1 on IRC and no one made it sound like they are having huge or for that matter any issues.
Of course I neither work for Canonical or Couchbase. I haven't signed any NDAs etc. — but looking back a week or two my well-educated guess is that not even the people at Couchbase knew there were fundamental issues with CouchDB and Ubuntu1.
The NDA-part is of course an assumption: don't quote me on it.
Scumbag Ubuntu1 drops CouchDB and doesn't say why. — myself on Twitter
First off: I'm not really sorry. I was abusing a meme and if you read my Twitter bio, you should not take things personal.
I also should have known better since it's not like I expect anything transparent from Canonical. (Just said it.)
When people are compelled to write a press release and put it out like that, they should expect a backlash. The reason why I reacted harsh is that Canonical didn't share any valuable information on why they discontinued using CouchDB except for: it doesn't scale.
And I'm not aware of anything concious to date.
Please take a look at the following email: https://lists.launchpad.net/u1db-discuss/msg00043.html
This email contains a lot of criticism. And it's all valid as well.
Other examples:
These are great emails because they contain extremely valuable feedback.
In my (humble) opinion, these kind of emails are exactly what is necessary in CouchDB-land, and many other open source projects: criticism and a little time to reflect on not so awesome features. And then moving on to make it better. If the feedback cycle doesn't happen, there's no development or evolution — just stagnation.
And in retrospect I wish more people would share their opinion on CouchDB and this situation more often. Since I'm personally invested in CouchDB, it's hard to say certain things. Honesty is sometimes brutal, but it's necessary.
In summary, a CouchDB user like Ubuntu1 (or Canonical) doesn't have the civic duty to give feedback, but to desert a project while pretending to be an Open Source vendor, and not talking to the community of the project or sharing your issues in public, that is extremely unhelpful.
Overall it strikes me that the only thing to date known about Canonical's collaboration with CouchDB is the support for OAuth in CouchDB. And most people don't even know about that (or wouldn't know how to use it). It worries me personally to not know the kind of problems Canonical ran into because they seem so messed up that they couldn't be discussed in public.
One thing I was able to extract is: CouchDB doesn't scale.
Thanks! But no thanks.
I wrote a book on CouchDB and I pretty much used it all, or at least looked at it very, very closely. I also get plenty of experience with CouchDB due to my job. Indeed, there are many situations where CouchDB doesn't scale or where it becomes extremely hard to make it scale. Situations where the user is better of putting data somewhere else.
Myself (and I'm assuming others) enjoy to learn the reasons why things break, so we can take this experience and use it going forward. If this doesn't happen we might as well all subscribe to the koolaid of a closed source vendor and purchase update subscriptions, install security packs and happily live ever after.
Another piece of information I gathered from the various emails written is that Canonical maintained CouchDB-specific patches for Ubuntu1. However, it's unknown what the purpose of these patches were. For example, if these patches made CouchDB scale (magically) for Ubuntu1 or if the patchset added a new feature.
What I'd really like to know is why these patches were not discussed in the open and why no one worked with the project on incorporating them into upstream. The upstream is the Apache CouchDB project.
This is another example of where communication went horribly wrong or just didn't happen.
I'm a little torn here and I don't want to offend anyone (further) especially since I know a couple Couchbase'rs or original CouchOne'rs (Hello to Jan, JChris and Mikeal) in person, but seriously: a lot of people realized that CouchOne stopped being The CouchDB company a long time ago.
This is not to say that the CouchDB project members who are employed by CouchOne/Couchbase are not dedicated to CouchDB. But if I take a look at the mobile strategy and the more or less recent integration of CouchDB with Membase/Memcache, I must notice that these strategies are far away from Apache CouchDB. Big data (whatever that means), to mobile and back.
The conclusion is that the majority of work done will not be merged into Apache CouchDB and this is one of the reasons why the Apache CouchDB project hasn't evolved much in a long time.
I realize that when a company has a different strategy, not everything they do can be send upstream. After all, most if not all companies operate in a world where money is to be made and goals are to be met. Nothing wrong there.
But let's take a look at the one project which could have been dedicated to Apache CouchDB: the documentation project.
CouchOne hired an ex-MySQL'er to write really great documentation for CouchDB. The documentation made sense, it was up to date with releases, contained lots examples and what not. But it was never contributed to the open source project. The documentation is still online today, though it's now the documentation of the Couchbase Server HTTP API.
So in my opinion the biggest news is not that Canonical stopped using CouchDB and it's also not outrageous to think that there can be one CouchDB company. The biggest news is that Couchbase officially said: "It's not us!".
Having said that and also not knowing much about Canonical's setup and scale, I still fail to even remotely understand why they didn't work with Cloudant who spezialize in making CouchDB scale all along.
Of course it is unfair to single them (Couchbase employees) out like that. For the record, there are pretty vivid projects such as GeoCouch which are also funded by Couchbase and while being devoted to the project, these guys also have to meet goals for their company.
Add to that, that other CouchDB contributors involved have not driven sustantial user-facing changes in Apache CouchDB either. CouchDB is still a very technical project with a lot of technical issues to solve. The upside to this situation is that while other NoSQL vendors add new buzzwords to each and every CHANGELOG, CouchDB is very conservative and stability driven. I appreciate that a lot.
User-facing changes on the other side are just as important for the health of a project. Subtle changes aside, but today's talks on for example querying CouchDB are extremely similar to those talks given a year or two ago. Whatever happens in this regard is not visible to users at all.
Take URL rewriting, virtualhosts and range queries as examples for features. I question:
Users need to have the ability to grasp what's on the roadmap for CouchDB. There needs to be a way for not so technical users to provide feedback which is actually incorporated into the project. All of these things aside from opening issues in a monster like Jira.
Since no one bothers currently, this is not going to happen soon.
Pretty candid stuff.
In terms of marketing and with a lack of an official CouchDB company, the CouchDB project has taken a PostgreSQL-attitude in the last two years.
In a nutshell:
We don't give a damn if you don't realize that our database is better than this other database.
This is a little dangerous for the project itself because when I look at the cash other NoSQL vendors pour into marketing for their NoSQL database, I realized quickly that with the lack of support this project can go away pretty soon.
CouchDB being an Apache project doesn't save me or anyone either: clean intellectual property, deserted, for forever.
The various larger companies (let's say Cloudant and Meebo) are basically employed with their own forks with maybe too little reason to merge anything back to upstream yet. There are independent contributors Enki Multimedia who contribute to core but also sub projects like CouchApp.
And then, there's Couchbase which is trying to tie CouchDB behind Memcached. And from what I can tell pretty much abondens HTTP and other slower CouchDB principals in the process.
You saw it coming: it depends!
Dear Jan, I'm still thinking about the email you wrote while I write my own blog entry. And honestly, that email and the general response raised more questions for myself and others than it answered.
I'd like to emphasize a difference I see (thanks, Lukas):
Is the core of Apache CouchDB alive? — It's not dead.
There is a lot of innovation going on in CouchDB's ecosystem.
Most notable, the following projects come to mind:
Need more? Check out CouchDB in the wild which I think is more or less up to date.
Hate it or love it — there is plenty of innovating going on. And many (if not all) CouchDB committers are a part of it.
The innovation just doesn't happen in CouchDB's core.
My closing words are that I don't plan on migrating anywhere else. If anything, we have mostly migrated to BigCouch.
For Apache CouchDB, I think it's important that someone fills that void. That can be either a company, a BDFL or more engaging project leaders (plural). I think this is required so the project continues vividly.
Because I would really like to see the project survive.
by Till Klampaeckel (till@php.net) at December 21, 2011 04:15 PM
If you've ever wondered whether there's a tool that will automatically compress JavaScript files when you're pushing your CouchApp somewhere, the answer is that there exists at least one: couchapp-compress.
Basically, couchapp-compress is a small Ruby script that wraps the couchapp command line tool. It compresses a CouchApp's JavaScript files and puts them altogether into one file, and temporarily changes the CouchApp so that instead of all the single uncompressed JavaScript files the compressed one is used. It pushes the CouchApp and then restores the previous state, so that again everything looks like before couchapp-compress was executed.
Check out the README for more details if you're curious and want to give it a try.
parse() method: Invitees = Backbone.Collection.extend({
model: Models.Invitee,
url: function( models ) {
return '/invitees?' + ( models ? 'ids=' + _.pluck( models, 'id' ).join(',') : '' );
},
parse: function(response) {
debugger;
return _(response.rows).map(function(row) { return row.doc; });
}
});There is nothing actually wrong with that parse statement. It converts the results of a CouchDB query into a list of attributes that can be used to build individual Invitee models. As can be seen from the parse() method, the CouchDB response contains a "rows" attribute with the actual invitee documents / attributes. The "rows" in that response contain several attributes. The only one that is needed to create an Invitee Backbone model is the "doc" attribute, which contains the entire JSON representation of a Person/Invitee:{
"_id": "6bb06bde80925d1a058448ac4d006f6e",
"_rev": "3-231654f6914afe2e20eb57a41ec8497a",
"firstName": "Black",
"lastName": "Francis",
"type": "Person"
}Checking one of the objects in the response in the debugger, I find that, indeed, the person is included in the list of results:parse method should work fine. And it does. The problem is that the existing models in the collection look like:attributes: Object
id: "6bb06bde80925d1a058448ac4d006f6e"That is just the model placeholder until the real thing can be fetched from the server. But, when the response is fetched from CouchDB, it comes back as:doc: ObjectDo you see the problem there? I did not. Not for quite some time. The problem is that the existing placeholder has an "id" attribute, but the replacement from CouchDB has an "_id" attribute. CouchDB puts an underscore in front of the ID to indicate that it is meta data. Far be it for me to argue the wisdom of doing so, but it sure screws things up for me here.
_id: "6bb06bde80925d1a058448ac4d006f6e"
_rev: "3-231654f6914afe2e20eb57a41ec8497a"
firstName: "Black"
lastName: "Francis"
type: "Person"
parse() I copy the "_id" attribute to "id": Invitees = Backbone.Collection.extend({
// ...
parse: function(response) {
return _(response.rows).map(function(row) {
var doc = row.doc;
doc['id'] = doc['_id'];
return doc;
});
}
});And with that, I have no more phantom invitees:by Chris Strom (noreply@blogger.com) at November 11, 2011 05:31 AM
{
"_id": "6bb06bde80925d1a058448ac4d004fb9",
"_rev": "2-7fb2e6109fa93284c19696dc89753102",
"title": "Test #7",
"description": "asdf",
"startDate": "2011-11-17",
"invitees": [
"6bb06bde80925d1a058448ac4d0062d6",
"6bb06bde80925d1a058448ac4d006758",
"6bb06bde80925d1a058448ac4d006f6e"
]
}And, to get that invitees attribute loading actual Invitee models, I had to add a relations attribute to my Appointment relational-model: Appointment = Backbone.RelationalModel.extend({
// ...
relations: [
{
type: Backbone.HasMany,
key: 'invitees',
relatedModel: 'Invitee',
collectionType: 'Invitees'
}
],
// ...
});At this point, I can load the invitees in the Javascript console, but they are no longer showing up in appointment dialog. For that, I need to replace the loadInvitees method call from my pre-backbone-relational days with the fetchRelated() method from backbone-relational: Appointment = Backbone.RelationalModel.extend({
// ...
initialize: function(attributes) {
if (!this.id)
this.id = attributes['_id'];
this.fetchRelated("invitees");
// this.loadInvitees();
},
// ...
});With that change, I have an invitees collection in the "invitees" attribute of my model. To make use of that, I pass said collection to the collection view (but only if people have been invited): var AppointmentEdit = new (Backbone.View.extend({
// ...
showInvitees: function() {
$('.invitees', this.el).remove();
$('#edit-dialog').append('<div class="invitees"></div>');
if (this.model.get("invitees").length == 0) return this;
var view = new Invitees({collection: this.model.get("invitees")});
$('.invitees').append(view.render().el);
return this;
}
}));Amazingly, that works! It turns out that I was not that far away from success last night after all.fetch() model that I wrote a few nights back. Specifically, the collection overrides fetch to individually retrieve the models specified by the list of IDs, manually triggering the "reset" event when complete.app.get('/invitees', function(req, res){
var options = {
method: 'POST',
host: 'localhost',
port: 5984,
path: '/calendar/_all_docs?include_docs=true'
};
var couch_req = http.request(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
couch_response.pipe(res);
}).on('error', function(e) { /* ... */ });
var ids = req.param('ids').split(/,/);
couch_req.write(JSON.stringify({"keys":ids}));
couch_req.end();
});Admittedly, that is somewhat exotic, but it works. Now, I need to be able to GET the /appointments resource with a query parameter of a comma separated list of strings.url() method of my collection (url can be either property or method in Backbone): Invitees = Backbone.Collection.extend({
model: Models.Invitee,
url: function( models ) {
return '/invitees' + ( models ? '?ids=' + _.pluck( models, 'id' ).join(',') : '' );
},
parse: function(response) {
return _(response.rows).map(function(row) { return row.doc;});
}
});(I also have to parse() the results coming back from CouchDB to ensure that they map into an array of Model attributes).by Chris Strom (noreply@blogger.com) at November 10, 2011 05:08 AM
draw_calendar(), which takes an ISO 8601 string that can draw the basic calendar (without the Backbone application adding appointments): function draw_calendar(year_and_month) {
$('.year-and-month', 'h1').html(' (' + year_and_month + ') ');
reset_calendar();
add_dates_to_calendar(year_and_month);
};The first line in there now displays the calendar date:draw_calendar('2011-09');Then I am treated to a blank September calendar:startDate:2011-10:➜ calendar git:(pagination) curl http://localhost:5984/calendar/_design/appointments/_view/by_month\?key\='"2011-10"'Yay!
{"total_rows":18,"offset":12,"rows":[
{"id":"4a5600f5b2e36fc99d24fe9b8700037d",
"key":"2011-10",
"value":{
"_id":"4a5600f5b2e36fc99d24fe9b8700037d",
"_rev":"5-181418fffeacdc08b5fdda91525978b6",
"title":"Update dialog errors",
"description":"asdf1",
"startDate":"2011-10-05"}},
{"id":"8b5c80c0211068428272af4784000451",
"key":"2011-10",
"value":{
"_id":"8b5c80c0211068428272af4784000451",
"_rev":"1-87f5be84687eada2cd178c8dab7aa34c",
"title":"Finish Beta",
"description":"Book important",
"startDate":"2011-10-31"}},
{"id":"8b5c80c0211068428272af478400f7bb",
"key":"2011-10",
"value":{
"_id":"8b5c80c0211068428272af478400f7bb",
"_rev":"2-b1ce8c6c815a11f7b78b7db412988451",
"title":"Go to bed early",
"description":"asdf",
"startDate":"2011-10-22"}},
{"id":"8b5c80c0211068428272af478400fc93",
"key":"2011-10",
"value":{
"_id":"8b5c80c0211068428272af478400fc93",
"_rev":"1-b19359520433df557ad2fa0d56165f24",
"title":"Validations",
"description":"description should be required",
"startDate":"2011-10-03"}},
{"id":"956a5c19fd866a6a024bbb4c39002e3b",
"key":"2011-10",
"value":{
"_id":"956a5c19fd866a6a024bbb4c39002e3b",
"_rev":"2-0bbb8660f7f116cc813e0fd9093cec6a",
"title":"Has Description",
"description":"asdf",
"startDate":"2011-10-13"}},
{"id":"956a5c19fd866a6a024bbb4c390031a2",
"key":"2011-10",
"value":{
"_id":"956a5c19fd866a6a024bbb4c390031a2",
"_rev":"2-0568aded9df520a0c1c8bb2ee8961156",
"title":"In-dialog errors","description":"asdf",
"startDate":"2011-10-04"}}
]}
_all_doc resource it had been using.Apppointments Backbone collection how to interact with this new resource requires almost no changes at all. Only the parse() method needs to be updated to return appointments from the value attribute of my query: var Collections = (function() {
var Appointments = Backbone.Collection.extend({
model: Models.Appointment,
url: '/appointments',
parse: function(response) {
return _(response.rows).map(function(row) { return row.value ;});
}
});
return {Appointments: Appointments};
})();To initialize the collection, I now need to pass the date query parameter to the Appointments URL. This is done by passing a data option to the collection's fetch() method (just like with jQuery's ajax() call):// ...
// Initialize the app
var appointments = new Collections.Appointments();
new Views.Application({collection: appointments});
var today = new Date(),
year = today.getFullYear(),
month = today.getMonth() + 1,
year_and_month = year + '-' + pad(month);
appointments.fetch({data: {date: year_and_month}});With that working, all that remains is the ability to switch months.september to last month's date, then draw_calendar(september) (to get the calendar itself drawn correctly). Lastly, I tell the collection to load in September's appointments via a calendar.appointments.fetch({data: {date: september}}):by Chris Strom (noreply@blogger.com) at October 09, 2011 04:01 AM
Got response: 400 localhost:5984/calendar/undefinedHrm... Checking the CouchDB logs, I see that I have a least got the PUT part of the update working:
{ error: 'bad_request', reason: 'Invalid rev format' }
[info] 127.0.0.1 - - 'PUT' /calendar/undefined 400The PUT is correct, but the "undefined" is a clear indication that whatever I am updating does not have an
"id" attribute.Backbone.sync() is not sending the JSON representation of the model, but rather it is sending the model itself: var faye = new Faye.Client('/faye');
Backbone.sync = function(method, model, options) {
faye.publish("/calendars/" + method, model);
}So adding a toJSON() ought to fix the problem: var faye = new Faye.Client('/faye');
Backbone.sync = function(method, model, options) {
if (model.toJSON) model = model.toJSON();
faye.publish("/calendars/" + method, model);
}Except it has no effect whatsoever. And really, it should not have an effect. If I am always sending a Backbone model object my overridden Backbone.sync() method, then something must already be calling toJSON() on my models. Since Backbone.sync() is sending directly to faye, it must be faye that is calling toJSON(). And in fact it is. Looking through the faye source, for each of the various transports, I see Faye.toJSON:Faye.Transport.WebSocket = Faye.extend(Faye.Class(Faye.Transport, {
// ...
request: function(messages, timeout) {
this._timeout = this._timeout || timeout;
this._messages = this._messages || {};
Faye.each(messages, function(message) {
this._messages[message.id] = message;
}, this);
this.withSocket(function(socket) { socket.send(Faye.toJSON(messages)) });
},
// ...
});Ah, so I am lucky that Faye has that toJSON() wrapper. That or faye is just a fabulous choice for a Backbone transport. At the very least, it is not the cause of my woes."_id" or "_rev" attributes needed on the server:client.subscribe('/calendars/update', function(message) {
// HTTP request options
var options = {
method: 'PUT',
host: 'localhost',
port: 5984,
path: '/calendar/' + message._id,
headers: {
'content-type': 'application/json',
'if-match': message._rev
}
};
// ...
});So I finally take the advice of Recipes with Backbone co-author and put mapping of "_id" and "_rev" attributes into the Backbone.sync() method: Backbone.sync = function(method, model, options) {
var message = model.toJSON();
if (!message._id && message.id) message._id = message.id
if (!message._rev && message.rev) message._rev = message.rev
faye.publish("/calendars/" + method, message);
}I assign an intermediate message variable so that setting attributes does not affect the real model.Got response: 201 localhost:5984/calendar/8b5c80c0211068428272af478400df1eThat is a fine stopping point for tonight. I think that I might have resolved all of my faye transport issues (and issues that I noticed after better testing with faye than in the original). I will do some more monkey testing tomorrow and then possibly move on to other areas to explore.
{ ok: true,
id: '8b5c80c0211068428272af478400df1e',
rev: '4-2929cf0e4933a4e32474145eb3e79f02' }
by Chris Strom (noreply@blogger.com) at October 04, 2011 04:09 AM
Backbone.sync(), in my Backbone application, I already have delete requests being sent to the /caledars/delete faye channel: var faye = new Faye.Client('/faye');
Backbone.sync = function(method, model, options) {
faye.publish("/calendars/" + method, model);
}Thanks to simple client logging: _(['create', 'update', 'delete', 'read', 'changes']).each(function(method) {
faye.subscribe('/calendars/' + method, function(message) {
console.log('[/calendars/' + method + ']');
console.log(message);
});
});...I can already see that, indeed, deletes are being published as expected:/calendars/delete faye channel in my express.js server. I already have this working for adding appointments, so I can adapt the overall structure of the /calendars/add listener for /calendars/delete:client.subscribe('/calendars/delete', function(message) {
// HTTP request options
var options = {...};
// The request object
var req = http.request(options, function(response) {...});
// Rudimentary connection error handling
req.on('error', function(e) {...});
// Send the request
req.end();
});The HTTP options for the DELETE request are standard node.js HTTP request parameters: // HTTP request options
var options = {
method: 'DELETE',
host: 'localhost',
port: 5984,
path: '/calendar/' + message._id,
headers: {
'content-type': 'application/json',
'if-match': message._rev
}
};Experience has taught me that I need to send the CouchDB revisions along with operations on existing records. The if-match HTTP header ought to work. The _rev attribute on the record/message sent to /calendars/delete holds the latest revision that the client holds. Similarly, I can delete the correct object from CouchDB by specifying the object ID in the path attribute—the ID coming from the _id attribute of the record/message./calendars/remove channel.http.request(). This accumulator callback accumulates chunks of the reply into a local data variable. When all of the data has been received, the response is parsed as JSON (of course it's JSON—this is a CouchDB data store), and the JSON object is sent back to the client: var req = http.request(options, function(response) {
console.log("Got response: %s %s:%d%s", response.statusCode, options.host, options.port, options.path);
// Accumulate the response and publish when done
var data = '';
response.on('data', function(chunk) { data += chunk; });
response.on('end', function() {
var couch_response = JSON.parse(data);
console.log(couch_response)
client.publish('/calendars/remove', couch_response);
});
});Before hooking a Backbone subscription to this /calendars/remove channel, I supply a simple logging tracer bullet: faye.subscribe('/calendars/remove', function(message) {
console.log('[/calendars/remove]');
console.log(message);
});So, if I have done this correctly, clicking the delete icon in the calendar UI should send a message on the /calendars/delete channel (which we have already seen working above). The faye subscription on the server should be able to use this to remove the object from the CouchDB database. Finally, this should result in another message being broadcast on the /calendars/remove channel. This /calendars/remove message should be logged in the browser. So let's see what actually happens...remove() method of the collection? faye.subscribe('/calendars/remove', function(message) {
console.log('[/calendars/remove]');
console.log(message);
calendar.appointments.remove(message);
});Well, no. It is not that simple. That has no effect on the page. But wait...remove() method looks up models to be deleted via get(): _remove : function(model, options) {
options || (options = {});
model = this.getByCid(model) || this.get(model);
// ...
}The get() method looks up models by the id attribute: // Get a model from the set by id.
get : function(id) {
if (id == null) return null;
return this._byId[id.id != null ? id.id : id];
},The id attribute was set on the message/record received on the /calendars/remove channel: var Appointment = Backbone.View.extend({
initialize: function(options) {
// ....
options.model.bind('destroy', this.remove, this);
options.model.bind('error', this.deleteError, this);
options.model.bind('change', this.render, this);
},
// ...
});It turns out that the destroy event is only emitted if the default Backbone.sync is in place. If the XHR DELETE is successful, a success callback is fired that emits destroy. Since I have replaced Backbone.sync, that event never fires.remove event. So all I need to change in order to make this work is to replace destroy with remove: var Appointment = Backbone.View.extend({
initialize: function(options) {
// ....
options.model.bind('remove', this.remove, this);
options.model.bind('error', this.deleteError, this);
options.model.bind('change', this.render, this);
},
// ...
});And it works! I can now remove all of those fake appointments:by Chris Strom (noreply@blogger.com) at October 02, 2011 09:19 PM
sync() method in my Backbone.js model to achieve an added layer of persistence. In addition to the normal REST persistence, my model also persists newly created appointments on a Faye pub-sub channel: var Appointment = Backbone.Model.extend({
urlRoot : '/appointments',
initialize: function(attributes) {
// ...
this.faye = new Faye.Client('/faye');
},
// ...
sync: function(method, model, options) {
if (method == "create") {
this.faye.publish("/calendars/public", model);
}
Backbone.sync.call(this, method, this, options);
}
});As my esteemed Recipes with Backbone co-author pointed out yesterday, it might make sense to switch entirely to Faye for persistence. It is hard for me to wrap my brain around all of the implications for such a change. At the very least, it is going to break my tests, which stub out XHR REST calls (via sinon.js). That aside, will it clean up my backend code? Backbone.sync() to send any sync requests for create, update, delete or read to a faye channel named accordingly: var faye = new Faye.Client('/faye');
Backbone.sync = function(method, model, options) {
faye.publish("/calendars/" + method, model);
}
// Simple logging of Backbone sync messages
_(['create', 'update', 'delete', 'read']).each(function(method) {
faye.subscribe('/calendars/' + method, function(message) {
console.log('[/calendars/' + method + ']');
console.log(message);
});
});With that, when I reload my funky calendar Backbone application, I see an empty calendar:fetch() of the collection: // Initialize the app
var appointments = new Collections.Appointments;
new Views.Application({collection: appointments});
appointments.fetch();It can be argued that I should not be fetching here, which requires a round trip to the server. The Backbone documentation itself suggests fetching the data in the backend (node.js / express.js in my case). The data can then be interpolated into the page as a Backbone reset() call. Personally, I prefer serving up a static file that shows something almost immediately followed by quick requests to populate the page with useful, actionable information./calendars/read channel. Doing faye things on the server is relatively easy. I already have the faye node.js adapter hooked in to my express.js application. I can then call to getClient() to gain access to client actions like subscribe():// Faye server
var bayeux = new faye.NodeAdapter({mount: '/faye', timeout: 45});
bayeux.attach(app);
// Faye clients
var client = bayeux.getClient();
client.subscribe('/calendars/read', function() {
// do awesome stuff here
});Now, when the client receives a message on the "read" channel, I can do awesome stuff. In this case, I need to read from my CouchDB backend store:client.subscribe('/calendars/read', function() {
// CouchDB connection options
var options = {
host: 'localhost',
port: 5984,
path: '/calendar/_all_docs?include_docs=true'
};
// Send a GET request to CouchDB
var req = http.get(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
// Accumulate the response and publish when done
var data = '';
couch_response.on('data', function(chunk) { data += chunk; });
couch_response.on('end', function() {
var all_docs = JSON.parse(data);
client.publish('/calendars/reset', all_docs);
});
});
// If anything goes wrong, log it (TODO: publish to the /errors ?)
req.on('error', function(e) {
console.log("Got error: " + e.message);
});
});This is a bit more work than with the normal REST interface. With pure REST, I could make the request to CouchDB and pipe() the response back to the client. Backbone (more accurately jQuery) itself takes care of parsing the JSON. Here, I have to accumulate the data response from CouchDB and parse it into a JSON object to be published on a Faye channel. I could send back a JSON string, requiring the client to parse, but that feels like bad form. Faye channels can transmit actual data structures, so that is what I ought to do./calendars/reset channel because that is what the client will do with this information—reset the the currently empty appointments collection: window.calendar = new Cal();
faye.subscribe('/calendars/reset', function(all_docs) {
console.log('[/calendars/reset]');
console.log(all_docs);
calendar.appointments.reset(all_docs);
});Upon reloading the page, however, I still see no appointments on the calendar. In the Javascript console, I can see that the /calendar/read message is still going out. I also see that I am getting a response back that includes the ten appointments already scheduled for this month:/calendars/reset channel as expected. It is the CouchDB query results as expected, but something is going wrong in the call to reset() on the appointments collection. Probably, something related to the "Uncaught ReferenceError: description is not defined" error message at the bottom of the Javascript console.reset() or add() need to be run through parse() first. Well, not always, just when parse() does something with the data. Something like I had to do with the CouchDB results: var Appointments = Backbone.Collection.extend({
model: Models.Appointment,
parse: function(response) {
return _(response.rows).map(function(row) { return row.doc ;});
}
});Anyhow, the fix is easy enough—just run the results through parse(): faye.subscribe('/calendars/reset', function(message) {
console.log('[/calendars/reset]');
console.log(message);
var all_docs = calendar.appointments.parse(message);
calendar.appointments.reset(all_docs);
});With that, I have my calendar appointments again populating my calendar:sync() function.Backbone.sync() did the trick. On the minus side, I had to do a little more work to convert CouchDB responses into real Javascript data structures. That is not a huge negative (and one that I can easily push into a helper function). Still outstanding it how this will affect the entire application. Also, I have the feeling that I could choose faye channel names better. Questions for another day...by Chris Strom (noreply@blogger.com) at October 02, 2011 08:42 PM
/calendars/udpate faye channel on my backend. My debug subscription in the client verifies that the message is being published to that channel:client.subscribe('/calendars/update', function(message) {
// HTTP request options
var options = {...};
// The request object
var req = http.request(options, function(response) {...});
// Rudimentary connection error handling
req.on('error', function(e) {...});
// Write the PUT body and send the request
req.write(JSON.stringify(message));
req.end();
});The pattern is to set HTTP options, here a PUT, to update the existing record: // HTTP request options
var options = {
method: 'PUT',
host: 'localhost',
port: 5984,
path: '/calendar/' + message._id,
headers: {
'content-type': 'application/json',
'if-match': message._rev
}
};(the if-match is a CouchDB optimistic locking thing)/calendars/changes channel: // The request object
var req = http.request(options, function(response) {
console.log("Got response: %s %s:%d%s", response.statusCode, options.host, options.port, options.path);
// Accumulate the response and publish when done
var data = '';
response.on('data', function(chunk) { data += chunk; });
response.on('end', function() {
var couch_response = JSON.parse(data);
client.publish('/calendars/changes', couch_response);
});
});Last I send the message/record via the request object and close the request so that the CouchDB server knows that I have no more HTTP PUT data to send: // Write the PUT body and send the request
req.write(JSON.stringify(message));
req.end();If all goes according to plan, the messages that I already know are being sent on /calendars/update will be seen by my server-side subscription, which will tell CouchDB to update the record and finally the browser will see the update on the /calendars/changes channel.Got response: 201 localhost:5984/calendar/66543e3457df7597f0e41764e500067cAnd I even see that response back in the browser:
{ ok: true,
id: '66543e3457df7597f0e41764e500067c',
rev: '3-243c4d7084fcdcb7c728917a73e94b97' }
Got response: 409 localhost:5984/calendar/66543e3457df7597f0e41764e500067c
{ error: 'conflict',
reason: 'Document update conflict.' }This is because the revision ID that is stored in the Backbone model is now out of date. I need to take the revision returned from the first update and ensure that model becomes aware of it. Otherwise, CouchDB's optimistic locking kicks in, rejecting the update.by Chris Strom (noreply@blogger.com) at October 02, 2011 08:40 PM
/calendars/update faye channel. The updated information is sent back from the server on the /calendars/changes faye channel:rev attribute. If a subsequent update is sent with the old rev, CouchDB's optimistic locking will kick in and reject the update:Got response: 409 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95So, in my client, I subscribe to the
{ error: 'conflict',
reason: 'Document update conflict.' }
/calendars/changes channel. In the subscription's callback, I use the revision published by the server to update the Backbone model: faye.subscribe('/calendars/changes', function(message) {
console.log('[/calendars/changes]');
console.log(message);
var model = calendar.appointments.get(message);
model.set({rev: message.rev});
model.set({_rev: message.rev});
});With, that, I can update an appointment. And update it again... and it works:Got response: 201 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95Yay! Except... immediately after I see the successful update, I see:
{ ok: true,
id: 'cd823d4aaaf358069f9a800410000b95',
rev: '7-55b3b95b6996072aa1519b6070cf12b6' }
Got response: 201 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95
{ ok: true,
id: 'cd823d4aaaf358069f9a800410000b95',
rev: '8-a06160ea8bef50bd71d92a73db61c14b' }
Got response: 201 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95When I update again, I successfully update the record, which is followed immediately by two conflicts:
{ ok: true,
id: 'cd823d4aaaf358069f9a800410000b95',
rev: '8-a06160ea8bef50bd71d92a73db61c14b' }
Got response: 409 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95
{ error: 'conflict',
reason: 'Document update conflict.' }
Got response: 201 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95Ah. I know what that is. And I will fix it. But first, I need to finish proof reading the recipes that taught me how to solve it.
{ ok: true,
id: 'cd823d4aaaf358069f9a800410000b95',
rev: '9-c09e4a0a8bbd5cc25a9e67b19b6a254b' }
Got response: 409 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95
{ error: 'conflict',
reason: 'Document update conflict.' }
Got response: 409 localhost:5984/calendar/cd823d4aaaf358069f9a800410000b95
{ error: 'conflict',
reason: 'Document update conflict.' }
by Chris Strom (noreply@blogger.com) at October 02, 2011 08:37 PM
At Couchbase we are looking for experienced hackers to help us build the fastest, most reliable distributed database on the planet. You don't need to a be expert already, but you should be ready to learn the ins and outs of distribute database systems, including:
More info here: http://www.couchbase.com/company/jobs Or you can send your resume and qualifications to me here: damien@couchbase.com
>On Sep 23, 2011, at 1:40 AM, XXXX XXXXX wrote:
>
>Hi Damien,
>
>Greeting from XXXXX XXXXXX;
>
>Im running a small company with history in the mobile enterprise space
>
>We are just about to get some seed funding to build sqllite sync
>technology for mobile devices;
>
>I came across CouchBase extremely cool;
>
>We are planning to offer some of same features;
>
>Offline access
>Smart sync
>Bandwidth optimisation
>
>It would be good to get any advice or pointers you might have in
>terms of building sync technology for mobile
>
>All the best,
>
>XXXX XXXXX,
Hello! I would say that mobile sync is a deceptively hard problem to get all the nice properties you want. I suggest you look at how Couchbase replication works and try to duplicate it, and ideally, try to interoperate with it.
Some of the properties you probably want:
Incremental replication - The ability to stop and restart replication and not lose all your progress. Vital in a mobile environment where connections are slow and flaky.
Concurrency -You want to be able to use the local and the remote the databases while it's getting sync'd/replicated, no global locking. So the app is usable at all times and syncing in the background.
Conflict management - You need plan for how you'll deal with and manage edit conflicts.
Partial replication - Having replicas that only hold a interesting subset of other replicas. Important when sharing a large data set, but mobile clients only need a portion of it.
Ad hoc Topology - Couchbase supports ad hoc topology, any machine can sync with any other machine without prior knowledge. This is much more flexible than a single centralized sync point or fixed topology. Though many deployments will only need a single sync point, often new ones will need to be added.
Schema upgrade - Couchbase is schemaless, so it's easy to add new field/properties without breaking things. If using a schema, it's difficult to upgrade remote clients when they have new data in older schemas, etc.
Security - the ability to refuse updates if the come from unauthorized sources.
Anyway, Couchbase and CouchDB has worked out these problems and is successful in production on millions of machines. It's not the only way to build a sync scheme, but it's one of the most successful.
-Damien
The purpose of business analytics is to find data from the company's information systems that can be used to support decision making. What customers buy most? What do they do before a buying decision? What are the signs that a customer may be leaving?
For the last month we've been working in Salzburg to build such a system, the Intelligent Project Controlling Tool needed for running large collaborative research projects like IKS. Since the design we went with can be reused for other business analytics needs, I wanted to write a bit about it.
But first, here is how our system looks like:

There are many ways to gather business data. Often the information systems already contain the data needed. But it may also be hidden in a jungle of spreadsheets. Or maybe some data is simply not available, and has to be filled in manually.
Handling all these cases in one system is a tricky question. To solve it, we went with a two-layered strategy:
In IKS's case, much of the data was available in a series of spreadsheets. With these, we built the necessary workflows for first converting the spreadsheets into XML with Apache Tika, and then extracting the information from them in a sensible subset of JSON-LD.
Because IKS is a collaborative project, information needs to be gathered from a diverse group of partner organizations. Some of them have systems that provide the needed APIs (like Basecamp, which we use), and we can just periodically import the data. But with many we decided on a simple data interchange approach: spreadsheets handled over email.
In this approach, user files a data request into the system. This gets picked up by NoFlo, which sends an email with the appropriate spreadsheet template to the partner. Then it starts waiting for a reply. When a reply arrives, it extracts the data from the attached spreadsheet and imports it to the system.
Our NoFlo processes are mostly initiated by the CouchDB change notification API. We keep them running persistently using forever Node, so whenever some operation needs to be run it happens nearly immediately.
With any automation, and especially with the email-based data interchange, things can go wrong. Because of this we tag all data that we receive with its origin, whether it was some automated operation or an imported spreadsheet. These origins are called execution documents. Users can browse all completed workflow executions and see what data came in from them. These can then be either accepted or rejected.
This way if some partner accidentally sends faulty data, or something else breaks, the incorrect information received can be easily removed. CouchDB's versioning capabilities help here.
CouchDB is built on top of the concept of map/reduce. Here you can modify and combine the data in lots of different ways using simple JavaScript functions. In our case we elected to write all our CouchDB code in CoffeeScript for simplicity. For example, here is the reduce function in CoffeeScript that counts totals of time planned, time used, and time left per task or partner in a project:
(keys, values, rereduce) ->
roundNumber = (rnum, rlength) ->
Math.round(parseFloat(rnum) * Math.pow(10, rlength)) / Math.pow(10, rlength)
data =
planned: 0.0
spent: 0.0
left: 0.0
if rereduce
for reducedData in values
data.planned += reducedData.planned
data.spent += reducedData.spent
data.left = data.planned - data.spent
return data
for doc in values
if doc['@type'] is 'effortallocation'
data.planned += roundNumber doc.value, 1
if doc['@type'] is 'effort'
data.spent += roundNumber doc.value, 1
data.left = roundNumber data.planned - data.spent, 1
return data
If you figure out a new way to look at the data you have, simply write the needed map and reduce functions and save them into the database. CouchDB will then run them against existing data and produce numbers.
Numbers are good, but to really see the information buried in them you need some visualizations. For this we decided to follow the CouchApp idea where the user interface code is stored in the database together with the data itself. This way no application servers are needed, and you can take the whole system with you just by replicating the database. Think of the possibility of doing some analysis on your company while flying to a meeting!
The visuals are in our case provided by JavaScript InfoVis Toolkit, a nice, MIT-licensed interactive graph library.
CouchDB views handle the number crunching, then CouchDB list functions process the numbers into the format needed for visualization. This leaves only a minimal amount of work for the client side.
For consistency our application has been built with CoffeeApp, so all the database and user interface code is in CoffeeScript.
Any business analytics system dealing with moderate amounts of data can be built following this approach.

This way you have a business analytics environment that is easy to extend with more data when it becomes available. New analysis can be done by writing reasonably simple map/reduce functions, and CouchDB's replication capabilities allow you to take the system and data with you.
Using JSON-LD for the data storage makes a lot of sense, as this way the relations between different pieces of information are easy to handle. And using URIs for data identifiers means you can easily mash up information coming from different sources together.
The two-layered approach of using NoFlo for data imports, and CouchDB for analysis also allows for clean separation of concerns. In our case, I did the workflow part of things, and Szaby built the visualizations.
by Henri Bergius (henri.bergius@iki.fi) at September 21, 2011 05:52 PM
The FOSS4G 2011 is over now. Time for a small report. The crowd was amazing and it was again the ultimate gathering of the Free and Open Source for Geospatial developer tribe. Solid presentations and great evenings.
I'm really happy how my talk went, I really enjoyed it. The were lots of people (although there was a talk from Frank Warmerdam at the same time) asking interesting questions at the end.
The talk is not only about GeoCouch but also gives you an overview of some of the features it leverages from Apache CouchDB. In the end you should have an overview why you might want to use GeoCouch for your next project.
You can get the slides right here.
I was happy to see that there was another talk about GeoCouch. Other talks I really enjoyed were:
And of course there were also great talks from in the plenary sessions from Paul Ramsey about Why do you do that? An exploration of open source business models and Schuyler Erle's so funny lightning talk about Pivoting to Monetize Mobile Hyperlocal Social Gamification by Going Viral
At the code sprint I was working on MapQuery together with Steven Ottens and Justin Penka. Steven was working on TMS support, Justin on a 6 minutes tutorial and I on making manual adding of features possible.
The OpenLayers developers did the migration from Subversion to Git for their development. OpenLayers is now available on Github.
And luckily there was a fire alarm in between to take a group photograph.
I really hope there won't be a yearly FOSS4G conference for the whole of the US. There should be regional events, as I think one big one would draw the attention away from the international conference. Why should you fly to Beijing for the FOSS4G 2012 if you can meet the majority of the developers in the US as well?
The FOSS4G was great. It was organized well and people were always out in the evenings. The only minor nitpick is that many people working remote had the city of their company in the name badge and not the one they live in. It seems that the original for you had to fill was confusing. So for next year it should perhaps say “Location where you live”. Hence I still don't believe that there were more Dutch than German people at the conference (Tik hem aan, ouwe! ;)
window.Appointment = Backbone.Model.extend({
// ...
destroy: function() {
Backbone.Model.prototype.destroy.call(this, {
headers: {'If-Match': this.get("_rev")}
});
}
});In fact, that works when I delete a pre-existing record. It is just newly created records that throw me for a loop. So what gives?{"ok":true,"id":"7acf98778a669f4d6fc33d6b340106de","rev":"1-21662a1368aa1592d1e5d1df710f6d8c"}But the actual data is stored as:{
"_id": "7acf98778a669f4d6fc33d6b340106de",
"_rev": "1-21662a1368aa1592d1e5d1df710f6d8c",
"title": "Delete me #2",
"description": "asdf",
"startDate": "2011-09-15"
}CouchDB normally represents meta data with a leading underscore ("_id", "_rev"). In the POST / create response, however, the ID and revision returned are not meta-data. Rather they are the actual data returned describing the newly created record.appointment.get("_rev") will not work but appointment.get("rev") will._rev or rev: destroy: function() {
Backbone.Model.prototype.destroy.call(this, {
headers: {'If-Match': this.get("_rev") || this.get("rev")}
});
}The problem with this approach is twofold. First, I have to remember to do this everywhere that I want to access the revision (which will definitely be necessary when I add updates). The other is that I have to remember CouchDB's meta-data policy any time I want to access these attributes. this.get("rev") and it just work—regardless of update, create or delete.
window.Appointment = Backbone.Model.extend({
get: function(attribute) {
return Backbone.Model.prototype.get.call(this, attribute) ||
Backbone.Model.prototype.get.call(this, "_" + attribute);
},
// ...
});In my new get() method, I call the get() method directly on the Backbone.Model.prototype. Since I am invoking it directly, as not as method on an instantiated object, I have to supply the object context to be used inside the method. After all, the get() method expects to be called on an object and, as such, it expects the this variable to refer to that object. call() method does just this—it sets the this variable inside the function to the first argument supplied. In this case, I supply the this variable from my Appointment model. So, in the end, the original get() is called with the same this variable with which it would have otherwise been called. destroy() method to work with both CouchDB updates and deletes:
window.Appointment = Backbone.Model.extend({
get: function(attribute) {
return Backbone.Model.prototype.get.call(this, attribute) ||
Backbone.Model.prototype.get.call(this, "_" + attribute);
},
destroy: function() {
Backbone.Model.prototype.destroy.call(this, {
headers: {'If-Match': this.get("rev")
});
},
// ...
});If I try to destroy a pre-existing record, the first Backbone.Model.prototype.get.call() in my get() will return undefined but the second one ("_rev") will return the current revision number for the records.Backbone.Model.prototype.get.call() will return the revision ID.by Chris Strom (noreply@blogger.com) at September 08, 2011 04:12 AM
<span> with a class of "delete" to my calendar event template:<script type="text/template" id="calendar-event-template">On the page, the "X" displays like:
<span class="event" title="<%= description %>">
<%= title %>
<span class="delete">X</span>
</span>
</script>
window.EventView = Backbone.View.extend({
// ...
events: {
'click .delete': 'test'
},
test: function() {console.log("delete")},
});
So, when a click event is received in this view for an element with the delete class, the "test" function is called. Sure enough, clicking on the X inside the delete <span> logs "delete" messages to the console:test function with a handler for the "click .delete" event:window.EventView = Backbone.View.extend({
// ...
events: {
'click .delete': 'deleteClick'
},
deleteClick: function() {
this.model.destroy();
},
remove: function() {
$(this.el).find('.event').remove();
}
});
It feels a little awkward having a remove() method and a deleteClick(). The former removes the UI element from the page. The latter handles clicks that should signal the model to delete itself, which will, in turn, tell the view to remove itself from the page. I will worry about the odd feeling another day. For now, I am not quite done with my delete.e.destroy({headers: {'If-Match':e.model.get("_rev")} })
It seems really wrong to me that the View should be responsible for knowing about this. But how to get the model to do this? I could create a new destroyWithRevision method on the model, but the view would still need to know to call this instead of the conventional destroy() method.
window.Event = Backbone.Model.extend({
// ...
destroy: function() {
Backbone.Model.prototype.destroy.call(this, {
headers: {'If-Match': this.get("_rev")}
});
}
});
That is slick. I call the destroy function that resides on the Backbone.Model prototype. Since I am invoking that function directly, I need to supply an object instance so that the method has a this (or self if you're a Rubyist) to which it can refer. That is the first argument to a Javascript call method. Then I can supply the arguments that inform CouchDB of the revision being deleted.destroy() method—completely unaware of this complexity. As an added bonus, I am not losing any of the benefits of optimistic locking—if the loaded model was superseded before the user clicked the "X", the delete would fail. And yes, when I click the little "X", the calendar event goes away from the UI:by Chris Strom (noreply@blogger.com) at September 05, 2011 07:32 PM
!!!Next, I create a very simple dialog in the Jade template:
html
head
title= title
link(rel='stylesheet', href='/stylesheets/style.css')
link(rel='stylesheet', href='/stylesheets/blitzer/jquery-ui.css')
script(src='/javascripts/jquery.min.js')
script(src='/javascripts/jquery-ui.min.js')
script(src='/javascripts/underscore.js')
script(src='/javascripts/backbone.js')
body!= body
#dialog(title="Add calendar event")I have the intention of eventually grabbing the values for appointments from the two dialog fields and from the
#calendar-event-start-date
p title
p
input#calendar-event-title(type="text", name="title")
p description
p
input#calendar-event-description(type="text", name="description")
#calendar-event-start-date <div> (which I will populate from the date clicked). But before I reach that point, I need to make this a jQuery-ui dialog:script
$(function() {
$('#dialog').dialog({
autoOpen: false,
modal: true,
buttons: [
{ text: "OK",
click: function() { $(this).dialog("close"); } },
{ text: "Cancel",
click: function() { $(this).dialog("close"); } }
]
});
});So far, there is absolutely nothing Backbone-y about this. I change that by adding an AppView Backbone View class:
window.AppView = Backbone.View.extend({
el: $("#dialog"),
events: {
'click .ok': 'create'
},
create: function() {
console.log("here");
Events.create({
title: "foo",
description: "bar",
startDate: "2011-09-01"});
}
});
window.AppView = new AppView;After reloading the page, I open that dialog from the Javascript console:$('#dialog').dialog('open')And I am greeted with a right proper jQuery-ui dialog:dialog() invocation. But a new Event is not create. Even the console.log() statement is not reached.el attribute to the dialog's parent: window.AppView = Backbone.View.extend({
el: $("#dialog").parent(),
// ...
});This way, the wrapper divs added by jQuery-ui become the element for this view. Also, I need to add a class to the OK button:
script
$(function() {
$('#dialog').dialog({
autoOpen: false,
modal: true,
buttons: [
{ text: "OK",
class: "ok",
click: function() { $(this).dialog("close"); } },
{ text: "Cancel",
click: function() { $(this).dialog("close"); } }
]
});
});
With that, I reach my console.log statement and I try to create my event:app.post('/events', function(req, res){
var options = {
method: 'POST',
host: 'localhost',
port: 5984,
path: '/calendar',
headers: {'content-type': 'application/json'}
};
var couch_req = http.request(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
couch_response.pipe(res);
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
couch_req.write(JSON.stringify(req.body));
couch_req.end();
});Aside from the headers and the write() of the JSON data to the CouchDB request, the remainder of this route looks very similar to stuff that I have been writing for GETs and DELETEs over the past few days. It may be time to investigate adding an abstraction layer in my express app. Another day, perhaps.by Chris Strom (noreply@blogger.com) at September 05, 2011 07:31 PM
Got response: 201 localhost:5984/calendarThere are, in fact two bugs here. The first is that my Backbone app is not sending the revision number of the newly created appointment when it comes time to delete the record. That is a somewhat understandable oversight on my part. What is not so OK is the lack of error handling that I have built. The frontend responded to the 409 as if nothing went wrong—the appointment was removed from the calendar as if nothing went wrong.
Got response: 409 localhost:5984/calendar/7acf98778a669f4d6fc33d6b3400e480
Got response: 200 localhost:5984/calendar/_all_docs?include_docs=true
app.delete('/appointments/:id', function(req, res){
var options = { /* Connection Options */ };
var couch_req = http.request(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
couch_response.pipe(res);
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
couch_req.end();
});Interesting. I had expected the 409 response from CouchDB to be considered an error by node.js's http.request(). But I am not seeing the "Got error" message logged. I am seeing the "Got response" message:Got response: 409 localhost:5984/calendar/7acf98778a669f4d6fc33d6b3400e480Ah, looking at the http.request documentation, I see that:
If any error is encountered during the request (be that with DNS resolution, TCP level errors, or actual HTTP parse errors) an 'error' event is emitted on the returned request object.The failure here is not a connection error and not technically a parse error, so I suppose that the error event should not be fired after all.
HTTP/1.1 200 OKLooking at the actual body of the response, however, there clearly was an error:
X-Powered-By: Express
Connection: keep-alive
Transfer-Encoding: chunked
"error":"conflict","reason":"Document update conflict."}Well, I have the correct 409
statusCode in the couch_response already. It seems that the solution here is simple enough. I set the HTTP response from my express.js app to be that of the CouchDB response that I am proxying: // ...
var couch_req = http.request(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
res.statusCode = couch_response.statusCode;
couch_response.pipe(res);
}). // ...Now, when I delete, the response back in the browser is the expected 409:error event from the model? window.AppointmentView = Backbone.View.extend({
initialize: function(options) {
this.container = $('#' + this.model.get('startDate'));
options.model.bind('destroy', this.remove, this);
options.model.bind('error', this.deleteError, this);
},
deleteError: function(model, error) {
// TODO: blame the user instead of the programmer...
if (error.status == 409) {
alert("This site does not understand CouchDB revisions.");
}
else {
alert("This site was made by an idiot.");
}
},
// ...
});Yup. It's exactly that easy. Now, when I click delete, an alert pops informing me that I'm an idiot:by Chris Strom (noreply@blogger.com) at September 05, 2011 07:29 PM
by Ricky Ho (noreply@blogger.com) at September 04, 2011 11:35 PM
...The resultant HTML is then:
tr#week4
td.sunday
td.monday 22
td.tuesday
td.wednesday
td#2011-08-25.thursday
td#2011-08-26.friday
td.saturday
...
/events resource in my app returns:{"total_rows":2,"offset":0,"rows":[
{"id":"fdbed27594feb433c74e82eb910015e0",
"key":"fdbed27594feb433c74e82eb910015e0",
"value":{"rev":"2-b7c22d428e648a6cdd2978c213f79ec0"},
"doc":{"_id":"fdbed27594feb433c74e82eb910015e0",
"_rev":"2-b7c22d428e648a6cdd2978c213f79ec0",
"startDate":"2011-08-25",
"title":"create blog post",
"description":"talk about node and CouchDB"}},
{"id":"fdbed27594feb433c74e82eb91001f45",
"key":"fdbed27594feb433c74e82eb91001f45",
"value":{"rev":"1-2b18432cf6e63b82c6507ff28af9724c"},
"doc":{"_id":"fdbed27594feb433c74e82eb91001f45",
"_rev":"1-2b18432cf6e63b82c6507ff28af9724c",
"startDate":"2011-08-26",
"title":"blog again",
"description":"add backbone into the node + couch mix"}}
]}(this is just a pass-thru to CouchDB's _all_docs?include_docs=true)getJSON call inside a document-ready: $(function() {
$.getJSON('/events', function(data) {
$.each(data.rows, function(i, rec) { add_event(rec.doc) });
});
});For each of the rows in the events returned from CouchDB, I extract the document (the event itself) and make a call to add_event().add_event() function then exploits the fact that the cells in my calendar are identified with an ISO 8601 date: function add_event(event) {
var date = event.startDate,
title = event.title,
description = event.description;
$('#' + date).html(
'<span title="' + description + '">' +
title +
'</span>'
);
}If the startDate from CouchDB is 2011-08-26, then this function finds the correct cell via a jQuery $('#2011-08-26') selector. If that selector is found, then the inner HTML is replaced with the event's title (and the description in a <span> title attribute). If the calendar event is for a date not currently displayed, no worries, the selector returns an empty wrapped set, in which case the html() has nothing to do.by Chris Strom (noreply@blogger.com) at September 01, 2011 01:14 PM
delete route in my express.js app. Nothing too fancy ought to be required. I just need to make an HTTP request with a method of "DELETE" to my CouchDB backend. The response from CouchDB can then be piped directly to my Backbone app. Something like this ought to do:app.delete('/events/:id', function(req, res){
var options = {
method: 'DELETE',
host: 'localhost',
port: 5984,
path: '/calendar/' + req.params.id
};
// Send the HTTP request with the DELETE options
var couch_req = http.request(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
// Pipe the response from CouchDB to the browser
couch_response.pipe(res);
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
// Send the complete request.
couch_req.end();
});Now to my Backbone application. As usual when I am exploring, I use Chrome's Javascript console for interacting with page elements and Javascript objects. In this case, I would like to delete the Backbone model responsible for the "foo" calendar event on the first of next month:Event model, all I need do is call its destroy method and the offending event should be stricken from existence:> e.destroy()Hrm... Dunno what I expected. I suppose it has to be chainable, so maybe it worked..? Actually, no, it did not. Examining the express.js app's log, I see no log entries. Reloading the page, I see that the calendar event is still there. So what gives?
=> child
Event model itself:window.Event = Backbone.Model.extend({});Say, that looks a bit spartan. Perhaps more is needed for Backbone to know how to delete a thing from the database./events) and a record ID. In retrospect, both make all kinds of sense. How else is Backbone supposed to infer the resource to be DELETEd?Event model looks like: window.Event = Backbone.Model.extend({
urlRoot : '/events',
initialize: function(attributes) { this.id = attributes['_id']; }
});The urlRoot property is fairly self-explanatory. The initialize method for setting the model's id is less so. CouchDB stores document IDs in the "_id" attribute:{
"_id": "a38f51509190f265959bbb2b5d001128",
"_rev": "1-174f31204df52e79a92c1ac875ac09a2",
"startDate": "2011-09-01",
"title": "foo",
"description": "bar"
}This is available in the model's attributes (I code call event.get("_id") to retrieve it), but Backbone has no way to tie it to the special id property of a Backbone model. So I link the two manually in the model's initializer.Got response: 409 localhost:5984/calendar/a38f51509190f265959bbb2b5d001128That is certainly progress, but 409?!
DELETE /calendar/a38f51509190f265959bbb2b5d001128?rev=1-174f31204df52e79a92c1ac875ac09a2 HTTP/1.0Or as an If-Match header:
DELETE /calendar/a38f51509190f265959bbb2b5d001128 HTTP/1.0Hrm... I tend to think it would be easier to transmit the revision via the
If-Match: "1-174f31204df52e79a92c1ac875ac09a2"
If-Match header. If I could set that in my Backbone application, then I ought to be able to pass that directly through to CouchDB:app.delete('/events/:id', function(req, res){
var options = {
method: 'DELETE',
host: 'localhost',
port: 5984,
path: '/calendar/' + req.params.id,
headers: req.headers
};
var couch_req = http.request(options, function(couch_response) {
// ...
couch_req.end();
});I rather like that one line change. No futzing with query parameters just feels cleaner.destroy (or update or create) method is sent along to jQuery as an option. Since jQuery AJAX requests recognize a headers attribute, something like this ought to work:> e.destroy({headers: {'If-Match':'1-174f31204df52e79a92c1ac875ac09a2'} })And, finally, I see a change in the CouchDB response:Got response: 200 localhost:5984/calendar/a38f51509190f265959bbb2b5d001128?rev=1-174f31204df52e79a92c1ac875ac09a2Most importantly, I no longer have a bogus entry on the first:
by Chris Strom (noreply@blogger.com) at September 01, 2011 01:11 PM
➜ repos express calendarIn the new "calendar" app directory, I need to install the express and jade packages from npm:
create : calendar
create : calendar/package.json
create : calendar/app.js
create : calendar/public/stylesheets
create : calendar/public/stylesheets/style.css
create : calendar/public/javascripts
create : calendar/public/images
create : calendar/views
create : calendar/views/layout.jade
create : calendar/views/index.jade
➜ calendar git:(master) ✗ npm install express jadeMy next step is to use the Futon admin interface to create a database. I fancy a calendar app, so I name my database accordingly:
jade@0.14.2 ./node_modules/jade
express-unstable@2.4.3 ./node_modules/express
├── mime@1.2.2
├── connect@1.6.0
└── qs@0.3.1
_all_docs resource in CouchDB to pull back all records in the calendar DB (I prolly would not do that in a larger DB). Once the Couch DB response comes back, I write the data back to the browser:app.get('/events', function(req, res){
var options = {
host: 'localhost',
port: 5984,
path: '/calendar/_all_docs'
};
http.get(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
res.contentType('json');
// Send all couch data to the client
couch_response.on('data', function (chunk) {
res.write(chunk);
});
// When couch is done, so is this request
couch_response.on('end', function (chunk) {
res.end();
});
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
});That's a bit of work establishing 'data' and 'end' listeners. Fortunately, node has an answer for this case in the pipe method for all stream objects: http.get(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
couch_response.pipe(res)
})Any events emitted by couch_response will be sent to the original Response object. The result of calling accessing the /events resource is:app.get('/events/:id', function(req, res){
var options = {
host: 'localhost',
port: 5984,
path: '/calendar/' + req.params.id
};
http.get(options, function(couch_response) {
console.log("Got response: %s %s:%d%s", couch_response.statusCode, options.host, options.port, options.path);
couch_response.pipe(res);
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
});This route makes use of some nifty parameter assignments in express.js. Named parameters in the route (the :id in /events/:id) are made available in the request params object (e.g. req.params.id). That bit of coolness aside, this is nearly identical to the all-events route from above./events route and add include_docs=true to the URL. This includes the documents (which are relatively small) along with the meta data:by Chris Strom (noreply@blogger.com) at September 01, 2011 12:52 PM
by Ricky Ho (noreply@blogger.com) at August 29, 2011 05:14 AM
Really excited about a new project I started recently to enable phone-based speech recognition for 311 service requests.
Here is a screen cast demonstrating the solution.
a

partition(key) {
range = (KEY_MAX - KEY_MIN) / NUM_OF_REDUCERS
reducer_no = (key - KEY_MIN) / range
return reducer_no
}
map(key, container) {
for each element in container {
element_meta =
extract_metadata(element, container)
emit(element, [container_id, element_meta])
}
}
reduce(element, container_ids) {
element_stat =
compute_stat(container_ids)
emit(element, [element_stat, container_ids])
}
Computing avg is done in a similar way except that instead of computing the local avg, we compute the local sum and local count. The reducer will do the final sum divided by the final count to come up with the final avg.class Mapper {
buffer
map(key, number) {
buffer.append(number)
if (buffer.is_full) {
max = compute_max(buffer)
emit(1, max)
}
}
}
class Reducer {
reduce(key, list_of_local_max) {
global_max = 0
for local_max in list_of_local_max {
if local_max > global_max {
global_max = local_max
}
}emit(1, global_max)
}}
class Combiner {
combine(key, list_of_local_max) {
local_max = maximum(list_of_local_max)
emit(1, local_max)
}
}
class Mapper {
interval_start = [0, 20, 40, 60, 80]
map(key, number) {
i = 0;
while (i < NO_OF_INTERVALS) {
if (number < interval_start[i]) {
emit(i, 1)
break
}
}
}
}
class Reducer {
reduce(interval, counts) {
total_counts = 0
for each count in counts {
total_counts += count
}
emit(interval, total_counts)
}
}
class Combiner {
combine(interval, occurrence) {
emit(interval, occurrence.size)
}
}
Notice that a non-uniform distribution of values across intervals may cause an unbalanced workload among reducers and hence undermine the degree of parallelism. We'll address this in the later part of this post.class Mapper {
buffer
init() {
buffer = HashMap.new
}
map(key, data) {
elements = process(data)
for each element {
....
check_and_put(buffer, k2, v2)
}
}
check_and_put(buffer, k2, v2) {
if buffer.full {
for each k2 in buffer.keys {
emit(k2, buffer[k2])
}
}
}
close() {for each k2 in buffer.keys {
emit(k2, buffer[k2])
}}
}
class Mapper {
map(k, rec) {
select_fields =
[rec.c1, rec.c2, rec.c3, rec.c4]
group_fields =
[rec.c1, rec.c2]
if (filter_condition == true) {
emit(group_fields, select_fields)
}
}
}
class Reducer {
reduce(group_fields, list_of_rec) {
s1 = 0
s2 = 0
for each rec in list_of_rec {
s1 += rec.c3
s2 += rec.c4
}
s2 = s2 / rec.size
if (having_condition == true) {
emit(group_fields, [s1, s2])
}
}
}
map(k1, rec) {
emit(rec.key, [rec.type, rec])
}
reduce(k2, list_of_rec) {
list_of_typeA = []
list_of_typeB = []
for each rec in list_of_rec {
if (rec.type == 'A') {
list_of_typeA.append(rec)
} else {
list_of_typeB.append(rec)
}
}
# Compute the catesian product
products = []
for recA in list_of_typeA {
for recB in list_of_typeB {
emit(k2, [recA, recB])
}
}
}
map(k1, rec) {
emit([rec.key, rec.type], rec])
}
partition(key_pair) {
super.partition(key_pair[0])
}
reduce(k2, list_of_rec) {
list_of_typeA = []
for each rec in list_of_rec {
if (rec.type == 'A') {
list_of_typeA.append(rec)
} else { # receive records of typeA
for recA in list_of_typeA {
emit(k2, [recA, rec])
}
}
}
}
class Mapper {
map = Hashtable.new
init() {
partition = detect_input_filename()
map = load("hdfs://dataset2/" + partition)
}
map(k1, rec1) {
rec2 = map[rec1.key]
if (rec2 != nil) {
emit(rec1.key, [rec1, rec2])
}
}
}
class Mapper {
rec2_key = nil
next_rec2 = nil
list_of_rec2 = []
file = nil
init() {
partition = detect_input_filename()
file = open("hdfs://dataset2/" + partition, "r")
next_rec2 = file.read()
fill_rec2_list()
}
# Fill up the list of rec2 list which has the same key
fill_rec2_list() {
rec2_key = next_rec2.key
list_of_rec2.append(next_rec2)
next_rec2 = file.read
while(next_rec2.key == key) {
list_of_rec2.append(next_rec2)
}
}
map(k1, rec1) {
while (rec1.key > rec2_key) {
fill_rec2_list()
}while (rec1.key == rec2.key) {
for rec2 in list_of_rec2 {
emit(rec1.key, [rec1, rec2])
}
}}
}


There is some seriously cool stuff coming up at CouchConf on July 29. One the things I'm most excited about is Richard Hipp, creator of SQLite, will join me on stage to talk about our current joint project. Can't tell you what it is right now, but if you feel the Earth shift a little that day, you'll know why...and be sure to watch this space on July 29 to learn the details!
We are doing a special training deal this summer--$395 for two days of training!
The next one is in Portland in just a couple days on June 27 and 28! If you're in Portland for OSBridge, or you are in the area, you should definitely sign up.
http://www.couchbase.com/couchdb-training/portland-june-2011
Also, if you're in NYC this summer and want to learn about Membase Server, we'll be doing a class on July 11 and 12th.
http://www.couchbase.com/membase-training/nyc-july-2011
We only have a limited number of seats so it's important to sign up ASAP.
Sign up by Friday, June 16, for the early bird rate. CouchConf is July 29 in San Francisco.
CouchConf is the only conference dedicated to all things Couch. This one-day event is for any developer who wants to take a deeper dive into Couchbase technology, learn where it's headed and build really cool stuff.
Update, 2011-06-16, 12:15 AM Thanks for the comments.
(I swear I had something like that before and it didn't work!) Here's the solution:
$userId = 61382;
$docId = 'CLD2_62e029fc-1dae-4f20-873e-69facb64a21a';
$body = '{"error":"missing","reason":"problem?"}';
$client = new Zend_Http_Client;
$client->setAdapter(new Zend_Http_Client_Adapter_Test);
$couchdb = $this->getMock(
'CouchDB',
array('makeRequest',),
array($userId, $this->config,)
);
$couchdb->expects($this->once())
->method('makeRequest')
->will($this->returnValue(new Zend_Http_Response(200, array(), $body)));
$couchdb->setHttpClient($client);
$couchdb->getDocument($docId);
--- Original blog entry ---
I wrote a couple tests for a small CouchDB access wrapper today. But when I wrote the implementation itself, I realized that my class setup depends on an actual CouchDB server being available and here my journey began.
Consider the following example:
My objective is not to be able to test any of the protected methods directly, but to be able to supply a fixture so we don't have to setup CouchDB to run our testsuite. My fixture would replace makeRequest() and return a JSON string instead.
by Till Klampaeckel (till@php.net) at June 15, 2011 10:19 PM
Really excited to launch a new OpenGov project in Philadelphia - Phind It For Me.
The service is built on PHLAPI and the point data sets it houses. As such, one could understand why I’d be interested in enhancing the data sets currently in PHLAPI.
I’m really excited about this project - source code available on GitHub - and would love to see if there is an interest in launching in other cities with CouchDB-based geospatial data repositories, like Baltimore.
It’s built on the awesome new SMSified platform from Voxeo (disclaimer, I work there) and uses a Node.js module I built for working with the SMSified API.
As always, dear readers, any comments or feedback is welcomed.
Do head on over to the project website and check it out!
a
Get yo couch on! Sign up here: http://www.meetup.com/Boston-CouchDB/events/17374461/
damien@couchbase.com

We are doing another iteration of our JavaScript for Designers Workshop (in German). Our designer Kristina and I (Alex) will be teaching the basics of programming in the browser for a full day. You don’t need to have any knowledge about programming. You don’t even have to be a designer
The workshop is scheduled for July 15, early bird tickets are available for €300 until June 10.
For more information visit jstraining.de.
Cloud, everybody wants it, some actually use it. So what's my take away from AWS' recent outtage?
So first off, we had two pieces of our infrastructure failing (three if we include our Multi-AV RDS) — both of which involve EBS.
One of those pieces in my immediate reach was a MySQL server, which we use to keep sessions. And to say the least about AWS and in their defense, the instance had run for almost 550 days and had never given us much or any reason to let us down.
In almost two years with AWS I did not magically lose a single instance. I had to reboot two or three once because the host had issues and Amazon sent us an email and asked us to make sure the instance survive a reboot, but that's about it.
Recovering the service, or at least launching a replacement in a different region would have been possible if not by coincidence we would have hit several limits on our AWS account (instances, IPs and EBS volumes), which apparently take multiple days to lift. We contacted AWS immediately and the autoresponder told us to email back if it was urgent, but I guess they had their hands full and apparently we are not high up the chain enough to express how urgent it really was.
I also tried to reach out to some of the AWS evangelists on Twitter which didn't work since they went silent almost all the way through this outtage.
All in all, it took roughly five hours to get the volume back and another 4-5 to recover the database. As far as I can tell, nothing was lost.
And in our defense — we were well aware of this SPOF and had already plans to move on a more redundant approach — I have another blog post in draft about evaluating alternatives (Membase).
The second critical piece of infrastructure which failed for us is our hosted BigCouch cluster with Cloudant.
We managed to (manually) failover to their cluster in us-west1 later in the day and brought the service back up. We would have done this earlier, but AWS suggested it would be only a few hours which is why we wanted to avoid the hassle of having to sync up clusters later on.
Sidenote: Cloudant is still (day three of the downtime) trying to get all pieces back online. Kudos to everyone from Cloudant for their hard work and patience with us.
Lesson learned for myself: When things fail which are not within your reach, it's pretty hard to do anything and stay calm. A good thing is to keep everyone busy so our team tried to reach out to all customers (we average about 200,000 users per day) via Twitter and Facebook and it looks like we've tackled that well.
Well, I don't have much to say about Amazon RDS (hosted MySQL in the cloud). Except that it didn't live up to our expectations: it costs a lot of money, but we learned that apparently that doesn't buy us anything.
Looking at the CloudWatch statistics associated with our RDS setup (or EBS in general), I'm rather weary and don't even know if they can be trusted. In the end, I can't really say for how long RDS was down or failed to failover, but it must have been back before I got a handle on our own MySQL server.
The rest of our infrastructure seems fine on AWS — most of the servers are stateless (Shared nothing, anyone?) and no EBS is involved.
And with the absense of EBS, there are no issues to begin with. Everything continued to work just as expected.
This is not really a take away, but a no-brainer. It's also not limited to AWS or cloud computing in general.
You should design for failure when you build any type of applications.
In terms of cloud computing it's really not as bad as ZOMG MY INSTANCE IS GONE!!!11, but it certainly can happen. I've heard people claim that EC2 instances constantly disappear and with my background of almost two years on AWS, I know that's just not true.
Designing for failure should take the following into account:
These services don't have to run in the cloud, they can become unavailable on bare metal too. For example, a service could crash, there could be a network partition or maintenance involved. The bottom line: How does your application deal with it?
For example, higher latency between the services, slower response time from disk — you name it.
Sometimes, everything is green, but your application still chokes.
While testing for the unexpected is of course impossible, validating what comes in is not.
I'm not sure if a fire drill is necessary, but it helps to have a plan on how to troubleshoot issues to be able to recover from an outtage.
In our case, we log almost anything and everything to syslog and utilize loggly to centralize logs. loggly's nifty console provides us with great input about the state of our application at any time.
Add to the centralized logging, that we have a lot of monitoring using ganglia and munin in place. Monitoring is an ongoing project for us since it seems like once you start, you just can't stop. ;-)
And last but not least: We can launch a new configured EC2 instance with a couple mouse clicks using Scalarium.
I value all these things equally — without them, troubleshooting and recovery would be impossible or at least a royal PITA.
So to get to the bottom of this (still ongoing) event, I'm not particulary pissed that there was downtime. Of course I can live without it, but what I mean is: Things are bound to fail.
The truth is though, that Amazon's product description is not exactly honest, or at the very least provides everyone with a lot of room for interpretation. You're asking for how much interpretation? I'm sure you could put ten cloud experts into one room and come away with 15 different opinions.
For example the use of attributes to a service such as highly available may cause different expectations for different people.
Let me break it down for you: highly available in AWS speak, means, "it works most of the time".
Highly available is not an SLA with those infamous nine Erlang nines.
On paper a multi-az deployment of Amazon RDS gets pretty close to what generally people expect from highly available: MySQL master-master replication, backups — in multiple datacenters. As of today we all know: even these things can fail.
And speaking of SLAs: It looks like none of the services failing are covered by it: AWS' track record remains clean. This is because EBS is not explicitely named in it and by the way neither is RDS. Amazon's SLA — as far as EC2 is concerned — covers some but not all network outtages. Since I was able to access the instance the entire time none of this applies here.
On Twitter people are quick to suggest that everyone who complaines now should have had a setup in multiple availability zones setup.
Here's what I think about it:
I find it rather amusing that apparently everyone on bare metal runs in multiple datacenters.
When people suggest multi-zone (or multi-az in Amazon-speak), I think they really mean multi-region. Because a zone is effectively us-east-1a, us-east-1b, us-east-1c and us-east-1d. Since all datacenters (= availability zones) in the us-east1 region failed on 2011/04/21, your multi-zone setup would not have covered your butt. Even Amazon's multi-az RDS failed.
Little do people know, but the zone identifiers — e.g. us-east-1a, us-east-1b — are tied to customer accounts. So for example, Cloudant's version of us-east-1a may be my us-east-1c, it may or may not be the same. This is why in many cases AWS never calls out explicit zones in outtages. This also makes it somewhat hard to plan ahead in a single region.
AWS sells customers on the idea that an actual multi-az setup is plenty. I don't know too many companies who do multi-region (maybe SimpleGeo?). Not even NetFlix does multi-region, but I guess they managed to sail around this disaster because they don't use EBS.
In the end it shouldn't be necessary to do a multi-region setup (and deal with its caveats) since according to AWS the different zones inside a region are different physical locations (let's call them datacenters) to begin with. Correct me if I'm wrong, but the description says different physical location, this is not just another rack in the same building or another port on the core switch.
Which brings me to the most important point of my blong entry.
In a nutshell, when you build for the AWS platform, you're building for a blackbox. There are a couple papers and blog posts where people try to reverse engineer the platform and write about its behavior. The problem with these things is that most people are guessing, though often (of course depending on the person writing) it seems to be a very well educated guess.
Roman Stanek blogged about communication between AWS and its customers, so head on over, I pretty much agree with everything he has to say:
So what exactly is my take away? In terms of technical details and as far as redundancy is concerned: not so much.
Whatever you do to run redundant on AWS, applies to setups in your local colocation or POP as well. And in theory, AWS makes it easier though to leverage multiple datacenters (availability zones) and even achieve somewhat of a global footprint by distributing in different regions.
The real question for anyone to ask is, Is AWS fit to host anything which requires permanent storage? (I'm inclined to say no.)
That's all.
by Till Klampaeckel (till@php.net) at April 23, 2011 07:56 PM
kmeans(data) {
initial_centroids = pick(k, data)
upload(data)
writeToS3(initial_centroids)
old_centroids = initial_centroids
while (true){
map_reduce()
new_centroids = readFromS3()
if change(new_centroids, old_centroids) < delta {
break
} else {
old_centroids = new_centroids
}
}
result = readFromS3()
return result
}

closest_centroid(point, listOfCentroids) {
bestCentroid = listOfCentroids[0]
minDistance = INFINITY
for each centroid in listOfCentroids {
distance = dist(point, centroid)
if distance < minDistance {
minDistance = distance
bestCentroid = centroid
}
}
return bestCentroid
}



closest_centroid(point, listOfCentroids) {
bestCentroid = listOfCentroids[0]
minDistance = INFINITY
for each cent in listOfCentroids {
if (not point.myCanopy.intersects(cent.myCanopy)) {
continue
}
distance = dist(point, centroid)
if distance < minDistance {
minDistance = distance
bestCentroid = centroid
}
}
return bestCentroid
}
by Ricky Ho (noreply@blogger.com) at April 22, 2011 11:04 PM
Two weeks ago I had the chance to give a talk about GeoCouch and MapQuery at the FOSSGIS 2011. Most of the people who read this Blog are probably aware of GeoCouch, but not so much of MapQuery. For me these two projects are tightly connected and therefore deserve a quick introduction/update.
GeoCouch, a spatial index for CouchDB gains, more and more attention. One of the reason is that the installation recently got way easier for developers as well as for normal users. You now can install GeoCouch as an extension right next to your already existing CouchDB instance. You may also download a binary of Couchbase-Server, which already includes GeoCouch. And finally there's the brand new Iris Couch hosting as well (previously known as the CouchOne hosting). So getting started with GeoCouch is easier than ever before.
Some people might have wondered about the state/future of GeoCouch, especially after the merger of CouchOne with Membase to Couchbase. I will keep on developing GeoCouch at Couchbase and it is (as it always was) fully open source licensed under the Apache 2.0 License.
The new home for the latest source is the Couchbase Github repository.
The FOSSGIS was also about OpenStreetMap. The idea to put OpenStreetMap data into GeoCouch is very sensible, but wasn't really done (AFAIK) in a big fashion. Luckily Jochen Topf from Geofabrik told me about his Projekt Osmium, which makes it possible to process OSM data with JavaScript. There is already a script to output a Shapefile, so it should be really easy to output GeoJSON, which could be consumed by GeoCouch. So if you (who are currently reading this) have some spare time, please give it a go :)
MapQuery is a web mapping framework that builds on OpenLayers and jQuery. The goal is a framework that is just as easy to use as jQuery combined with the power of OpenLayers. It's meant for people that just want to get started with web mapping, but also for those who have already knowledge about OpenLayers and want to have easy integration into their jQuery application.
I was able to show a quick demo of the MapQuery API at the FOSSGIS. I won't publish it here, as things are about to move fast. After over one year of discussions about MapQuery and only little code contributions, it seems that we are finally getting somewhere. That feels so good :)
The wonderful EduGIS is build on an early version of MapQuery (source code), but will be merged with the most recent version of my fork.
Other big news is that the WhereGroup hired Christian Wygoda, who is a committer of the MapQuery project. This also means that Mapbender 3 will use MapQuery.
And finally I've also met a developer of a another company that was building a big mapping application based on OpenLayers and jQuery. I don't want disclose it here, as the code isn't open source yet, but the developer told me that it should be easily possible. I will keep in touch with them and hope they will contribute their code to MapQuery.
To get to a conclusion about MapQuery. If you want to stay in touch with the project, please subscribe to the official mailing list, this is where things are happening (there's also the little attended IRC channel #mapquery on freenode). If you want to be a user of MapQuery, you should be patient and wait a bit. If you plan to contribute, you can start now. The currently biggest item is moving the EduGIS MapQuery code base over to the MapQuery version of my fork. The "documententation" are the demos.
As people started to asked about the slides from my presentaion at FOSSGIS, here they are.
FOSSGIS was a really awesome event, where I met a lot of new people, but also a lot of friends I haven't seen in a while. I'm really looking forward to next year's conference, but also hope that I might see many of the people at this year's FOSS4g in Denver.
So the other day, I wanted to quickly check something in BigCouch and thanks to Vagrant, chef(-solo) and a couple cookbooks — courtesy of Cloudant — this was exceptionally easy.
As a matter of fact, I had BigCouch running and setup within literally minutes.
Here's how.
You'll need git, Ruby, gems and Vagrant (along with Virtualbox) installed. If you need help with those items, I suggest you check out my previous blog post called Getting the most out of Chef with Scalarium and vagrant.
For operating system to use, I suggest you get a Ubuntu 10.04 box (aka Lucid).
Vagrant (along with Ruby and Virtualbox) is a one time setup which you can use and abuse for all kinds of things, so don't worry about the extra steps.
Clone the cookbooks in $HOME:
$ git clone http://github.com/cloudant/cloudant_cookbooks
Create a vagrant environement:
$ mkdir ~/bigcouch-test $ cd ~/bigcouch-test $ vagrant init
Setup ~/bigcouch-test/Vagrantfile:
Vagrant::Config.run do |config| config.vm.box = "base" config.vm.box_url = "http://files.vagrantup.com/lucid32.box" # Forward a port from the guest to the host, which allows for outside # computers to access the VM, whereas host only networking does not. # config.vm.forward_port "http", 80, 8080 config.vm.provisioner = :chef_solo config.chef.cookbooks_path = "~/cloudant_cookbooks" config.chef.add_recipe "bigcouch::default" end
Start the vm:
$ vagrant up
$ vagrant ssh $ sudo /etc/init.d/bigcouch start $ ps aux|grep [b]igcouch
Done. (You should see processes located in /opt/bigcouch.)
That's all — for an added bonus you could open BigCouch's ports on the VM use it from your host system because otherwise this is all a matter of localhost. See config.vm.forward_port in your Vagrantfile.
by Till Klampaeckel (till@php.net) at April 04, 2011 03:56 PM
I had a blast teaching the first Couchbase CouchDB Training with training pro Alan McKean last week. 2 intensive days of hands on teaching and talking about Apache CouchDB to enthusiastic and excited people. It was actually a learning experience for me too, there's a lot in CouchDB I haven't had a chance to use yet :)
It's not too late to sign up for the remaining 3 cities on the Couchbase Training World Tour: Austin, London and Berlin.
On cobot we are making extensive use of the AnyTime date picker. When it came to integration testing with Cucumber/Capybara, until recently I got away with using the Rack::Test driver and doing
And I fill in "2010-01-01 12:45:00" for "Date"
Yesterday I finally needed to enter a date in a Scenarion that was using Javascript (with the Selenium driver. With JavaScript enabled the above step doesn’t work anymore because when Selenium tries to type into the text field Anytime pops up and resets the text field’s contents. After some fiddling around I decided that I would make selenium click the actual buttons on the date picker. Not only did this seem to be the least hackish way, it would also mean I would be testing the real thing, i.e. clicking on buttons like a user would. This becomes especially important once you start with internationalization and different time formats (e.g. 24h vs. 12h system), where you want to make sure AnyTime generates the proper date string.
Long story short, it took me almost a day to figure out how to do this. I played around with a lot of variations, but in the end this is what you have to do:
When /I select the time "([^"]+)" from "([^"]+)"/ do |time_string, label|
time = Time.parse(time_string)
And %Q{I fill in "" for "#{label}"}
find_field(label).click
click_on_selectors ".AnyTime-btn:visible:contains(#{time.year})",
".AnyTime-mon#{time.month}-btn:visible",
".AnyTime-dom-btn:contains(#{time.day}):visible:first",
".AnyTime-hr#{time.hour}-btn:visible",
".AnyTime-x-btn:visible"
end
def click_on_selectors(*selectors)
def recurse(*selectors)
if selectors.any?
wait_for_css_selector_fn(selectors.first,
"$('#{escape_javascript selectors.first}').click(); #{recurse(*selectors[1..-1])}")
else
'window.__capybara_wait = false;'
end
end
page.evaluate_script "window.__capybara_wait = true"
page.evaluate_script recurse(*selectors)
wait_until 10 do
page.evaluate_script "!window.__capybara_wait"
end
end
include ActionView::Helpers::JavaScriptHelper
def wait_for_css_selector_fn(selector, after)
<<-JS
(function() {
var time = new Date().getTime();
var runDelayed = function() {
if(!$('#{escape_javascript selector}').length) {
if(time < new Date().getTime() - 5000) {
throw('waited too long for #{escape_javascript selector}')
} else {
window.setTimeout(runDelayed, 100);
}
} else {
#{after};
};
}
window.setTimeout(runDelayed, 100);
})();
JS
end
In you Cucumber feature you can then call:
When I fill in the time "2010-01-01 12:00" for "Date"
What the above code essentially does is open the date picker and click on all the required buttons. Just as important (and that was the tricky part) it asynchronously waits for the necessary events (open/close date picker, date shows up in text field). Enjoy.




by Ricky Ho (noreply@blogger.com) at March 20, 2011 03:39 AM
by Ricky Ho (noreply@blogger.com) at March 18, 2011 06:18 AM
An insightful comment on Reddit about Node.js:
And the idea that you fully understand your own code is a bit suspect, too. My code's all nice and fast until somebody passes me in a POST request with a million keys, or decides to upload a 10GB file where I was expecting a 5KB file and I run a hash algorithm over it, or I accidentally use way more memory than I expected and push the system into swap, or any number of other things like that. My life would be a lot easier if my code never did anything I didn't expect.
I keep wondering where Node fits in a production environment and who writes the code that powers it.
Lots of work is being done to finalize the next version of the Open311 API spec (officially referred to as GeoReport V2).
Almost a year ago I launched TweetMy311 - a service that lets people report non-emergency service requests using a smart phone and Twitter. Since then, a lot has changed - not only with the Open311 specification but with the tools available to build powerful Twitter-based applications.

In the last several months, I’ve spent a lot of time learning about and working with Node.js. Some of the things I did in the initial version of TweetMy311 (written in PHP) are so much easier to do in Node.js that I’ve decided to completely rewrite the application to use Node. In addition, since I initially launched TweetMy311 CouchDB (the NoSQL database on which the app is built) has also seen a lot of enhancements.
I’ve expecting the overhaul I’m currently working on to make the application code a lot more efficient and easy to understand. Once this overhaul is complete, I intend to release a big chunk of it as open source software, so that anyone that wants to build a powerful Node.js/CouchDB-based civic app can do so.
It’s also exciting to see new cities get on board the Open311 bandwagon. The City of Boston is now supporting Open311 and has started to issue API keys to developers.
As part of my work to overhaul TweetMy311, I’ve developed a neat little Node.js library for interacting with the Open311 API. Since I just started to work with the Boston implementation, I thought it would be helpful to others interested in doing so to walk through a quick example.
If you want to run this example for yourself, you’ll need to have Node.js installed, specifically the latest version - v0.4.2. If you have the Node Package Manager installed, you can simply do:
npm install open311
Once you’ve done this, you should be able to run the following script:
Which will output:
This is just a quick example of how to make the most basic of API calls with the Node.js Open311 module. You can use this module to build fully feature Open311 applications.
I’ll be doing some more blogging in the weeks ahead as the rewrite of TweetMy311 continues, and work on this phase of the GeoReport V2 spec is concluded.
Stay tuned!
a
CouchDB World Tour Coming! Along with Alan McKean, I and other Couchbase staff will be doing 5 training sessions in 5 different cities starting in March. I'll be teaching the San Francisco one :) We've developed some incredible material that I'm really excited to present. So go ahead and sign up, and bring a friend!
Claire was so inspired she wrote a song about it:
I recently had the extreme pleasure to use node.js and socket.io on a project. Here are some insights.
So the objective of the project was to read data from the _changes feed of our CouchDB cluster (hosted by Cloudant) and publish the data to a widget which we can use to display a constant stream of "what are people doing right now".
The core of the problem we faced was not just taking this stream of data and feeding it on to a page, but since we'll deploy this widget to our homepage we needed to make sure that no matter how many clients see it, the impact on the database cluster is minimal; for example, it would be a single client (or down the road up to three for failover) who actually read data from the cluster.
After shopping around for a technology to use, it became obvious that we needed some sort of abstraction because of how the different technologies (e.g. comet, websockets, ajax longpolling, ...) are implemented in different browsers. We decided to build this project on top of socket.io — pretty much for the same reasons most people go to jQuery, prototype or dojo these days.
... more after the jump.by Till Klampaeckel (till@php.net) at February 15, 2011 10:47 PM
In my last blog entry, I shared some nodejs-code to read CouchDB's _changes feed and publish the data to a website. In order to update the page in a continous fashion, I used socket.io which provides a nifty abstraction across server- to client-side transports — for example, websockets and ajax longpoll.
When we tested the code for a few days over the weekend, the largest issue we ran into was that the stream moved too fast. In fact it moved so fast, we couldn't read anything and were at risk of getting a seizure when we watched the page for too long.
Certainly awesome from one point of view — people are using the website — but it also led to the next objective: I had to find a way to throttle broadcasting to the client. Here's how!
... more after the jump.by Till Klampaeckel (till@php.net) at February 15, 2011 10:44 PM
I've got some news I'm extremely excited to finally announce: a merger between CouchOne and Membase!
A little background, I met James Phillips, the co-founder of Membase, for the first time in December. I'd heard a little about Membase up to that point, but I was most impressed with some of their high profile users. For example, Membase is a key part of Zynga, where giving millions of users a fast, low latency experience is critical.
Membase has been targeting large scale mission critical apps, being able to scale out quickly and support millions of users, and getting impressive traction. They'd been going after a very specific pain point, a completely different part of the market than what we were targeting. They've focused on performance and scalability and exploiting all the power and memory available on modern servers. Simple, Fast, Elastic.
At CouchOne we've been focusing on very different problems: mobile, sync and offline use cases. We make it easy to build applications that travel with you, allowing you access to your important data no matter the network conditions. Slow and unreliable connectivity means many businesses can't rely on the cloud for mission critical apps, all their data is gone when their network is down. But with Couch powered apps on your phone, tablet, putting data directly on the machines at the edge of the network, you have your apps and data with you at all times and safely backed up to the cloud.
What James had is the vision to see the great fit between the two companies. While independently we were both doing very well, we both have a lot of growing to do yet. And amazingly, the direction Membase needed to grow, we were already doing very well. And in the direction we needed to grow, Membase was already doing very well. Not only were the part of the stack we were focusing different and complementary, but the way we built out our teams was different and complementary. I'm not sure we could have planned it any better, and we didn't plan it at all!
And so I'm thrilled to announce Couchbase, a merging of both our companies and our technology!
Technologically, we'll be joining the products together to create a high volume, low latency, elastic clustered Couchbase server system. A Couch that's Simple, Fast, Elastic with all the reliability and power of CouchDB. We'll also continue to support the Membase API, for both backwards compatibility and it's performance advantages over HTTP. We will be the only solution out there that can scale to Zynga sized workloads and down to phones and tablets and everything in between, supporting millions of users and keeping everything in sync.
For existing CouchDB users, we will fully support CouchDB's HTTP API with all its associated benefits: seamless integration with other HTTP based infrastructure, a universally supported, human-readable protocol and direct-browser access just to name a few.
Together as Couchbase, we'll have the fastest, most scalable (both scale up and scale down) NoSQL solution. We will become the standard storage for mobile devices, and the standard server technology for syncing them all together. Our unified solution will dramatically simplify your technology stack and maintenance for building fast responsive apps that scale to millions of users, and also scaling down to phones so people can work and play even when not connected to the network.
My role at Couchbase will be CTO, overseeing the technical direction of the company. Dustin Sallings will be the chief architect. Bob Weiderhold will be CEO and co-founder James Phillips will continue to be product-oriented maniac :) CouchOne co-founders Chris Anderson and Jan Lehnardt will take roles to lead our mobile efforts and to work with our developers and community.
What's in it for you?
It's all upside! In the short-term we'll be able to provide a much better developer and support experience for both for CouchOne and Membase technologies, and move the development speed ahead much faster. The long term benefits are that CouchDB users will acquire the high performance, high scale easy-fast-elastic capabilities of Membase, while Membase users will acquire CouchDB's indexing features (map/reduce views, lucene, R-Tree GeoCouch), replication, reliability, and an easy path to mobile.
This is hot stuff! 2011 is the year of Couchbase!
Chris Anderson here.
The Web has been synonymous with HTML for too long. But its principles go deeper than arguments over W3C standards or backwards compatible CSS hacks. At the core of the web is the simple concept of linked data. I will argue that the Web, properly understood as linked data, the application that use it, and users that interface with it, can be much richer than just the presentation through HTML, and that extending the reach of the web beyond the confines of the browser is crucial to its long term success.
The surge of smartphone ‘apps’ has pundits and experts alike forecasting the demise of the web. This is the wrong way to think about it. Instead of concentrating on how apps are killing the web, let’s think about how apps can embrace the principles of the Web. Apps could potentially become vibrant participants of the Web, while still retaining their slick user interfaces and proprietary business models.
The web gave us write once run anywhere, a computing holy grail. But apps are platform-specific by design. Putting data on the web (giving it URLs) makes it “linked data”. The huge advantage to such an approach is interoperability at the data layer. Recently, #newtwitter showed us how you can build your web app on top of the same API as your native clients, building almost no new HTML on the server. This wave is only just beginning.
This recent summer, 2010, Wired Magazine pronounced the web dead.

Here is the tagline that ran along with the article:
Two decades after its birth, the World Wide Web is in decline, as simpler, sleeker services — think apps — are less about the searching and more about the getting. Chris Anderson explains how this new paradigm reflects the inevitable course of capitalism. And Michael Wolff explains why the new breed of media titan is forsaking the Web for more promising (and profitable) pastures.
Of course we’ll take Wired Magazine’s pronouncements in the reverse oracular spirit they’ve earned for a previous obituary, 1997’s “Apple is Dead”:

That year turned out to be a turning point for Apple. Steve came back and everything. By my lights, Wired’s “The Web Is Dead” cover is as ringing an endorsement the Web could hope for.
Apps are threatening the web, with their quick path to revenue (even if they rarely become big hits.) The threat exists because the insides of apps often don’t interact with the web via hyperlinks, or if they do, it’s in an ad-hoc and limited way. This isn’t a bug, it’s just the way apps are — slickness matters in a big way, so developers optimize for user experience, not ‘webiness’.
Is this a turning point for the web, where it will continue as the dominant platform, or will it be replaced by apps? I think the reality is that the Web will win by remaining the backbone, the conduit on which the data is shared and exchanged. Even for the slickest native apps. But before I explain my position, let’s look at the strengths of the Web, and how different the app world is.
It is clear that in the possibility space of coding, one goal stood on the horizon for a long time: write once, run anywhere. It’s not always been clear whether it’s attainable, or just a mirage. It’s more of a social question (one of conventions) than a technical one.
Eventually, 21 years ago, the web emerged as the first successful consumer application platform to be independently implemented multiple times, across a wide range of hardware and software environments. This aspect is often overshadowed by the more fundamental changes to our social fabric as the Internet makes us all more connected.
Write once run anywhere has been regarded as a holy-grail since even before Sun’s foray into cross platform application widgets (Java) was unable to take hold as a standard. Similarly Adobe Flash and Microsoft Silverlight have attacked the HTML juggernaut. But in the fullness of time, those efforts look futile when held against the 3D accelerated, mobile, HTML5 web. How does the web do that?
The web is able to subsume the host operating system, as well as invaders (Java and Flash) to become the dominant user interface metaphor of our time, by adhering to a principled stance about openness, one that Tim Berners-Lee can describe better than I can.
The key to Berners-Lee’s argument is that the web enables applications to be brought online, and without any formal coordination, begin to interoperate with other applications on the web, because the constraints of hyperlinking and HTML are defined narrowly enough to allow for consensus. Rather than have a million features, it is better to have one killer feature. For the web, that feature is linking.
The web was comfortable in its dominance, maybe even starting to stagnate as a platform, when its first new competitor came along — the app store. Such a different way of thinking about software! With the web, anyone can put a new site up, whenever they want. In the app store, plan to wait a week or two before you can show your work to anyone.
But the payoff — literally! With the web your best bet was to put up some ads and hope for massive traffic, or else burden your users with a paywall and suffer the reduced engagement and lack of in-links. In the App Store you can make money from day one, even without massive traffic. It’s no suprise the app ecosystem is growing fast.
The new app sensation, Instagram, is quickly subsuming Facebook’s most powerful feature, photo sharing. And they don’t even have a way to browse your own photos via the web. Don’t get me wrong, I think Instagram is doing everything right, it’s just not very webby — because that’s the way apps are.
What matters in an app first and foremost is that people use it, and encourage their friends to use it. Instagram obviously put a great effort into their onboarding, because setting up my account was seamless and quick. This is development effort well spent, not thinking about linkability. Which is sad, because the message of the Web is so powerful: Build your app into the Web, and it can be linked to and extended beyond the confines of any one system.
Of course, real apps do understand this, and Instagram does share photos by a single-photo url, distributed via other social systems. It pays to be on the web, but Instagram at least has found that the user experience of the Web was no longer the place to invest, when growing the user base.
As a pragmattic matter, it makes sense to focus on slick interfaces and viral adoption, over an interest in broad web-style interoperability. But I think there’s another way.
Those of us who were aware of the web in the early 90’s knew the web had won when URLs joined the popular parlance. Once everything has a link, people can start to put the web together — by discussing and linking to pages, we build the meaning among the links. The act of linking is more important than the details of HTML. Of course, getting HTML right gives a baseline of interoperability.
Think about #newtwitter again - an (almost) all Ajax application that takes advantage of the same APIs Twitter provides for desktop and mobile clients. The link between the native and the desktop applications is made up of innumerable URLs shared via Web protocols. In Twitter’s case, the URLs are mostly hosted at sites like twitter.com and t.co. Twitter is centralized by design - we’ve seen this mostly-Ajax pattern applied to several prominent websites.
The major criticism that still applies to these Ajax heavy apps, is that because they depend on remote resources, they are inherently slow and unreliable, compared to native apps that don’t require remote servers.
There is a way to have the best of both worlds, by moving the URLs to localhost and asynchronously sharing changes with best-effort as the network connection allows.
My team at CouchOne is putting apps on the web, by building web addressable data into the core of our App Store compatible platform. Our mission is to ensure you have the data you need, no matter the network conditions. We’re happy that developers can make money in the app store, but we know that users will be happier when their data isn’t locked into silos. Everyone should know by now that snappiness is the most important feature any user experience can have.
We’re confident that the benefits you accrue by moving your web server from a high-latency remote server, to the mobile device, will mean a real competitive advantage for early adopters, and eventully a wholesale movement of the web towards client-based eventually consistent applications.
The web has traditionally been known as the HTML platform as deployed mostly in browsers. It really has become the #1 way code and functionality gets in front of users. The web won because of its simplicity. But its simplicity is holding it back in the app world.
By giving URLs to data you can have the best of both apps and the web. because the data is shared between the two interfaces so intimately, and presumably with a simple REST interface to support the web client, we can properly say these linked-data apps are on the web. The key to why #newtwitter feels more advanced than other sites with heavy APIs, is that it supports the web client from the public API. This proves that the app API is really Web stuff—we’ll see a lot more of this in the future.
For instance you might have a PhoneGap style cross-platform HTML5 mobile application that suddenly becomes popular among the business crowd, making a native Blackberry UI into a worthwhile investment. The local linked-data layer CouchOne provides a way for distinct implmentations to communicate with a common data substate. With real-time replication, the Blackberry users can collaborate with the web users, on the same app at the same time. The app is on the web, even if it is slick and low latency.
The web won because it allowed linking to anyone, so adding new pages to the web doesn’t require coordination beforhand. In the future, the web will allow sharing of data with the people you choose, using something much like the replication built into Apache CouchDB. Our aim isn’t merely to add to the web developer’s toolkit, it’s to fundamentally rebalance the web in favor of the edge.
As a developer, you’ll be able to take the existing data stored by an existing application, and write you’re own interface for it. Your favorite social network doesn’t have a native UI for your phone’s operating system, but they do offer a Couch feed? Just build your own UI. You’ll still be able to take advantage of real time interaction with users running against the same data on different platforms.
We strive to offer a relaxing new way to share state changes across computing devices, the web, and mobile phones.If you’re interested in combining the web and app worlds, check out Dale’s CouchDB on Android tutorial (coming soon) and the CouchApp community wiki.
A lot of my open gov energy of late has been focused on replicating a technique pioneered by Max Ogden (creator of PDXAPI) to convert geographic information in shapefile format into an easy to use format for developers.
Specifically, Max has pioneered a technique for converting shapefiles into documents in an instance of GeoCouch (the geographic -enabled version of CouchDB).
I was thrilled recently to come across some data for the City of Baltimore and since I know there are some open government developments in the works there, I decided to put together a quick screencast showing how open data - when provided in an easily used format - can form the basis for some pretty useful civic applications.
The screencast below walks through a quick demonstration of an application I wrote in PHP to run on the Tropo platform - it currently supports SMS, IM and Twitter use.
Just send an address in the City of Baltimore to one of the following user accounts along with a hashtag for the type of location you are looking for:
This demo application interacts with a GeoCouch instance I have running in Amazon EC2 - you can take a look at the data I populated it with by going to baltapi.com and accessing the standard CouchDB user interface. I haven’t really locked this instance down all that tight, but there really isn’t anything in it that I can’t replace.

There are a number of people in Baltimore pushing for an open data program from their city government, and I have heard that there are some really cool things in the pipeline. I can’t wait to see how things develop there, and I want to do anything I can to help.
Hopefully, this simple demo will be useful in illustrating both the ease with which data can be shared with developers and the potential benefit that applications built on top of open data can hold for municipalities.
UPDATE (4/18/2011): I’ve actually replicated all of the Baltimore data from the EC2 instance discussed in this blog post to the new Iris Couch instance. Iris Couch is by far the easiest way to get started using CouchDB, and Couch’s replication feature makes it easy to move data into an Iris Couch instance.
a
Today, one of our CouchDB hosting customers suffered a security issue. During the build up for a product launch, the customer’s developer posted their open source software to the public web. Unfortunately, the source code contained the database administrator credentials. This was an unfortunate error — in this industry the mantra is to hustle and to deliver at all costs. But then a simple oversight anyone could make leads to calamity — we know the feeling and sympathize.
In any event, despite our service’s “beta” status, we are working with our customer to restore their data and reinstate their service. Fortunately, the restoration is greatly simplified by CouchDB’s features.
We value all of our users’ security and as such we strongly suggest that password information never be revealed in publicly accessible source code to avoid this kind of situation in the future. Considering today’s distributed source code systems, the best policy is never to commit authentication credentials at all. We’d like to stress again, that the security incident is not caused by any vulnerability in Apache CouchDB or our hosting infrastructure.
If you have any questions about this or our hosting service in general, do not hesitate to get in touch: hosting@couchone.com
— Jason, VP of Hosting
Its very exciting to organize a conference, especially if it is the well known EuRuKo. The preparations for this year EuRuKo in Berlin are in full effect. We just published the call for papers and I hope we get a good deal of interesting talk proposals from you until the deadline at the 22th February. So stretch you fingers and send in a proposal
…[if] you have researched something about Ruby, developed a gem, found a unique usage for Ruby or you had a life changing experience with Ruby.
For details see the official post. We will choose wisely from the submissions so that you’ll never fell the urge to leave the track on this single track conference
.
Back in December I whipped up a series of CouchDB plugins for the Scout monitoring service. The plugins allow you to track all sorts of metrics for CouchDB, including (but not limited to):
In addition, there is a plugin for individual CouchDB databases and individual couchdb-lucene indexes. The database plugin will report:
The couchdb-lucene plugin will report:
The kind folks over at Scout have just released two new, official plugins based on the ones I created. The CouchDB Overall plugin combines some of the more important CouchDB metrics into a single plugin, and the CouchDB Database plugin reports the same set of the stats as the database plugin listed above.
The original plugins can be found at https://github.com/signal/scout-plugins/tree/master/couchdb. More information can be found here. I hope you find them useful.
Thanks to Doug Barth for some help on the plugins, and Derek over at Scout for putting together the official plugins.
Wow, 2010 was quite a hectic year in the Midgard world. Here is a quick summary:
Unfortunately at the same time the Midgard developer community has stayed quite small and insular. This will hopefully improve through easier installation, availability of Midgard libraries in Linux distributions and closer collaboration with the rest of the PHP world as a participant of the Zeta Components ecosystem.
We still also need to solve the project governance question of either running our own association or joining a major organization like ASF. The relation between Midgard and the GNOME project on which we heavily rely on should also be clarified.
See also the Midgard in 2009 post.
by Henri Bergius (henri.bergius@iki.fi) at January 11, 2011 12:19 PM
… we consider their participation a glimpse of the future …
With these words our team (Alex, Frank and Me) got accepted with JavaScript into the web development platform comparison “Plat_forms 2011“. Big words
Plat_forms is a contest in which teams of three programmers compete to implement the same requirements for a web-based system within two days, using different technology platforms. It will be held in Nürnberg from 18th to 19th January.
The purpose of the Plat_forms is to provide new insights into the real pros, cons and emergent properties of each platform by analyzing various aspects like usability, structure, performance, scalability etc.
Sadly we were the only applicant for Javascript, so our results for JavaScript platform will be treated noncompetitively in the evaluation. The other participating platforms are our beloved Ruby, PHP, Perl and good old Java.
Although we use Ruby as our main language we are getting more and more comfortable with JavaScript. Just recently JavaScript got a huge boost with faster VMs, server side execution, powerful libraries and more possibilities on the client side commonly regarded as HTML5. So our usage of JavaScript increased over time and now ranges from rich client interfaces over special server side tasks to small JavaScript only apps.
With our participation with JavaScript at Plat_forms 2011 we want to push our skills and boundaries further. We are excited and anticipate the insights that our participation with JavaScript will reveal.