<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>Blurring the lines between hacking and painting</description><title>Fun with Dependencies</title><generator>Tumblr (3.0; @mattinsler)</generator><link>http://www.mattinsler.com/</link><item><title>Announcing longjohn: long stack traces for node.js</title><description>&lt;p&gt;Alright, so I&amp;#8217;m deploying some new code at &lt;a href="http://pagelever.com" title="PageLever" target="_blank"&gt;PageLever&lt;/a&gt; to move our Facebook sync code to node.js. I&amp;#8217;ve just slaved over it for a few weeks, creating a queue/worker system (to be publicly released soon) that keeps track of lots of stats and more importantly errors. Very very important to see our errors. Is the problem with us or Facebook?  Have we hit our API limits again?  Did I fat-finger yet another FQL statement?  You know, the vital stuff.&lt;/p&gt;
&lt;p&gt;So now I&amp;#8217;ve pushed it out there and started up a bunch of workers and I&amp;#8217;m monitoring all of them. I start getting a few errors trickling in. No problem, this happens. Let&amp;#8217;s check out what they say.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;Error: socket hang up
    at Object. (http.js:1124:15)
    at CleartextStream. (http.js:1173:23)
    at CleartextStream.emit (events.js:64:17)
    at Array. (tls.js:792:22)
    at EventEmitter._tickCallback (node.js:190:38)&lt;/pre&gt;
&lt;p&gt;&lt;!-- more --&gt;So that&amp;#8217;s really not very helpful. Like not in any way, shape or form. So what to do, what to do&amp;#8230; A little googling and I found the awesome &lt;a href="https://github.com/tlrobinson/long-stack-traces" title="long-stack-traces" target="_blank"&gt;long-stack-traces&lt;/a&gt; module by Tom Robinson. It&amp;#8217;s a very cool bit of code that wraps all async callbacks (even within the core libraries!) with a bit of code to save the current stack trace. I stuck a simple &lt;code&gt;require('long-stack-traces')&lt;/code&gt; into an initializer in my code and voila!  Now my stack traces looked more like this.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;Error: socket hang up
    at Object. (http.js:1124:15)
    at CleartextStream. (http.js:1173:23)
    at CleartextStream.emit (events.js:64:17)
    at Array. (tls.js:792:22)
    at EventEmitter._tickCallback (node.js:190:38)
---------------------------------------------
    at Facebook._executeRequest (/app/node_modules/facebook.node/node_modules/rest.node/index.js:109:7)
    at Facebook.request (/app/node_modules/facebook.node/node_modules/rest.node/index.js:154:8)
    at Facebook.get (/app/node_modules/facebook.node/node_modules/rest.node/index.js:160:8)
    at Facebook.fql (/app/node_modules/facebook.node/index.js:67:8)
    at PostFetcher.fetch_posts (/app/lib/post_fetcher.coffee:31:21)
    at Object. (/app/lib/register_worker_commands.coffee:58:25)
    at Object. (/app/lib/queue_worker_child.coffee:35:14)
    at EventEmitter. (/app/lib/queue_worker_child.coffee:47:12)
    at QueueWorker.error (/app/lib/queue_worker.coffee:177:15)
    at ChildProcess. (/app/lib/queue_worker.coffee:149:24)
    at ChildProcess. (events.js:156:14)
    at ChildProcess.emit (events.js:88:20)
    at Pipe.onread (child_process.js:102:16)
---------------------------------------------
    at Object. (/app/lib/queue_processor.coffee:62:10)
    at Object. (/app/lib/queue_processor.coffee:73:4)
    at Module._compile (module.js:446:26)
    at Object..coffee (/app/node_modules/caboose/node_modules/coffee-script/lib/coffee-script/coffee-script.js:22:21)
    at Module.load (module.js:353:31)
    at Function._load (module.js:311:12)
    at Module.require (module.js:359:17)
    at Object. (module.js:375:17)
&lt;/pre&gt;
&lt;p&gt;Ooooooooo!!!!  Guess what!  It was an FQL call that hung up on me. No more blaming MongoDB. OK, good.&lt;/p&gt;
&lt;h4&gt;The Rub&lt;/h4&gt;
&lt;p&gt;Now we&amp;#8217;re golden, right?  Well, not exactly. There seems to be an issue with this implementation of long stack traces. My http server wouldn&amp;#8217;t start up. I found &lt;a href="https://github.com/tlrobinson/long-stack-traces/issues/9" title="this issue" target="_blank"&gt;this issue&lt;/a&gt; that echoed my problem, and hacked in the solution. Yay once again!&lt;/p&gt;
&lt;p&gt;Not so fast!  Now when I run my workers, all of my EventEmitter callbacks get called multiple times. Two times. Three times. Four?  Five?  WTF is going on?  OK, more research. Turns out, long-stack-traces doesn&amp;#8217;t implement removeListener. Ahhhhhh, I see now. Hmm, the project hasn&amp;#8217;t been updated in a year. I could have sent a message to tlrobinson, but you know what, I need this to work today. I&amp;#8217;ll just write it.&lt;/p&gt;
&lt;h4&gt;Introducing longjohn&lt;/h4&gt;
&lt;p&gt;So I wrote it. It worked. A few days later, I made it a bit neater, exposed some options, published it to &lt;a href="http://search.npmjs.org/#/longjohn" title="npm" target="_blank"&gt;npm&lt;/a&gt; and posted it on &lt;a href="https://github.com/mattinsler/longjohn" title="github" target="_blank"&gt;github&lt;/a&gt;. Maybe it&amp;#8217;ll be useful to you. Maybe not.&lt;/p&gt;
&lt;p&gt;Just install longjohn with&lt;/p&gt;
&lt;pre class="prettyprint"&gt;$ npm install longjohn
&lt;/pre&gt;
&lt;p&gt;Then somewhere in the initialization of your node app (or just at the top of your script) add&lt;/p&gt;
&lt;pre class="prettyprint"&gt;require('longjohn');
&lt;/pre&gt;
&lt;p&gt;Now your stack traces will be nice and long. You say you want to customize something. Want some options? We&amp;#8217;ve got options.&lt;/p&gt;
&lt;h4&gt;Options&lt;/h4&gt;
&lt;p&gt;The method used to track async callbacks adds some memory overhead if your code looks something like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;function foo() {
  // do something
  setTimeout(foo, 1000);
}
foo();&lt;/pre&gt;
&lt;p&gt;You can see that this will wrap every single callback for eternity, keeping around stack traces that node would normally forget about. This is BAAAAD!  To solve this, longjohn will automatically prune the traces kept to the last 10 async callbacks. In order to change this limit, set &lt;code&gt;async_trace_limit&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;var longjohn = require('longjohn');
longjohn.async_trace_limit = 5;  // defaults to 10
longjohn.async_trace_limit = -1; //unlimited
&lt;/pre&gt;
&lt;p&gt;Another option is to customize the output for an async callback boundary. By default longjohn will print out a bunch of dashes to represent a callback boundary. Change this by setting &lt;code&gt;empty_frame&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;var longjohn = require('longjohn');
longjohn.empty_frame = 'ASYNC CALLBACK';&lt;/pre&gt;
&lt;h4&gt;Nice side-effect&lt;/h4&gt;
&lt;p&gt;Due to the fact that I&amp;#8217;m using the same exact output as long-stack-traces, felixge&amp;#8217;s &lt;a href="https://github.com/felixge/node-stack-trace" title="node-stack-trace" target="_blank"&gt;node-stack-trace&lt;/a&gt; module can parse the output if you&amp;#8217;d like your stack traces in object form. Here&amp;#8217;s how to do it.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;require('stack-trace').parse(err);&lt;/pre&gt;
&lt;p&gt;Enjoy!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26396305882</link><guid>http://www.mattinsler.com/post/26396305882</guid><pubDate>Mon, 02 Jul 2012 20:38:00 -0700</pubDate><category>debugging</category><category>node.js</category></item><item><title>Build a Car.  Don't Make a Faster Horse.</title><description>&lt;blockquote&gt;
&lt;p&gt;&amp;#8220;If I had asked people what they wanted, they would have said faster horses.&amp;#8221;&lt;/p&gt;
&lt;p&gt;- Henry Ford&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Henry Ford’s famous line is often used as a guideline for innovation, but I think that the majority of tech start-ups misinterpret the quote. By and large I’m not talking about “cool” products like games or social networks (before they become useful), but websites and applications that serve a direct business need. When you understand what users want, you can examine why they want it, and then figure out how to give them what they need. Too many start-ups fall into the trap of giving users what they want without really examining why they want it, completely ignoring what they need.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;Many start-ups ask users for input in the form of surveys like, “which feature do you want us to build next?” or push-poll style questions like, “when you look for a deal, do you look in the newspaper or online?” They are building faster horses, not cars. They are asking questions that serve their own purpose, not their users’.&lt;/p&gt;
&lt;p&gt;The earlier versions of the Windows phone were faster horses. People wanted to use Windows on their phone, so Microsoft gave them a start menu and applications without taking into account why users wanted those features and how Microsoft could address the underlying need in a way that fit the platform more intuitively. Microsoft overlooked the fact that the way people use phones and the things they use them for are wildly different than how they use computers. The iPhone, on the other hand, is a car. It addresses what people need—to have their whole life available in two or three taps. There are no hidden menus to navigate because everything is easily accessible on the screen.&lt;/p&gt;
&lt;p&gt;When people wanted to send each other files that were too big for email, sites like Yousendit, Sendspace, and drop.io became popular. Sharing files with other people was part of the problem, but not the core issue. These sites made sense and gave people what they wanted, but failed to understand the root problem, which was the need to seamlessly manage files across various computers, devices, and locations. I don’t know how many times I emailed myself sendspace links to download later. When dropbox came along, it was so easy to use that my mother figured it out on the first try. Dropbox effortlessly won over users because emailing someone a link was no longer necessary. Moving files from place to place was easy as it could be. The underlying need, not the want it created, was addressed.&lt;/p&gt;
&lt;p&gt;Flight reservation websites are another example of an industry full of faster horses. The vast majority of them provide nothing more than a long list of prices and flights. Yes I know they find better deals, and while I want that, what I really need is something to make choosing the best flight from that list easier. Picking a flight takes so long when I need to scroll through pages, read all the prices, compare times, layovers, and so on. I end up creating a little list of flights and comparing them myself, and that’s a pain. Hipmunk.com gave me what I needed. All the pertinent information is represented visually in a way that makes the right one for me obvious. Hipmunk also has tabs, and while tabs exist in all modern browsers, integration into the product itself shows a great understanding of what people need when they’re looking for a flight. Price charts, fancy when-to-buy graphics and whatever clutter other sites throw on the screen don’t help me do what I need to do. I need to make an easy decision, not accidentally book a car with my flight when I’m trying to pay for it.&lt;/p&gt;
&lt;p&gt;My interpretation of Henry Ford’s iconic quote is that it’s up to us to listen to what our users want and then give them what they need. Next time you’re considering that next product or feature, take some time to think about why your users want it, and whether there is a more intuitive way to give them what they need. And if you’re thinking about your next business or start-up idea, remember that true disruption comes from a deep understanding of what people need, not what they want.&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26411283488</link><guid>http://www.mattinsler.com/post/26411283488</guid><pubDate>Wed, 12 Oct 2011 18:44:00 -0700</pubDate><category>startups</category></item><item><title>MongoNYC 2011: My Talk on Tracking and Analytics at Signpost</title><description>&lt;p&gt;Last week I gave a talk about the way we do analytics and tracking at &lt;a href="http://www.signpost.com" title="signpost.com" target="_blank"&gt;signpost.com&lt;/a&gt;. The content was a follow-on to my post about &lt;a href="http://www.mattinsler.com/signpost-tracking-analytics-mysql-mongodb/" title="how we moved the tracking system from MySQL to MongoDB" target="_blank"&gt;how we moved the tracking system from MySQL to MongoDB&lt;/a&gt;. I hope the talk was informative and invite any questions that people may have.&lt;/p&gt;
&lt;p&gt;Here are the slides:&lt;/p&gt;
&lt;div class="video-wrapper"&gt;&lt;iframe frameborder="0" height="417" marginheight="0" marginwidth="0" scrolling="no" src="http://www.slideshare.net/slideshow/embed_code/8300695" width="500"&gt; &lt;/iframe&gt;&lt;/div&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;Video to come&amp;#8230;&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26412005065</link><guid>http://www.mattinsler.com/post/26412005065</guid><pubDate>Tue, 14 Jun 2011 00:30:00 -0700</pubDate><category>analytics</category><category>tracking</category><category>mongodb</category><category>slides</category></item><item><title>Signpost Tracking &amp; Analytics: MySQL -&gt; MongoDB</title><description>&lt;p&gt;Here at &lt;a href="http://www.signpost.com/" title="Signpost" target="_blank"&gt;Signpost&lt;/a&gt;, we believe in track everything. We keep every request, every event, every log, every exception. We use this data for user tracking, analytics and business intelligence, and code health / bug triage.&lt;/p&gt;
&lt;h4&gt;Original System&lt;/h4&gt;
&lt;p&gt;We started out with &lt;a href="http://www.mysql.com/" title="MySQL" target="_blank"&gt;MySQL&lt;/a&gt;. Requests were logged into a request table, except we handled the 1-Many relationship of request parameters in a single serialized column. Events were tracked in two tables: event_tracking and event_tracking_property, where there are many properties per event. Logs and exceptions were sent to log files, where they were rotated and hardly looked at for eternity. We ran with this setup in production for the first few months of Signpost&amp;#8217;s lifetime and it worked out just fine.&lt;/p&gt;
&lt;h4&gt;&lt;!-- more --&gt;Here&amp;#8217;s the basic idea:&lt;/h4&gt;
&lt;pre class="prettyprint"&gt;create table request (
  request_tracking_id int not null auto_increment,
  visitor_key varchar(64) not null,
  session_key varchar(36) not null,
  ip varchar(64) null,
  user_agent varchar(128) not null,
  uri varchar(255) not null,
  referer varchar(255) null,
  params text,
  email_user_id int null,
  user_id int null,
  response_code int not null,
  response_time int not null,
  timestamp timestamp not null,
  primary key (request_tracking_id)
);

create table event_tracking (
  event_tracking_id int not null auto_increment,
  visitor_key varchar(64) null,
  session_key varchar(36) null,
  event_name varchar(32) not null,
  timestamp timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'
  primary key (event_tracking_id)
);

create table event_tracking_property (
  event_tracking_property_id int not null auto_increment,
  event_tracking_id int not null,
  event_key varchar(32) not null,
  event_value text,
  primary key (event_tracking_property_id),
);
&lt;/pre&gt;
&lt;h4&gt;Problems&lt;/h4&gt;
&lt;p&gt;Capturing and tracking everything is great. With some trouble we can find which pages are working and which are not, where users are coming from (we use Google Analytics as well for this), what they are doing on the site, when they run into problems, and when our system fails to operate as expected. The only problem with our approach, was the crazy amount of work needed to access and analyze this data for simple things. For example:&lt;/p&gt;
&lt;p class="caption"&gt;How many clicks did deal 212001 get from a session starting with an email click?&lt;/p&gt;
&lt;pre class="prettyprint"&gt;select count(distinct et.event_tracking_id)
  from event_tracking et, event_tracking_property etp1, event_tracking_property etp2
 where et.event_tracking_id = etp1.event_tracking_id
   and etp1.event_tracking_id = etp2.event_tracking_id
   and et.event_name = 'deal-click'
   and etp1.event_key = 'deal_at_location_id'
   and etp1.event_value = '212001'
   and etp2.event_key = 'session_ca'
   and etp2.event_value = 'eml';
&lt;/pre&gt;
&lt;p&gt;In our new system, this would end up as:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;db.event.count({
  event_name: 'deal-click',
  'event_properties.deal_at_location_id': 212001,
  'event_properties.session_ca': 'eml'
});
&lt;/pre&gt;
&lt;p&gt;This is a simple case, but we can already see the thought that needs to go into the SQL vs the MongoDB query. The speed of both queries are comparable with the correct indexes. The request tracking design also made analysis on request parameters difficult. The parameters of a request were serialized into a query string format:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;raw=false&amp;amp;feedId=dealActivity&amp;amp;user-opts=username,avatar&amp;amp;count=100&amp;amp;no-cache=1306867013431&amp;amp;feedType=comment&amp;amp;userId=&amp;amp;targetUserId=&amp;amp;deal-opts=id,location-name,title,category&amp;amp;activity-opts=id,user,type,rawtime,commentcontent&amp;amp;types=commenter&amp;amp;dealId=243844
&lt;/pre&gt;
&lt;p&gt;This made it painfully slow to find requests with a certain parameter. Often we would just write some code in Java to parse the parameters and do the analysis. Not very efficient.&lt;/p&gt;
&lt;h4&gt;Desires&lt;/h4&gt;
&lt;p&gt;We wanted to ask more interesting questions in the future. Here are some of our desires:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Development wanted a way to see the session and requests that resulted in an exception&lt;/li&gt;
&lt;li&gt;Community Management wanted a better way to help users when they had issues on the site (seeing their requests and pages they visited)&lt;/li&gt;
&lt;li&gt;Business wanted both real-time and cumulative statistics, like the top 5 deals for today, or the deal with the most clicks in the past month&lt;/li&gt;
&lt;li&gt;Business wanted a way to see if users were gaming the referral bonuses&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;This is just a short list, but it is indicative of the desires of most sites I can think of.&lt;/p&gt;
&lt;h4&gt;Considerations&lt;/h4&gt;
&lt;p&gt;I wanted the system to be fast, but also flexible. I knew that we would have new questions all the time and I didn&amp;#8217;t want to re-compile some Java code or write long SQL queries every time. I also wanted a fast way to see aggregated data that I could graph over long periods of time. I had worked with &lt;a href="http://www.mongodb.org/" title="MongoDB" target="_blank"&gt;MongoDB&lt;/a&gt; before and had built some Java tooling to help create applications quickly (&lt;a href="http://www.mattinsler.com/tag/guiceymongo/" title="GuiceyMongo" target="_blank"&gt;GuiceyMongo&lt;/a&gt;). I knew that MongoDB was fast, and could handle a ton of writes per second (we&amp;#8217;re averaging 4,000+ per second on a single EC2 large instance also running a Tomcat server). I also knew that I could execute random Javascript as part of my queries, which gives me that flexibility I wanted. Sounds like a plan!&lt;/p&gt;
&lt;h4&gt;New System&lt;/h4&gt;
&lt;p&gt;The system that I came up with contains three collections: request, event, event.rollup. The simplest is request.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;{
  "_id" : ObjectId("4de53548b09a6541874eb641"),
  "method" : "GET",
  "response_code" : 200,
  "parameters" : {
    "raw" : "false",
    "feedId" : "dealActivity",
    "user-opts" : "username,avatar",
    "count" : "100",
    "no-cache" : "1306867013431",
    "feedType" : "comment",
    "userId" : "",
    "targetUserId" : "",
    "deal-opts" : "id,location-name,title,category",
    "activity-opts" : "id,user,type,rawtime,commentcontent",
    "types" : "commenter",
    "dealId" : "243844"
  },
  "user_agent" : "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.17) Gecko/20110420 Firefox/3.6.17 (.NET CLR 3.5.30729)",
  "referer" : "http://www.signpost.com/deals/Boston-MA/Cafe-Gigu/-5-for-10-Worth-of-Food-at-Cafe-Gigu-/243844",
  "visitor_key" : "6f6d7e229f2a06e6a0dcdac42b09b26d17dbfac2318be53bea72a996204e3731",
  "uri" : "/api/recent-activity",
  "user_id" : null,
  "session_key" : "008F75D5B7515B202DC1A610F9D86539",
  "ip" : "123.28.156.46",
  "response_time" : 4,
  "timestamp" : ISODate("2011-05-31T18:36:56.177Z")
}
&lt;/pre&gt;
&lt;p&gt;This is practically the same as the MySQL table, but with an embedded object for the query parameters. I have indexes on fields that we search often, like visitor_id, session_id, and uri. An example of a common query is to find all the visitor_id and session_ids for a user, along with the time range those ids were active. This allows me to see when and how long a user is on the site, and then dig deeper into their requests.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://content.screencast.com/users/racobac/folders/Jing/media/12f3d86c-d9f7-4aec-9ce9-72dcbaa2d3e8/00000039.png" target="_blank"&gt; &lt;img alt="Visitor ID breakdown" src="http://content.screencast.com/users/racobac/folders/Jing/media/12f3d86c-d9f7-4aec-9ce9-72dcbaa2d3e8/00000039.png" width="600px"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://content.screencast.com/users/racobac/folders/Jing/media/353d4bbb-a831-42c1-8215-94231ed0dd1c/00000040.png" target="_blank"&gt; &lt;img alt="Session ID breakdown" src="http://content.screencast.com/users/racobac/folders/Jing/media/353d4bbb-a831-42c1-8215-94231ed0dd1c/00000040.png" width="600px"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://content.screencast.com/users/racobac/folders/Jing/media/672dfd1d-1582-4382-a5fb-665c34a32cad/00000041.png" target="_blank"&gt; &lt;img alt="Request breakdown" src="http://content.screencast.com/users/racobac/folders/Jing/media/672dfd1d-1582-4382-a5fb-665c34a32cad/00000041.png" width="600px"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The queries for this data are pretty simple. For example, the first image comes from a map-reduce:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;db.request.mapReduce(function() {
  emit(this.visitor_key, {start: this.timestamp, end: this.timestamp});
}, function(key, values) {
  return {start: values[0].start, end: values[values.length - 1].end};
}, {
  query: {user_id: 12345},
  sort: {timestamp: 1}
});
&lt;/pre&gt;
&lt;p&gt;Now let&amp;#8217;s get into the events. I have two collections here, the event and event.rollup collections. The event collection is once again, very similar to the MySQL table that I started with, except with embedded objects and time window information for grouping the events.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;{
  "_id" : ObjectId("4de53c85c7d8327f56284f3a"),
  "visitor_key" : "50f46d60e00efe0a17c42c7fffdc68bda530572d1f83972a83d6c61d684ba3ce",
  "event_name" : "deal-click",
  "event_properties" : {
    "ca" : "dl",
    "deal_at_location_id" : 98303,
    "session_ca" : "dl",
    "ct" : "?",
    "cr" : "?",
    "node" : "web2",
    "bot" : "false",
    "user_id" : 163181,
    "guest" : "true",
    "session_cr" : "?",
    "session_ct" : "?"
  },
  "session_key" : "1C9FA142A59465B745ECE2C1ACB15E62",
  "timestamp" : ISODate("2011-05-31T19:07:48.533Z"),
  "window" : {
    "minute" : ISODate("2011-05-31T19:07:00Z"),
    "hour" : ISODate("2011-05-31T19:00:00Z"),
    "day" : ISODate("2011-05-31T00:00:00Z"),
    "week" : ISODate("2011-05-30T00:00:00Z"),
    "month" : ISODate("2011-05-01T00:00:00Z"),
    "year" : ISODate("2011-01-01T00:00:00Z")
  }
}
&lt;/pre&gt;
&lt;p&gt;This format is significantly easier to query, and we have pulled a whole lot of ad-hoc data just from querying these events like the query from the beginning:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;db.event.count({
  event_name: 'deal-click',
  'event_properties.deal_at_location_id': 212001,
  'event_properties.session_ca': 'eml'
});
&lt;/pre&gt;
&lt;p&gt;I&amp;#8217;ve written email tracking, exception tracking, email and session clickthrough tracking, and a bunch of other little close-to-realtime reports from this collection that we host on an internal Business Intelligence server. Iterating has been very quick, and I can usually answer most questions in a few minutes, if not much less. A quick sidenote, we also have a report table that contains the Javascript code to run all of our reports. To add a new report, write some Javascript and store it in that collection. Our internal BI website knows how to read that collection, present the relevant options to the user, and run the reports, resulting in pretty graphs, like this:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://content.screencast.com/users/racobac/folders/Jing/media/5fd0959a-1cb6-430f-ba65-1710b45f925f/00000042.png" target="_blank"&gt; &lt;img alt="Sample BI graph" src="http://content.screencast.com/users/racobac/folders/Jing/media/5fd0959a-1cb6-430f-ba65-1710b45f925f/00000042.png" width="600px"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So what is this and how is it done? This is where the event.rollup collection comes into play. It gives us the ability to do long-term analysis across a large aggregate of events, very very quickly. The graph above is actually the click traffic of the top 5 deals for May, summed by hour. The query took less than 2 seconds, and the graph took another 3 to render. That&amp;#8217;s pretty fast considering the top line is actually over 10,000 clicks (each an individual event). Here&amp;#8217;s how the rollup events look:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;{
  "_id" : ObjectId("4de54041c7d8327f56285563"),
  "c" : 1,
  "e" : "deal-click",
  "k" : [{
    "k" : "deal_at_location_id",
    "v" : 245931
  }],
  "s" : "month",
  "w" : ISODate("2011-05-01T00:00:00Z")
}
&lt;/pre&gt;
&lt;p&gt;This is a month rollup started May 1, 2011 for the event named &amp;#8220;deal-click&amp;#8221; with a key/value pair of deal_at_location_id = 245931 and a count of 1. There are over 12 million of these objects in the database currently, and they get updated every minute without raising the cpu usage of mongod past 5%. Pretty neat! We can filter, sort, and group these by any number of key/value pairs, since MongoDB can &lt;a href="http://www.mongodb.org/display/DOCS/Multikeys" title="index on objects within arrays" target="_blank"&gt;index on objects within arrays&lt;/a&gt;. This allows powerful queries like the graph above. I&amp;#8217;ll show you the basic code behind that query:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;function sum_top_n_by_range(from, to, event, property, count) {
  /* create collection that looks like
  {
    _id: property_value,
    value: count
  }
  */
  var r = db.event.rollup.mapReduce(function () {
    emit(this.k[0].v, this.c);
  }, function (k, v) {
    var s = 0;
    for (var i in v) {
      s += v[i];
    }
    return s;
  }, {
    query: {
      e: event,
      s: 'day',
      w: {$gte: from, $lte: to},
      k: {$size: 1},
      'k.k': property
    }
  });

  /* take the top count property_values */
  var keys = r.find().sort({value: -1}).limit(count).map(function (o) {return o._id;});
  r.drop();

  var cursor = db.event.rollup.find({
    e: event,
    s: 'hour',
    w: {$gte:from, $lte:to},
    k: {$size: 1},
    'k.k': property,
    'k.v': {$in: keys}
  }).sort({w: 1});

  var result = {};
  cursor.forEach(function(dataPoint) {
    /* organize data points in the result object as needed by your output */
  });
  
  return result;
}
&lt;/pre&gt;
&lt;p&gt;It&amp;#8217;s a lot of code, but not all that crazy, right?&lt;/p&gt;
&lt;h4&gt;Thoughts&lt;/h4&gt;
&lt;p&gt;I&amp;#8217;m very pleased with our decision to move this system to MongoDB. The nature of event and request tracking does not require pre-conceived columns and hard data constraints, which saves a lot of development and planning time. The ability to embed objects inside documents has made querying complex data and conceptualizing the records more straightforward. A large win was the ability to compute rollups in near real-time because of the upsert speed of MongoDB. I can create far more rollups than events if we choose to track more complex data and not worry about bogging down the database with writes. Currently we&amp;#8217;re clocking between 4,000 and 7,000 updates per second on a full rebuild, and around 500 per second during peak website usage. If this becomes a pain point, sharding out the rollups would be an easy solution to increase our write throughput.&lt;/p&gt;
&lt;h4&gt;MongoNYC 2011&lt;/h4&gt;
&lt;p&gt;I&amp;#8217;ll be speaking about this and showing more of the way Signpost uses MongoDB at the MongoNYC 2011 conference next week. If you don&amp;#8217;t have a ticket yet, &lt;a href="http://www.10gen.com/conferences/mongonyc2011" title="why not get one" target="_blank"&gt;why not get one&lt;/a&gt;?&lt;/p&gt;
&lt;h4&gt;Slides &amp;amp; Video&lt;/h4&gt;
&lt;p&gt;Slides: &lt;a href="http://bit.ly/jvwev4" title="http://bit.ly/jvwev4" target="_blank"&gt;&lt;a href="http://bit.ly/jvwev4"&gt;http://bit.ly/jvwev4&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26535917380</link><guid>http://www.mattinsler.com/post/26535917380</guid><pubDate>Tue, 31 May 2011 13:11:00 -0700</pubDate><category>mongodb</category><category>analytics</category><category>tracking</category></item><item><title>MongoDB Blog Contest - I Won!</title><description>&lt;p&gt;I entered my post on &lt;a href="http://www.mattinsler.com/why-and-how-i-replaced-amazon-sqs-with-mongodb/" title="Why (and How) I Replaced Amazon SQS with MongoDB" target="_blank"&gt;Why (and How) I Replaced Amazon SQS with MongoDB&lt;/a&gt; into the MongoDB Blog Contest and &lt;a href="http://blog.mongodb.org/post/677516152/blog-contest-winners" title="won" target="_blank"&gt;won&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;As part of the grand prize, I went out to &lt;a href="http://www.oscon.com/" title="OSCON" target="_blank"&gt;OSCON&lt;/a&gt; in Portland, OR and had a great time! &lt;a href="http://www.kchodorow.com/blog/" title="Kristina Chodorow" target="_blank"&gt;Kristina Chodorow&lt;/a&gt;&amp;#8217;s new book on MongoDB was on display and I spoke to lots of people who are interested in what MongoDB is and what it can do for them. I also went to a lot of talks, the most interesting of which (in my opinion) were about &lt;a href="http://nodejs.org/" title="node.js" target="_blank"&gt;node.js&lt;/a&gt; and the analytics done at Twitter using &lt;a href="http://hadoop.apache.org/" title="Hadoop" target="_blank"&gt;Hadoop&lt;/a&gt; and &lt;a href="http://hadoop.apache.org/pig/" title="Pig" target="_blank"&gt;Pig&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;If you haven&amp;#8217;t checked out node.js yet, you should. It&amp;#8217;s a really cool technology that provides an event based I/O system on top of Google&amp;#8217;s V8 javascript engine. Now you can take all of that javascript hacking that you&amp;#8217;ve been doing and write shell scripts using javascript (sort of like how your C# knowledge helps with Powershell) or even write full-fledged web servers! Node.js is incredibly fast and manages to stay fast at large quantities of connections by following an I/O strategy like &lt;a href="http://nginx.org/" title="NGINX" target="_blank"&gt;NGINX&lt;/a&gt; or even a windows application. Rather than having a thread pool to post work to or creating a new thread per connection, it uses a single thread and a message queue of operations (I&amp;#8217;m not sure if there are multiple threads with their own queues, but you get the idea). Also, node.js is designed such that any I/O operation at all requires a callback. This can make for some messy looking nested code, but ensures that you think about how to parallelize your application as much as possible. It&amp;#8217;s really easy to get started and has a great community behind it. My favorite starting point was &lt;a href="http://howtonode.org/express-mongodb" title="Blog rolling with mongoDB, express and Node.js" target="_blank"&gt;Blog rolling with mongoDB, express and Node.js&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I haven&amp;#8217;t personally used Pig and I&amp;#8217;m not currently using Hadoop, but if I were to start using Hadoop, Pig would be the first thing I would install. Hadoop allows you to store and process large amounts of data in a distributed fashion. In order to process the data, you write map and reduce methods in Java and then let Hadoop take over. The problem with this comes when you&amp;#8217;re rewriting large chunks of Java code and compiling over and over again to figure out how you want to view your data and how to process it. At Twitter, with over 75 Billion tweets, a simple map-reduce can take a long time! Then you need to debug your code, compile, and run. Then again, and again. Now if you&amp;#8217;re trying to do a distributed join, the pain really starts. Kevin Weil from Twitter showed the Java code, which is over 200 lines and needs to be written custom to each map-reduce you perform. That&amp;#8217;s slow and painful! In comes Pig. Pig is a scripting language that rides on top of Hadoop and allows you to quickly perform operations on your data without having to worry about writing the Java code and compiling. It&amp;#8217;s much more terse and readable and takes away a lot of the debugging headaches in writing your own joins or other complexities. It seems like a great addition to the map-reduce world!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26536651476</link><guid>http://www.mattinsler.com/post/26536651476</guid><pubDate>Thu, 29 Jul 2010 11:04:00 -0700</pubDate><category>mongodb</category><category>node.js</category></item><item><title>Why (and How) I Replaced Amazon SQS with MongoDB</title><description>&lt;h4&gt;What is Amazon SQS?&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://aws.amazon.com/sqs/"&gt;Amazon SQS&lt;/a&gt; (Simple Queue Service) is a reliable message queuing service hosted in the Amazon cloud. This service is ideal for sending messages between servers that need to acknowledge that processing has been completed. When a message is popped from the queue, it is not deleted, but marked with the client who has made the request. The client is then responsible for telling SQS to delete the message from the queue. If the client does not delete a message it has popped within a certain time frame, the client loses ownership of the message and it is made available for other clients.&lt;/p&gt;
&lt;h4&gt;How am I using it?&lt;/h4&gt;
&lt;p&gt;One of the systems I am using SQS for is a distributed email delivery service (using SMTP). Since there is not an asynchronous SMTP client for Java (that I know of), I am using JavaMail to deliver messages. Sending messages with JavaMail is pretty slow and can take a number of seconds per message, with a thread being consumed for each message sent. In order to send many many messages in parallel I decided to queue up the outoging messages and spin up many instances of the SMTP application. This approach is dead simple and scales wonderfully without needing to implement an asynchronous SMTP client of my own.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;h4&gt;So what&amp;#8217;s wrong with Amazon SQS?&lt;/h4&gt;
&lt;p&gt;The main problem with using SQS in the above scenario is that I can&amp;#8217;t push an entire email message onto the SQS queue since each SQS message is limited to 8K of data. To get around this, I store the message in MongoDB and then queue the message ID on SQS. Each client then needs to pull a message from the queue and then look up the email in Mongo. There&amp;#8217;s nothing really wrong with this approach, but it can be done better, faster, easier, and cheaper. Amazon charges me a fraction of a cent for every operation I perform on SQS. This doesn&amp;#8217;t seem like much, but if I have 10 SMTP applications polling SQS 4 times a second all day every day regardless of whether there are new messages to send, this can add up. Plus, I have diagnostic applications watching the queue size to see if I need to spin up more instances or take down instances. Even if this adds up to $10/day, that&amp;#8217;s still $3,650/year just to send out email. That&amp;#8217;s too much for a startup with no financial backing!&lt;/p&gt;
&lt;h4&gt;The approach&lt;/h4&gt;
&lt;p&gt;I have been using MongoDB for a while now and am enamored with what it can do. I know that it can store lots of schema-less data in 4MB chunks (a document is limited to 4MB) and can store larger files through the use of GridFS. I know that it&amp;#8217;s lightning fast (almost memcached speed) for indexed lookups and can handle thousands of operations per second without spiking the CPU over 10% even. I know that I&amp;#8217;m paying for the CPU and hard drive space on Amazon EC2 already and thoroughly enjoy minimizing my monthly, weekly, and even daily costs. Blah. Blah. Blah. I want to implement this in Mongo!&lt;/p&gt;
&lt;p&gt;With the introduction of server-side javascript and the findAndModify command, using MongoDB for a queue that can be accessed by any client language (of which there are a ton!) is just easy. Below is the code that I am using on my own projects.&lt;/p&gt;
&lt;h4&gt;The Code&lt;/h4&gt;
&lt;p class="caption"&gt;sqs.js&lt;/p&gt;
&lt;pre class="prettyprint"&gt;function sqsQueueExists(name) {
	return db.queue[name].count() != 0;
};

function sqsQueueMessageCount(name) {
	return db.queue[name].count({
		alive: true,
		expires: {$lt: new Date}
	});
};

function sqsDeleteQueue(name) {
	db.queue[name].drop();
};

function sqsListQueues(prefix) {
	var regex;
	if (prefix)
		regex = new RegExp('^[^.]+\.queue\.' + prefix + '[^$]*$');
	else
		regex = /^[^.]+\.queue\.[^$]+$/;

	return db.system.namespaces.find({
		name: regex
	}).map(function (x) {
		return x.name.substring(x.name.indexOf('.') + 7);
	});
};

function sqsPushMessage(queue, message) {
	var _push = function(queue, message) {
		db.queue[queue].save({
			alive: true,
			expires: new Date(0),
			owner: new ObjectId('000000000000000000000000'),
			body: message
		});
	};

	if (message instanceof Array) {
		message.forEach(function(m) {
			_push(queue, m);
		});
	} else {
		_push(queue, message);
	}
};

function sqsPopMessage(queue, owner, count) {
	var now = new Date;
	// 10 second expiration, change this to what you want
	var expires = new Date(now.getTime() + 10000);
	
	if (!count) {
		count = 1;
	}
	var result = [];
	for (var i = 0; i &amp;lt; count; ++i) {
		var item = db.queue[queue].findAndModify({
			query: {
				alive: true,
				expires: {$lt: now}
			},
			update: {
				$set: {
					expires: expires,
					owner: owner
				}
			},
			new: true
		});
		if (friendlyEqual({}, item))
			break;
		result.push(item);
	}
	return result;
};

function sqsDeleteMessage(queue, owner, item_ids) {
	if (item_ids instanceof ObjectId)
		item_ids = [item_ids];
	db.queue[queue].update({
		alive: true,
		expires: {$gte: new Date},
		owner: owner,
		_id: {$in: item_ids}
	}, {
		$set: {
			alive: false
		}
	},
	false,
	true);
};
&lt;/pre&gt;
&lt;p&gt;If you copy the code into a sqs.js file, you can then run the script below to create the stored procedures on the database of your choice. The alternative is to write the code in the &lt;a href="http://www.mongodb.org/display/DOCS/Drivers"&gt;MongoDB driver&lt;/a&gt; of your choice.&lt;/p&gt;
&lt;p class="caption"&gt;bash&lt;/p&gt;
&lt;pre class="prettyprint"&gt;$ for function in sqsQueueExists sqsQueueMessageCount sqsDeleteQueue sqsListQueues sqsPushMessage
	sqsPopMessage sqsDeleteMessage; do
	echo "db.system.js.save({_id: '$function', value: $function})" | 
		mongo [db name] --quiet --shell sqs.js
done
&lt;/pre&gt;
&lt;p class="caption"&gt;MongoDB Shell&lt;/p&gt;
&lt;pre class="prettyprint"&gt;&amp;gt; use [db name];
&amp;gt; load('sqs.js');
&amp;gt; [sqsQueueExists, sqsQueueMessageCount, sqsDeleteQueue, sqsListQueues, sqsPushMessage,
	sqsPopMessage, sqsDeleteMessage].forEach(function (x) {
	var name = x.toString().match(/^function\s(\w+)/)[1];
	db.system.js.save({_id: name, value: x});
});
&lt;/pre&gt;
&lt;p&gt;There are pieces of the SQS API that I have left out of this implementation for simplicity. These are based around the options provided, like changing the visibility timeout of a queue or individual message. This would be pretty trivial to add if necessary. However, if you are not using these features, leaving them out of the code will only make the code faster. To add options to a queue, just add another collection called queue that holds all queue names (in the _id field) and the applicable options. Then just do an query on the queue collection when the options are needed (the _id field is automatically indexed, so this will be fast).&lt;/p&gt;
&lt;p&gt;I hope this helps you out!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26536863600</link><guid>http://www.mattinsler.com/post/26536863600</guid><pubDate>Tue, 25 May 2010 12:09:00 -0700</pubDate><category>mongodb</category><category>queue</category><category>amazon sqs</category></item><item><title>MongoDB Java DAO Generator - GuiceyData</title><description>&lt;p&gt;One of the best things about MongoDB is the lack of an enforced schema for collections. This flexibility gives developers a lot of power in how they work with their data. Embedding records and arrays inside other records allows both a complexity and simplicity of data organization that RDBMSs can only dream of! All of that being said, working with these records in a language like Java and on large diverse teams of people who don&amp;#8217;t want to open the database and inspect the records to see what values and sub-records are available, means that you will always spend time wrapping these records in a strong-typed class. Wrapping up loose data into classes that can both access and create that data sounds just like another project I&amp;#8217;ve used recently. If you haven&amp;#8217;t heard of &lt;a href="http://code.google.com/p/protobuf/"&gt;Google&amp;#8217;s Protocol Buffers&lt;/a&gt;, you might want to acquaint yourself.&lt;/p&gt;
&lt;p&gt;Since I&amp;#8217;ve enjoyed working with Protocol Buffers so much, I thought I could mimic their functionality and ease of use with MongoDB. This would also integrate beautifully with the &lt;a href="http://www.mattinsler.com/tag/guiceymongo/"&gt;GuiceyMongo&lt;/a&gt; project that I released a month or two ago.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;h4&gt;What is the GuiceyData Generator?&lt;/h4&gt;
&lt;p&gt;The GuiceyData Generator is a quick and easy way to specify strongly typed data structures to be stored in a MongoDB database and mapped to wrappers and builders in Java. The resulting classes are strictly data and have (by design) very limited functionality. Their purpose is to make reading and storing data in MongoDB a breeze from Java.&lt;/p&gt;
&lt;h4&gt;How does it work?&lt;/h4&gt;
&lt;p&gt;It&amp;#8217;s pretty straightforward. You create a data definition file and then run the generator. This will create wrappers and builders for all of the types you define. Here&amp;#8217;s a very simple example:&lt;/p&gt;
&lt;p class="caption"&gt;simple.data&lt;/p&gt;
&lt;pre class="prettyprint"&gt;data Person {
  string name;
  set&amp;lt;string&amp;gt; alias;
  blob picture;
}
&lt;/pre&gt;
&lt;p&gt;Then you can run the generator:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;$ java -cp guiceymongogenerator-0.1.0.jar 
com.lowereast.guiceymongo.data.generator.GuiceyDataGenerator -p test.data -s src
&lt;/pre&gt;
&lt;p&gt;This will create the directory structure and files below:&lt;/p&gt;
&lt;pre&gt;+ src
  + test
    + data
      - Person.java
&lt;/pre&gt;
&lt;h4&gt;Wrappers&lt;/h4&gt;
&lt;p&gt;Once you have generated the code, you can do something like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;Person p = Person.wrap(personCollection.findOne());
if (p.hasName())
  log.trace(p.getName());
if (p.getAliasCount() &amp;gt; 0)
  log.trace(StringUtil.join(p.getAliasSet(), ", "));
if (p.hasPicture()) {
  Image picture = ImageIO.read(p.getPictureInputStream());
  // do something with the picture
}
&lt;/pre&gt;
&lt;p&gt;Please note that in the example above, blobs will only work if you are using GuiceyMongo to configure your databases, collections, and buckets. Using GuiceyMongo, we could also just do this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;@Inject
void loadPerson(@GuiceyMongoCollection("person")
  GuiceyCollection personCollection) {
  Person p = personCollection.findOne();
  // ...
}
&lt;/pre&gt;
&lt;h4&gt;Builders&lt;/h4&gt;
&lt;p&gt;What would be the use of being able to read data if we couldn&amp;#8217;t create it as well? For this, there are builders:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;Person.Builder p = Person.newBuilder()
                         .setName("Matt Insler")
                         .addAlias("Matt")
                         .addAlias("Guice &amp;amp; Mongo Guru")
                         .setPictureBucket("pictures");
ImageIO.write(picture, format, p.getPictureOutputStream());
personCollection.save(p.build());
&lt;/pre&gt;
&lt;p&gt;And once again with GuiceyMongo:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;@Inject
void loadPerson(@GuiceyMongoCollection("person")
  GuiceyCollection personCollection) {
  Person p = Person.newBuilder() // ...
  personCollection.save(p);
}
&lt;/pre&gt;
&lt;p&gt;It&amp;#8217;s really just that easy!&lt;/p&gt;
&lt;h4&gt;Builder Prototypes&lt;/h4&gt;
&lt;p&gt;You can create builders based on another object. This can be a wrapper or a builder itself. This is useful when you would like to use a prototype object or just copy an object that you&amp;#8217;ve just read.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;Person p = Person.wrap(collection.findOne());
Person newPerson = Person.newBuilder(p);
&lt;/pre&gt;
&lt;h4&gt;More Complex&lt;/h4&gt;
&lt;p&gt;Here&amp;#8217;s an example of a more complex data definition file that shows you more of what the generator can handle:&lt;/p&gt;
&lt;p class="caption"&gt;Contact.data&lt;/p&gt;
&lt;pre class="prettyprint"&gt;data Contact {
  data Address {
    string street_1;
    string street_2;
    string city;
    string state;
    int zip_code;
  }
  
  [identity]
  string identity;
  
  string first_name;
  string last_name;
  map&amp;lt;string, Address&amp;gt; address;
  map&amp;lt;string, string&amp;gt; phone_number;
  map&amp;lt;string, string&amp;gt; email_address;
  map&amp;lt;string, InstantMessenger&amp;gt; instant_messenger;
  set&amp;lt;string&amp;gt; tag;
  blob picture;
}

data InstantMessenger {
  enum Application {
    AIM,
    ICQ,
    Jabber,
    MSN,
    Yahoo
  }

  string screen_name;
  string alias;
  IMApplication application;
}
&lt;/pre&gt;
&lt;h4&gt;Integration with GuiceyMongo &lt;a href="http://www.mattinsler.com/guiceymongo-server-side-method-proxies/"&gt;Stored Procedure Proxies&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Now that you have generated code to access and build your data, what if you want to return that data from a stored procedure? This is pretty easy!&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public interface ContactQuery {
  Contact findContactByName(String name);
  List findContactsByIMAlias(String alias);
}

@Inject
void exercise(ContactQuery query) {
  query.findContactByName("Matt Insler");
}
&lt;/pre&gt;
&lt;p&gt;Not too bad, right?&lt;/p&gt;
&lt;h4&gt;Supported Data Types&lt;/h4&gt;
&lt;p&gt;Basic Java Types&lt;/p&gt;
&lt;pre class="tabular"&gt;Guicey Type                       Java Type

double                      =&amp;gt;    double
float                       =&amp;gt;    float
int32                       =&amp;gt;    int32
int64                       =&amp;gt;    int64
bool                        =&amp;gt;    boolean
string                      =&amp;gt;    String
date                        =&amp;gt;    java.util.Date
&lt;/pre&gt;
&lt;p&gt;MongoDB Types&lt;/p&gt;
&lt;pre class="tabular"&gt;Guicey Type                       Java Type

object_id                   =&amp;gt;    ObjectId
db_object                   =&amp;gt;    DBObject
db_timestamp                =&amp;gt;    DBTimestamp
&lt;/pre&gt;
&lt;p&gt;Collection Types&lt;/p&gt;
&lt;pre class="tabular"&gt;Guicey Type                       Java Type

list&amp;lt;[type]&amp;gt;                =&amp;gt;    List&amp;lt;[type]&amp;gt;
set&amp;lt;[type]&amp;gt;                 =&amp;gt;    Set&amp;lt;[type]&amp;gt;
map&amp;lt;string, [type]&amp;gt;         =&amp;gt;    Map&amp;lt;String, [type]&amp;gt;
map&amp;lt;[enum type], [type]&amp;gt;    =&amp;gt;    Map&amp;lt;[enum type], [type]&amp;gt;

where type is:
- Basic Type
- MongoDB Type
- User Data Type
- User Enum Type
&lt;/pre&gt;
&lt;h4&gt;User Data Types&lt;/h4&gt;
&lt;pre class="prettyprint"&gt;data [name] {
  [type] [property name];
  ...
  OtherData.Embedded [property name];
  data [embedded type name] {
    [type] [property name];
    ...
  }
}
&lt;/pre&gt;
&lt;h4&gt;User Enum Types&lt;/h4&gt;
&lt;pre class="prettyprint"&gt;enum [name] {
  [value],
  ...
  [value]
}
&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;A note on enum types:&lt;/strong&gt; User defined enum types will be stored in MongoDB as [enum value].name(). Conversions will be made when reading and storing these values. Be very very careful when changing enum values. They will need to be changed in the database as well.&lt;/p&gt;
&lt;h4&gt;Options&lt;/h4&gt;
&lt;p&gt;Currently there is only one available option. This is the identity option. Options are specified directly above the property they should apply to and are enclosed in square braces.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;data Foo {
  [identity]
  string identity;
}
data Bar {
  [identity]
  object_id identity;
}
&lt;/pre&gt;
&lt;p&gt;The identity option specifies that this property should be read and stored as an ObjectId and used as the _id value. The generated code will perform automatic conversions between ObjectId and String if the property&amp;#8217;s type is string.&lt;/p&gt;
&lt;h4&gt;Why not do what others are doing?&lt;/h4&gt;
&lt;p&gt;In two words, personal preference. While I think that other available libraries have their place, I have certain preferences that have lead me to develop GuiceyData the way that I did.&lt;/p&gt;
&lt;h4&gt;GuiceyData vs. &lt;a href="http://code.google.com/p/morphia/"&gt;Morphia&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Morphia is a JPA style library that allows you to annotate POJOs and then convert them to and from DBObjects. It&amp;#8217;s rather nice and supports a bunch of features like strong types queries, DAO/Datastore generation, embedded and referenced data objects, and more. It&amp;#8217;s definitely worth checking out and could be the right library for you.&lt;/p&gt;
&lt;p&gt;The main positive about morphia is the ability to persist POJOs that you&amp;#8217;ve already created and been using. Just annotate the fields and go! What I don&amp;#8217;t like is that I don&amp;#8217;t like dealing with full object conversions. The code isn&amp;#8217;t as fast or efficient as it could be because it has to use reflection and must process all fields of an object regardless of what fields are actually being used in your code. Plus, with a schema-less database like mongo, you can&amp;#8217;t assure that all of the values in the POJO will be filled in when you convert the object, so you can&amp;#8217;t use primitive types since they are not nullable. Another problem I have is that I really really really don&amp;#8217;t like creating my own custom data stores and the lack of outside control on what database and collections are being used. With GuiceyMongo I can generate object wrappers that will lazily access the data with code written specifically for the object I&amp;#8217;m dealing with. No reflection and lazy loading and processing == really fast! GuiceyMongo also separates the data spec from java code. This is very useful for when you want to share objects across an organization that uses different languages to access the database. All languages can be generated from the same specification. In my opinion (realize that I am the creator of GuiceyMongo), GuiceyMongo achieves more than morphia in a cleaner way and is faster.&lt;/p&gt;
&lt;h4&gt;GuiceyData vs. &lt;a href="http://java.dzone.com/articles/using-mongodb-sculptor"&gt;Sculptor&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;Sculptor is a code generator that has been adapted to create MongoDB conversion code. It follows the idea of Rails by scaffolding lots of methods and classes to help you deal with your data classes. It&amp;#8217;s a bit daunting to understand exactly what&amp;#8217;s going on, but you can figure it all out once you play with it a bit. Just like morphia, Sculptor will convert your data objects to and from DBObjects. Unlike morphia, Sculptor generates this conversion code which makes it faster since there&amp;#8217;s no reflection. What I really like about Sculptor is query language that it provides. This is a much easier way to query your generated objects than constructing DBObjects by hand or even using QueryBuilder in the com.mongodb package. This functionality has caused me to start creating my own query DSL that I will hopefully release in the next month. Check it out!&lt;/p&gt;
&lt;p&gt;Now for the criticisms. My main and largest criticism of Sculptor is that it&amp;#8217;s so flexible that it becomes complex. I&amp;#8217;m a large proponent of your data objects being just that&amp;#8230; Data objects. I don&amp;#8217;t see a very compelling use case for scaffolding classes that inherit from your data classes and stubbing out extra functionality. If you want that functionaliy, write it, but leave it out of your data definition. The same goes for repositories. They really aren&amp;#8217;t buying you that much. Why do you need to generate the common methods for each and every type of repository? Isn&amp;#8217;t this what generics are for? All of the base repository classes will be identical, except for the methods that you add to them. Scaffolding is nice, but not necessary. I also think that generating the conversions is a bit short-sighted. If you&amp;#8217;re generating code, why not generate lazy loading in your data objects? I understand that people like POJOs, but they&amp;#8217;re very slow. We&amp;#8217;re dealing with databases here. Your bottleneck should be the queries and data processing, not constructing the objects around the data that you&amp;#8217;ve received or are storing. My last gripe is something that I chose not to support in GuiceyMongo because of the fact that MongoDB is schema-less. Sculptor supports object inheritance. This is a nice feature, but I&amp;#8217;m not sure that it&amp;#8217;s absolutely necessary. An object in MongoDB can be many different types all rolled into one. I can create an email object and then overlay a task object on top of it. I don&amp;#8217;t want an inheritance chain for this. I want a way to convert one object type to another. Wrapping raw DBObjects with accessors is a very clean and fast way to do this. In GuiceyData all you need to do is Email e = Email.wrap(&amp;#8230;); Task t = Task.convertFrom(e); The data types can then be stored in the database in the same object or in different ones, or even different collections.&lt;/p&gt;
&lt;h4&gt;Enterprise Usage&lt;/h4&gt;
&lt;p&gt;Both GuiceyMongo and GuiceyData were created with enterprise usage in mind. GuiceyMongo allows you to create multiple configuration environments, like Test, QA, Prod, etc. where you can map databases, collections, and buckets (GridFS) to any server, database, or collection that you want. Just by changing the configuration, your data can be loaded from or saved to somewhere else. If this configuration is abstracted into a config file or configuration library, this can be shared across the enterprise and overridden locally if necessary. As for the data generation, the definition files only talk about data. That&amp;#8217;s all they ever will talk about. This means that code can be generated in many different languages with ease. If you are writing a data access library that will be distributed to other teams, you can use Google Guice to provide an easy way to configure your library, and package the generated data classes. There is no need to make your users generate the code, just use your jar or so or dll.&lt;/p&gt;
&lt;p&gt;The integration of GuiceyMongo and GuiceyData will make it dead easy to get started with MongoDB from Java. In the future I&amp;#8217;ll write a quickstart post to s how everyone how quick and easy it can really be!&lt;/p&gt;
&lt;h4&gt;Try it!&lt;/h4&gt;
&lt;p&gt;Just head over to the &lt;a href="http://github.com/mattinsler/com.lowereast.guiceymongo/downloads"&gt;github download page&lt;/a&gt; and grab the 0.1.0 jars. To generate your classes run the generator like above. To use GuiceyMongo, add a reference to it and it&amp;#8217;s dependencies and look at my &lt;a href="http://www.mattinsler.com/tag/guiceymongo/"&gt;other posts&lt;/a&gt; on the subject.&lt;/p&gt;
&lt;h4&gt;Dependencies&lt;/h4&gt;
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;guiceydatagenerator-[version].jar&lt;/strong&gt; - None&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;guiceymongo-[version].jar&lt;/strong&gt; - aopalliance.jar (included with Google Guice), guice-2.0.jar, mongo-1.4.jar (or later)&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;As always, any issues you encounter, please &lt;a href="mailto:matt.insler@gmail.com"&gt;contact me&lt;/a&gt;!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26545570530</link><guid>http://www.mattinsler.com/post/26545570530</guid><pubDate>Thu, 20 May 2010 15:29:00 -0700</pubDate><category>guiceymongo</category><category>mongodb</category><category>java</category><category>guiceydata</category></item><item><title>MongoNYC Conference May 21st</title><description>&lt;p&gt;MongoDB is coming to NYC in May. &lt;a href="http://decav.com/Blogs/Andre/Default.aspx"&gt;Andre deCavaignac&lt;/a&gt; and I will be speaking at the conference about MongoDB in C#/.Net. We have a cool demo to show everyone and I&amp;#8217;ll be posting some of the source here after the conference.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.10gen.com/event_mongony_10may21"&gt;&lt;img alt="MongoNYC May 21st Badge" src="http://www.mattinsler.com/wp-content/uploads/2010/04/badge-mongonyc-large.png"/&gt;&lt;/a&gt; &lt;a href="http://www.10gen.com/event_mongony_10may21"&gt;Click here to register for the conference&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.mongodb.org/"&gt;MongoDB&lt;/a&gt; is a document-oriented database that I am currently enamored with and use in all of my data-driven projects. It&amp;#8217;s created for and supported on &lt;a href="http://www.mongodb.org/display/DOCS/Downloads"&gt;most environments&lt;/a&gt; and has &lt;a href="http://www.mongodb.org/display/DOCS/Drivers"&gt;drivers&lt;/a&gt; for most of the programming languages you could want to use. Just imagine all of your data stored as JSON objects, accessible by a fairly powerful (and getting better every month) query language, along with lightning fast documents updates, inserts and queries.&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26548506703</link><guid>http://www.mattinsler.com/post/26548506703</guid><pubDate>Fri, 30 Apr 2010 11:07:00 -0700</pubDate><category>mongodb</category></item><item><title>Guicey Tips - Google Guice Provider Modules</title><description>&lt;p&gt;Following up on my post about &lt;a href="http://www.mattinsler.com/google-guice-module-de-duplication/"&gt;Google Guice Module De-Duplication&lt;/a&gt; I want to show a good use of singleton modules. Motivation for the Provider Module pattern came from writing the &lt;a href="http://www.mattinsler.com/tag/guiceymongo"&gt;GuiceyMongo&lt;/a&gt; library. Every time a database or collection is configured, I need to install a provider for that database key or that collection key. Since there can be a Production, Test, QA, etc. configuration for each database or collection key I would get an exception when binding the provider twice for the same key. I needed a way to only install each provider once. This is where Provider Modules come into play.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;Rather than binding the database provider like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;bind(DB.class).toProvider(new DBProvider("dbName"));
&lt;/pre&gt;
&lt;p&gt;We would bind it like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;install(new DBProviderModule("dbName"));
&lt;/pre&gt;
&lt;p&gt;The real difference is between DBProvider and DBProviderModule. Here is a simple example of these classes:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public class DBProvider implements Provider {
  private final String _database;
  public DBProvider(String database) {
    _database = database;
  }
  public DB get() {
    try {
      return new Mongo().getDB(_database);
    } catch (Exception e) {
      return null;
    }
  }
}
&lt;/pre&gt;
&lt;pre class="prettyprint"&gt;public class DBProviderModule extends SingletonModule&amp;lt;Key&amp;gt; implements Provider {
  private final String _database;
  public DBProviderModule(String database) {
    super(Key.get(DB.class, Names.named(database)));
    _database = database;
  }
  public void configure(Binder binder) {
    binder.skipSources(DBProviderModule.class).bind(DB.class).toProvider(this);
  }
  public DB get() {
    try {
      return new Mongo().getDB(_database);
    } catch (Exception e) {
      return null;
    }
  }
}
&lt;/pre&gt;
&lt;p&gt;In this example, there is one provider created for each DB, @Named combination. You can create your own meaningful annotations rather than using the @Named annotation that comes with Guice. In GuiceyMongo I use @GuiceyMongoDatabase for this purpose.&lt;/p&gt;
&lt;p&gt;The beauty of the Provider Module approach is that the same module can be installed multiple times without causing any binding errors. This allows your modules to be reused without worrying that the providers will conflict with each other. They can be installed multiple times, but will only be configured once.&lt;/p&gt;
&lt;p&gt;I hope this helps!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26548600649</link><guid>http://www.mattinsler.com/post/26548600649</guid><pubDate>Tue, 13 Apr 2010 05:38:00 -0700</pubDate><category>google guice</category><category>java</category></item><item><title>Guicey Tips - Google Guice Module De-Duplication</title><description>&lt;p&gt;There is no current way in the &lt;a href="http://code.google.com/p/google-guice/"&gt;Google Guice&lt;/a&gt; API to run a Module&amp;#8217;s configure method only once. This might not seem like a big deal until you try writing something like a logging configuration module or any other module that could be added or augmented in multiple places in your code. For instance, in my &lt;a href="http://www.mattinsler.com/tag/guiceymongo/"&gt;GuiceyMongo&lt;/a&gt; library, I wanted to allow users to add configurations from multiple modules. So the CalendarModule can add the calendar collection to the configuration and the StopwatchModule can add the stopwatch collection to the configuration. This is similar to the effect that the &lt;a href="http://code.google.com/p/google-guice/wiki/Multibindings"&gt;Guice Multibindings&lt;/a&gt; extension accomplishes.&lt;/p&gt;
&lt;p&gt;So, how do you do this in your own code? &lt;a href="http://publicobject.com/2008/05/elite-guice-2-binding-de-duplication.html"&gt;Elite Guice 2: Binding de-duplication&lt;/a&gt; over at publicobject tells us how. Just override the hashCode and equals. It would look something like this:&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public class RunOnceModule implements Module {
  public void configure(Binder binder) {
    // ...
  }

  @Override
  public boolean equals(Object obj) {
    return RunOnceModule.class.equals(obj.getClass());
  }

  @Override
  public int hashCode() {
    return RunOnceModule.class.hashCode();
  }
}
&lt;/pre&gt;
&lt;p&gt;In my GuiceyMongo library I have abstracted this idea into a reusable class that I call &lt;a href="http://github.com/mattinsler/com.lowereast.guiceymongo/blob/master/src/com/lowereast/guiceymongo/guice/internal/SingletonModule.java"&gt;SingletonModule&lt;/a&gt;. It looks something like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public abstract class SingletonModule implements Module {
  protected T key;

  public SingletonModule(T key) {
    this.key = key;
  }

  @Override
  public boolean equals(Object obj) {
    return obj instanceof SingletonModule
      &amp;amp;&amp;amp; ((SingletonModule)obj).key.equals(key);
  }

  @Override
  public int hashCode() {
    return key.hashCode();
  }

  @Override
  public String toString() {
    return getClass().getName() + "(key=" + key.toString() + ")";
  }
}
&lt;/pre&gt;
&lt;p&gt;This allows me to use any type of key object, such as the Class object of the module, or a Keyobject if necessary.&lt;/p&gt;
&lt;p&gt;This concept becomes very useful in a pattern I like to call Provider Modules, where a singleton module is used to configure a provider of some type only once. I&amp;#8217;ll explain this a bit more in a future post.&lt;/p&gt;
&lt;p&gt;When using this pattern, your module&amp;#8217;s configure method will be called at one of two points (whichever is first):&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Binder.install(&amp;#8230;)&lt;/li&gt;
&lt;li&gt;Guice.createInjector(&amp;#8230;)&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Knowing this, you cannot assume anything about when your module will be executed. You cannot keep around information to bind when configure is run. Instead try to bind right away and use a singleton module to configure things like shared providers.&lt;/p&gt;
&lt;p&gt;In a future post, I&amp;#8217;ll discuss how Guice Multibindings and GuiceyMongo use singleton modules to achieve a kind of distributed binding augmentation.&lt;/p&gt;
&lt;p&gt;I hope this helps!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26548709502</link><guid>http://www.mattinsler.com/post/26548709502</guid><pubDate>Mon, 05 Apr 2010 16:01:00 -0700</pubDate><category>google guice</category><category>java</category></item><item><title>Google Guice &amp; MongoDB - GuiceyMongo Configuration</title><description>&lt;p&gt;After using &lt;a href="http://code.google.com/p/google-guice/"&gt;Google Guice&lt;/a&gt; for a while, I have come to love it.  I feel the same way about &lt;a href="http://www.mongodb.org/"&gt;MongoDB&lt;/a&gt;, especially with all the new features they&amp;#8217;ve been adding since 1.0.  Following the shiny object, like I do sometimes, I wanted a good way to combine the two.  In my own projects I have done this a few different ways, but kept finding the need for something just slightly different.  Eventually I ended up with a few features I would want in this kind of library:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;annotation-based selection of the database or collection to inject&lt;/li&gt;
&lt;li&gt;easy to understand configuration (via Guice or other configuration sources)&lt;/li&gt;
&lt;li&gt;ability to map a database key to the actual database name (e.g. always refer to the &amp;#8220;User&amp;#8221; database, but map it to &amp;#8220;prod_user&amp;#8221; or &amp;#8220;test_user&amp;#8221; in the configuration)&lt;/li&gt;
&lt;li&gt;ability to have configurations that map database/collection keys to different names for different environments, like Test, QA, and Production&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;To explain the approach I have gone with, I won&amp;#8217;t waste any more time and just show some code examples.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;pre class="prettyprint"&gt;Injector injector = Guice.createInjector(
  GuiceyMongo.configure("TEST")
    .mapDatabase("MAIN").to("test")
    .mapCollection("USER").to("user").inDatabase("MAIN"),

  GuiceyMongo.configure("PROD")
    .mapDatabase("MAIN").to("prod")
    .mapCollection("USER").to("user").inDatabase("MAIN"),

  GuiceyMongo.chooseConfiguration("TEST")
);
&lt;/pre&gt;
&lt;p&gt;Here I create two configurations, TEST and PROD and set up a MAIN database and a USER collection in each.  I can map any number of databases and collections within a configuration.  Above, the TEST configuration for database key MAIN is mapped to actual database &amp;#8220;test&amp;#8221; and the PROD configuration for the same key is mapped to actual database &amp;#8220;prod&amp;#8221;.  As for collection mappings, both map the USER collection key to actual collection &amp;#8220;user&amp;#8221; within their respective databases.  Collection mappings must reference a mapped database key.&lt;/p&gt;
&lt;p&gt;The last line is very important and should only exist once in your application.  It activates one of the configurations that you have set up.  Only the mappings within the chosen configuration will be active. This could be passed in from a command line argument and then set in the injector creation for more flexibility.&lt;/p&gt;
&lt;p&gt;Now that we have defined our configurations and created an injector, how do we request an injection of a specific database or collection?  Using annotations of course!  The @GuiceyMongoDatabase annotation is used with a database key to inject a DB object, and the @GuiceyMongoCollection annotation is used with a collection key to inject a DBCollection object.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public class Foo {
  @Inject
  Foo(@GuiceyMongoDatabase("MAIN") DB database,
    @GuiceyMongoCollection("USER") DBCollection collection) {
    ...
  }
}
&lt;/pre&gt;
&lt;p&gt;When working on a large application, it&amp;#8217;s probably a good idea to create classes that encapsulate the required databases or collections for the individual module and then create your mappings from a central location that installs all the modules.&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public class UserModule extends AbstractModule {
  public static final class Collections {
    private Collections() {}
    public static final String User = "USER";
  }

  @Override
  protected void configure() {
    ...
  }
}
&lt;/pre&gt;
&lt;pre class="prettyprint"&gt;public class ApplicationModule extends AbstractModule {
  public static final class Configurations {
    private Configurations() {}
    public static final String Production = "PROD";
    public static final String Test = "TEST";
  }
  public static final class Databases {
    private Databases() {}
    public static final String Main = "MAIN";
  }

  private final String _configuration;
  public ApplicationModule(String configuration) {
    _configuration = configuration;
  }

  @Override
  protected void configure() {
    install(new UserModule());
    install(GuiceyMongo.configure(Configurations.Test)
      .mapDatabase(Databases.Main).to("test")
      .mapCollection(UserModule.Collections.User).to("user").inDatabase(Databases.Main)
    ),
    install(GuiceyMongo.chooseConfiguration(_configuration));
  }
}
&lt;/pre&gt;
&lt;p&gt;Hopefully this quick walkthrough piques your interest in &lt;a href="http://bit.ly/guiceymongo"&gt;GuiceyMongo&lt;/a&gt; and gives you an idea of how easy the setup actually is. There are more features implemented and I have more planned. I&amp;#8217;ll be blogging about other features in the future, so check back periodically!&lt;/p&gt;
&lt;p&gt;If you would like to use GuiceyMongo, it is available to the public:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;information: &lt;a href="http://www.mattinsler.com/tag/guiceymongo/"&gt;Related Posts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;source: &lt;a href="http://bit.ly/guiceymongo"&gt;GuiceyMongo source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;jars: &lt;a href="http://bit.ly/guiceymongo-jars"&gt;GuiceyMongo jars&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;I plan on maintaining and enhancing this library regularly and would love to hear your input! If you use my library, please leave me a comment so I know that people are finding it useful. Thanks!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26548783175</link><guid>http://www.mattinsler.com/post/26548783175</guid><pubDate>Wed, 17 Mar 2010 13:00:00 -0700</pubDate><category>google guice</category><category>guiceymongo</category><category>java</category><category>mongodb</category></item><item><title>Google Guice &amp; MongoDB - GuiceyMongo Stored Procedure Proxies</title><description>&lt;p&gt;Writing and maintaining stored procedures in a SQL database was both painful and annoying to manage, test, and easily encapsulate in code. I usually ended up with an interface defining all of the stored procedures and a lengthy class implementation with JDBC templates or lines of setup and teardown code for each procedure to run. If the procedure changed on the server, I needed to change the interface and a bit of code in the implementation. As much as I hated maintaining this, I know that stored procedures are necessary.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.mongodb.org/"&gt;MongoDB&lt;/a&gt; has a version of stored procedures. This is javascript code that you can store in your database and call using the eval command. It&amp;#8217;s a very simple system to use, although not completely intuitive to SQL people. Just write your javascript code and then save it into the system.js collection within the database you want to call it from.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;&lt;/p&gt;
&lt;pre class="prettyprint"&gt;&amp;gt; db.system.js.save({
  _id: 'hello_world',
  value: function() {
    print('hello world!')
  }
})
&lt;/pre&gt;
&lt;p&gt;When using the MongoDB shell, you have to realize that these functions will not be available to call directly. You must use an eval. So to call our hello_world function, we must do this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;&amp;gt; db.eval('hello_world()')
&lt;/pre&gt;
&lt;p&gt;This will print out &amp;#8220;null&amp;#8221; in the shell window (since we don&amp;#8217;t return a value) and &amp;#8220;hello world!&amp;#8221; in the server log (or server output if you ran it from a command shell). You can return any value that you want except cursors. So returning the number 5 or {value: 5} is allowed.&lt;/p&gt;
&lt;p&gt;Now, onto the Java code. Let&amp;#8217;s assume that you have two methods saved in your test database defined as follows:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;&amp;gt; db.system.js.save({
  _id: 'getCount',
  value: function() {
    return db.data.count()
  }
})
&amp;gt; db.system.js.save({
  _id: 'getData',
  value: function(count) {
    return db.data.find().limit(count).toArray()
  }
})
&lt;/pre&gt;
&lt;p&gt;Your code might look something like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public class Data {
  private static Logger _logger = Logger.getLogger(Data.class);
  private final DB _database;

  public Data(DB database) {
    _database = database;
  }

  void doSomething() {
    Integer count = (Integer)_database.eval("return getCount()");
    if (count == null) {
      _logger.error("Could not execute getCount");
      return;
    }

    List data = (List)_database.eval(
        "function(c){return getData(c)}",
        Math.min(20, count)
    );
    if (data == null)
      return;

    if (data != null) {
      for (DBObject obj : data) {
        System.out.println(obj);
      }
    }
  }

  public static void main(String[] args) {
    Mongo mongo;

    try {
      mongo = new Mongo();
    } catch (MongoException e) {
      // ...
    } catch (UnknownHostException e) {
      // ...
      return;
    }

    DB database = mongo.getDB("test_db");
    Data data = new Data(database);
    data.doSomething();
  }
}
&lt;/pre&gt;
&lt;p&gt;With &lt;a href="http://bit.ly/guiceymongo"&gt;GuiceyMongo&lt;/a&gt; your code could look like this:&lt;/p&gt;
&lt;pre class="prettyprint"&gt;public interface DataQuery {
  List getData(int count);
  int getCount();
}

public class Data {
  private static Logger _logger = Logger.getLogger(Data.class);
  private final DataQuery _query;

  @Inject
  Data(DataQuery query) {
    _query = query;
  }

  void doSomething() {
    try {
      int count = _query.getCount();
      List data = _query.getData(Math.min(20, count));
      if (data != null) {
        for (DBObject obj : data) {
          System.out.println(obj);
        }
      }
    // throwing from a server-side method will be caught in a
    // GuiceyMongoEvalException rather than a MongoEvalException
    } catch (GuiceyMongoEvalException e) {
      _logger.error(e);
    }
  }

  public static void main(String[] args) {
    Injector injector = Guice.createInjector(
        GuiceyMongo.configure("Test")
          .mapDatabase("Main").to("test_db"),

        GuiceyMongo.javascriptProxy(DataQuery.class, "Main"),

        GuiceyMongo.chooseConfiguration("Test")
    );
    Data data = injector.getInstance(Data.class);
    data.doSomething();
  }
}
&lt;/pre&gt;
&lt;p&gt;For more on configuring your databases and collections, check out &lt;a href="http://www.mattinsler.com/guiceymongo-injecting-databases-and-collections/"&gt;GuiceyMongo – Injecting databases and collections&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Changes in the server method only need to be reflected in the interface definition. All implementation code is taken care of for you.&lt;/p&gt;
&lt;p&gt;Using a query interface and &lt;a href="http://code.google.com/p/google-guice/"&gt;Google Guice&lt;/a&gt; injection also makes mocking and writing tests much easier. This approach as well as some other tricks I&amp;#8217;ll post in the future have saved me a whole lot of time.&lt;/p&gt;
&lt;p&gt;Enjoy!&lt;/p&gt;
&lt;p&gt;If you would like to use GuiceyMongo, it is available to the public:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;information: &lt;a href="http://www.mattinsler.com/tag/guiceymongo/"&gt;Related Posts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;source: &lt;a href="http://bit.ly/guiceymongo"&gt;GuiceyMongo source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;jars: &lt;a href="http://bit.ly/guiceymongo-jars"&gt;GuiceyMongo jars&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;I plan on maintaining and enhancing this library regularly and would love to hear your input! If you use my library, please leave me a comment so I know that people are finding it useful. Thanks!&lt;/p&gt;</description><link>http://www.mattinsler.com/post/26548988833</link><guid>http://www.mattinsler.com/post/26548988833</guid><pubDate>Tue, 16 Mar 2010 12:30:00 -0700</pubDate><category>google guice</category><category>java</category><category>guiceymongo</category><category>mongodb</category></item></channel></rss>
