Writings‎ > ‎

Data, energy, and Jujitsu

January 2013

DJ Patil, formerly of LinkedIn, recently wrote a great article1 on the development of 'data products', loosely defined as those which 'facilitate an end goal through the use of data'. The main theme is that such products should be approached in the same way that a Jujitsu fighter approaches a fight: by clever manipulation rather than brute force.

'In my experience' , says Patil, 'meeting the problem head-on is a recipe for disaster'. Instead, he continues, 'there’s a method to solving data problems that avoids the big, heavyweight solution, and instead, concentrates building something quickly and iterating'.

In parallel with the martial art, Patil refers to his methodology as 'Data Jujitsu'. To paraphrase brutally, a black-belt of the art is one who has demonstrated competence in handling the common foes of: 
  1. specification (what is it?)
  2. implementation (when can I have it?)
Naturally Patil exemplifies Data Jujitsu with references to social media and adjacent industries, but his points generalize easily to the Smart Grid.

Some elements of a good data product

Though by no means a complete definition, Patil provides some very interesting characterizations of a good data product: the thing has to be accurate, it has to be grounded in the real world of things, and it has to refrain from vomiting on its users. 


A data-product is one that translates raw data into insights or automated actions. Evidently inaccurate insights are useless and automated actions could be down-right dangerous if they are not highly informed. 

So cross-validations, RMSE, model-selection: it's all good stuff.

However, Patil warns that accuracy is not something that the Quants can define alone because it can often be subjective. Even the best algorithms in the world won't be able to account for the full variations of personal preference and so, from the outset, algorithms are doomed to under-perform or outright fail for some people. 

The Californian utility PG&E has learnt this lesson the hard way after rolling out their smart-meters before an uncommonly hot summer. The smart meters were repeatedly demonstrated to work, but the public remained discontent because their year-on-year bills had increased. And so, inexorably, PG&E found themselves mounting a PR campaign based on Thermodynamics 101. 

The costs of algorithmic failure in the Smart Grid are perhaps middling. Though there's probably a smaller risk of burning through $440 million per day than the algorithmic failures of quantitative finance, there's probably more at stake than simply recommending the wrong movie. Either way, there should always be a manual mode...

Grounded in the real world

Patil recommends that data-products are grounded in the real world, comparing Amazon's suggestion to 'browse similar items' to what you'd do naturally in a physical shop. The rational is clear: by drawing parallels with previous experiences your product becomes far more intuitive for the user.

To this end I predict a backlash against those products which attempt to quantify energy savings in terms of green leaves, smiley faces, or other such vagaries. These things are not real. The only real ways to talk about savings is kWh (for the professionals) or dollars (for the general public). 

Data vomit

Data vomit makes your head reel, and induces you to consider a simpler life. Though it might include chunks of value, data vomit is rarely of any use at all. If you've ever spent any time with a computational physicist you'll know about data vomit.

It's easy to confuse data for knowledge, but they are not the same...

It is explicitly the role of the data-product to assimilate, process, and filter data in order to generate high quality knowledge for the user. Don't vomit on your users.

Building the darn thing

The best product in the world isn't worth a dime until it's built, and unfortunately these things can be complicated. In the worst case scenario you end up with several PhDs who demand whole rooms of whiteboards and three months 'thinking time' before they write a single line of code (ahem).

Patil discusses some concepts that seem like pretty basic product management such as 'minimum-viable product' and opportunism. However, he also provides a couple of suggestions that are more specific to data-products. Both involve using humans.

Using mechanical turks

This is quite simple: instead of building a complex machine learning algorithm, just pay a human to do it. They'll probably do a pretty good job and Amazon's mechanical turks can complete tasks for just cents. Depending on how quickly you scale, this interim solution could well buy you months of development time and, as a bonus, you'll have a much better idea of the what and how by the time you unleash the PhDs.

Along these lines, Twitter uses humans to identify and categorize trending topics to determine, for example, that #bindersfullofwomen refers to politics rather than office accessories.

Work vs Time graphThis principal is also quite clearly manifest in the recent evolution of demand response and energy efficiency industries. Both cut their teeth in large commercial buildings before expanding into the residential sector. In the commercial buildings where bills are large and accounts few, an element of manual control is easily justified. Conversely, in the residential sector where individual bills are small and accounts many, complete automation is required (though even here Patil would probably contend that it's best to start with humans).

Using your users

The surest way to managing the complexity of machine-learning algorithms is to simply avoid them altogether Patil points out that algorithms can sometimes be avoided by simply asking your users for their input. For all the strengths of automation, it's worth considering that users might sometimes just know better and be happy to share.

Over the last few years energy management products have been offered that span the full spectrum of user involvement, ranging from those that simply 'empower' the user to save energy, to those that offer 'lights-out' automated savings. In reality, the sweet spot is probably somewhere in the middle, where users and the algorithms compliment each other in a symphony of efficiency. 

The Orb, the glowing energy meter - woe betide you if it turns red.

On a related note: for investors looking for the next bubble, might I suggest the intersection between algorithms and user-experience (AUX).



I'll allow Patil the concluding remarks:

If your product is successful, you will have plenty of time to play with complex machine learning algorithms, large computing clusters running in the cloud, and whatever you’d like. Data Jujitsu isn't the end of the road; it’s really just the beginning. But it’s the beginning that allows you to get to the next step.


  1. DJ Patil on Data Jujitsu: the art of turning data into product
  2. Greentech Media on Demand Response, Meet Customer Engagement
  3. Greentech Media on PGE and its discontents
  4. Thanks to 
    Joerg Rings for pointing out that Twitter's search stack included humans