Use case version:
"As a warlord of Wayfair finance, I want to build an application that safely turns off tech infrastructure that's not in use, so that we can be as fast to market with new applications as the most overfunded startup in Silicon Valley, but not worry that we're going to blow all our cash on a 5000-node Hadoop cluster somebody left running over the weekend for no good reason."
They do something like this at Netflix, according to Adrian Cockcroft. The joke name for this project, and department, is "the Netflix Finance Thing". But it's called "Variable Cost Engineering" for realzies.
What it does
It monitors CPU (theoretically memory usage, network I/O, qps, whatever), and identifies lightly used infrastructure with a rough (super rough for now!) cost model, and suggests high-value targets for turning them off.
How I built it
We grabbed a lot of cost, usage and tech inventory data, put it all in a SQL Server data, made a WARP dashboard and an off-button screen, that for now basically knows how to turn off my dev box, because of course I almost never do an PHP development these days.
Challenges I ran into
A one-day cost model for virtual machines in our data center? Let's just say we're not going to be showing this version to Ernst and Young.
Accomplishments that I'm proud of
The collaboration by Finance Engineering, SRE, Emerging Systems, Distributed Systems and InfraNext
What I learned
We're ready to hire into this.
What's next for Bootstrap variable cost engineering
Make a list of "chaos-ready" applications, and enable the "turn it off" button for production infrastructure. After SRE says it's OK, of course.
On the WARP dashboard you need to click "allow unsecured connection", and the admin url is on Allen Tang's dev