Many 47 Lining customers have a fleet of Redshift clusters and Data Pipeline analytics jobs, but they want to minimize the expense and maximize the utility without great effort or a large staff to manage the infrastructure. Some clusters are utilized 24/7 but many are have part time utilization patterns such as dev and test clusters. Significant cost savings can be achieved by managing the uptime of these clusters.
What it does
Redshift Overlord (ROVL) manages the Redshift clusters by accumulating votes from the team. If someone votes that the cluster be available, it starts the cluster if it is down. There are also automated votes to bring up the cluster at the start of the work day and shut it down at the ends - presuming there are no votes to keep it running. It can also receive notices from customer-account hosted SNS topics should CloudWatch events or alarms trigger.
How I built it
We used the serverless framework to manage several projects consisting of many Python files to implement a handful of AWS Lambda functions constituting the chatbot portion of the Overlord. We also implemented a number of Lambda functions in NodeJS which trigger on a schedule to manage the status of the cluster as well as gather custom metrics from CloudWatch.
Challenges I ran into
The Redshift clusters run in the customers' accounts. But the Overlord will be a 47Lining managed service. Creating the proper IAM roles and permissions for cross-account access is difficult.
Accomplishments that I'm proud of
There are many unit tests as well as integration tests to facilitate ground-up testing of the code.
What I learned
Slack has good integration points providing a nice interface for ChatOps-type functionality.
What's next for Redshift Overlord
Team 47Lining will be pitching ROVL to our customers for further development and continued integration into their business processes.