Redshift Overlord Chatbot

Inspiration

Many 47 Lining customers have a fleet of Redshift clusters and Data Pipeline analytics jobs, but they want to minimize the expense and maximize the utility without great effort or a large staff to manage the infrastructure. Some clusters are utilized 24/7 but many are have part time utilization patterns such as dev and test clusters. Significant cost savings can be achieved by managing the uptime of these clusters.

What it does

Redshift Overlord (ROVL) manages the Redshift clusters by accumulating votes from the team. If someone votes that the cluster be available, it starts the cluster if it is down. There are also automated votes to bring up the cluster at the start of the work day and shut it down at the ends - presuming there are no votes to keep it running. It can also receive notices from customer-account hosted SNS topics should CloudWatch events or alarms trigger.

How I built it

We used the serverless framework to manage several projects consisting of many Python files to implement a handful of AWS Lambda functions constituting the chatbot portion of the Overlord. We also implemented a number of Lambda functions in NodeJS which trigger on a schedule to manage the status of the cluster as well as gather custom metrics from CloudWatch.

Challenges I ran into

The Redshift clusters run in the customers' accounts. But the Overlord will be a 47Lining managed service. Creating the proper IAM roles and permissions for cross-account access is difficult.