Meet CLIve

Hello world, meet CLIve, a serverless Slack bot for AWS. CLIve makes managing your AWS EC2 instances a doddle. He understands natural language, so you can just ask @clive can you stop nginx? and he’ll go ahead and do it. It’s that simple.

Capture

If one of your instances goes haywire, CLIve can give you a rundown on recent changes to that resource, or even on your entire account.

If you want to get creative, CLIve also supports custom notifications so you can integrate him into pretty much anything!

Go to https://clive.chat, click on the 'Add to Slack' button to add CLIve to your team.

Inspiration

The AWS CLI is awesome! I use it countless times a day, I can't recommend it enough. The problem? It's a CLI. For newcomers, there are the usual setup and configuration barriers, and then there's the new syntax to learn.

The CLI setup is not straightforward; there's the awkward permissions, user and access key creation dance to undertake. Then there's the configuration of the CLI itself with profiles and regions to navigate. Once complete, your CLI should be ready to run - but how do you start your IIS box? aws start IIS? Nope. More like,

instanceId=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=IIS" | jq -r .Reservations[].Instances[].InstanceId) && aws ec2 start-instances --instance-ids $instanceId; aws ec2 wait instance-running --instance-ids $instanceId; unset instanceId;

Yikes!

Whilst the CLI is a necessity for AWS specialists, there's a large number of development and operations teams who require just EC2 start, stop and reboot to get their jobs done; and what's more, they're already getting their work done in Slack. That's the sweet spot where CLIve can help.

You see, there's no CLI in CLIve. Under the hood he's interacting with the same set of AWS APIs, but he's taking care of the setup and configuration, and removing the need for any special syntax. So you can ask something like @clive please start IIS, and he'll get to it!

How does CLIve work?

CLIve is serverless. AWS API Gateway and Lambda are CLIve's engine room, as such he uses Slack's Event API to interact with users (rather than Slack RTM).

When you message CLIve, he tries to make sense of what you said by leveraging Wit.ai's natural language processing capabilities. CLIve has been trained to recognise AWS concepts, such as instance state, whilst using everyday vocabulary. For example, you can ask CLIve to start an instance in many different ways, e.g. "start", "boot up", "load up", etc.

Once CLIve has worked out what you want him to do, he begins interacting with AWS APIs. He does this by assuming a role and invoking AWS JavaScript SDK functions with Lambda. CLIve takes care of orchestrating the various API calls. For example to start IIS (just as with the CLI); first he'll describe the instances, then he'll issue a start command, and then he'll wait for the instance to report that it's running.

But how does CLIve help you get setup? To start, there's no need to install any software, just add CLIve to your Slack team. For every chat CLIve is added to, he creates a custom CloudFormation template (to setup the permissions and role he needs to do his thing). Run this template in your AWS account and you're 5 clicks away from being setup. The template also contains a custom resource to notify CLIve when you've run the template (so you don't have to copy/paste any CloudFormation outputs). He's not done yet though! Once he receives access to your account, CLIve checks each region for running instances to setup your default region automatically. That's it, not 3 pages of CLI setup, 5 clicks!

How I built CLIve

CLIve is a completely serverless bot. He leverages the following AWS services; Lambda, S3, CloudFront, DynamoDB, Route 53, CloudWatch, CloudFormation, CloudTrail, Config, IAM, Certificate Manager, Kinesis, SNS and API Gateway.

CLIve is built almost entirely from CloudFormation. The exception being Wit.ai.

CLIve was initially trained, by hand, using the Wit.ai web interface. Now that CLIve is integrated with Slack he learns from real user interactions.

The flow which is triggered when a user chats to CLIve on Slack is roughly as follows:

  1. Slack pushes an event to API Gateway
  2. API Gateway triggers a Lambda function (event put)
  3. The Lambda function (event put) puts a record onto a Kinesis stream
  4. Another Lambda function (event get) reads the record from Kinesis and triggers the main Lambda function (core)
  5. The Lambda function (core) gets operational data from DynamoDB and decrypts it with KMS
  6. The Lambda function (core) sends the message to Wit.ai for processing
  7. Wit.ai extracts any known entities from the message
  8. The Lambda function (core) divines the requested action from the parsed entities and triggers downstream Lambda functions (actions)
  9. When actions are complete the Lambda function (core) sends a message to Slack
  10. Slack pushes the message to the user

Event flow

The setup flow is similar to the above, although slightly more complex, with a Lambda function (authorize) to handle OAuth and additional resources including SNS to handle per channel configuration.

Challenges I ran into

Natural language processing isn't ready, yet. CLIve does pretty well responding to expected behaviour. However, humans are mean, and they try to trick him, constantly. Wit.ai, out of the box, doesn't know what to do with completely random messages (more on this below), crazy slang or worse... emojis (which even I, a human, can't translate most of the time).

Of course there were the usual technical challenges too. Kinesis doesn't appear to like cold starts, and doubles up on some messages. The dreaded API rate limiting of course caused more than its fair share of problems - I mean one lookup a second, when the lookup only supports one event per call (despite being called LookupEvent*s*), argh!

The rate of lookup requests is limited to one per second per account. If this limit is exceeded, a throttling error occurs.

AWS CloudTrail LookupEvents

API Gateway doesn't (at the time of writing) support Certificate Manager, so I had to spend $$$ on a third party certificate... In truth, these are minor annoyances; it's pretty amazing how much CLIve can do, given he utilises no EC2 at all and costs the equivalent of a cup of coffee to run!

What I learned

People want to date my bot! Sure, the majority of Slackers use CLIve responsibly, but a significant percentage try to chat him up! It goes without saying that CLIve doesn't know how to respond to such advances. Similarly, users often send him gobbledygook, ask him to book a taxi or some other crazy request. Whilst natural language technology definitely has some way to go, it appears that humans also have a bit of growing up to do!

What's next for CLIve, a serverless Slack bot for AWS

There's still some tidying up to do! He could use a little bit of error handling TLC and refactoring (especially of the core function). There's also a list of feature requests I'm steadily working through. If you'd like to submit a bug report, or feature request, please do so here, https://clive.zendesk.com/hc/en-us/requests/new.

Built With

Share this project:
×

Updates