Inspiration
For the past decade and plus, I have been working as an IT professional for the likes of Amazon, HP, and McAfee. I recently left my role as a Cloud Architect at McAfee Enterprise to focus on graduate school. In my various roles, I had to build and configure monitoring solutions of all sorts for application availability, performance monitoring, and compliance reporting. CertMon was originally conceived as a simple Python script when I had to find a quick solution to monitor expiring website certificates that were transferred in a corporate divestiture, without any owners or a Certificate Authority. That script along with my intrigue of remote sensors become the inspiration to build CertMon from the ground up as a Microsoft Teams Bot that anyone can use.
What CertMon does
CertMon monitors deployed TLS/SSL X.509 certificates for expiration, availability, and compliance. As a Microsoft Teams enterprise assistance Bot, CertMon actively connects to hosts and verifies the server certificate. The Bot is configured using natural language commands and it sends reminders when certificates are near expiration, become invalid or inaccessible. CertMon also provides a detailed report for use by compliance and operations. That is a detailed Microsoft Excel worksheet containing connection and certificate details.
How CertMon works
CertMon works as a Microsoft Teams Bot:
- Use Microsoft Teams to configure CertMon by adding Monitors, Schedules, and optional Self-Hosted Agents.
- Schedule Monitors for public hosts on CertMon Cloud Agents.
- Add Self-Hosted Agents and Schedule Monitors for private hosts.
- Turn on notifications for Microsoft Teams to be alerted by proactive messages from CertMon, with attached optional Microsoft Excel report.
How I built CertMon
I have had plenty of hands-on experience working with public and private cloud providers, with the exception of Microsoft Azure. For this project I took up the challenge of building and deploying on Azure. I have not used any Azure services previously but I got to building quickly thanks to endless Microsoft documentation and online community resources. Microsoft publishes open source examples and code snippets that helped me build CertMon with the latest and greatest features without any security compromise.
I started the build by designing my application architecture, putting together a Darw.io diagram depicting control plane and data plane components of CertMon. The control plane consists of a REST API built on Azure API Management and Azure Functions for serverless computing. For the data plane, I made scalability and high availability the design priorities, along with CertMon having capability to monitor private hosts. For those reasons, I choose Microsoft Web PubSub as the core streaming component of CertMon. Web PubSub allows CertMon to quickly dispatch scheduled jobs to available Agents on a secure WebSocket channel. Lastly, I wanted users to configure CertMon as quickly and efficiently as possible with the shortest commands. Azure Cosmos DB made that possible with the use of a simple data model and native Python client that allows for convenient SQL queries.
I designed and built the Microsoft Teams Bot using Bot Framework Composer, and testing with Bot Framework Emulator or at least until I had a working skeleton. Having access to Microsoft 365 Developer Program allowed me to test my app in Teams using an active production tenant. The Bot Framework Composer made it super simple to start building my Bot using templates available in C# and Node. I choose the C# template to challenge my self since I had not written any code in the language before and Node is still in preview. I found it simple enough to pick up and was able to write my function that handles proactive messages sent to the Bot.
I built CertMon Agent as the data plane client which users can deploy in their data centers, in addition to a cloud deployment for customers who do not have a need to monitor private hosts. I believe this is where Python as a programming language shines the brightest since it allowed for rapid prototyping and deployment of CertMon components, and still managed to be relatively responsive. Docker also made that possible, allowing users to deploy CertMon Agent in their clouds. To reduce complexity, Self-Hosted Agent and CertMon Cloud Agents are the same image. The Docker image is built from Google's Distroless container project, which allows for small (~37MB for CertMon Agent) and secure images by including only necessary application runtime dependencies in the container image. The downside with this deployment stack is that it requires previous experience/knowledge of Docker. Lastly, I decided to deploy CertMon Cloud Agents in AWS so to have a similar architecture as if it was deployed by a customer in their cloud.
CertMon architecture:
So if CertMon is monitoring my certificates, what is monitoring CertMon? I am glad you asked. CertMon is monitored using Azure Application Insights and AWS Route 53 along with CloudWatch. CertMon is mostly a serverless deployment on Azure, the key benefit of serverless is the lack of an operating system (OS) to manage. However, the downside with serverless is you loose visibility into your application that would be traditionally monitored by solutions running along side the app on the same OS. Azure Functions solves this by integrating with Application Insights for application log and metric ingestion. Each Function emits discrete telemetry metadata which are consumed within a few minutes and are available for display and querying. Application Insights also allows for log streaming directly from Functions as they run. This was instrumental for remote troubleshooting and debugging code while deployed in the cloud.
One of the other pitfalls of serverless computing is cold starts. This happens when a Function is not used frequently and goes into a paused or stopped state in the cloud provider. Starting from a paused/stopped state has additional overhead from the Function being pulled, started and being made available. This can be up to several seconds, which you can imagine is not good for a chat Bot; no one wants to wait for a slow Bot. To solve this issue, I built my Functions to handle remote health check requests sent from Route 53. These health checks not only monitor my APIs remotely but they also keep the Functions busy so they do not go into a paused/stopped state. CloudWatch is also used on the AWS side for alarms on custom metrics sent from CertMon Cloud and Route 53 health checkers.
CertMon monitoring infrastructure:
Challenges I faced
Authentication/Authorization
Integrating with Microsoft Azure Active Directory (AAD) to secure CertMon REST APIs and protect user data was my biggest challenge. I switched to use AAD B2C, after my initial proof of concept with AAD was not working. AAD B2C had its own challenges and complexities that I realized were not required and would take considerable effort to get done right. After some fastidious researching and testing, I reverted back to AAD. Once I had fully understood AAD identity concepts and Microsoft Graph API, I was able to get the configuration right to protect CertMon APIs in Azure API Management, and automate provisioning new users with their access. Once I solved user access then I had to secure CertMon Agent APIs. This requires each Agent getting a set of access credentials during provisioning. I also had to provide these credentials to the user creating the Agent in a secure channel. Sensitive credentials can not simply be shared as another message from the Bot as that would expose credentials in users' chat history. I secured this delivery of credentials by using Azure Blob Storage to store a CSV which users can download securely with temporary access credentials. Credentials are available for the user to download within five minutes after creation and the lifetime of the credentials is set to a maximum of six months.
Building marketplace landing page
I have scattered experience with web development, some HTML and CSS but minimal familiarity with platforms or scripting languages required to build a modern website. I found that I spent nearly half of this project time on building this website and the other half on CertMon Bot. That time was spent learning Angular, figuring out Typescript, creating illustrations, writing legal/support documents, and integrating with both Azure and CertMon APIs. I always found web development to be a bit frustrating due to features that may or may not be supported across all browsers and platforms. Though I managed to keep my sanity and build a decent site thanks to an Angular starter template from Microsoft Identity and an abundance of Angular/TypeScript resources online.
Linking user ID in Teams to user subscription
I hit a block when I had to figure out how to enforce SaaS licensing in my newly built Teams Bot. Within Teams, the Bot has access to the unique global ID of a user, however, this same ID does not exist in Graph or other sources that I could find. Which made it impossible to link the user ID, which I was using to identify each user, to a user's email address during sign up flow. I had to think outside of Teams to solve this one. My solution was to rely on the JWT access token instead of the unique Teams user ID sent by the Bot to my APIs. This access token has an email claim which is used to verify licensing once the token has been validated.
Sending notifications to users
The Bot is written using C#, which was another first for me. CertMon simply would not be if it could not send notifications to users. This required me to get an understanding of C# and .NET to be able to write a function for incoming messages to my Bot. The starter template provided by Bot Framework Composer was helpful for a quick start but it lacked this specific feature support as it requires unique functionality. Adding this functionality to the project in Bot Framework Composer was straightforward since I had to simply receive and parse JSON requests then hand off to Bot Framework to present to user.
Accomplishments that I am proud of
I am proud of learning an entirely new cloud provider, several Azure services, getting some experience with C#, and especially building an Angular site. I am sure this project could have benefited from having additional engineering help though I am proud of what I accomplished by myself, from learning new concepts and services, to immediately getting busy on building and finishing CertMon in such a short time.
What I have learned
I learned several new technologies that I had previously not worked with, getting me closer to my goal of being a full stack engineer. I also learned how to register my first company and become a small business owner, joining Microsoft Partner Network as an Independent Software Vendor.
What's next for CertMon
In the near future, I would like to add additional planned features to CertMon that I had to de-prioritize over marketplace integration and landing page requirements. The features include:
- mTLS support
- Batch import/export
- Multi-User support
- Aggregated reporting
- Notifications settings
- Native Windows Agent
Long term, I would like to have a version of CertMon available in Microsoft government marketplaces. I believe CertMon should be part of any core monitoring and compliance solution as required by various federal and commercial information security standards, for example FedRAMP and PCI.
Built With
- .net
- amazon-cloudfront-cdn
- amazon-route-53
- amazon-ses
- amazon-web-services
- angular.js
- azure
- azure-active-directory
- azure-api-management
- azure-bot-services
- azure-cosmos-db
- azure-functions
- azure-web-pubsub
- bot-framework
- bot-framework-composer
- c#
- docker
- python
Log in or sign up for Devpost to join the conversation.