Inspiration

Most of developer community should be familiar with the cycle - Failed deployments→ Cryptic logs → debugging →tons of Wasted potential. It's a painful cycle every developer knows. All this time could be spent shipping or building software! What if we have a software that could not only break this cycle, but make our systems smart and with self-healing capabilities to fix themselves? This was our source of inspiration. Git Lab, with its unparalleled customization, developer-first features, a thriving supportive community, and Google with it's State-of-the-art AI models and scalable on-demand Infrastructure we just have the right setup to turn this idea into reality.

What it does

Glyph is a suite of intelligent CI/CD components designed to supercharge your development lifecycle. This project is two pronged -

  1. gylph, publicly available here in the CI/CD catalog, is the the AI Analyst and
  2. velocity, the interactive dashboard dependent on glyph.

With a combination of these both prongs, Glyph is both: proactive - solving bugs before they come up and reactive - fixing them if they do occur. And with the help of Google's Gemini models, and Cloud Run functions it processes all this in a few minutes, allowing you to ship at an incredibly rate.

Proactive glyph features:

  1. GitLab CI/CD config &
  2. IaC configurations

Glyph using AI to find issues, suggest optimizations, and ensure best practices before problematic configurations ever reach production.

Reactive glyph features:

  1. Incident remediation

Glyph automates incident remediation. All open incidents will be read, processed and suggested a fix for. This facilitating closure of incidents after minutes from their birth.

The gylph AI analyst component is designed to feed rich, contextual data directly into the Velocity dashboard , transforming raw pipeline events( we all hate logs, dont we?) into actionable history and insights. NOTE: Velocity dashboard is not needed to use glyph. It is there if you want a clean display of logs, insights and fixes if you prefer, but not compulsory at all

Introduction

To the uninitiated, a page of glyphs is a complex and unreadable mystery. Modern CI/CD configurations and pipeline logs are today's glyphs. They are a dense, symbolic language that can be difficult to decipher. _ So, ironically we built Glyph, a suite of tools, designed to be your expert translator, decoding the complexities of your development lifecycle into clear, actionable wisdom. _

Features of glyph

  • AI-Powered CI/CD Analysis: Sends your CI configuration file, to a powerful AI backend to identify potential errors(incorrect syntax, etc.), security vulnerabilities(exposed secrets, etc.), and performance optimizations(caching, etc.). Raises a MR automatically* with fixed code within few mins.
  • AI-Powered IaC Reviewer: Sends your IaC terraform config to a powerful AI backend to identify cost implications(unneeded high CPU), potential errors(incorrect syntax, etc.), security vulnerabilities(public bucket, etc.), and overall configuration analysis. Raises a MR automatically* with fixed code within few mins.
  • Incident Remediation: When pipeline is run, automatically scans all open incidents via REST API. Uses labels to parse the type of incidents, performs predetermined actions to resolve the incident within minutes.
  • Dashboard-Ready Data Logging: Enriches every scan with comprehensive GitLab metadata (project, pipeline, commit, user info) and logs it to Google's Firestore database, ready for visualization in your velocity dashboard.
  • Automated Incident creation and Failure Analysis: Includes a built-in after_script that automatically detects if the the job fails, and automatically creates an incident with the detailed failure report.
  • Effortless Integration: Add the component to any GitLab project with just a few lines of YAML.
  • Flexible Configuration: Easily specify a custom path to your CI/CD configuration file or IaC file if you don't use the standard .gitlab-ci.yml or main.tf
  • Lightweight and Secure: Runs in a minimal Alpine Linux image and uses protected, masked CI/CD variables for your service URLs. This ensures very fast and air-tight stage runs.

How we built it

This project is built upon a ton of services from Google Cloud, GitLab APIs and others. Including these:

[AI and Data]

  • Google Vertex AI - This is the home of all AI models with which IaC, CI and incident remediation analysis engines are built
  • Google FireStore - This is the storage we used for this entire project. a fully managed, scalable NoSQL where we store all logs, analysis reports, encrypted OAuth tokens, etc.

[Backend APIs and Logic]

  • Google Cloud Functions - This is where the backend and logic is hosted at, chosen for it's scalability and cost-efficiency in no traffic conditions
  • Google API Gateway - This is used to tie up all cloud functions into a singular API, where rate limits and timeouts are set.
  • GitLab APIs - These APIs are used for lot of features like OAuth, creation of MRs, reading of open incidents and their labels, etc.

[User Facing Services]

  • Google Compute Engine - This is used to deploy our frontend - velocity dashboard.
  • Google Load Balancer - This is tied to unmanaged group of compute engines and configured to distribute load among them. Also used to create a Google managed SSL certificate

[Bonus]

  • Google Cloud DNS - All DNS lookups to the domain are configured from here

The Architectural diagram consisting connecting all the above services is here: Google Architecture and Workflow

glyph CI/CD component is written in plain GitLab's yaml syntax under /templates folder. The dashboard, velocity, is built with:

  • python,
  • nextJS,
  • tailwind &
  • noSQL database.

Challenges we ran into

  • Familiarity with GitLab: Since GitLab is new to us both, we took some time to figure out what is what. Going through documentations, tutorials and API references, it all seemed a bit complicated at first. But due to very easy on boarding steps and other guidance by hackathon organizer's in discord, we were able to get up to speed eventually.

  • AI Analysis: It's not easy as sending the text and asking it to fix. There are hallucinations from AI, response in wrong structure , lack of grounding, temperature and lot of other parameters to tweak to get optimal response. And when you figure this out will come the problem of the time it takes to get a response vs the cost. With limited resources and time, getting the right balance between Cost and Accuracy was a very tough job. We had to do a lot of testing with different models, various complexities of configs and incidents to land where we are now.

  • Prompts & models: Like I said previously, we can throw everything at large models since they do great analysis but they take more time, which is not good in a CI/CD project aiming to speed up shipping. So, to get this project working with smaller models with good accuracy, we had to prompt engineer really really well. Think prompt engineering techniques like roles, few-shot, decomposition, self-criticism, ensemble, reasoning write-out etc.

  • Timeout issue in pipelines: I was puzzled when i first saw a timeout issue being returned from analysis API. Was it from GitLab runner's side or Google API gateway side, i was unsure. Took a long time reading logs and docs to figure out that, it was due to cold starts of cloud functions plus model taking long for analysis.

  • Frontend design: The analysis we are doing is a lot, really. So there is a lot of text to display. Figuring out a way to meaningfully put this text and not be overwhelming to the user was hard. And this being our first project using NextJS didn't help. But we pulled through and now have a beautiful frontend.

  • Different types of tokens: Actually this might just skill issue, but we took a lot of time to figure out that inbuilt $CI_JOB_TOKEN cannot be used on /issues endpoint. That piece of information was buried deep under tons of documentation, which delayed the project more than it should have.

Accomplishments that we're proud of

  • We built a complete, full-stack platform, not just a script: From the ground up, we designed and deployed a comprehensive application featuring an interactive nextJS frontend on Google cloud, a scalable serverless backend in python and a persistent Firestore database, demonstrating a wide range of technical capabilities.

  • We tackled one of the hackathon's most ambitious prompts: a self-healing system. We designed and completed a working prototype of a visionary AIOps feature that analyzes production incidents from GitLab Issues and uses AI to propose the specific infrastructure-as-code change that would have prevented the failure, creating a true learning loop.

  • End-to-end "One-Click Fix" workflow: We didn't just stop at finding problems. We engineered a system that takes an AI-generated suggestion and uses the GitLab GraphQL API to automatically create a branch, commit the code change, and open a Merge Request, turning an insight into an actionable solution in seconds.

What we learned

  • Learned new pieces of technology yet still managing to deliver exactly what we had in mind before we started, there were many obstacles in way - but we managed to deliver an entire e2e working product!!
  • Improved knowledge in DevSecOps: Before doing this hackathon, i had very little idea about DevSecOps. I just used to know that it is done to deploy software
  • AI Is Not an Answer Box; It's a System you engineer: Finally, our most profound technical takeaway was in the application of AI. We learned that the true power of a large language model is unlocked through sophisticated prompt engineering. Learnt really useful techniques when doing this project on prompt engineering which helped me get lost drop of reasoning from these smaller models.

What's next for Glyph

There is ton of potential here for glyph. It's to the point we are even surprised we built this. We will continue to develop glyph even after the hackathon, as a community member. If done right glyph has a potential that can be useful to so many people out there. It can predict and fix most bugs before they even occur, resolve them within minute if they do manage to sneak out. This will enable the world to build and ship software even faster! Future work for glyph:

  • Support and Integration of monitoring tools for automated incident creations
  • Add support to more IaC languages like AWS SAM
  • Improve Accuracy on config file identification front
  • Intelligent AI labeller, saves the time for user to assign labels

Hopefully, one day glyph evolves beyond human-in-the-loop approvals. We aim to create a system so intelligent and trusted that it can autonomously commit verifiable fixes, making GitLab a truly self-healing platform that builds, ships, and improves itself.

Built With

Share this project:

Updates