Inspiration

Managing observability in multi-cloud environments is a critical challenge for businesses deploying applications across platforms like Azure, AWS, and GCP. Inspired by the growing demand for streamlined log monitoring and AI-driven insights, we envisioned a solution that simplifies observability, reduces complexity, and delivers actionable insights. Leveraging advanced AI models and APIs from SambaNova Cloud, we aimed to revolutionize how teams monitor, analyze, and optimize their systems, ensuring enhanced reliability and efficiency.

What it does

User Query Input: Users submit natural language queries through the Streamlit UI to gain observability insights about three applications powered by SambaNova Cloud models. The system processes these queries in real-time.

  1. Meddy (Azure)
  2. LegalExpert (AWS)
  3. Flash (GCP)

Autonomous DSL Query Construction: Agentic AI (Meta-Llama-3.1-405B-Instruct) interprets the user’s query and converts it into a tailored Domain-Specific Language (DSL) query. This ensures the query aligns with the user's intent, focusing on areas like application insights, severity, performance, or error trends.

Dynamic Log Retrieval from AWS OpenSearch: The system retrieves relevant logs from AWS OpenSearch using the DSL query, seamlessly identifying the appropriate index based on the application’s hosting environment.

AI-Powered Analysis and Insights: Agentic AI (Meta-Llama-3.2-1B-Instruct) analyzes the logs, providing a high-level summary, identifying anomalies, patterns, extracting actionable insights and making recommendations in response to the query.

Visual Insights: Insights are presented on the Streamlit UI in an intuitive and user-friendly format, enabling real-time decision-making and observability.

How we built it

  • Support for Multi-Cloud Applications: Meddy (Azure), LegalExpert (AWS), and Flash (GCP) are seamlessly integrated into a unified observability system. We modified three applications to send logs to a centralized platform, enabling seamless log aggregation. The system is designed to support any number of additional applications with minimal configuration.
  • Multiple AI Models: We utilized state-of-the-art AI models and APIs provided by SambaNova Cloud to power key features like DSL query generation, log analysis, and insights generation.
    1. Meta-Llama-3.1-405B-Instruct: Handles autonomous DSL query construction.
    2. Meta-Llama-3.2-1B-Instruct: Provides advanced log analysis, anomaly detection, and actionable insights.
  • Log Aggregation: Centralized logs from Azure, AWS, and GCP using AWS OpenSearch.
  • UI: Streamlit provides an intuitive interface for users to input natural language queries and visualize results.
  • Deployment: Deployed on AWS for scalability and multi-cloud integration.
  • Automation: Fully automated workflows streamline query generation, log retrieval, and insight generation.

Alt text

Reusability

This application is highly reusable and can be adapted to monitor multiple applications across various environments, leveraging its AI-driven capabilities to detect bottlenecks effectively.

X – Factors

  • Support for Multi-Cloud Applications: Meddy (Azure), LegalExpert (AWS), and Flash (GCP) are seamlessly integrated into a unified observability system.

  • LLM-Powered Agentic AI Workflow: Leverages multi-models with Meta-Llama-3.1-405B for Autonomous Decision Making and Meta-Llama-3.2-1B for Proactive Insights Generation of summarization, anomaly detection, and optimization recommendations that were not explicitly requested by the user.

  • End-to-End Automation: From translating the query to fetching data and generating insights, the system handles all tasks autonomously, reducing the need for human intervention and optimizing operational efficiency.

  • Adding and integrating new apps/log files is as simple as integrating them into the OpenSearch platform and by default, our app starts to Observe it.

Challenges we ran into

  • Multi-Cloud Connectivity: Ensuring seamless integration across Azure, AWS, and GCP while maintaining data security and efficiency.
  • Optimizing SambaNova AI Models: Leveraging SambaNova Cloud APIs to build accurate and responsive Agentic AI workflows required extensive fine-tuning and optimization.
  • Scalability: Designing a system that scales effortlessly to support additional applications and log files.
  • Real-Time Insights: Achieving rapid analysis without compromising the quality of insights.

Accomplishments that we're proud of

  • Successfully integrated SambaNova Cloud AI models to deliver accurate and actionable insights.
  • Unified observability for three multi-cloud applications with centralized log aggregation.
  • Automated end-to-end workflows, from query interpretation to log analysis and visualization.
  • Designed an intuitive, user-friendly UI that simplifies complex observability tasks.
  • Built a system that seamlessly scales to accommodate new applications and logs with minimal setup.

What we learned

  • SambaNova AI Capabilities: Explored the potential of SambaNova Cloud's AI models and APIs, gaining insights into their application in observability workflows.
  • Multi-Cloud Dynamics: Enhanced our understanding of integrating and managing resources across Azure, AWS, and GCP.
  • User-Centric Design: Recognized the value of accessibility and simplicity in delivering advanced features to users.
  • Automation and Efficiency: Learned how to optimize end-to-end processes for maximum operational efficiency.

What's next for ObserveAI

  • Advanced Recommendations: Extend capabilities to provide AI-driven fixes and optimizations for detected issues.
  • Continuous Monitoring: Implement real-time log monitoring with proactive alerting systems for critical events.
  • Flexible Log Uploads: Enable users to upload logs from any application through Observe AI, leveraging SambaNova Cloud APIs for quick analysis.
  • Self-Healing Systems: Develop an automated resolution system for recurring bottlenecks and performance issues.
  • Agent-Orchestrator Model: Introduce an orchestrator that dynamically allocates tasks to agents for optimized workflows.

Built With

Share this project:

Updates