Project ORION — Agentic AI Architecture with llama-3 NIM on AWS
Overview
Project ORION extends its multi-agent orchestration engine to use NVIDIA NIM microservices running on Amazon EKS or Amazon SageMaker.
This deployment introduces two specialized NIMs:
- Reasoning NIM:
llama-3 1-nemotron-nano-8B-v1— the core reasoning and planning LLM. - Retrieval Embedding NIM:
nv-embed-qa-4Bor equivalent — for contextual memory and semantic search.
Together, they form the cognitive layer of ORION, allowing agents to reason, recall, and act autonomously across AWS infrastructure.
Inspiration
“What if an AI ecosystem could think, recall, and coordinate — not just execute?”
While traditional systems isolate APIs and models, ORION connects them into a co-operative network of intelligent agents.
By combining llama-3 NIM for reasoning and a Retrieval NIM for memory, we enable an environment where agents learn, plan, and execute in sync.
What We Built
Agentic ORION on AWS is an end-to-end reasoning-and-retrieval framework capable of:
- Deploying llama-3 1-nemotron-nano-8B-v1 as a GPU-backed NIM for structured reasoning and workflow planning.
- Running a Retrieval Embedding NIM to store and search context embeddings from agents and domain data.
- Integrating Kernel (Spring Boot) with Redis and MySQL to coordinate reasoning requests and agent responses.
- Visualizing multi-agent flows in real time through an SSE-enabled dashboard hosted on S3 + CloudFront.
- Ensuring secure inter-agent communication via VPC, IAM roles, and Security Groups.
Conceptually:
Reasoning NIM = Brain | Retrieval NIM = Memory | Kernel = Nervous System | Agents = Muscles
Live Demos
| Application | URL | Credentials |
|---|---|---|
| Agentic-AI-UI | http://ec2-13-233-77-128.ap-south-1.compute.amazonaws.com:5173 | viewer / jenkins123 |
| MSME-UI | http://ec2-13-233-77-128.ap-south-1.compute.amazonaws.com:5174 | viewer / jenkins123 |
| Jenkins-UI | http://ec2-13-233-77-128.ap-south-1.compute.amazonaws.com:5175 | viewer / jenkins123 |
These live dashboards visualize real-time orchestration, multi-agent activity, and SSE event logs from the Kernel.
How We Built It
- Backend: Java 21 + Spring Boot Kernel for event routing and asynchronous orchestration.
- Reasoning:
llama-3 1-nemotron-nano-8B-v1 NIMfor planning, delegation, and multi-step reasoning. - Retrieval:
nv-embed-qa-4B NIMfor embedding generation and contextual retrieval via FAISS or Redis Vector. - Agents: Independent microservices (Invoice, Jenkins, Payment, Ledger) registered through Eureka.
- Persistence: MySQL (RDS) for session state and logs; Redis (Elasticache) for async message queues.
- UI: Real-time SSE dashboard served from S3 and CloudFront.
AWS Architecture Diagram

Figure 1: Project ORION — Agentic AI Architecture on AWS with llama-3 Reasoning and Retrieval NIMs
System Architecture
- Frontend (UI): S3 + CloudFront hosting static assets and handling inbound SSE/API calls.
- Core Services: Kernel (Spring Boot) + Eureka for discovery, Redis for state cache, MySQL for persistence.
- AI Orchestration Layer:
- Reasoning NIM (llama-3) — plans and delegates multi-agent tasks.
- Retrieval Embedding NIM — returns context vectors for reasoning grounding.
- Agents — Invoice, Jenkins, Payment microservices executing tools.
- Reasoning NIM (llama-3) — plans and delegates multi-agent tasks.
- AWS Infra: VPC, Security Groups, IAM Roles, CloudWatch, Elastic Load Balancer.
Data Flow
- User sends intent through the UI → ELB → Kernel.
- Kernel queries Retrieval NIM for contextual embeddings.
- Kernel invokes Reasoning NIM (llama-3) to create a JSON plan.
- Kernel delegates steps to Agents via Eureka service registry.
- Agents execute and return status/feedback.
- Kernel stores logs to Redis/MySQL and streams updates to UI through SSE.
- CloudWatch monitors GPU metrics and health status.
Example Reasoning Plan
{
"id": "plan-uuid",
"steps": [
{
"stepId": "s1",
"agent": "InvoiceAgent",
"tool": "Create_Invoice_Tool",
"inputs": { "vendor": "Zenith Retailers", "amount": 45000 },
"awaitHuman": false
},
{
"stepId": "s2",
"agent": "NotificationAgent",
"tool": "Send_Notification_Tool",
"inputs": {
"vendor": "Zenith Retailers",
"message": "Invoice INV-1042 created successfully."
},
"awaitHuman": false
}
],
"status": "planned"
}
Observability and Security
- CloudWatch: Collect logs, metrics, and GPU utilization.
- IAM Roles: Grant Bedrock, S3, and RDS permissions.
- Security Groups & VPC: Enforce intra-cluster isolation and secure communication.
- Elastic Load Balancer (ELB): Route SSE and REST traffic to the Kernel.
- Redis + MySQL: Store asynchronous state and workflow history for reliable orchestration recovery.
Outcome
With ORION on AWS NIM infrastructure, any organization can deploy an agentic ecosystem where reasoning, retrieval, and action occur seamlessly within a secure, scalable cloud environment.
“You don’t just automate workflows anymore — you build self-thinking systems that reason, recall, and adapt.”
Log in or sign up for Devpost to join the conversation.