Inspiration

Cloud AI APIs like OpenAI include critical extras — chat formatting, function calling, embeddings, and tools — that local models lack. We wanted to bridge that gap and empower local-first AI development.

What it does

LocalAI+ is an OpenAI-compatible API wrapper for local LLMs. It adds chat formatting, function calling, embeddings, secure tool use, and RAG — all locally, with zero cloud dependencies.

How we built it

We used Python and FastAPI to build the backend, Ollama to serve local LLMs, Qdrant for vector search, and added a plugin system for tools and function calling. All wrapped in an OpenAI-style interface with full API docs.

Challenges we ran into

  • Making function calling robust and schema-compliant
  • Handling edge cases in local model output
  • Safely executing code and tools in a sandboxed environment
  • Designing a clean, modular plugin architecture that works out of the box

Accomplishments that we're proud of

  • Fully OpenAI-compatible local API
  • Function calling and embedding support with zero cloud
  • Secure sandboxed code execution
  • Plug-and-play architecture for adding new tools

What we learned

  • Local models are powerful but need orchestration to be useful
  • Developers want open, local-first APIs — but simplicity is critical
  • Rebuilding cloud-level infra locally is hard, but incredibly rewarding

What's next for LocalAI+

  • Agent memory and threading
  • API key-based auth
  • Web-based dev playground
  • Prebuilt tool library for plug-and-play LLM apps
  • Model backend switching and load balancing

Built With

  • docker
  • fastapi
  • llama
  • nomic-embed-text
  • ollama
  • openapi
  • pyodide
  • python
  • qdrant
Share this project:

Updates