meet noa

Noa is a local LLM that is designed to run on my local machine. It is inspired by the same pop culture assistant every agent is inspired from, Dr John Watson. But for inventors. I personally wanted something that can handle everything in my life like dealing with people, notifications, ideation, maybe creation in future, flirting on my behalf while I focus on more important tasks, getting platinum on Sekiro. Another important point for me was to have it run locally and I did succeed on doing that, but with the exception of using an online DB, MongoDB, because too late in the project I realized that the local MongoDB server doesn't support vector searches. I am waiting for that. Let me bold that for all my quick glancing folks, The project is not fully local as I am using Vector Search in MongoDB which is not supported locally as far as I know. Also, due to the nature of project I found it quite difficult to find a way for general public to use as of yet. I have however included setup instructions in github. The project couldnt meet my vision in the given time. I was going for an ESP driven device in my pocket but more on that later.

what is noa's purpose for existing?

It takes 10000 hours to master a particular subject, that is already enough for a lifetime, considering other humanly obligations. But mastery alone doesn't equate to success. There are factors like marketing yourself, managing time efficiently, bouncing ideas etc. For people with money, a secretary and a team solves that problem but that luxury is not for everyone, say a broke college student like me. While focusing on planning and networking, inventors take a big time out of their life that could very well be used to create the next big things. Maybe someone was meant to create the cure to cancer only to be bogged down by menial daily tasks. I am building Noa to combat this very issue. For now, specific to my needs. As I am not surrounded by like-minded people I often find myself conversing with LLMs to bounce ideas and I already use digital tools for scheduling. Another pain point is to build something that truly is fine tuned for me. It should remember. It should also have the capability to control my iot gadgets and robots by running on my network.

how noa helps me

Noa is Jarvis in real life. I am well aware that might very well be the most tried experiment but with recent technology and research into memory and ai personality, along with the mcp boom, it might very well be possible right now. I would like to call attention to the fact that the project is not completed as of yet and some of the features I wanted to implement are not available. What we do have is a solid foundation that will provide a rigid base for long term development of this project.

building blocks of noa

Noa uses the following techologies to come to live:

  • Python as the main language
  • Llama3.2 as brain processing
  • Ollama for accessing said model
  • Mistral:instruct for memories generation
  • mini-lm-l6-v2 sentence transformer for embedding(384)
  • pymongo for interfacing with MongoDB for storing memories and vector searching
  • faster_whisper for transcribing sound in Ears
  • sounddevice for captioning said sound
  • openwakeword for finetuning a wake work detection model
  • webrtcvad for VAD support in Ears
  • kokoro for TTS in mouth
  • Python as main programming language
  • Basic ML stuff like pytorch, numpy, pandas
  • CUDA for making my life easier

how noa works

Noa is composed of the following parts

  • Wake Word Detection: The program constantly listens to the a low powered input for "noa" sound. This is not done by transcribing but by training a custom model. When it detects it alarms the ear to go into hearing mode
    • I am using openwakeword for that. Using their google colab project, I first made a model that responds to "noah". then I am predicting it on audio callback from stream. audio = np.frombuffer(indata, dtype=np.int16) prediction = self.model.predict(audio)
  • Ears: Noa is designed to balance speed with accuracy, resulting in a speech recognition system that fits perfectly for conversations. It has the capability to detect and parse clear audio even in relatively soundy places. Additionally it can also process pauses between sentences in an a way that doesnt break the flow of the conversation.

    • For speech to text I am using kokoro. For sound recording, sounddevice. and VAD for better results
    • The wake word detector triggers a functions here that initializes listening and it goes into listening mode.
    • it then counts the number of silence frames to:
      • transcribe and wait for more if its just a pause
      • end this line and pass it over if end of sentence
      • end speech if its end of conversation silence_threshold=20 stop_threshold= 40 escape_threshold= 80
  • Brain: This is the meat of thinking. Here data from the ears comes and then relevant memories are fetched which are then processed to an llm model through prompting. The model is initialized by a system prompt. There are 4 types of memories that are used in this model: working, episodic, semantic and procedural.

    • I use an Llamma model running locally though ollama. The initializing takes time because ofcourse it does but in my experiments the conversation thinking was fast enough to hold a conversation
    • The choice for model was because it was free and easily runnable locally which is one of the main goals of this project. The model is initialized with a system prompt that defines some of the initial functionality. In future I might give it the ability to change this prompt.
  • Memory Manager: This component provides interface for learning and retrival of various memories.

    • For generation of memory I used mistral:instruct. I tried using the Llamma model but it didnt work well with the intended json format output. I am producing embeddings using sentence transformer due to its lightweight nature.
    • This memory is stored in a MongoDB database, originally local then had to use online due to the vector search functionality. This is the reason for a bit more delay in speech then I would like.
    • For retrieval I am doing a vector search on the memories and adding the best matches to the prompt
    • "Cognitive Architectures for Language Agent" inspired this part. https://arxiv.org/abs/2309.02427
  • Mouth: Noa speaks like a human. Again a balancing is attempted between the processing speed and the speaking accuracy to sound as natural as possible. Other than that its simple local AI TTS.

what i learned

All the breakdowns imparted by this project hit my emotional state like a truck in an isekai anime. However what doesnt kill you makes you stronger. So here's what I learned during the development of this project in chronological order:

  • Importance of proper setup: the first problem with building with new technology was the constant conflict with packages and environment. Here I learned just how important and lovely a conda environment is. For this project, there were many clashes between packages that required conflicting versions of python. Solving that was quite frustrating.
  • Threads: Next big road block I hit was management of threads. When dealing with audio input streams it is important to take in account the thread status or things might stop working without even a warning signal or where to look for log. I learned this the hard way. If you have never worked with sd.InputStream remember that it spawns a new thread and use a function like this to make your life easier

    current  =  threading.current_thread()
    print(f"[{origin}] Running in thread: {current.name} (ID: {threading.get_ident()})")
    print(f" Daemon: {current.daemon} | Alive: {current.is_alive()}")
    
  • Working with Vecror Search: This was my first time working with this technology. Although I have worked previously with MongoDB in many projects like a college dating app I launched, this was my first time dealing with vector searches. Learning them is something that I think I will use for every project in future.

  • Important of System Design: Throughout the development cycle, I refactored the code a bunch of times because I wanted something that I can later expand upon comfortably. Paying mind to system design definetely helped

Future

  • Make it fully local meaning custom database
  • Have it the capability to program itself
  • Make it MCP compatible
  • Make a small controller device powered by ESP that can fit in pocket
  • Run it on network

Conclusion

Thanks for this! I dont know if I will win but this was start to something fun. Will mean the world to me if you share your opinion/advice

Built With

Share this project:

Updates