Inspiration
I like to use ChatGPT to ask coding questions because I can use natural language. So I tried using it for questions about companies I invest in, or deciding if I should invest in. But I found ChatGPT doesn't have the latest info about companies because of its knowledge cutoff date, which is Oct 2023. I noticed they did add Internet search recently to the free plan, but only while you're using GPT-4o, which is used up fast. Then you're stuck with GPT-4o mini for 3 or more hours, just getting old answers from last October, which you'd still have to double-check anyway, as they warn.
So I thought wouldn't it be really helpful if I built an AI chatbot that uses the latest financial and corporate data available for all companies that trade on US stock exchanges, which is over 10,000. This chatbot, which I named Finsight, would answer your questions based on current year info, and the info would be trustworthy and reliable, since the bot would get it from the financial statements required by law to be filed by companies with the US Securities and Exchange Commission (SEC).
Of course, you can access these financial statements from SEC directly, at https://www.sec.gov/search-filings. But you need to find the statements you're interested in, then read or search their contents using keywords only to find the answers you're looking for. Why not use Finsight where you can ask questions in natural language in a chat interface? You can't do that on the SEC website, you can't do that on Google. Only in Finsight, it's unique in using official financial statements.
What it does
Finsight is an AI chatbot that gives you a dropdown list of all 10,000 plus companies that trade on US stock exchanges. You can type ahead in the dropdown box or select a company directly from the list. Then simply ask questions about that company and click the "Ask Question" button.
Finsight stores current year financial statements from SEC in a vector database, and if it finds the statements for the company you chose, it will use RAG (Retrieval Augmented Generation) with OpenAI to generate your answer from the financial statements (forms 10-K, 10-Q, 8-K for US firms; 20-F, 6-K for non-US). If the company's statements aren't already in the database, they are retrieved from SEC in real-time, processed for ML, and inserted into the database. This process can take a few minutes, so Finsight will inform you about that, and you need to try asking your question again later (just click "Ask Question" again).
In this month of September I already loaded the financial statements for all 503 stocks of the S&P 500 (https://en.wikipedia.org/wiki/List_of_S%26P_500_companies) into the vector database, taking up almost 5 GB for the text chunks and corresponding vectors. But Finsight supports over 10,000 companies and there are many interesting companies not in the S&P 500. So you will likely encounter a company where you need to wait for the chatbot to load its statements. Please be patient then, you're saving the next user interested in that company from waiting :)
How I built it
I built Finsight in Python, my current favorite language, using Gradio for the UI, Beautiful Soup for parsing the financial statement data retrieved from SEC's EDGAR database and API, NLTK for chunking the parsed text, OpenAI for embedding the text chunks into vectors, and TiDB Vector for storing the chunks and vectors with metadata. For answering questions, I use RAG with LangChain using metadata to query the vectors from TiDB, passing the closest matched vectors to OpenAI for natural language processing.
I also use FastAPI to implement an internal heartbeat endpoint to keep the app (chatbot) running and not timing out while a company's financial statements are being retrieved, processed, and loaded to the database. To avoid excessive memory usage, I use a queue and a daemon worker thread for loading the statements. The loading code is pretty robust as proven by my successful loading which I mentioned of the statements for the S&P 500 companies to date for 2024, almost 5 GB worth of data stored in the vector database for your use.
What's next for Finsight
In v2 I will enhance Finsight to automatically update the financial statements stored in its vector database if new ones are filed with the SEC. Currently after a company's statements are added by the code, if you want to add a new statement, you need to manually delete that company's statements from the database, and then the next time that company is selected, the code will add all its statements for the current year. With the new feature, I'll add a scheduled job that checks for new statements and adds them automatically, as well as an admin page where the admin can request an update for selected companies (like all S&P 500 companies) on demand.
Built With
- beautiful-soup
- edgar-api
- fastapi
- gradio
- langchain
- nltk
- openai
- tidb-vector

Log in or sign up for Devpost to join the conversation.