Inspiration
Full text search (FTS) in MongoDB is good enough until you have to search in collection that is big.
I learned it hard way, when my client product started to experience bad performance on database with ~1m records. Instead of moving to other products like elastic or sphinxsearch - I decided to give a look to FTS codebase.
Lucky me, mongodb hackathon started about the same time when I was looking to the code, so I decided to join it and spread a words about my project.
What it does
My code changes focused on performance optimization for FTS queries. It is still experimental and it is not perfect. There is still many tasks to do. You can check the roadmap At the same time, current version already shows significant performance boost and decreased memory footprint.
How I built it
I started with code review to understand how it works at first place. When I get confident with the code source and how it works, I started to change a code. At first I changed TEXT_OR stage. Then I added TEXT_AND, TEXT_NIN stages to perform operations on index even before data get into TEXT_MATCH
Challenges I ran into
The biggest challenge that I faced was an algorithm to advance (pass to up stage) a document record that has guarantee top score.
This is very important to speedup a limit based queries. We don't need to scan all indexes and docs to get only 10 or 1000 top records sorted by score.
Accomplishments that I'm proud of
I would prefer word "joy". I felt very happy when my codechanges finally started to show performance differences and perform without side effects.
Another happy moment happens, when I deployed new version to client server and we saw changes to performance right away.
Memory usage dropped at least 30% and queries time reduced from 14s to less than a second.
It was true moment of happiness.
What I learned
I become familiar with boost::multi_index and how I can use it to perform very fast ordering in memory to achieve high efficient sorting, find and update.
What's next for mongodb.codes
As i said, I started to make the roadmap Mongodb 3.6 is getting old and porting code forward to 4.x is important. Another interesting part is having multiple text indexes per collection. Also using score sorted index for fast "not score based" requests is inefficient. I want to have index that have document id sort mode. It will help to boost FTS queries where phrases and limit used together.
My final goal is performance that can be competitive with elastic search. Let's see if I get enough support to get there!
About testing
You can download latest release from mongodb.codes and test it on your own database. Due to index compatibility, only mongod binary replacement is enough to test on your collection. As long as you run 3.6.11 version.
I am still working on updates, so there is will be more releases before deadline and after.
I setup custom server to make it easy to test a difference in performance. Also I created a repo with test code that connect to this server and run queries. You can find code here You will find all instructions in readme file of this repo.
Thank you for your attention. Best Regards, Gor Martsen

Log in or sign up for Devpost to join the conversation.