Insight - a custom business intelligence platform that discovers new and existing technologies as they progress through the market. Insight serves as an end-to-end solution for market discovery, quantitative analysis, predictive modeling, and expansion tracking.
What it does
Technology evolves rapidly. While many are struggling to keep up, others are looking for high yield investments in a fast-paced and dynamic market. We built Insight to not only identify promising new technologies, but also to filter and classify those that show the highest likelihood of being commercially successful in the future.
The platform consists of a data ingestion engine, an analytics and prediction layer, and a data visualization layer. For the ingestion engine , we dynamically and programmatically crawl and retrieve data from sites such as The New York Times, The Guardian, Google News, NASDAQ, Crunchbase, CBInsights, and Twitter for news, finance, and social media data to ensure a holistic and comprehensive model. We pass the information through different machine learning and mathematical models to grab sentiment for text articles and obtain the structure of the domain space. We then compute a singular “commercial relevancy score” along with time-series tracking of our factors as a powerful heuristic metric indicative of future growth. This data is aggregated and displayed on our dashboard.
Our goal with Insight was to generate actionable insights at all stages of the technology pipeline, from idea conception to long term maturity. The platform in its current state already allows us to create a watchlist for infant technologies that show potential in the future and then track these technologies through the early investment stages.
How we built it
In order to keep track of new and existing technologies, the first step was to gather data from various sources surrounding these technologies. These include news, social media, information about startups, and whether or not large corporations adopt a certain technology. We gathered data around a these specific areas for the industry space surrounding artificial intelligence in order to create a model to track their growth.
We used data around news and social media to see how much these topics were being talked about as well as the general sentiment of the public in order to get an idea of the demand of the market for certain technologies. Keeping this in mind, we additionally looked at startups using these technologies to ensure that the technology was not based solely around hype, and actually show commercial viability and business value. By taking the union of these two different data streams we were able to build a more robust model that took into account abstract features related to social opinion and public discourse as well as concrete financial data related to investment, adoption, and market share.
We trained our model using the data we gathered in order to assign weights to the features of our model. Our model assigns a commercial relevancy score based on the various features (gathered from sources we listed above), such that after training, we can validate against a new technology and get its estimated commercial relevancy score.
To ensure that our model is correct, we decided to test it against a few technologies that are currently trending and are attracting a lot of investment. The two we choose were cryptocurrencies and blockchain. Our system returned a relevancy score of 40.13 for blockchain and 14.0 for cryptocurrencies. Although both had many articles written about them we believe the score discrepancy was due to the perceived positive benefits that the blockchain could bring to companies -- and the investments from large companies like GS and GOOGL into blockchain technology. This is a great example of the hidden structure of the large set of data we were able to acquire.
Challenges we ran into
Because the problem was very open ended, we took a while to hone a direction and prioritize our development efforts for this hackathon. In the end, we decided to reduce our search scope to widely available news, finance, and social media sources.
Accomplishments that we're proud of
The time constraints severely limited the reach we originally wanted to have for the project but in the end, we were still able to link together quite a few data sources to generate a fairly robust model.
What's next for Insight
More API Integrations Dynamic Weight Effectiveness Tuning Realtime Updating arXiv as Source (we already have the data)