Inspiration

Purpose & Motivation

The purpose of taking this challenge is to improve my AI skill in analysing the document processing. I really motivated to apply this solution to unlock the Microsoft problem statement. In order to deliver the solution, I have interviewed one of a company’s CFO and decision maker, to understand the real-life problem that faced by the decision maker, respectively for the Form 10-K.

Based on the interviewed, the decision maker is looking for the few factors as below in choosing the best company stock to invest; as such;

  1. Regulation and market legislative
  2. Key highlight of the 10-K report
  3. Pick and choose certain item based on no.2 to further analyse, such as profit ratio etc.
  4. Rules of thumb of financial analysis based on these three factors;

a. Sustainability (cash flow for 12 months, margin for 12 months) b. Risk beyond jurisdiction (eg: contemplate market share price) c. Future impact (for the whole market and the global impact)

On the other note, I also studied how other AI's web services product provides the end result of the automated document analysis. The result is presented in a 2D table, in which each criteria will be presented in a vertical column, while every company will be presented in a vertical row. The last column of the table will presents the summary result of each company. However, the summary result is still vague and confusing. Therefore, the future features that will be provided in the Pineapple Financial Hack's app will be more objective and reliable for the decision maker.

In future, the automated document processing using the AI capability, is the most looking forward by the business decision maker.

The future features of the Pineapple Financial Hack's app are as below:

  1. To recommend and listing the top 5 or top 10 companies’ stock to invest.
  2. The ability of the system to automate and categorizing the stocks either “Buy”, “Sell” or “Hold” options, based on the user input criteria.

What it does

Problem Statement

Processing large financial documents are really challenging. Decision maker who needs to make investment decision really need to digest a huge chunk of financial reports as such; Form 10-K, and other available financial reports, such as Form 10Q, Proxy statement (DEF 14A), 8K, 13D, 13F and etc, for every different company. On the other hand, Letter to Shareholders, is also a useful document in helping the investment decision. Analysing the financial report information will be confusing because it consists a lot of text and number with multiples pages. Sometimes the pages for each company report can go to hundreds of pages.

Form 10-K is a comprehensive report filed annually by public companies about their financial performance. Information in the 10-K includes corporate history, financial statements, earnings per share, and any other relevant data. The 10-K is a useful tool for investors to make important decisions about their investments.

As a part of the Hackathon project, Microsoft Corp is looking for the Ai solution in unlocking the document processing for the Form 10-K reports in particular, to help solve the situation faced by the decision maker. In order to process the large documents, Microsoft is looking for an AI solution to automate the document processing by extracting the Custom Named Entity Recognition. In addition, it also looking for a Text Summary capability of company’s risk factors. By having an automated document processing, it will help to improve the productivity, and ease the investment analysis and decision for the business decision maker.

How we built it

Introduction

Pineapple Financial Hack is introduced to solve the document processing capabilities. It utilized the Artificial Intelligent power by leveraging the Azure Cognitive Services under Language Services to provide those features.

Pineapple Financial Hack is designed with features to extract the different entity name of the documents by using one of the Cognitive Services features which is Languages category. The chosen AI services are Custom Named Entity Recognition (NER), and the Text Summary features. The Custom Named Entity Recognition (CNER) is to train the model to recognize Form 10-K structure and content in a custom mode, whereas the Text Summary feature, will summary the risk factors capability of each company. The Bing Custom Search capabilities will also be included in this app for the convenient of the decision maker to find related result. The Power BI will be used to display the visualization feature of the result to the app user.

In addition, this app is designed and built by using Azure Cloud capability which provides security, scalability and cost efficient environment.

Security by using Azure Active Directory (AAD) service in Azure cloud environment to provide secure key authorization to the client. Scalability is the services offered in future to include different types of financial document type and data to be added and trained by our model. Cost efficient because the Data Lake Gen2 version of our blob container provides better pricing because the data movement in and out of the storage is lesser.

How does the app work

This app will use the Text Analytics under Languages Services, to process the Custom Named Entity Recognition extraction. This process is perform using the Language Studio tool provided by Microsoft Azure AI Cognitive Services, with API version 3.0. Then the model will be trained and test using the dataset given. The dataset given is based on the Form 10-K of Microsoft Corporation and a few other companies. The documents must only be in .docx format. For the training data, I have used five set of Form 10-K from different companies. For the test data, I have added another two Form 10-K from another different companies. The result will be presented in a visualization form, using Microsoft Power BI.

The app is developed using the Cognitive Services -under Language Services -Text Analytics for the AI capability. The Storage Container is created using the Data Lake Storage Gen2 due to the capability of the container on the Blob storage to enhance the performance, management and security.

Performance is improved because the user does not need to copy or transform data for the purpose of, the hierarchical namespace helps to improve the performance of directory management operations, and job performances. Management is easier because files are managed through directories and subdirectories. Security is enforceable because you can define POSIX permissions on directories or individual files.

Challenges we ran into

Difficulty & challenge faced during the design and development process

The difficulty faced during the development period was the configuration setup of my own Azure Cloud environment that has caused the difficulty in performing the AI processing. The configuration of the Azure Active Directory (AAD), Managed Identity and Key Vault need to be synchronized with the Language Studio application, for me to able to process the files in the Language Studio. The process is a bit complex in order to understand and configure the Azure cloud security due to different versions of APIs, and deprecated web services. Another difficulties was I have chosen a wrong approach on solving the problem. I have used a different AI service that does not provides the result I want. However, I managed to tune in this problem into a manageable solution at the last minutes.

After joining this hackathon I feel that I should push myself to join more hackathon in future, for me to advanced my skills in AI. This time is my first-time joining hackathon. Therefore, I spent a lot of time watching the Youtubes on how to configure and develop an AI's app on the Azure AI, and Azure Cloud. After I tried it out by learning through Youtubes and Microsoft documentation, I gained the AI skills, Azure Cloud skills and app development exposure. In future, I believe it will be easier for me to handle the AI's hackathon project. Juggling time with work also takes another level of commitment, as my working industry is not purely in AI and cloud environment practise.

Accomplishments that we're proud of

Another challenge was the emotional break down due to the end of this hackathon. I did feel give up and hopeless towards the end of the submission. It really made me down, but I managed to continue working on it until last minutes without thinking this effort is a worst thing. I realized that I gained many skills through the whole project period. My biggest achievement is I NEVER SAY NO till the end! 😊 I can either sleep and enjoy my day rather then completing this project, but I choose to continue working on it as a positive matter in my AI journey. Indeed I cannot stop working on it, and I keep working round the clock for the completion. This is totally not me before... For this, I am proud of myself for NEVER GIVE UP!

What we learned

From my first experience in Hackathon, what I learnt from the technical perspectives are:

  1. The main references is the always product documentation, Youtubes resource is a no 2 reference. Always keep update with the technical documentation when working in app development.

  2. Solving the technical configuration issue of the system is more tougher than solving the AI matter using Azure Cognitive Services. I have underestimate this matter, thus it really effected the development time. If the cloud configuration and the web services is well in place in your computer and laptop, you have less hassle. But somehow solving this issue really gives satisfaction to me and giving me more confident in handling the development. This is the thing that I am afraid, and hate to deal in app development! I hope I will overcome it soon.:)

What's next for PineappleFinancialHack

Go to market (How will the app be available to the public, and is it scalable?) Yes, this app has the capability to improve in AI's market product, in terms of the automated document processing. It also can be scalable to another financial reports by adding the rest of the financial reports as mentioned in the problem statement, such as Form 10Q etc. The app is possible to be available to the public by having a custom Financial Recommender Text Analytic API.:)

Built With

+ 45 more
Share this project:

Updates