The insurance industry runs on vast reservoirs of data, dealing with mixed data formats which include both paper and electronic documents. The manual effort to extract information from these documents and different data sources is not only time consuming but also costly and prone to errors. In the near future, there looms potential competition from companies like Amazon, Google, and Facebook which are sitting on top of huge personal data to offer personalized insurance products. In fact, Amazon is already hiring insurance professionals looking to disrupt the insurance market in the UK, Germany, France, Italy, and Spain. According to GlobalData 2017 General Insurance Survey, 18% of consumers would buy their motor or home insurance from Amazon. Therefore, insurers are facing constant challenges to optimize operational costs, improve overall accuracy & customer experience and maximize the highest return on allocated capital. Intelligent automation can be used to quickly automate key processes to achieve higher efficiencies, streamline operational costs and reducing handling and cycle time of the claims processes.

What it does

We have developed a service that can automate verification of claim files including policy holder and vehicle details in multiple format such as scanned documents and pictures from claim surveyors.

Motor Claims are routinely reviewed through the Closed File Review (CFR) program. This includes assessment of a number of physical claim files. During testing of claim files, key claim attributes have to be verified with the multiple information sources (e.g. surveyor report, government documents, authoritative data) FTE’s need to manually test 18 to 20 data points for each claim file and compare them against authoritative data source.

A claim file would contain the following documents :

  1. Insurance Policy document
  2. Driving license of the driver
  3. Registration Certificate of the vehicle
  4. Surveyor’s report – this is an assessment report to ascertain the extent of damage on the vehicle based on which the insurance company decides if the claims is to be paid or not.
  5. Vehicle photographs taken at the time of survey

The above data is then cross checked with the claim data stored in system (authoritative data source)

So we developed a transaction testing solution that can automate manual review by converting unstructured data in scanned pdf claim files into structured data followed by comparison with the data captured in policy administration system, documents collected at the time of claim survey and other authoritative data sources.


  1. Upload the scanned images of claim files (unstructured data), corresponding authoritative data into our automated solution.

  2. Our solution perform an overall scan (preprocessing + OCR ) of claim files (unstructured data) and select data attributes that are subject to testing.

  3. Then we convert unstructured data attributes from the claim files into structured data (JSON). For e.g. Value associated with typed field ‘Registration Certificate’ should be extracted from the scanned images and saved .

  4. Validate the data extracted using document specific algorithm to ensure authenticity.

  5. Perform comparison of structured data from previous step with corresponding data in the system dump (authoritative data source) by our advanced comparison algorithm which calculates the percentage of matching between the data rather than returning a boolean value.

  6. Record exceptions (If any) in a spreadsheet file or a simple report.

How we built it

We used the following technology stacks for the following features :

  1. For taking input we built a user friendly upload form which even a non-tech guy can use!
  2. For Preprocessing : OpenCV
  3. For OCR : Tesseract OCR engine to ensure customer data security as tesseract runs locally.
  4. For Validation : We have trained customised validation models to check the extracted data's like Vehicle No, DL,PAN card, Dates, Names format.
  5. For Cross Verification : We have written an advanced comparison algorithm which calculates the percentage of matching between the data rather than returning a boolean value.

Challenges we ran into

We ran into the some challenges but solved it with team effort and determination! Some of them were :

  1. The format of some government documents and IDs changed over the years but are still accepted .So we developed a common algorithm to accept all types of govt IDs.
  2. The quality of customer's documents and IDs were poor in quality so preprocessing was tuned to deal with all quality conditions.
  3. Wrong extraction by OCR ( for eg : 5 and S) . We dealt with it by developing an advanced comparison algorithm which calculates the percentage of matching between the data rather than returning a boolean value.

Accomplishments that we are proud of

We are proud of the fact that the accuracy our service offers is above 95%. Moreover time taken to verify a single claim file is less than 20 seconds!

What's next for Motormation

  1. Add government validation using VAHAN api
  2. Expand from single dataset to thousands of datasets at one time
  3. Reduce Surveyor’s manual task of verifying damage by AI using damaged cars dataset.
Share this project: