Inspiration

Enterprises across domains such as media, e-commerce, finance, and manufacturing struggle with incomplete or inconsistent metadata across their systems. Manual data completion slows operations, introduces quality risks, and delays time-to-value. We wanted to build a solution that removes human bottlenecks and ensures accurate, up-to-date data for better business decision-making.

What it does

Agent integrates within the existing pipeline of the enterprises & automatically detects missing datapoints in databases, fetches accurate information from trusted public sources, and writes enriched data back along with a confidence score for each field.

Unlike typical GenAI chat-based systems, this is an agentic AI workflow where it doesn’t just generate text but performs a sequence of actions:

  1. Identifies missing data fields,
  2. Calls external search and enrichment APIs such as SerpAPI and other web connectors,
  3. Evaluates the credibility of sources,
  4. Synthesizes the extracted information, and
  5. Writes validated results back automatically to the DB.

For this hackathon, we demonstrate its use in OTT platforms, filling gaps in metadata like cast, release year, and ratings for regional content. The agent works autonomously in the pipeline, removing manual intervention and ensuring consistent, high-quality metadata.

How we built it

  1. AWS Bedrock for LLM-based reasoning and data extraction.
  2. AgentCore to orchestrate the multi-step enrichment workflow.
  3. AWS Lambda for serverless event-driven execution.
  4. Amazon S3 for storing input/output data.
  5. External APIs - SerpAPI and other web connectors for sourcing publicly available information.
  6. Custom Confidence Engine - Combines model confidence, source authority, and evidence recall to generate weighted scores.
  7. Custom frontend demo allows CSV uploads in the defined DB table format for the hackathon, simulating the agent’s integration with real pipelines.

Challenges we ran into

  1. Determining trustworthy sources from the web and ranking them for confidence scoring.
  2. Calculating a weighted confidence score combining model certainty, source authority, and evidence recall.

Accomplishments that we're proud of

  1. Built a fully autonomous data enrichment agent that integrates into existing pipelines without manual intervention.
  2. Developed a confidence scoring system to quantify reliability of every enriched datapoint.
  3. Delivered a working demo in OTT context, showing tangible before-and-after metadata completion.

What's next for Autonomous Data Enrichment Agent

  1. Extend to image metadata enrichment, including poster images and cast photos.
  2. Handle image aspect ratio and formatting corrections automatically.
  3. Expand into other domains like e-commerce, finance, and manufacturing for general-purpose enterprise data enrichment.
  4. Introduce real-time pipeline integration and continuous monitoring for missing or inconsistent data.

Built With

Share this project:

Updates