Inspiration

Inspired by how overwhelming it is to read through long legal documents like the EU GDPR documentation and the AI acts and the unreliability of LLM generated contents, I decided to optimise the Gemini pro model to help guide AI system builders about potential legal considerations when deploying their AI solutions in the EU.

What it does

The AI act advisor takes in the details of the solution entered by the User as a prompt for comparison with the specifications described in the Act guides and articles. The closest matches is used to advise the users about considerations for building the solutions and the documentation.

How we built it

The system is essentially Retrieval Augmented Generation.

Dataset-extraction

There was no public consumable dataset for the AI act, so we parsed the AI act pdf document using the PyPDF library on python and reconstructed using the "gemini-1.5-flash" model. The extracted csv datasets can be found in the Github/HuggingFace link.

Embedding

Embeddings were created using the "models/text-embedding-004" for each paragraph in the guide and for each articles text stated in the act. The embedded article text contained the Chapter title, Section title, Article title, and Article text.

Vector storage

We used BigQuery tables as the vector storage for the embedded guide and article texts. The schema for the article table was:

CREATE TABLE `hackhathon-438922.hackhathon_ai_act_rag.ai_act_embedded` (
  article_number INT64 NOT NULL,
  tokens INT64 NOT NULL,
  article_title STRING NOT NULL,
  file_id STRING,
  chunk_data STRING NOT NULL,
  embeddings ARRAY<FLOAT64>
)

Schema for the Guide table was similar

CREATE TABLE `hackhathon-438922.hackhathon_ai_act_rag.ai_act_guide_embedded` (
  id INT64 NOT NULL,
  text STRING NOT NULL,
  embeddings ARRAY<FLOAT64>
)

This tables were created on the BigQuery UI because the SDK threw errors about the embeddings column.

Retrieval

Using the BigQuery SDK and the embedding model we were about to do a similarity search using the DOT_PRODUCT distance. The embedding model returns an embedding vector for the query string using the "QUESTION_ANSWER" task type. The query to search for the similar guides is as follows:

WITH search_results AS (
      SELECT base.id AS id, base.text as text, distance
      FROM VECTOR_SEARCH(
        TABLE `hackhathon-438922.hackhathon_ai_act_rag.ai_act_guide_embedded`, 'embeddings',
        (SELECT {query_embedding} AS embeddings, 'query_vector' AS file_id),
        top_k => 10, distance_type => 'DOT_PRODUCT', options => '{{"use_brute_force": true}}'
        )
    )
    SELECT sr.id, sr.text
    FROM search_results sr
    ORDER BY sr.distance ASC
   WITH search_results AS (
      SELECT base.article_number AS article_number, base.article_title as article_title, base.chunk_data as article, distance
      FROM VECTOR_SEARCH(
        TABLE `hackhathon-438922.hackhathon_ai_act_rag.ai_act_embedded, 'embeddings',
        (SELECT {query_embedding} AS embeddings, 'query_vector' AS file_id),
        top_k => 10, distance_type => 'DOT_PRODUCT', options => '{{"use_brute_force": true}}'
        )
    )
    SELECT sr.article_number, sr.article_title, sr.article
    FROM search_results sr
    ORDER BY sr.distance ASC

Generation

We used the Vertex AI Reasoning Engine LangAgent LangchainAgent to pass in the system instructions, tools, and user prompts to the gemini-1.0-pro model. We also used FireStore as chat history storage for context retrieval.

Challenges we ran into

Quota limitations

The initial plan was to use the vertex AI feature store, however we were hit with quota errors across different steps. Also for the models, we were unable to use the chain of thought flow effectively due to the quota limitations.

Accomplishments that we're proud of

Accomplishments:

  • EU AI Act dataset creation and publishing.
  • RAG System creation using BigQuery as Vector Storage
  • Reasoning Engine Deployment

What we learned

  • Using Big Query as a vector storage
  • Using LangchainAgents
  • Tooling in gemini, running tools in parallel and chain of thought.

What's next for EU AI Act advisor

  • Ensure agent always calls tools
  • Successfully deploy on Vertex AI reasoning engine where the prompts make the tool calls without erroring.
  • Build a frontend for public use.

Built With

Share this project:

Updates