Inspiration Ancient Near Eastern archives consist of thousands of fragmented clay tablets. Traditional decipherment is slow, and we wanted to see if AI can accelerate this archaeological process.
What it does CuneiScript leverages the cutting-edge capabilities of Gemini 3 Flash to bridge the gap between ancient Assyrian cuneiform and modern structured data. It performs morphological normalization and translates complex transliterations into queryable JSON formats.
How we built it (Gemini 3 Integration) Central to our application is the model's large context window and multimodal reasoning, which allows for the simultaneous processing of transliterations from the OARE dataset and cross-referencing them with the eBL (electronic Babylonian Library) Lexicon.
We utilized the System Instructions feature to anchor the model in specific philological rules, ensuring that the AI performs morphological normalization rather than just literal translation. The Gemini 3 Flash variant was specifically chosen for its high-speed inference and efficiency in handling JSON-structured outputs.
Challenges we ran into Handling large-scale archaeological datasets (like the 600MB+ Kaggle archive) and avoiding AI hallucinations in dead languages was a major hurdle.
Accomplishments that we're proud of A key feature we successfully implemented is Lexical Grounding: by providing the model with professional lexicon data, we ensured that entities like personal names (PN) and financial units (shekels, minas) are extracted with historical accuracy. This integration demonstrates how Gemini 3 can transform unstructured ancient text into a queryable database for modern researchers.
Technology & Workflow Model Selection: "I chose Gemini 3 Flash for its exceptional processing speed and its ability to handle extremely large contextual datasets via its long context window. This enabled the model to analyze complex cuneiform transliterations through the lens of a vast historical lexicon without compromising accuracy."
The Technological Workflow:
Input: Raw tablet transliterations (e.g., 10 gín kù-babbar).
Context: Large-scale CSV files (Lexicon and Training data) integrated into Google AI Studio to provide grounding.
Prompt Engineering: Specialized system instructions designed to frame Gemini as an expert Assyriologist, ensuring high-fidelity extraction.
Output: The structured JSON results demonstrated in the video, ready for database integration.
Built With
- gemini
- googleaistudio
- json
- machine-learning
- python
Log in or sign up for Devpost to join the conversation.