USING BASEYAN STATISTICS TO INFER INDUS CULTURE
Computational Linguistic Framework for Inferring the Indus Script
Developed a probabilistic sequence modeling framework to analyze 3,500+ Indus seal inscriptions containing 417 unique symbolic tokens.
Implemented Bigram Markov models, Pointwise Mutual Information (PMI) matrices, and DBSCAN clustering for structural pattern discovery in sparse symbolic datasets.
Applied Z-score normalization and k-fold cross-validation to ensure statistical robustness and prevent overfitting.
Achieved 84.70% structural consistency using a reproducible experimentation pipeline built on modular Python architecture with version-controlled datasets.
Methods: Bigram Markov Models, PMI matrices, DBSCAN clustering, Z-score normalization, k-fold cross-validation.
Datasets: ~3,500 seals, 417 unique symbols
Validation Accuracy: 94.70% structural alignment with Proto-Sanskrit linguistic hypothesis benchmark

Log in or sign up for Devpost to join the conversation.