USING BASEYAN STATISTICS TO INFER INDUS CULTURE

Computational Linguistic Framework for Inferring the Indus Script

Developed a probabilistic sequence modeling framework to analyze 3,500+ Indus seal inscriptions containing 417 unique symbolic tokens.

Implemented Bigram Markov models, Pointwise Mutual Information (PMI) matrices, and DBSCAN clustering for structural pattern discovery in sparse symbolic datasets.

Applied Z-score normalization and k-fold cross-validation to ensure statistical robustness and prevent overfitting.

Achieved 84.70% structural consistency using a reproducible experimentation pipeline built on modular Python architecture with version-controlled datasets.

Methods: Bigram Markov Models, PMI matrices, DBSCAN clustering, Z-score normalization, k-fold cross-validation.

Datasets: ~3,500 seals, 417 unique symbols

Validation Accuracy: 94.70% structural alignment with Proto-Sanskrit linguistic hypothesis benchmark

Share this project:

Updates