Data Dialogue: Small Data Analysis in the Age of Big Data: Estimating the Number of Scribes in Ancient Hebrew Inscriptions
In the era of data proliferation, it is sometimes unfeasible, from practical reasons, to collect a large amount of data. In our research, we tackled the problem of estimating the number of different scribes who produced a corpus of ancient Hebrew inscriptions, originating from Israel of the First Temple period. The data we dealt with was unearthed in archaeological excavations of two different sites. The first one is the isolated military outpost of Arad in the Judah desert (dating to ca. 600 BCE), while the second one is Samaria, the capital of the kingdom of Israel (dating to the 8th century BCE). As such, the availability of samples is very limited and Big Data techniques cannot give statistically significant answers.
In our study, we reformulate the problem of finding the number of scribes in terms of estimating the number of clusters in a limited data. Thus, given a set of high-dimensional distributions, we aim at comparing and finding an estimate to the number of clusters in the data. First, we introduce a method for comparing a pair of high-dimensional distributions through a hypothesis testing. Next, utilizing this test on a pair-by-pair basis, we estimate the lower bound for the number of clusters in the examined data. Last, we suggest a method for calculating the maximum likelihood estimate for the number of classes, by leveraging the statistics on the accuracy of our algorithm.