Predictive Modelling in Archaeology – Introduction

When people think about archaeology, they often picture Indiana Jones or people in a desert excavating dinosaurs.

When people think about archaeology, they often picture Indiana Jones or people in a desert excavating dinosaurs. I am afraid both are born from misunderstandings from how archaeology is portrayed in popular media. The first part to address is that archaeology is typically more interested in the context an artefact is found than the artefact itself; making Indiana Jones a good rescue archaeologist, but not someone who you want on a dig with you. The other major aspect is that archaeologists are focussed on people, and we leave dinosaurs to the palaeontologists. That’s not to say all archaeology is directly involving the study of people, but rather they are studying people through supporting evidence, such as animal bones, plant seeds, the environment and the landscape.

Predictive modelling or in this case geospatial predictive modelling is using geographic centric data to predict the likelihood of a given event occurring. For a model to be successful in predicting the desired events, two main elements are required: a large pool of data, and a strong relationship with location. A simple example of this would be predicting where to build a driveway for a new house.

In this example, we would need at least 3 sets of data: house locations; driveway locations; and road locations. Most people would agree a driveway is a means of connecting a house to the road, and so there should be a strong positive correlation of a driveway being connected to both a house and a road. We can then use 60-90% of the available data to train a model and 10-40% of the data to test its success. Models can range from statistical linear regression to deep neural networks, and it’s increasingly common to involve multiple models stitched together. A simple model’s outcome might be placing the driveway on the side of the house closest to the road, and a more complex model might consider local planning laws, the aspect/slope of the terrain, and cluster the local houses with a higher weight for representing local aesthetic.

Without digressing too much further, the main thing to consider when designing a model is starting simple so you can fail fast. If there is not a strong relationship between location and other variables, you are probably wasting your time and it would be better spent on something else. Whilst housing is an example everyone should hopefully be familiar with, we will try to make geospatial predictive modelling in archaeology more accessible.

Archaeology

As mentioned earlier, archaeology is focussed on people, so the fundamental argument for using geospatial predictive modelling in archaeology is that people’s behaviour is either constrained by or visible in the landscape. The further you go back in time, the more you expect landscape to place larger constraints on things such as site selection as people are less able to modify the landscape around them to something more suitable. Monumental archaeology is a nice overlap between modifying the environment and site selection.

There is a fundamental issue with archaeological data we must discuss before pursuing predictive models utilising it. The archaeology we observe is the archaeology that has survived (taphonomy). This means any model we create is being biased in two important areas: we are modelling archaeology where it is likely to survive, and not all locations where it might have existed; and most archaeology is found during structural development, meaning the archaeology we have is biased by modern site selection. Modern site selection introduces the most complicated problem to solve, and the simplest solution is to acknowledge its existence. This is not confined to archaeology, but rather everything involving predictive modelling must rely on historic data which is likely to have some sampling bias.

The Neolithic is my preferred period for geospatial predictive modelling as it is the intersection between landscape modification with monumental archaeology and large-scale farming, and landscape constraints still playing a large role in site selection. Simply put, it is the time when people had the resources to make changes to the landscape where it was suitable for them to do so. We have access to Neolithic barrows (Tumuli) which are visibly identifiable in the terrain and are typically located near the tops of hilltop ridgeways and artefact find spots which have been found in farmer’s fields out of situ. This allows us the opportunity to cleanse and categorise the data in such a way to pose and hopefully answer hundreds of different questions.

Additionally, the South-West of Britain is my preferred region for predictive modelling for several reasons. Firstly, I wrote my MSc Archaeology dissertation at Bournemouth University and my tutor had written the book on Prehistoric Dorset which provided a rich repository of local domain knowledge. Secondly, I have worked closely with several HER Officers (Historic Environmental Record) in the South-West who have been able to provide both data and local domain knowledge. HER Officers are responsible for maintaining a record of local archaeology across each county in the UK; they play a role in protecting the cultural heritage of counties during planning decisions. Finally, The South-West is famously rich in prehistoric monumental archaeology, giving us much more data to work with, and sample size is important for successful models.

Now we have a time and place to play with, we can begin to pose questions which let us try different modelling techniques. Like most things technical, there is rarely a right model to choose, just the one with the least negative trade-offs.

The next steps

With a brief introduction to both geospatial predictive modelling and some aspects of archaeology explained, it is time to delve deeper into specific projects. Each project will have its own article and go into much greater detail about the specifics of the archaeology, the steps to prepare the data and the design/implementation of the model.

The first of these articles is a simplification of my dissertation which was written back in 2017 as my first attempt at geospatial predictive modelling. Subsequent articles will improve aspects of my dissertation and focus on areas where we can introduce new techniques. Finally, I am excited to investigate the issue of sampling bias and will be writing this up as another article in the series.