The value of biostratigraphic data in energy resource exploitation is essential and has been demonstrated historically for over a century. Traditionally it has been carried out by analogue scientific methodology involving the generation and iterative analysis of data by a specialist, with results communicated by the use of text and charts.

However, biostratigraphic data seems eminently suitable for treatment both as “big data” and as a candidate for assisted interpretation techniques. The value and role of the specialist is not reduced in these circumstances; in particular, the fossil entities in rocks that allow biostratigraphy to function still need to be correctly identified if they are to provide value and these (often intuitive) skills have consistently eluded transfer to the digital realm of automation. However, the accumulation of vast amounts of data during the previous century and beyond requires significant phases of updating and/or re-integrating with ever-changing and developing stratigraphic concepts and techniques, and is beyond the physical capability of the (dwindling) biostratigraphic workforce using traditional methods. Such potentially valuable data is in danger of laying moribund within companies and institutions.

Automation of Biostratigraphy (image source: Halliburton Exploration Insights)

In the vast majority of data sets, biostratigrahic data consists of a large variety of major class types (i.e. different fossil groups) and an even larger variety of various subsets and categories which can be compiled, interrogated and potentially have applications to algorithmic treatment to assist a geoscientist to interpret signals. This can be extended into the realm of “machine learning” with the application of a set of “rules” which define the way data should be structured and analysed, and any potential sources of error or uncertainty can be quantified.

Halliburton Exploration Insights are working to investigate machine learning techniques on biostratigraphic data and Mike from GSS Geoscience is also involved in assisting in this process. Recently a postgraduate intern research team comprising a biostratigrapher and a data scientist spent 12 weeks in Halliburton's Abingdon office working to develop algorithms to run with a data-set of three wells from West Africa with a large amount of micropaleontological and nannopaleontological raw data. The process involved "supervised" machine learning with systems designed by the data scientist trying to replicate intepretations made by a human (the biostratigrapher). Initial results are very encouraging although the whole approach is in the early stages of investigation and development.

A critical component of the "supervised" approach involves the documentation and systematic breakdown of steps by which a biostratigrapher carries out biostratigraphic interpretation. There are numerous sources on the theory of biostratigraphy but very little on the subject of how to do biostratigraphy. There is also very little information on how to deal with the many "nuances" encountered during the interpretation of data sets. These need to be comprehensively documented and systematised to refine the basic machine learning techniques.