Where Data Meets Computer Science


An image processing approach for quantitative understanding of early-print books

In their research, historians often work with dated manuscripts and early-print books, as these can often provide valuable, authentic information that may not be possible to obtain from other sources. However, despite their popularity with historians, a number of problems can arise when using manuscripts and early-print books in research. One of the most significant problems is that these sources can be challenging to read. This is due to the combination of difficult-to-read handwriting, and also noise on the page (such as textures, damage, or ink seepage). This project therefore proposes the developement of a software system which will facilitate the efficient digitalisation of primary sources, which in turn, will ease the burden on the historian. The project will take a skeletonization based image processing approach, which aims to extract and quantitatively describe the shapes of letters on pages of early-print books, despite any noise on the sample. This will allow the text in the source to then be presented to the historian in an easy-to-read, digital format. In order to realise this aim, the project will involve looking into noise reduction techniques in image processing, image segmentation techniques and algorithms, and character recognition techniques.

Coded letter.

Coded letter.


  • Dr Federico M. Federici, School of Modern Languages and Cultures, University of Durham, UK