An image processing approach for quantitative understanding of early-print books
In their research, historians often work with dated manuscripts and early-print books, as these can often provide valuable, authentic information that may not be possible to obtain from other sources. However, despite their popularity with historians, a number of problems can arise when using manuscripts and early-print books in research. One of the most significant problems is that these sources can be challenging to read. This is due to the combination of difficult-to-read handwriting, and also noise on the page (such as textures, damage, or ink seepage). This project therefore proposes the developement of a software system which will facilitate the efficient digitalisation of primary sources, which in turn, will ease the burden on the historian. The project will take a skeletonization based image processing approach, which aims to extract and quantitatively describe the shapes of letters on pages of early-print books, despite any noise on the sample. This will allow the text in the source to then be presented to the historian in an easy-to-read, digital format. In order to realise this aim, the project will involve looking into noise reduction techniques in image processing, image segmentation techniques and algorithms, and character recognition techniques.
- Dr Federico M. Federici, School of Modern Languages and Cultures, University of Durham, UK