We are excited to present the initial results of our 2023 Patrick Leary Field Development Grant this week at #RSVP2024! After annotating 908 pages, we fine-tuned an object detection model (YoloV8) that can identify illustrations on scanned pages with high accuracy (mAP50 = 0.964).

Using this model, we extracted 140,351 illustrations from the Illustrated London News (1842-1889) and four other well-known British illustrated periodicals: The Graphic, Illustrated Times, Pictorial Times, and Illustrated Weekly News (see the table below).


Employing the OpenCLIP AI model, we extracted multimodal embeddings from these images, making them fully searchable with both textual and visual queries. For instance, users can easily find illustrations of railway accidents using the prompt ‘an illustration of a railway accident’ or images of the British parliament using a modern photograph. The embeddings can also be used to cluster all the images of the ILN, identifying various portraits of different types of men or all the maps published by the British periodical (see below).

Text search

In the coming months, we will expand the dataset to include well-known European and American titles. We will release the code, full dataset, and multimodal embeddings as open access in the fall of 2024.

We hope this dataset will be valuable to scholars of the nineteenth century and its press. Please get in touch if you are interested in working with our data and multimodal embeddings. We plan to organize several online meetings and an in-person workshop in the summer of 2025 to familiarize interested scholars with the data and associated methods.