SmartDoc 2015 — Challenge 1 Dataset


This work is licensed under a Creative Commons Attribution 4.0 International License. Author attribution should be given by citing the following conference paper:

Jean-Christophe  Burie, Joseph Chazalon, Mickaël Coustaty, Sébastien Eskenazi, Muhammad  Muzzamil Luqman, Maroua Mehri, Nibal Nayef, Jean-Marc OGIER, Sophea Prum  and Marçal Rusinol: “ICDAR2015 Competition on Smartphone Document  Capture and OCR (SmartDoc)”, In 13th International Conference on  Document Analysis and Recognition (ICDAR), 2015.


The dataset is available for download on Zenodo at the the following address:


To build our dataset, we took six different document types coming from public databases and we chose five document images per class. We have chosen the different types so that they cover different document layout schemes and contents (either completely textual or having a high graphical content).

Each of these document models was printed using a color laser-jet on A4 format normal paper and we proceeded to capture them using a Google Nexus 7 tablet. We recorded small video clips of around 10 seconds for each of the 30 documents in four different background scenarios. The videos were recorded using Full HD 1920×1080 resolution at variable frame-rate. Since we captured the videos by hand-holding and moving the tablet, the video frames present realistic distortions such as focus and motion blur, perspective, change of illumination and even partial occlusions of the document pages. Summarizing, up to now, the database consists of 120 video clips comprising around 24.000 frames


We ground-truthed this collection by annotating the quadrilateral coordinates of the document position for each frame in the collection.