In order to enable a global ranking of your submissions, we decided to use a simpler metric than the one we use internally and initially planned to use for the competition (detection and recall measures for segmentation and detection). The evaluation process will be greatly simplified and avoid having to optimize rejection rates.
It works as follows.
0/ For each frame of each video of the test set, our ground-truth contains the exact coordinate of each corner of the page object to segment. Each frame contains such page object: your methods should return 4 coordinates for each frame.
1/ Using the object size and its coordinates in each frame, we start by transforming the coordinate of your results S and of the ground-truth G to undo the perspective transform and obtain the corrected quadrangles S’ and G’. By applying such transform we make all the evaluation measures comparable within the document referential.
2/ For each frame f, we will compute the Jaccard index (JI) as follows to measure the similarity between the set G’ of expected pixels in the ground truth and the set S’ of the segmentation result returned by your method:
JI(f) = area(intersection(G’, S’)) / area(union(G’, S’))
S’ will be considered empty if you reject a frame, giving the worst possible score (0) to it. Hence, your method should not reject frames.
3/ The overall score for each method will be the average of the frame score, for all frames in the dataset.
For further study and comparison, we may use precision and recall metrics, detection rates and others in addition to the global evaluation.