SmartDoc 2015 – Challenge 1 Input and Output Formats Specifications

This page specifies the input and output formats of the competition dataset and of expected results.Complying with those requirements will enable a quick and reliable evaluation of participants’ results by the organizers. 
Output samples are available for download at the bottom of this page.
Feel free to contact us for further information regarding those formats


Participants will have to process files in the test set, and produce for each of those files an associated result file.

Input structure

The input structure of the test set has the following structure:

├── background01
│   ├── datasheet001.avi
│   ├── datasheet002.avi
│   ├── datasheet003.avi
│   ├── datasheet004.avi
│   ├── datasheet005.avi
│   ├── letter001.avi
│   ├── letter002.avi
│   ├── letter003.avi
│   ├── letter004.avi
│   ├── letter005.avi
│   ├── magazine001.avi
│   ├── magazine002.avi
│   ├── magazine003.avi
│   ├── magazine004.avi
│   ├── magazine005.avi
│   ├── paper001.avi
│   ├── paper002.avi
│   ├── paper003.avi
│   ├── paper004.avi
│   ├── paper005.avi
│   ├── patent001.avi
│   ├── patent002.avi
│   ├── patent003.avi
│   ├── patent004.avi
│   ├── patent005.avi
│   ├── tax001.avi
│   ├── tax002.avi
│   ├── tax003.avi
│   ├── tax004.avi
│   └── tax005.avi
├── background02
│   ├── datasheet001.avi
│   ...
│   └── tax005.avi
├── background03
│   ...
├── background04
│   ...
└── background05
    └── tax005.avi

It contains 5 directories, and 150 video files (AVI container, XVID codec, no audio)

Expected output structure

The output structure must have the same structure, ie:

├── background01
│   ├── datasheet001.segresult.xml
│   ├── datasheet002.segresult.xml
│   ├── datasheet003.segresult.xml
│   ├── datasheet004.segresult.xml
│   ├── datasheet005.segresult.xml
│   ├── letter001.segresult.xml
│   ├── letter002.segresult.xml
│   ├── letter003.segresult.xml
│   ├── letter004.segresult.xml
│   ├── letter005.segresult.xml
│   ├── magazine001.segresult.xml
│   ├── magazine002.segresult.xml
│   ├── magazine003.segresult.xml
│   ├── magazine004.segresult.xml
│   ├── magazine005.segresult.xml
│   ├── paper001.segresult.xml
│   ├── paper002.segresult.xml
│   ├── paper003.segresult.xml
│   ├── paper004.segresult.xml
│   ├── paper005.segresult.xml
│   ├── patent001.segresult.xml
│   ├── patent002.segresult.xml
│   ├── patent003.segresult.xml
│   ├── patent004.segresult.xml
│   ├── patent005.segresult.xml
│   ├── tax001.segresult.xml
│   ├── tax002.segresult.xml
│   ├── tax003.segresult.xml
│   ├── tax004.segresult.xml
│   └── tax005.segresult.xml
├── background02
│   ├── datasheet001.segresult.xml
│   ...
│   └── tax005.segresult.xml
├── background03
│   ...
├── background04
│   ...
└── background05
    └── tax005.segresult.xml


  • ${participantid} is an ASCII string without space referring to the participant
  • ${methodid} is an ASCII string without space referring to the method used (multiple methods may be proposed).

Output packaging

Those files, and only those files, must be added to a Zip archive and sent to the organizers according to the procedure indicated on the competition website. The content of the Zip file must have the same structure as the output files.

XML file format specification

Result files must be in XML format and comply with the following specifications.Real output samples are available for download at the bottom of this page.All tags and attributes are mandatory, except when explicitly stated.

1/ First line (XML header) must be:

<?xml version='1.0' encoding='utf-8'?>

2/ Second line (format tag and generation time) must be:

<seg_result version="0.2" generated="$timestamp">


  • $timestamp is a string representing the date and time of generation of the file. The format of this string should comply with one of the standard ISO formats. Using Python, you can generate such string with

3/ Third line (software description) must be:

  <software_used name="$software_name" version="$software_name"/>


  • $software_name is a UTF-8 string describing the software
  • $software_version is a UTF-8 string denoting the software version

Those attributes must be present and should not be left blank.

4/ Fourth line (source file) must be:


  • $path is a path to the input file. It should be relative to dataset root.

5/ Fifth line (frame results start) must be:


6/ Following lines (frame results blocks)

  • Following lines must be formed of blocks indicating the coordinates of the object found in each frame, or indicate a reject if it cannot be found.
  • For each frame, a block should be generated.

If the object cannot be found in the current frame, the block must be:

    <frame index="$frame_index" rejected="true"/>


  • $frame_index is the index of the frame, starting at 1.

If the object is found in the frame, the block must be:

<frame index="$frame_index" rejected="false">
  <point name="bl" x="$blx" y="bly"/>
  <point name="tl" x="$tlx" y="tly"/>
  <point name="tr" x="$trx" y="try"/>
  <point name="br" x="$brx" y="bry"/>


  • $frame_index is the index of the frame, starting at 1.
  • $blx and $bly are the coordinates of the bottom left point of the object
  • $tlx and $tly are the coordinates of the top left point of the object
  • $trx and $try are the coordinates of the top right point of the object
  • $brx and $bry are the coordinates of the bottom right point of the object

The coordinates can be floating point numbers, provided the decimal separator used is the dot (‘.‘).Coordinates are expressed in the frame (image) coordinate system:

  • origin (0,0) is at the upper left corner of the frame
  • x values are increasing toward the right of the image
  • y values are increasing toward the bottom of the image
  • this is consistent with OpenCV and Numpy image matrix coordinate system.

7/ Last-1 line (frame results end) must be:


8/ Last line (document end) must be:


Sample output XML file (abbreviated)

<?xml version='1.0' encoding='utf-8'?>
<seg_result version="0.2" generated="2014-07-24T15:18:01.287068">
  <software_used name="My program (c) CVC/ULR 2014" version="0.2"/>
    <frame index="1" rejected="true"/>
    <frame index="2" rejected="true"/>
    <frame index="3" rejected="true"/>
    <frame index="4" rejected="true"/>
    <frame index="5" rejected="true"/>
    <frame index="6" rejected="true"/>
    <frame index="7" rejected="true"/>
    <frame index="8" rejected="true"/>
    <frame index="9" rejected="true"/>
    <frame index="10" rejected="true"/>
    <frame index="11" rejected="false">
      <point name="bl" x="970" y="770"/>
      <point name="tl" x="910" y="347"/>
      <point name="tr" x="1242" y="333"/>
      <point name="br" x="1367" y="743"/>

    <frame index="219" rejected="false">
      <point name="bl" x="852" y="744"/>
      <point name="tl" x="880" y="331"/>
      <point name="tr" x="1212" y="352"/>
      <point name="br" x="1258" y="766"/>
    <frame index="220" rejected="false">
      <point name="bl" x="852" y="744"/>
      <point name="tl" x="884" y="328"/>
      <point name="tr" x="1217" y="352"/>
      <point name="br" x="1258" y="766"/>