OK for future references, this is how i add background images to the dataset allowing the model to train on it. Functions used from: datitran/raccoon_dataset
- Generate CSV file ->
xml_to_csv.py - Generate TFRecord from CSV file ->
generate_tfrecord.py
First Step - Creating XML file for it Example of background image XML file <annotation>
<folder>test/</folder>
<filename>XXXXXX.png</filename>
<path>your_path/test/XXXXXX.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>640</width>
<height>640</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
</annotation>
Basically you remove the entire <object> (i.e no annotation ) Second Step - Generate CSV file Using the xml_to_csv.py I just add a little change, to consider the XML file that do not have any annotation (the background images) as so: From the original: https://github.com/datitran/raccoon_dataset/blob/93938849301895fb73909842ba04af9b602f677a/xml_to_csv.py#L12-L22 I add: value = None
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
if value is None:
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
'-1',
'-1',
'-1',
'-1',
'-1'
)
xml_list.append(value)
I'm just adding negative values to the coordinates of the bounding box if there is no in the XML file, which is the case for the background images, and it will be usefull when generating the TFRecords. Third and Final Step - Generating the TFRecords Now, when creating the TFRecords, if the the corresponding row/image has negative coordinates, i just add zero values to the record (before, this would not even be possible). So from the original: https://github.com/datitran/raccoon_dataset/blob/93938849301895fb73909842ba04af9b602f677a/generate_tfrecord.py#L60-L66 I add: for index, row in group.object.iterrows():
if int(row['xmin']) > -1:
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
else:
xmins.append(0)
xmaxs.append(0)
ymins.append(0)
ymaxs.append(0)
classes_text.append('something'.encode('utf8')) # this doe not matter for the background
classes.append(5000)
To note that in the class_text (of the else statement), since for the background images there are no bounding boxes, you can replace the string with whatever you would like, for the background cases, this will not appear anywhere. And lastly for the classes (of the else statement) you just need to add a number label that does not belong to neither of your own classes. For those who are wondering, I've used this procedure many times, and currently works for my use cases. Hope it helped in some way. |