Motion-based Segmentation and RecognitionDataset

最新推荐文章于 2023-03-31 07:31:46 发布

凌风探梅

最新推荐文章于 2023-03-31 07:31:46 发布

阅读量1.9k

点赞数

分类专栏： ObjectDetect

ObjectDetect 专栏收录该内容

37 篇文章 1 订阅

订阅专栏

Motion-based Segmentation and RecognitionDataset
(this is a draft versionof this page)

Please cite:

(1)

Segmentation and RecognitionUsing Structure from Motion Point Clouds, ECCV 2008 (pdf)
Brostow, Shotton, Fauqueur, Cipolla (bibtex)

(2)

Semantic Object Classes inVideo: A High-Definition Ground Truth Database (pdf)
Pattern Recognition Letters (to appear)
Brostow, Fauqueur, Cipolla (bibtex)

Description:

The Cambridge-driving LabeledVideo Database (CamVid) is the first collection of videos with objectclass semantic labels, complete with metadata. The database providesground truth labels that associate each pixel with one of 32 semantic classes.

The database addresses the need for experimental data to quantitativelyevaluate emerging algorithms. While most videos are filmed withfixed-position CCTV-style cameras, our data was captured from theperspective of a driving automobile. The driving scenario increases thenumber and heterogeneity of the observed object
classes.

Over ten minutes of high quality 30Hz footage is being provided, withcorresponding semantically labeled images at 1Hz and in part, 15Hz. TheCamVid Database offers four contributions that are relevant to objectanalysis researchers. First, the per-pixel semantic segmentation ofover 700 images was specified manually, and was then inspected andconfirmed by a second person for accuracy. Second, the high-quality andlarge resolution color video images in the database represent valuableextended duration digitized footage to those interested in drivingscenarios or ego-motion. Third, we filmed calibration sequences for thecamera color response and intrinsics, and computed a 3D camera pose foreach frame in the sequences. Finally, in support of expanding this orother databases, we offer custom-made labeling software for assistingusers who wish to paint precise class-labels for other images andvideos. We evaluated the relevance of the database by measuring theperformance of an algorithm from each of three distinct domains:multi-class object recognition, pedestrian detection, and labelpropagation.

Overview Video:

Avi, 30 Mb, xVid compressed.(playbacktips or get the free Mac/Windows player.
or
Mpg, 11 Mb, mpeg-1 compressed(more compatible, but lower quality)

CamVid Database
(just samples shown. For all thevideos, see below)

Original Video Sequences:

Link to FTP server withvideo files (very big!)
Linkto codecs + utility for extracting frames from those big files
(read the inventory.txt)

Labeled Images
(701 so far)

Linkto zip file with painted class labels for stills from the videosequences.
Txtfile listing classes and label colors as RGB triples (sorted).
(Note: the corresponding raw input images only - at 1Hz,
already extracted from the respective videos areheretemporarily(556Mb).)

Camera extrinsics

Linkto files and code (if link breaks someday, go here)
The relevant line that you care about to get the projection matrix of 1camera is in MotBoostEvalOneFrame.m (see howLoadBoujou_2Dtrax_3dBans_Misc.m calls it):
curC = Cs( frameNum-offsetForFrameNums, 1:3);

Examplecamera posetrajectory, stored in Boujou Animation Format:
each line containing "AddDecompCameraKey" has a K and R matrix and tvector,
so that P = K * R * [I -t]

seq06R0

Description: 3030 frames at 30Hz == 1:41 min
Sample Frame
VideoFilein MXF format *

seq16E5

Description: 6120 frames at 30Hz == 3:24 min
Sample Frame
VideoFiles 1 and 2 inMXF format* (note: these are 2halves of 1 zip file)

seq16E5_15Hz
(see also CamSeq01)

Description: 202 frames at 30Hz == 0:06 min
Sample Frame
VideoFiles 1 and 2 inMXF format * (note: same files asabove, but use a different script)

seq05VD

Description: 5130 frames at 30Hz == 2:51 min
Sample Frame
VideoFileinMXF format*

seq01TP

Description: 3720 frames at 30Hz == 2:04 min
Sample Frame
VideoFilein MXF format *

Listingof (RGB)-Classassignments (alphabetical) Listingin color-order used by MSRC(with "XX")

Moving objects
Animal
Pedestrian
Child
Rolling cart/luggage/pram
Bicyclist
Motorcycle/scooter
Car (sedan/wagon)
SUV / pickup truck
Truck / bus
Train
Misc

Road
Road == drivable surface
Shoulder
Lane markings drivable
Non-Drivable

Ceiling
Sky
Tunnel
Archway

Fixed objects
Building
Wall
Tree
Vegetation misc.
Fence
Sidewalk
Parking block
Column/pole
Traffic cone
Bridge
Sign / symbol
Misc text
Traffic light
Other

Hand-Labeled Frames:

seq06R0

Description: 101 frames at 1Hz == 1:41 min
Sample Frame PreviewVideo

seq16E5

Description: 204 frames at 1Hz == 3:24 min
Sample Frame PreviewVideo

seq16E5_15Hz
(see also CamSeq01)

Description: 101 frames at 15Hz == 0:06 min
Sample Frame PreviewVideo

seq05VD

Description: 101 frames at 1Hz == 1:41 min
Sample Frame PreviewVideo

seq01TP

Description: 124 frames at 1Hz == 2:04 min
Sample Frame PreviewVideo

Paint-Stroke Logs of ManualLabeling:

Example log file, whereeachof the user's mouse-strokes was recorded to include:
the class label being applied, size and type of brush orpre-segmentation used, location of each click point and drag-path, andduration for each stroke.

InteractLabeler Software:

InteractLabeler.zipforWindows (3.4Mb)
InteractLabelerDocumentation
InteractLabelerinstructions, as given to volunteers

*MXF format:

This format is like Avi orQuicktime in that it is a wrapper for multimedia files. In our case,just the video channel has data, and is HD format. To decode, use thisutility ( link)along with the scripts provided.