TITLE:
An Analysis of OpenSeeD for Video Semantic Labeling
AUTHORS:
Jenny Zhu
KEYWORDS:
Semantic Segmentation, Detection, Labeling, OpenSeeD, Open-Vocabulary, Walking Tours Dataset, Videos
JOURNAL NAME:
Journal of Computer and Communications,
Vol.13 No.1,
January
30,
2025
ABSTRACT: Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications.