Journal of Software Engineering and Applications

Volume 16, Issue 12 (December 2023)

ISSN Print: 1945-3116   ISSN Online: 1945-3124

Google-based Impact Factor: 1.22  Citations  h5-index & Ranking

Object Detection Meets LLMs: Model Fusion for Safety and Security

HTML  XML Download Download as PDF (Size: 4265KB)  PP. 672-684  
DOI: 10.4236/jsea.2023.1612034    109 Downloads   562 Views  

ABSTRACT

This paper proposes a novel model fusion approach to enhance predictive capabilities of vision and language models by strategically integrating object detection and large language models. We have named this multimodal integration approach as VOLTRON (Vision Object Linguistic Translation for Responsive Observation and Narration). VOLTRON is aimed at improving responses for self-driving vehicles in detecting small objects crossing roads and identifying merged or narrower lanes. The models are fused using a single layer to provide LLaMA2 (Large Language Model Meta AI) with object detection probabilities from YoloV8-n (You Only Look Once) translated into sentences. Experiments using specialized datasets showed accuracy improvements up to 88.16%. We provide a comprehensive exploration of the theoretical aspects that inform our model fusion approach, detailing the fundamental principles upon which it is built. Moreover, we elucidate the intricacies of the methodologies employed for merging these two disparate models, shedding light on the techniques and strategies used.

Share and Cite:

Wase, Z. , Madisetti, V. and Bahga, A. (2023) Object Detection Meets LLMs: Model Fusion for Safety and Security. Journal of Software Engineering and Applications, 16, 672-684. doi: 10.4236/jsea.2023.1612034.

Cited by

No relevant information.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.