Object Detection Meets LLMs: Model Fusion for Safety and Security - Journal of Software Engineering and Applications

JSEA > Vol.16 No.12, December 2023

Journal of Software Engineering and Applications

Volume 16, Issue 12 (December 2023)

ISSN Print: 1945-3116 ISSN Online: 1945-3124

Google-based Impact Factor: 1.22 Citations h5-index & Ranking

Object Detection Meets LLMs: Model Fusion for Safety and Security ()

HTML XML

Download as PDF (Size: 4265KB) PP. 672-684

DOI: 10.4236/jsea.2023.1612034 109 Downloads 562 Views

Author(s)

Zeba Mohsin Wase¹, Vijay K. Madisetti², Arshdeep Bahga³

Affiliation(s)

¹School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India.
²School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, Georgia, USA.
³Cloudemy Technology Labs, Chandigarh, India.

ABSTRACT

This paper proposes a novel model fusion approach to enhance predictive capabilities of vision and language models by strategically integrating object detection and large language models. We have named this multimodal integration approach as VOLTRON (Vision Object Linguistic Translation for Responsive Observation and Narration). VOLTRON is aimed at improving responses for self-driving vehicles in detecting small objects crossing roads and identifying merged or narrower lanes. The models are fused using a single layer to provide LLaMA2 (Large Language Model Meta AI) with object detection probabilities from YoloV8-n (You Only Look Once) translated into sentences. Experiments using specialized datasets showed accuracy improvements up to 88.16%. We provide a comprehensive exploration of the theoretical aspects that inform our model fusion approach, detailing the fundamental principles upon which it is built. Moreover, we elucidate the intricacies of the methodologies employed for merging these two disparate models, shedding light on the techniques and strategies used.

KEYWORDS

Computer Vision, Large Language Models, Self Driving Vehicles

Share and Cite:

Wase, Z. , Madisetti, V. and Bahga, A. (2023) Object Detection Meets LLMs: Model Fusion for Safety and Security. Journal of Software Engineering and Applications, 16, 672-684. doi: 10.4236/jsea.2023.1612034.

Cited by

No relevant information.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies