GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs - Journal of Software Engineering and Applications

JSEA > Vol.17 No.1, January 2024

Journal of Software Engineering and Applications

Volume 17, Issue 1 (January 2024)

ISSN Print: 1945-3116 ISSN Online: 1945-3124

Google-based Impact Factor: 1.22 Citations h5-index & Ranking

GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs ()

HTML XML

Download as PDF (Size: 891KB) PP. 43-68

DOI: 10.4236/jsea.2024.171003 230 Downloads 1,138 Views

Author(s)

Parijat Rai¹, Saumil Sood¹, Vijay K. Madisetti², Arshdeep Bahga³

Affiliation(s)

¹School of Computer Science Engineering & Technology, Bennett University, Greater Noida, India.
²School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, USA.
³Cloudemy Technology Labs, Chandigarh, India.

ABSTRACT

This paper introduces a novel multi-tiered defense architecture to protect language models from adversarial prompt attacks. We construct adversarial prompts using strategies like role emulation and manipulative assistance to simulate real threats. We introduce a comprehensive, multi-tiered defense framework named GUARDIAN (Guardrails for Upholding Ethics in Language Models) comprising a system prompt filter, pre-processing filter leveraging a toxic classifier and ethical prompt generator, and pre-display filter using the model itself for output screening. Extensive testing on Meta’s Llama-2 model demonstrates the capability to block 100% of attack prompts. The approach also auto-suggests safer prompt alternatives, thereby bolstering language model security. Quantitatively evaluated defense layers and an ethical substitution mechanism represent key innovations to counter sophisticated attacks. The integrated methodology not only fortifies smaller LLMs against emerging cyber threats but also guides the broader application of LLMs in a secure and ethical manner.

KEYWORDS

Large Language Models (LLMs), Adversarial Attack, Prompt Injection, Filter Defense, Artificial Intelligence, Machine Learning, Cybersecurity

Share and Cite:

Rai, P. , Sood, S. , Madisetti, V. and Bahga, A. (2024) GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs. Journal of Software Engineering and Applications, 17, 43-68. doi: 10.4236/jsea.2024.171003.

Cited by

No relevant information.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies