TITLE:
Intelligent ETL for Enterprise Software Applications Using Unstructured Data
AUTHORS:
Manthan Joshi, Vijay K. Madisetti
KEYWORDS:
Structured Data, Relational Model, LLM-Powered Agents, Field-Level Extraction, Knowledge Graph
JOURNAL NAME:
Journal of Software Engineering and Applications,
Vol.18 No.1,
January
30,
2025
ABSTRACT: Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.