SmartFlow: A Lightweight Protocol for Fine-Grained Sharding and Parallel Smart Contract Execution in High-Performance Blockchain Systems ()
1. Introduction
1.1. Motivation
Blockchain technology has demonstrated transformative potential across various industries [1]. In high-performance computing (HPC) environments, it offers significant advantages, such as distributed consistent caching, data fidelity, and provenance tracking [2]-[5]. Smart contracts further enhance blockchain’s capabilities by automating complex workflows, reducing reliance on intermediaries, and optimizing computational costs [6]. However, achieving high-throughput blockchain execution in HPC settings requires addressing critical limitations inherent in traditional blockchain protocols.
A major bottleneck in existing blockchain systems is their reliance on heavyweight consensus mechanisms like Proof of Work (PoW) [7] and Practical Byzantine Fault Tolerance (PBFT) [8]. These methods impose high computational and energy costs while introducing significant latency, hindering efficient large-scale execution. Additionally, mainstream blockchain architectures are incompatible with the Message Passing Interface (MPI)—a widely adopted framework in the scientific computing community for HPC workloads. Since MPI facilitates distributed memory processing and efficient inter-node communication, the absence of blockchain systems designed to integrate with it further limits their feasibility in HPC environments—particularly for parallel transaction processing, where thousands of smart contracts may need to be executed concurrently.
Another major limitation is that most blockchain implementations lack compatibility with shared storage frameworks, which are fundamental to modern HPC and scientific computing workflows [2]. Traditional blockchain architectures rely on independently maintained ledgers across nodes, making integration with HPC storage solutions—such as parallel file systems or distributed shared memory—challenging. Furthermore, despite the automation advantages of smart contracts, their execution remains predominantly sequential in many blockchain frameworks. This sequential processing creates significant performance bottlenecks when handling high-volume workloads, limiting blockchain’s applicability in large-scale scientific and HPC applications.
To address these limitations, the blockchain for HPC requires: 1) a lightweight consensus mechanism that minimizes overhead while maintaining security and resilience, 2) efficient parallel execution of smart contracts across a large scale without contention or synchronization delays, and 3) seamless integration with existing HPC infrastructure, including shared storage and MPI-based communication frameworks.
1.2. Challenges
Integrating a lightweight blockchain framework into HPC systems to support parallel smart contract execution presents several technical challenges that must be addressed to ensure seamless scalability, efficiency, and fault tolerance. The primary challenges include the following.
Lightweight Consensus Protocol: Existing blockchain consensus mechanisms impose significant computational and synchronization overheads, limiting their suitability for high-speed and high-volume transaction environments. A lightweight and parallel-friendly consensus protocol compatible with shared storage ecosystems and parallel smart contracts is required to enable efficient block processing, low latency consistency, and high fault tolerance while remaining compatible with shared storage platforms.
Parallel Execution of Smart Contracts: Scheduling smart contracts across distributed compute nodes introduces data dependencies, conditional logic constraints, and state synchronization challenges [9]. Unlike conventional parallel computing, blockchain execution requires strict ledger consistency, meaning transactions must be validated concurrently without violating causal dependencies. Optimizing energy consumption, concurrency control, and rollback mechanisms is critical to ensuring scalable execution [10].
Integration with MPI and Technical Challenges: Integrating decentralized blockchain networks with scientific computing frameworks such as the Message Passing Interface (MPI) introduces several critical bottlenecks that need to be addressed. Efficient consensus formation across distributed nodes requires precise synchronization strategies to prevent network congestion while maintaining blockchain immutability. Balancing the fast, mutable memory demands of HPC with the immutability of blockchain transactions is a significant challenge. Additionally, optimizing block timing, propagation latency, and transaction finalization is crucial for achieving high throughput. A well-designed consensus mechanism must minimize synchronization overhead during parallel validation, ensuring fault tolerance and security at scale. Moreover, smart contract modularization should enable decentralized execution without introducing bottlenecks in the verification process, ensuring the system operates efficiently in large-scale environments.
1.3. Contributions
This paper introduces SmartFlow, a lightweight mechanism for blockchain that enables parallel execution of smart contracts, addressing the performance bottlenecks associated with traditional sequential execution methods [11]. Unlike conventional resource-intensive protocols [12], the proposed protocol is designed for improved efficiency with minimal resource consumption, enabling simultaneous smart contract processing. This enhances system performance, mitigates the limitations of traditional consensus approaches, and positions it as a significant advancement in scalable smart contract execution within blockchain systems.
In summary, this paper makes the following contributions.
• We design a fine-grained transaction sharding graph mechanism to enhance the parallel processing of transactions, thus improving system efficiency by optimizing resource usage and throughput.
• We develop a parallel smart contract execution framework, guided by a lightweight two-phase quorum commit (2PQC) protocol [2], which enables complex contract logic to be processed in parallel across distributed nodes. This approach improves both efficiency and consistency in smart contract processing.
• We optimize the lightweight two-phase quorum commit (2PQC) protocol, integrated with MPI, utilizing a dual-layer strategy to address communication overhead and throughput limitations, providing a scalable solution for blockchain systems in HPC environments.
• We implement a prototype and evaluate the system using a real-world dataset (e.g., banking transactions) [13] and compare its performance against RapidChain, a mainstream blockchain protocol, and a cutting-edge two-phase concurrency control (2PCC) protocol for parallel smart contract execution [14]. Our preliminary experiments show that the proposed system offers up to 7.5× lower latency and 5.8× higher throughput compared to current state-of-the-art protocols.
2. Related Works
On a blockchain, smart contracts are self-executing units of code that automate contract execution according to predetermined conditions. Designed with programming languages such as Solidity, they function independently, carrying out tasks without the need for a third party. The exploration of serial and parallel execution paradigms has yielded noteworthy advances. Chaincode introduces a scalable framework leveraging serial execution with state-of-the-art consensus protocols [15] and efficient data structures. Čeke et al. [16] focus on optimizing serial execution by reducing gas costs and improving transaction processing times.
Parallel execution in blockchain has become a crucial research direction to improve the performance and scalability of smart contract execution. XuperChain [17] employs parallelization without specifying the consensus protocol, while Liu et al. [18] separate execution from consensus in Ethereum to boost throughput. C. Jin et al. [14] introduce a two-phase concurrency protocol for permissioned blockchains, optimizing validation and execution stages. Chung and Park [19] propose parallel sharded execution using shared logs to address conflicts and bottlenecks in Ethereum’s Order-Execute model. Fang et al. [20] demonstrate gains in inter-node performance with SEFrame, which uses Intel SGX enclaves for safe parallel execution. Tao et al. [21] improve blockchain throughput and performance with a distributed sharding approach.
Other works focus on privacy and accessibility. Li et al. [22] prioritize privacy through contract segmentation, while Tan et al. [23] present LATTE, a visual interface for non-programmatic smart contract creation. Jian et al. [24] explore hardware-based security techniques with TSC-VEE, a TrustZone-based virtual execution environment. Qi et al. [25] introduce DMVCC, a scheduling framework for fine-grained synchronization in parallel execution. Despite these advancements, as shown in Table 1, many studies still rely on resource-intensive consensus protocols such as Practical Byzantine Fault Tolerance (PBFT), generic Byzantine Fault Tolerance (BFT), or Proof of Stake (PoS), which are impractical in high-performance computing (HPC) environments. Furthermore, none of these protocols support shared storage ecosystems or Message Passing Interface (MPI) based parallel computation.
Table 1. Summary of limitations of recent Smart Contract Protocols.
Features |
A High Performance Concurrency Protocol [14] |
Parallel and Asynchronous Smart Contract Execution [18] |
SEFrame
[20] |
Parallel Execution of Solidity Smart Contract [19] |
DMVCC
[25] |
SmartFlow (This Work) |
Lightweight Computation |
× |
× |
× |
× |
× |
√ |
Energy-efficient Protocol |
× |
× |
× |
× |
× |
√ |
Cost-effective Communication |
× |
× |
× |
× |
× |
√ |
Parallel block processing |
√ |
× |
× |
× |
× |
√ |
MPI-compliant |
× |
× |
× |
× |
× |
√ |
Shared-storage compatible |
× |
× |
× |
× |
× |
√ |
Recent developments in MPI provide extensive knowledge to improve its properties to create various solutions [26] [27]. However, MPI has not been used to develop a consensus protocol for parallel smart contract execution. Integrating MPI with existing research enhances MPI-specific packages. While a recent work [2] has proposed lightweight protocols to enhance blockchain compatibility with HPC ecosystems, it does not provide support for parallel execution of smart contracts.
3. System Design
The architecture of our proposed system, SmartFlow, consists of four main components: the Transaction Sharding Graph, Smart Contract, SmartFlow Consensus Protocol, and Parallel Processing component. As illustrated in Figure 1, the transaction processing flow begins with the Transaction Sharding Graph, which analyzes dependencies and organizes independent transactions into batches. These batches are then transmitted to the blockchain network, composed of MPI-connected nodes. Each node participates in a two-phase validation and quorum commit consensus protocol. First, transactions in each batch are validated in parallel by executing their associated smart contracts across the available cores. If validation succeeds, contract state updates are provisionally applied. After parallel execution, the nodes vote to commit or abort the batch based on validation results. If a quorum is reached, the block is committed to the chain. This validate-then-commit approach, leveraging parallel execution, aims to accelerate transaction latency and throughput by maximizing concurrent validation and contract execution across available HPC resources. We will discuss each component in more detail in the following sections.
3.1. Transaction Sharding Graph
The Transaction Sharding Graph (TSG) is a fundamental component of our proposed system architecture, designed to efficiently manage and analyze dependencies between transactions. Structured as a directed acyclic graph (DAG), each node in the TSG represents an individual transaction, while directed edges indicate dependencies between transactions. The direction of an edge enforces dependency constraints, ensuring that the graph remains acyclic and that transactions are executed in a valid sequence.
Figure 1. Overall system architecture of the proposed protocol.
Figure 2 illustrates an example of a Transaction Sharding Graph (TSG), where nodes represent transactions, and directed edges capture dependency relationships. The execution order within the TSG ensures that transactions are processed only after their dependencies are resolved. In this example, the execution sequence
guarantees correct processing while maximizing parallelism. After ordering transactions based on their execution dependencies, independent transactions are grouped into batches and distributed to separate shards, each managed by a set of nodes.
The process described in Protocol 1 begins by iterating through the list of transactions
. Each transaction
is checked for independence. If it has no dependencies, a hash is created using the corresponding entity
, and the transaction is added to a batch
. Dependent transactions are deferred until their required conditions are met (Lines 1 - 9). Once the batch is prepared, it is partitioned into blocks based on the number of available clusters. Each block
is then assigned to a corresponding cluster
. Finally, the protocol invokes the consensus mechanism to validate each block in parallel across the assigned clusters (Lines 10 - 22). By organizing transactions into shards, the TSG enhances parallelism and computational efficiency. Additionally, the directed acyclic structure of the graph prevents cyclic dependencies, ensuring consistent and reliable transaction execution.
Figure 2. Generating batches of independent transactions through transaction sharding graph.
3.2. Smart Contract
The smart contract module in our system plays a critical role in executing business logic and ensuring transaction validity. Implemented in Python, the smart contract handles key banking functions, such as depositing funds, transferring money, and withdrawing funds. More importantly, the smart contract is designed to be invoked during the parallel validation and execution phase managed by the consensus protocol.
A key technical contribution of our approach is the integration of the Transaction Sharding Graph (TSG), which enables parallel transaction processing by analyzing dependencies among transactions and partitioning them into independent execution batches. Each batch contains transactions that can execute concurrently, reducing bottlenecks and increasing throughput. These batches are then processed in parallel by the smart contract module deployed in the distributed nodes utilizing the Parallel Processing component. By leveraging the TSG, the smart contract efficiently validates dependencies and ensures that updates to contract states remain consistent across the network. Once validated, the smart contract’s changes are temporarily applied, awaiting consensus confirmation. This integration not only synchronizes contract execution with the consensus process but also optimizes distributed transaction processing. As a result, the smart contract contributes to both the SmartFlow Consensus Protocol and the Parallel Processing components, achieving high transaction throughput while preserving consistency and security within the system.
3.3. SmartFlow Consensus Protocol
In the proposed system, a lightweight MPI-based Quorum Commit protocol is utilized to overcome the limitations of traditional protocols (e.g., PBFT, PoW, PoS). This protocol is specifically designed for distributed blockchain networks, where the root node serves as the coordinator. The coordinator manages a sub-cluster of participant nodes and oversees transaction validation and commit operations. The workflow of the proposed Two-Phase Quorum Commit (2PQC) protocol is illustrated in Figure 3. Upon receiving a batch of transactions from the Transaction Dependency Graph, the primary node processes them through the 2PQC protocol, which operates as follows:
Prepare Phase: The coordinator initiates the process by sending a prepare message along with the set of transactions to all participant nodes. Each participant node independently validates the transactions based on the associated smart contract and responds with a prepare vote, indicating readiness or non-readiness to commit.
Figure 3. Customized SmartFlow Protocol for Enhanced Consensus Mechanisms.
Quorum Check: Following the prepare phase, the coordinator conducts a vital quorum check. If the majority of participant nodes (i.e., 51% or more) provide validation votes, it proceeds to request a commit. If not, the transaction block is aborted. The quorum check in 2PQC ensures that the coordinator only commits transactions when a majority of nodes agree, preventing inconsistent or invalid states.
Commit Phase: After a successful quorum check, the coordinator issues a commit request to all the participant nodes. Each node then commits to the transactions and sends a confirmation message back to the coordinator. Once all confirmation messages are received, the coordinator performs another quorum check.
Final Decision: Based on the results of the quorum check in the commit phase, the coordinator makes the final decision. If it receives commitments from at least 51% of the participant nodes, it declares the transactions committed successfully. Otherwise, it concludes that the commitment has failed.
As detailed in Protocol 2, the SmartFlow Consensus Protocol commences by distributing transaction block
to each cluster
, where a sub-cluster
with n nodes is managed by a coordinator
. Each transaction is parsed to extract metadata and security information (Lines 1 - 7). Nodes in the sub-cluster independently validate the block using entity data E (Lines 8 - 11). If more than 51% of the nodes agree, the block is committed; otherwise, it is aborted (Lines 12 - 16).
In our HPC blockchain system, where all nodes are trusted and pre-authorized, Byzantine faults are considered unlikely, making a 51% majority quorum sufficient for commit decisions—contrasting with the 2/3 majority required in typical zero-trust blockchain environments. Transactions are only finalized after achieving quorum consensus across nodes also this ensures consistency even in cases of partial failures or temporary network partitions. This protocol ensures that isolated nodes or failed nodes cannot commit invalid states during any node failures or network partitions since they cannot reach the required quorum. When these nodes reconnect, the states are synchronized via the quorum-based commit process. This synchronization works for consistency and for network integrity. By leveraging a decentralized verification service, the coordinator’s actions are transparent and verifiable. Each commit to the blockchain is hashed and recorded on an immutable ledger, allowing contributors to independently verify updates. Any discrepancies are flagged through a protocol, ensuring unauthorized changes are prevented. This structure ensures the coordinator’s actions are accountable to all participants, maintaining trust.
3.4. Parallel Processing: Dual-Layer Strategy
To optimize parallelism in an MPI-based blockchain framework, we introduce a dual-thread execution strategy that assigns two dedicated threads per node: one for transaction validation and another for consensus and commit handling. This division ensures efficient utilization of computational resources while minimizing inter-thread contention. Within each MPI node, the validation thread is responsible for verifying incoming transaction batches. Transactions are processed in parallel or micro-batches within this thread, ensuring correctness while maximizing throughput. Meanwhile, the consensus/commit thread asynchronously communicates validated transactions with other nodes using non-blocking MPI primitives (MPI_Isend, MPI_Irecv), allowing consensus operations to proceed without stalling validation.
This model effectively decouples validation from the consensus process, ensuring that while one thread verifies transactions, the other can finalize commits and synchronize state updates across nodes. By leveraging overlapping execution, where validation continues while previous transactions await consensus, the system significantly reduces bottlenecks. Furthermore, this approach reduces the complexity of thread management while maintaining high scalability. The consensus thread can aggregate multiple validation results before reaching a commit decision, thereby minimizing MPI communication overhead. By employing a minimal yet effective dual-thread model–synergizing thread-level and data-level parallelism—our approach optimally utilizes computational resources, significantly accelerating transaction validation and contract execution, making it highly suitable for HPC environments requiring high transaction throughput with minimal synchronization delays. Although HPC environments are often associated with centralized architectures, using HPC infrastructure for blockchain does not inherently compromise decentralization. In our design, decentralization is preserved by ensuring that each MPI node independently validates transactions and interacts equally in the consensus process, rather than relying on a single central node for decision-making. Our system leverages parallel storage systems in HPC environments, allowing MPI nodes to efficiently access and synchronize data across the distributed network. This integration of parallel storage with independent transaction validation upholds the principles of decentralization while benefiting from the performance advantages of HPC resources.
4. Evaluation and Results
4.1. Experimental Setup
4.1.1. Testbed
The experiments are conducted on an HPC cluster with 18 compute nodes (2× Intel Xeon Silver 4210R, 20 cores, 96 GB RAM) connected via an Infiniband network to a 240 TB shared storage managed by Lustre [28]. Each node runs Red Hat Enterprise Linux 8, OpenMPI, MPI4PY, Python 3.12.2, and Numpy 1.19.5. Each compute node can emulate up to 20 virtual nodes using user-level threads. To achieve optimal performance in a time-shared system, we deploy the prototype across cluster sizes ranging from 10 to 100 nodes.
4.1.2. Workload
For our experiments, we use the financial transaction dataset [13] with varying transaction volumes, ranging from small batches of 2000 transactions to larger volumes of 30,000 transactions. We specifically leverage banking transactions because the algorithms used to validate and process these transactions are well-suited for optimizing large-scale simulations of economic systems and parallel data processing in scientific computing (e.g., climate modeling). These workloads involve complex transaction dependencies, requiring runtime analysis and parallel processing to enhance performance.
4.1.3. Systems for Comparison
As our first baseline, we deployed a two-phase concurrency control (2PCC) protocol with an optimized PBFT (Practical Byzantine Fault Tolerance) protocol for parallel smart contract execution [14], aligning closely with SmartFlow’s strategy. Our second baseline was RapidChain [29], a recognized blockchain protocol that also uses PBFT, but with the vanilla smart contract execution method. Both 2PCC and RapidChain focus on Byzantine fault-tolerant consensus and smart contract execution. This contrast allows evaluation of our protocol’s performance against parallel and conventional execution models under Byzantine settings. This setup allowed for a direct performance comparison under the same operational conditions. We excluded proof-of-stake (PoS) protocols due to their susceptibility to centralization and other attacks such as the Sybil attack [30] [31]. The evaluation aimed to evaluate the efficiency and effectiveness of SmartFlow compared to these baselines, focusing on their ability to execute smart contracts effectively at scale within an HPC cluster.
4.2. Latency
We compared the latency of the SmartFlow framework against two state-of-the-art blockchain systems: 2PCC Protocol (Two-Phase Concurrency Control) and RapidChain. Latency refers to the time required to process a varying number of transactions. The experiment was conducted on a 50-node blockchain network. Figure 4 presents the transaction processing performance of the three systems as the number of transactions increases from 2000 to 10,000. SmartFlow consistently outperforms both 2PCC and RapidChain. At 2000 transactions, it is approximately 3.9× faster than 2PCC and 5.7× faster than RapidChain. At 10,000 transactions, SmartFlow remains efficient, being 2.1× faster than 2PCC and 3.4× faster than RapidChain, demonstrating its scalability under higher workloads.
Figure 4. Latency comparison of SmartFlow and state-of-the-art blockchain systems on a 50-node cluster.
4.3. Throughput
In this section, we evaluated the throughput of SmartFlow in comparison to the 2PCC (Two-Phase Concurrency Control) protocol and RapidChain, measured in transactions per second (TPS). The experiment maintains a constant workload of 1000 transactions while varying the number of nodes from 10 to 50. Figure 5 illustrates the TPS performance of SmartFlow alongside the other blockchain systems. SmartFlow repeatedly surpasses the conventional systems across all node scales, achieving up to 4.4× and 5.8× higher throughput than 2PCC and RapidChain, respectively. Thanks to its optimized 2PQC (Two-Phase Quorum Commit) consensus protocol, SmartFlow’s throughput scales efficiently, even at larger network sizes, whereas both 2PCC and RapidChain experience performance degradation as the number of nodes increases.
Figure 5. Throughput comparison of SmartFlow and state-of-the-art blockchain systems on a 50-node cluster.
4.4. Scalability
This section evaluates the scalability of SmartFlow by measuring its performance under increasingly large workloads. We assess how the system’s latency scales with growing transaction volumes in an HPC environment, comparing it to the 2PCC (Two-Phase Concurrency Control) protocol and RapidChain. Figure 6 illustrates the observed latency, measured in seconds, across all systems. Throughout this experiment, we utilized a network of 100 nodes and varied the number of transactions from 10,000 to 30,000. It is noteworthy that SmartFlow significantly outperforms the baseline systems at all transaction scales, achieving latency up to 3× and 7.5× faster than the 2PCC protocol and RapidChain, respectively. As the workload increases, SmartFlow sustains its performance advantage, delivering increasingly efficient results compared to other blockchain systems. The combination of the dual-layer parallel execution strategy and the lightweight 2PQC consensus protocol effectively manages the growing workload, reducing latency and improving overall performance across the distributed nodes.
![]()
Figure 6. Scalability analysis of SmartFlow and state-of-the-art blockchain systems in a 100-node HPC cluster.
5. Conclusion and Future Work
SmartFlow introduces two key innovations to enhance blockchain performance in high-performance computing (HPC) environments: 1) a fine-grained Transaction Sharding Graph (TSG) for maximizing concurrency in transaction processing and 2) a parallel smart contract execution model compatible with MPI, integrated with a lightweight Two-Phase Quorum Commit (2PQC) protocol optimized for managing larger workloads. Our design ensures efficient multi-node execution while leveraging remote shared storage for enhanced scalability. Experimental evaluations on an HPC cluster with up to 100 nodes demonstrate that SmartFlow achieves significantly higher throughput and lower latency than conventional consensus protocols. Future work will focus on 1) extending smart contract functionalities and enabling cross-chain interoperability to support a wider range of decentralized applications and 2) integrating SmartFlow with BAASH [2] to explore its scalability in extreme-scale workloads.