Blockchain Based Redistricting with Public Participation

Mahalingam Ramkumar; Naresh Adhikari

doi:10.4236/jis.2022.133009

Journal of Information Security > Vol.13 No.3, July 2022

Blockchain Based Redistricting with Public Participation

Mahalingam Ramkumar¹, Naresh Adhikari²
¹Mississippi State University, Starkville, MS, USA.
²Slippery Rock University, Slippery Rock, PA, USA.
DOI: 10.4236/jis.2022.133009 PDF HTML XML 136 Downloads 699 Views Citations

Abstract

Redistricting is the process of grouping all census blocks within a region to form larger subdivisions, or districts. The process is typically subject to some hard rules and some (soft) preferences to improve fairness of the solution. Achieving public consensus on the fairness of proposed redistricting plans is highly desirable. Unfortunately, fair redistricting is an NP hard optimization problem. The complexity of the process makes it even more challenging to convince the public of the fairness of the proposed solution. This paper proposes a completely transparent blockchain based strategy to promote public participation in the redistricting process, to increase public confidence in the outcome of the process. The proposed approach is based on the fact that one does not have to worry about how the NP hard problem was solved, as long as it is possible for anyone to compute a “goodness” metric for the proposed plan. In the proposed approach, anyone can submit a plan along with the expected metric. Only the plan with the best claimed metric needs to be evaluated in a blockchain network.

Keywords

Redistricting, Authenticated Data Structures, Blockchain Ledger

Share and Cite:

Ramkumar, M. and Adhikari, N. (2022) Blockchain Based Redistricting with Public Participation. Journal of Information Security, 13, 140-164. doi: 10.4236/jis.2022.133009.

1. Introduction

Redistricting is a process of subdividing a geographical region into districts such that each district meets similarity criteria such as geographic contiguity, equality of population, etc. Some of the basic principles [1] underlying the congressional redistricting process include 1) contiguity of districts, 2) equality of population size, 3) geographic compactness, 4) protection against vote dilution of minorities, 5) preservation of the boundaries of other political subdivisions, 6) promotion of electoral competition, 7) prohibition of partisan considerations, etc.

In the United States, congressional redistricting is performed every 10 years to account for dynamics in population. Specific redistricting principles, however, may differ among states. Congressional redistricting may be performed by a commission of political representatives, or by independent experts. Redistricting by a politically biased committee may lead to what is known as gerrymandering [2]. A gerrymandered redistricting process produces districts that give an advantage [3] [4] to one party over another in elections.

Ideally, redistricting seeks to create equal-population districts that are optimally compact, thereby optimizing some cost. A redistricting plan that maps b blocks to d districts has b^d possible solutions, making it an NP-hard problem [5]. While the notion of using sophisticated computer algorithms to automate redistricting to produce a fair district plan is appealing, the public often lacks trust in the computer platforms and algorithms. The lack of trust is not unjustifiable, especially for complex platforms needed to solve NP-hard problems. Sophisticated computer algorithms have been known to be misused to produce visually satisfying, but with subtle deliberate sub-optimal tweaks to create politically biased districts [5].

1.1. Problem Statement

Let the b blocks in a region R be $B_{1} \dots B_{b}$ . Any block $B_{i}$ is a polygon with $n_{i}$ sides, typically represented by a sequence of $n_{i} + 1$ boundary points (or vertices of a polygon) in the counter clockwise (CCW) order, viz.,

$B_{i} = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n_{i}}, y_{n_{i}}), (x_{1}, y_{1})} .$ (1)

The last point is the same as the first point. Traversing the polygon in the CCW direction leaves the area enclosed by the polygon to the left. Figure 1 depicts a region R with 10 blocks, grouped into 3 districts.

Assigning the b blocks to d districts is a process of creating d groups $D_{1} \dots D_{d}$ among the b blocks, subject to some hard and soft constraints. For our purposes we shall assume that the region is a state, and the districts are congressional districts. Hard constraints dictate that the blocks in a group should be contiguous (a block should share at least one side with another block in the same group), no

Figure 1. A region R with 10 blocks grouped into 3 districts.

overlap between blocks, and no overlap between districts. Some soft constraints include equal-population, compactness, convexity, and proximity of blocks in a district to the “center” of the district. The center, could be a geographic centroid, or the population centroid (which pulls the geographic centroid closer to more populated blocks). Other soft constraints may include consideration of racial mix, annual income, political leanings, etc.

In the United States, census blocks are public data [6], available in shapefile (.shp) format [7]. The shapefiles for states range from 4MB for the District of Colombia, to 760 MB for Texas. The total number of census blocks in the U.S.A is over 6 million. For example, the state of California has $b = 710145$ blocks grouped into $d = 53$ districts.

1.1.1. The Challenge

The computing process of interest to us in this paper is required to take possibly up to a million blocks as input, where each block is represented by possibly hundreds or even thousands of points. Each block may also be associated with non geographic data like population, racial mix, average annual income, political leanings, etc. The input will also include a set of weights $w_{i} \dots w_{n}$ applied to various soft metrics to influence the output of the process, viz., the allocation of blocks to each group.

The utility of any computing process is ultimately limited by the extent of trust in the correctness of output of the process. In general, higher the complexity of the platform, the harder it is to ensure that every component of the platform is trustworthy. Given b blocks and d districts there are b^d possible ways of grouping the blocks ( $b^{d} = 710145^{53}$ for California). Given the enormity of the computational requirement at hand—choosing the best out of b^d options—it is far from easy to justify the trust in the correctness of the output.

One approach to enhance trust in the integrity of platforms is to minimize the trusted computing base (TCB), the minimal amount of hardware/software that needs to be trusted [8] to guarantee platform integrity. In von Neumann (VN) computers—which is almost every computing platform today—each step in the execution of any process (a machine instruction) requires memory access. If data previously stored in some memory location at time t_s is retrieved later at time t_r, it is simply assumed that the memory contents were not modified between times t_s and t_r. This potential time-of-check-time-of-use (TOCTOU) [9] vulnerability is simply ignored in the VN model, by including external memory inside the TCB.

Unfortunately, trust in external memory is unjustifiable in modern computers with a complex hardware/software stack [10] [11] [12] [13]. A significant number of attacks that exploit this weakness, such as RAM scraping [14], buffer overflow [15], file-less malware [16], DMA malware attacks [17], invasive or semi-invasive attacks [18] etc., have repeatedly demonstrated exploits that illegitimately read/write memory.

1.2. Proposed Solution

The proposed solution is motivated by the fact that one does not have to be concerned about how the computation was actually performed, as long as one can verify the results of the computation. For NP optimization problems, while computing a solution may have exponential complexity, verifying the correctness of a proposed solution is typically a polynomial time algorithm. However, even while verification is substantially easier, the scale of data can make it hard to promote confidence in computing platforms used for verification.

A blockchain network is essentially a computing platform where no component needs to be included inside the TCB [19] [20] [21]. In the proposed approach, the verification algorithm is executed in a blockchain network. Specifically, the verification algorithm computes a goodness metric [22] for any proposed solution.

Anyone can submit a solution. In order to deter submission of incorrect solutions, it can be mandated that each submission should explicitly indicate the final metric, and be accompanied by a stake. The entity submitting the solution will lose the stake if the computed metric does not match the claimed metric. If the stake is high enough to serve as a deterrent, only one proposed solution (the one that claims the best metric) will need to be executed in the blockchain network.

1.3. Organization

The rest of the paper is organized as follows. Section 2 begins with a discussion of various measures of optimality of redistricting plans. Section 2.1 provides a overview of an important component of the proposed solution, viz., hash tree based authenticated data structures (ADSes). Section 2.2 is an overview of blockchain networks as a platform for computing. More specifically, executing a process in a blockchain calls for a state-machine model for the process, where permitted state-changes are well-formed transactions.

Section 3 discusses 3 distinct steps involved in the verification of redistricting plans, and outlines the scope of different types of blockchain transactions in each step. Additional discussions and conclusions are offered in Section 4.

2. Overview

Let D be a simple polygon whose n sides are described by Cartesian coordinates $(x_{0}, y_{0}), (x_{1}, y_{1}), \dots, (x_{n - 1}, y_{n - 1}, (x_{0}, y_{0}))$ in the counter clockwise (CCW) direction, along the perimeter of the polygon. The area A [23], the perimeter P, and the coordinates of the centroid $C = (C . x, C . y)$ [24] of the polygon D, can be computed incrementally by considering 2 adjacent points at a time, as

$A = \frac{1}{2} \sum_{i = 0}^{n - 1} (x_{i} + x_{i + 1}) (y_{i + 1} - y_{i})$ (2)

$P = \sum_{i = 0}^{n - 1} \sqrt{{(x_{i} - x_{i + 1})}^{2} + {(y_{i} - y_{i + 1})}^{2}}$ (3)

$C . x = \frac{1}{6 A} \sum_{i = 0}^{n - 1} (x_{i} + x_{i + 1}) (x_{i} y_{i + 1} + x_{i + 1} y_{i})$

$C . y = \frac{1}{6 A} \sum_{i = 0}^{n - 1} (y_{i} + y_{i + 1}) (x_{i} y_{i + 1} + x_{i + 1} y_{i})$ (4)

Geometric compactness [25] [26] [27] is a commonly used metric for evaluating the suitability of a proposed solution. Several measures of compactness exist for a polygon D with area A and perimeter P:

1). Iso-perimetric ratio, computed as $I_{i p} (D) = A / P^{2};$

2) Reock ratio $R = A / A_{C}$ [25] where $A_{C}$ is the area of the smallest circle bounding the polygon D (with area A);

3) The convex hull ratio $H_{D} = A / A_{H_{D}}$ where $A_{H_{D}}$ is the area of the convex hull of D.

For a district D with k blocks, the moment of inertia is computed as $I (D) = \sum_{i = 1}^{k} w_{i} d_{i}^{2}$ , where $d_{i}$ is the distance between the centroid of the district and the centroid of block i. The weight $w_{i}$ for block i can be based on the area of the block, or the population of the block, or more often, a combination of both.

2.1. Authenticated Data Structures

An authenticated data structure (ADS) [28] - [32] is a mechanism for computing a cryptographic commitment s to any number of discrete data items in a collection $S$ . Useful ADSes, typically based on hash trees, possess efficient strategies to incrementally update the commitment $s (t)$ , with incremental changes to $S (t)$ .

Commitments are hashes, computed using a cryptographic hash function h(). Given $y = h (x)$ with pre-image x and digest y, it is impractical from a computational perspective to determine any $x^{'} \neq x$ satisfying $y = h (x^{'})$ . One can thus reasonably conclude that digest y was computed after x was chosen, or that x “existed before” its commitment y.

2.1.1. Merkle Hash Tree

A Merkle hash tree [33] is an example of an ADS where the root of the tree (a single hash) is a commitment to a set of dynamic leaves. Figure 2 depicts a tree with $s (t) = v_{0, 0}$ (root of the inverted tree) and $S (t) = {L_{0} \dots L_{7}}$ as the 8 leaves of the tree. For a balanced tree with $N = 2^{d}$ leaves, there are $2^{i}$ nodes in level $i =0,1, \dots, d$ , where $d = \log_{2} N$ is the maximum depth of the balanced tree.

A node $v_{r, k}$ at depth r, $0 \leq k < 2^{r}$ is computed as

$v_{r, k} = h (v_{r + 1,2 k}, v_{r + 1,2 k + 1}) .$ (5)

A leaf-node is obtained by hashing the contents of the leaf. In Figure 2, leaf-node $v_{3,5} = h (L_{5})$ .

Any node at depth r has r complementary nodes, and r ancestor nodes. The 3 blue nodes in the figure are ancestors of node $v_{3,5}$ . The red nodes—the node’s own “sibling” of $v_{3,4}$ , and the siblings $v_{2,3}$ and $v_{1,0}$ of its ancestors—are

Figure 2. A Merkle hash tree of depth $d = 3$ with $N = 2^{d} = 8$ leaves.

complementary to $v_{3,5}$ . The complementary nodes of $v_{3,5}$ are the verification objects (VOs) for node $v_{3,5}$ .

2.1.2. Prover-Verifier Protocols

For any one with access to only the root $v_{0,0}$ of the tree, knowledge of VOs of $v_{3,5}$ , viz., ${v_{3,4}, v_{2,3}, v_{1,0}}$ , constitutes proof of existence of $v_{3,5}$ in the tree. Note that setting $x = v_{3, 5}$ and following it with a sequence of 3 hash operations using the VOs, viz., $x = h (v_{3, 4}, x)$ , $x = h (x, v_{2, 3})$ , $x = h (v_{1, 0}, x)$ will yield $x = v_{0, 0}$ (the known root of the tree). Due to the one-way property of cryptographic hash functions, that $v_{3, 5} = h (L_{5})$ exists in the tree, is also proof of existence of leaf $L_{5}$ .

If there is a legitimate reason to update $v_{3,5} \to {v^{'}}_{3,5}$ , using the same sequence of hash operations starting with $x = {v^{'}}_{3,5}$ instead, one can compute the new value of the root, using the same VOs ${v_{3,4}, v_{2,3}, v_{1,0}}$ . It is important to note that VOs of $v_{3,5}$ are unaffected by changes to $v_{3,5}$ .

The need for an update to $v_{3,5}$ may be for purposes of changing the contents of leaf $L_{5}$ , or adding a new leaf by increasing depth of $L_{5}$ . For example, a new leaf (say) $L^{'}$ can be inserted alongside $L_{5}$ , making $v_{4, 10} = h (L_{5})$ the new leaf node of $L_{5}$ , and $v_{4,11} = h (L^{'})$ the leaf-node for the new leaf, thus changing $v_{3,5}$ to $v_{35} = h (v_{4, 10}, v_{4, 11})$ . Just as inserting a leaf promotes a leaf node to “a parent of 2 leaf nodes,” deletion of a leaf will result in “a parent of 2 leaf nodes” being demoted to a leaf node.

For a tree with N leaves, $\log_{2} N$ VOs can be used to prove

1) the existence of a specific leaf $L \in S (t)$ , or

2) that the new root is $s (t_{+})$ , corresponding to an update $L \to L^{'}$ to a leaf, or

3) that the new root is $s (t_{+})$ , corresponding to insertion/deletion of a leaf.

In prover-verifier protocols based on ADSes, provers store all leaves and nodes. Verifiers are assumed to know only the root $s (t)$ of the tree. Provers can prove specific properties regarding a data collection $S (t)$ (leaves of the tree) against the commitment $s (t)$ (root of the tree). If the total number of leaves is $| S (t) | = N$ , the proof takes the form of $O (\log_{2} N)$ VOs. Verifiers can verify the proof by computing $O (\log_{2} N)$ cryptographic hash operations using the $O (\log_{2} N)$ VOs, and comparing the end result to the known commitment $s (t)$ . The proofs also permit verifiers to update the commitment $s (t)$ corresponding to legitimate updates to $S (t)$ (changing the contents of leaves, inserting/deleting leaves, etc.).

2.1.3. Geographic Collections

In the proposed approach, the leaves (of a hash tree/ADS) will correspond to one or more collections of geographic elements like

1) a side of a polygon, specified by 2 points (2 sets of $(x, y)$ coordinates), or

2) a rectangular bounding box, also specified by 2 points (lower-left and upper-right corners), or

3) a more general key-value (k-v) collection.

For a geographic point $p = (p . x, p . y)$ , $p . x$ and $p . y$ are typically latitude and longitude coordinates. Collectively, geographic elements in a collection may represent more complex objects like a polygonal region [34].

Leaves in a boundary line collection are of the form $[p_{1}, p_{2}, v]$ , representing a boundary line connecting the 2 points. The value v may provide some information about the line. In such collections, an example of a meaningful update is to split a leaf $[p_{1}, p_{2}, v]$ (representing a boundary line) into 2 leaves as

$[p_{1}, p_{2}, v] \to {[p_{1}, p, v], [p, p_{2}, v]},$ (6)

at a redundant point p on the line. Such an incremental operation replaces a leaf with 2 leaves (or a new leaf is added to the tree).

In a rectangular bounding box collection, leaves are of the form $[c_{1}, c_{2}, v]$ , specifying the lower-left corner point $c_{1}$ and the upper-right corner point $c_{2}$ . A rectangular bounding box can also be split into 2, for example, by a vertical line at X coordinate value $c_{1} . x < x_{s} < c_{2} . x$ as

$[c_{1}, c_{2}, v] \to {\begin{array}{l} [c_{1}, c, v] \\ [c^{'}, c_{2}, v] \end{array} c . x = c^{'} . x = x_{s}, c . y = c_{1} . y, c^{'} . y = c_{2} . y .$ (7)

or across a horizontal line with Y-coordinate value $c_{1} . y < y_{s} < c_{2} . y$ , as

$[c_{1}, c_{2}, v] \to {\begin{array}{l} [c_{1}, c, v] \\ [c^{'}, c_{2}, v] \end{array} c . y = c^{'} . y = y_{s}, c . x = c_{1} . x, c^{'} . x = c_{2} . x .$ (8)

2.1.4. Key-Value Collections

In a key-value (k-v) collection [30] the leaves are of the form $[k, k_{n}, v]$ corresponding to a k-v pair ${k, v}$ with next-key $k_{n}$ .

The first leaf added to an empty tree should be of the form $[k, k, v]$ . A k-v pair ${k, v}$ can be added to the collection only if a leaf $[a, b, u]$ —a k-v pair ${a, u}$ with next-key b—already exists in the collection such that

$(a < k < b) OR (b \leq a < k) OR (k < b \leq a) .$ (9)

If the condition above is satisfied, the existing leaf $[a, b, u]$ is replaced with two leaves

1) $[a, k, u]$ : old k-v pair ${a, u}$ , but with K as next-key, and

2) $[k, b, v]$ : new k-v pair ${k, v}$ with b as next-key.

2.1.5. Hierarchical Hash Trees

ADSes constructed using hash trees can have hierarchical structures. For example, one or more values in a leaf can themselves be a hash tree root.

As an example, assume that the verifier knows only the root $ξ$ of a hierarchical ADS. Two sets of VOs can be used to prove that

1) “boundary line $[p_{1}, p_{2}]$ exists in a tree with root $ξ_{b}$ ” and

2) “ $[b, ξ_{b}]$ exists in a tree with root $ξ$ .”

Such hierarchical collections will be very useful for our purposes. For example, the tree with root $ξ$ may have a leaf for every block in a region. The value of the leaf for a block $B_{i}$ may include a root $ξ_{v}$ for a collection of boundary lines for the block.

2.1.6. State-Change Functions

The practical utility of such ADSes is that any entity with access to merely the root s of the hash tree (with such geometric objects as leaves), can verify proof (using VOs) that “a boundary line $(p_{1}, p_{2})$ exists in a polygon represented by hash tree with root s, and that splitting the line into two lines $(p_{1}, p)$ and $(p, p_{2})$ will result in a new commitment $s^{'}$ .”

The ability to enable verification of useful properties regarding process specific data $S (t)$ , and more importantly, update commitments in step with incremental changes to $S (t)$ , can be leveraged using “some external mechanism” to maintain universal consensus on the dynamic commitment $s (t)$ at all times.

Such a mechanism is a blockchain network, where updates to $S (t)$ (and hence, the commitment $s (t)$ ) are due to well-formed transactions. Specifically, different types of transactions may be permitted for a process (for example, splitting a leaf, splitting a bounding box, etc.), each of which results in an easily calculable change to the commitment $s (t)$ .

More generally, given a transaction T of type k, the current commitment $s (t)$ , and a small set of VOs v, any one can reliably calculate the updated commitment as

$s (t_{+}) = F_{k} (s (t), T, v) .$ (10)

In other words, validating the correctness of a state-change function $F_{k} ()$ (for a transaction type k) will call for $\log_{2} N$ hash operations, using $\log_{2} N$ VOs $v$ . In blockchain networks, the transaction T triggering the state-change, and the updated state $s (t_{+})$ after the transaction, are logged in the blockchain ledger.

2.2. Blockchain Networks

A blockchain network is a mechanism that employs a broadcast channel for maintaining consensus on the contents of a blockchain ledger. Only well-formed transactions (also broadcast over the channel) can be added to the ledger. What makes a transaction well-formed will depend on process specific rules.

2.2.1. Ideal vs Practical Blockchain Networks

In an ideal blockchain network, participation is universal. The broadcast channel is ideal—every participant will hear every broadcast at the same time; every participant can then decide on their own, if the transaction is well-formed, and if so, add the transaction to the ledger. Every ledger entry changes the ledger-hash—a cryptographic commitment to the entire ledger. Specifically, if the current ledger with i transactions has a ledger hash $d_{i}$ , then the ledger hash after adding the next transaction $T_{i + 1}$ is $d_{i + 1} = h (d_{i}, T_{i + 1})$ .

If everyone is honest, everyone will maintain identical ledger copies, with the same ledger-hash. Broadcasting the succinct ledger hash makes it easy for every one to confirm that they do indeed maintain identical ledger copies.

In practical blockchain networks a ledger entry is not a transaction, but a commitment to a block of transactions. More specifically, each transaction is seen as a leaf of a hash tree, and the root of the tree is the commitment to the entire block; the ledger hash is thus a commitment to a “chain of block (hashes).”

In practical networks, only a small fraction of users may participate actively at any time. The broadcast channel, typically realized using a multi-hop peer to peer network, is far from ideal. Broadcasts may be heard at different times by different participants depending on the number of hops. Some participants may never hear some broadcasts. Furthermore, we cannot simply assume all participants to be honest. Even honest participants may sometimes have genuine disagreements regarding the correctness of a transaction. Even differences in time-of-arrival of transactions can cause differences between blocks added to different copies of ledgers, thereby creating forks in the ledger.

In spite of the practical limitations, it is possible to use well designed incentive mechanisms [35] to ensure that a) only well-formed transactions are added to the ledger, and b) close to universal consensus is maintained on the correctness of ledger entries. An incentive mechanism has three main goals a) motivate participation to ensure a reasonably large number of participants at all times; b) serve as a mechanism for penalizing ill-formed ledger entries; and c) minimize the chance of long-term survival of a forked ledgers.

2.2.2. Explicit vs Implicit States

A blockchain ledger is a log of state-changes due to transactions. For example, in crypto-currency applications, a transaction $[T (t) : A \to B, c]$ at time t transfers c coins from wallet A to wallet B, reducing wallet A balance by c and increasing wallet B balance by c. The state of the application at any time t can be seen as the remaining balance is every wallet.

For this application, the transaction is considered as well-formed only if a) the transaction is signed by A, and if b) A had c or more coins prior to the transaction. Ill-formed transactions are simply ignored, and thus do not change the state of the application. As an example that is more relevant to this paper, a transaction to split a boundary line $[p_{1}, p_{2}, v]$ into $[p_{1}, p, v]$ and $[p, p_{2}, v]$ is considered as well-formed only if p lies on the line connecting $p_{1}$ and $p_{2}$ (as splitting the line at a redundant point p does not change the shape or area inside the polygon).

In some networks like Bitcoin [36] the process state after each transaction is not made explicit in the transactions. In Ethereum [37], on the other hand, every transaction is accompanied by a cryptographic commitment to the updated state of the application/process affected by the transaction. In this paper, we restrict ourselves to transactions with explicit states.

2.2.3. Progression of Process States

In general, any process $P$ , with process states $S$ , can be executed in a blockchain network, if the process is modeled as state-machine. In such a model, a process $P$ is defined by m types of transactions (or state-change functions). A transaction $T^{j}$ of type j is executed only if $T^{j}$ is well-formed. Execution if $T^{j}$ results in a change in the process state $S$ . The progression of states of process $P$ can be represented as

$S_{0} \overset{T_{1}^{j_{1}}}{\to} S_{1} \overset{T_{2}^{j_{2}}}{\to} S_{2} \dots S_{n - 1} \overset{T_{n}^{j_{n}}}{\to} S_{n}, \dots$ (11)

The utility of ADSes stems from their ability to capture any number of dynamic process states as leaves of a (possibly hierarchical) hash tree, whose root s is a commitment to all process states. The progression of states changes in Equation (11) can equivalently be represented as

$s_{0} \to_{V O_{1}}^{(T_{1}^{j_{1}})} s_{1} \to_{V O_{2}}^{(T_{2}^{j_{2}})} s_{2} \dots s_{n - 1} \to_{V O_{n}}^{(T_{n}^{j_{n}})} s_{n}, \dots,$ (12)

where the VOS accompanying each transaction serves as “proof of correctness” of the state-change.

3. Redistricting Verification Protocol

Given a set of blocks $B_{1} \dots B_{b}$ of a region R and districts $D_{1} \dots D_{d}$ , any solution to the redistricting problem can be seen as a collection of b key-value pairs ${B_{i}, D_{j}},1 \leq i \leq b,1 \leq j \leq d$ . The conditions that need to be satisfied for the solution to be considered as correct are as follows:

1) No block is left unassigned;

2) No block is assigned to more than 1 district;

3) Blocks assigned to a district are contiguous.

While numerous soft restrictions can be imposed, in this paper we shall limit ourselves to equal-population and compactness metric. We shall use the normalized standard deviation of the population as a measure for deviation of population between districts. For the compactness metric we can use the moment of inertia I, or more specifically

$I_{j} = \sum_{j = 1}^{d} \sum_{i = 1}^{n_{j}} d_{i}^{2} A_{i j}$ (13)

where $n_{j}$ is the number of blocks in district $D_{j}$ , $A_{i j}$ is the area of the i^th block in district $D_{j}$ , and $d_{i}$ is the distance between the centroids of district $D_{j}$ and the i^th block.

3.1. ADSes for Blocks, Districts and State

As the input to the process may involve tens of millions of points, and as data stored on general purpose computers are far from secure, the process involves constructing different types of ADSes such that a single commitment can be used to verify the integrity of any geographic data item, and correctness of any transaction performed to change data. For geographic data involving a billion points (or $N \approx 2^{30}$ ), only of the order of $\log_{2} (N) = 30$ VOS are required for verifying proof of integrity of any data item or proof of correctness of any transaction that makes an incremental change to data.

The verification protocol can be seen as consisting of 3 steps,

1) Constructing a block-ADS with b blocks, for validating the inputs, viz., points corresponding to every block.

2) Constructing a district-ADS with d districts, for evaluating a proposal specified by a key-value collection (with b key-value pairs of the form ${B_{i}, D_{j}}$ , conveying that $B_{i}$ is assigned to district $D_{j}$ ).

3) Constructing a “state-level” ADS.

Processes in the first step ensure that the possibly tens of millions of points representing possibly hundreds of thousands of blocks do indeed constitute a set of valid polygonal regions. For a block with n points, $O (n)$ transactions will be called for; the total number of transactions will be proportional to the total number of points in the entire state. Consequently, the first step is possibly the most expensive one. Fortunately, this step will need to be performed only once. Once a block ADS is available for all blocks in the state, any one can access the data and confirm the validity of the block boundaries. The same block ADS can be used for any number of future redistricting attempts, as block boundaries typically remain stable.

The second step is a process for every district in the state. The complexity of such processes will depend on the number of district boundary points, which can be substantially less than the total number of points in the entire state. The purpose of the second step is to ensure that there are no overlaps between blocks included in any district, and to compute various metrics for each district.

The third step is a process for the entire state. The number of transactions for this process will be proportional to the number of state boundary points. The purpose of this is to ensure that districts do not overlap, and to compute overall metric for the state, and verify that the proposed metric is indeed correct.

The second and third steps will need to be repeated for every proposal. However, as mentioned earlier, if there is sufficient deterrent to incorrect proposals, only the solution with the best claimed metric will need to undergo the last 2 steps.

3.1.1. ADS Structures and Notations

The ADSes used at various steps include

1) the block ADS with root $ξ_{B}$ , which is a collection with key-value pairs of the form ${B_{i}, v_{i}}$ ;

2) the District-ADS with root $ξ_{D}$ , also a key-value collection with key-value pairs of the form ${D_{j}, u_{j}}$ ;

3) the “redistricting proposal,” a key-value collection with root $ξ_{B D}$ , with pairs of the form ${B_{i}, D_{j}}$ .

4) a state ADS $ξ_{S}$ with a single leaf $[s]$ .

Values like $v_{i}$ for block $B_{i}$ ( ${B_{i}, v_{i}}$ in tree with root $ξ_{B}$ ) and $u_{j}$ for district $D_{j}$ ( ${D_{j}, v_{j}}$ in tree with root $ξ_{D}$ ) and value s for the state, include several values like area, centroid, population, multiple hash tree roots (for boundary-line collections, bounding box collections, key-value collections), and multiple point values needed to keep track of the progress of the process. Values associated with each polygonal structure like a block or a district or a state are summarized in Table 1.

For a block $B_{i}$ , $v_{i}$ conveys 10 values (in the top 5 rows of Table 1). The value $u_{j}$ of a district $D_{j}$ conveys 14 values (4 additional values $A_{u}, p_{c u}, I$ and $b_{c t}$ ). The single state polygon is associated with 24 values.

In the rest of the paper we shall use the following notations for such values associated with blocks, districts and the state. For example, $v_{i} . p_{b}$ represents a point $p_{b} = (p_{b} . x, p_{b} . y)$ in value $v_{i}$ of block $B_{i}$ (in the block-ADS $ξ_{B}$ ). Tree

Table 1. Values associated with polygonal blocks, districts and state.

roots in value $v_{i}$ are represented as $v_{i} . ξ_{u}$ , $v_{i} . ξ_{v}$ , etc. Similarly, $u_{j} . p_{b}$ , and $u_{j} . ξ_{u}$ represent a point and a tree root in the value $u_{j}$ of district $D_{j}$ , in the district-ADS $ξ_{D}$ . For values associated with the state we shall use notations like $s . p_{e}, s . n, s . ξ_{v}$ , etc.

All ADSes begin as empty trees (or $ξ_{B} = ξ_{D} = ξ_{B D} = ξ_{S} = 0$ ), and are incrementally constructed by blockchain transactions. In the rest of the paper we shall refer to “a tree with root $ξ$ ” as simply “tree $ξ$ .” We shall also use the notation $L \in ξ$ to imply that “L has been verified to exist in the tree with root $ξ$ (using appropriate VOs).” For hierarchical collections such as the block ADS with root $ξ_{B}$ , $L \in ξ_{B} . v_{i} . ξ_{r}$ implies $L \in v_{i} . ξ_{r}$ , where ${B_{i}, v_{i}} \in ξ_{B}$ .

3.2. Validating Polygons

A high level overview of the process for validating a polygonal region can be understood from Figure 3 with an eight sided polygon. By performing 2 types of simple operations, viz., splitting a rectangular bounding box (splits across red lines in the figure), and splitting boundary lines (at 6 bold points in the figure) it is possible to ensure that every boundary line fits wholly inside a rectangular box.

Some restrictions are imposed on mapping boundary lines to bounding boxes [34] to ensure that lines cannot cross each other inside a box, and make it easy to set the value v of the box to convey the lines mapped to the box. The restrictions are 1) no more than 2 lines may be mapped to a bounding box; 2) every line should start and/or stop at a corner of the BB; 3) in boxes with 2 boundary lines, both lines should be adjacent, and meet at a corner of the bounding box.

For the example in Figure 3, 6 transactions for splitting boundary lines at bold dots, and 20 transactions for splitting a bounding box vertically or horizontally (along red lines) will be needed to create conditions necessary for other transactions that map 1 boundary line, or map 2 adjacent boundary lines, to a

Figure 3. Process for validating a polygonal region.

bounding box. For the example in Figure 3, 8 transactions that map a single line, and 3 transactions that map 2 lines (3 red shaded boxes) are called for.

The boundary lines are initially available in the un-validated collection $ξ_{u}$ . When a line from this collection $ξ_{u}$ is shown to fit in a bounding box in $ξ_{r}$ , the line is moved to a collection $ξ_{v}$ of verified boundary lines (i.e., a leaf is deleted from the tree with root $ξ_{u}$ and added to the tree with root $ξ_{v}$ ). If all lines are mapped successfully, $ξ_{u}$ will become empty, and $ξ_{v}$ will hold all verified lines.

Lines have to be mapped sequentially, in the CCW order of points. A point value $p_{s}$ is used to remember the starting point; point values $p_{b}$ and $p_{e}$ keep track of the beginning and end of the last-mapped line (by a previous transaction). To map a line $[p_{1}, p_{2}, v]$ it is necessary that $p_{1} = p_{e}$ (the beginning of the current line is the end of the last-mapped line). An exception is the first line, for which $p_{e} = p_{b} = 0$ , and causes $p_{s} = p_{1}$ to be set. The last line, with $p_{2} = p_{s}$ , will signify completion of mapping.

A transaction that maps a line $[p_{1}, p_{2}, v]$ incrementally computes the area as in Equation (2), and the centroid as in Equation (4). Specifically, the value A and coordinates $(p_{c} . x . p_{c} . y)$ of the centroid are incremented as

$\begin{array}{l} A + = p_{1} . x p_{2} . y - p_{2} . x p_{1} . y \\ p_{c} . x + = (p_{1} . x p_{2} . x) (p_{1} . x p_{2} . y + p_{2} . x p_{1} . y) \\ p_{c} . y + = (p_{1} . y p_{2} . y) (p_{1} . x p_{2} . y + p_{2} . x p_{1} . y) \end{array}$ (14)

Transactions that map 2 adjacent lines $[p_{1}, p_{2}, v]$ and $[p_{2}, p_{3}, v]$ lines will repeat the same computation for the second line. Unless the unverified boundary lines points were specified in the CCW order, the computed area will be negative.

The constraints on mapping lines to bounding boxes prevent two lines from crossing each other inside a box. However, they do not prevent them from crossing at boundary points. Figure 4 illustrates such a situation with a bad polygon ABCD, in which AD and CB cross each other at point E. As shown in the figure, even with the restrictions, the polygon can still be mapped by splitting bounding boxes along dotted lines, to create a corner at E.

Figure 4. Redundant points in “bad” polygons.

Fortunately, unlike redundant points like $D^{'}$ (added by splitting side DA) and $A^{'}$ (by splitting side AB) in the figure, the redundant point E will occur twice in the polygon, as both DA and BC will need to be split at point E to “illegally” map the lines. Imposing a restriction that redundant points should be unique (as a well-formedness condition for transactions that map lines) will make it impossible to map polygons with intersecting sides.

When a line $[p_{1}, p_{2}, v]$ is mapped, the value $p_{b}$ (of the last-mapped-line) can be used to determine if $p_{1}$ is a redundant point (i.e., if $p_{1} = p_{e}$ lies on the line connecting $p_{b}$ and $p_{2}$ ). If so, the point is added to a key-value collection $ξ_{t}$ as a key. If there are duplicate redundant points, the inability to add a redundant point to the key-value collection will cause the transaction to be deemed as ill-formed.

3.2.1. The “Value” of a Bounding Box

The value v in a bounding box $[c_{1}, c_{2}, v]$ can be set by transactions that map 1 or 2 lines to determine which areas inside the box fall inside the polygon [34]. The value v in every box with 1 or 2 mapped lines indicates i) a type $τ$ ; ii) an offset o; and iii) traversal direction (a binary flag) r. The 3 values unambiguously convey which parts of the bounding box (above/below a diagonal, above/below or to the left/right of a side of the box, or inside/outside the wedge shape formed by 2 lines) fall inside/outside the district.

15 types of bounding boxes can be created by mapping lines; type $τ = 0$ with no lines; 2 types (types 1 and 2), where the line is a diagonal (one with positive slope, one with negative slope); 4 types (types 3 to 6) where a vertical or horizontal boundary line is one of the 4 sides of the box; and 8 types (types 7 to 14) with 2 lines in the box; the 8 different types are due to the 4 possible corners where the 2 lines meet, and if the shorter line is above/below the diagonal.

To uniquely identify the coordinates of the mapped line(s) only the type is needed for types 0-6. For types 7-14 (boxes with two lines) we also need an offset corresponding to the shorter line; the offset is $o = 0$ if the shorter line is a side of the box (the offset o for a block labeled f is shown in Figure 3).

Apart from the type $τ$ and offset o, the direction of traversal inside the block is required to identify which region(s) inside the box are inside the polygon. For example, in Figure 3 while both blocks marked a and b have the same type (positive-slope diagonal) the traversal direction in block a makes the region above the diagonal inside the polygon, while in box b the region below the diagonal is inside the polygon. Similarly, in box f the wedge formed by 2 lines is inside the polygon while in block g the wedge is outside the polygon.

3.3. Transactions for Constructing Block ADS

A transaction InitBlock( $B_{i}, k, k_{n}, ξ_{u}, n$ ) adds a k-v pair ${B_{i}, v_{i}}$ for key $B_{i}$ (block identifier), by splitting an existing k-v pair for key k with next key $k_{n}$ . For the first InitBlock() transaction (to add the first block to an empty tree with $ξ_{B} = 0$ ), $k = k_{n} = B_{i}$ .

The value $v_{i}$ in k-v pair ${B_{i}, v_{i}}$ is initialized as follows:

$\begin{array}{l} v_{i} . ξ_{u} = ξ_{u}, v_{i} . n = n \\ v_{i} . ξ_{v} = v_{i} . ξ_{r} = v_{i} . ξ_{t} = 0 \\ v_{i} . A = v_{i} . p_{c} = v_{i} . p_{s} = v_{i} . p_{b} = v_{i} . p_{e} = 0 \end{array}$ (15)

with all values set to 0—except the population n (specified by the transaction) and the root $ξ_{u}$ of unverified boundary lines.

For a region R with b blocks, b InitBlock() transactions will need to be executed to create a leaf for every block in the block ADS.

3.3.1. Validating Polygon B_i

The types of transactions executed in the blockchain for validating any block is the same for every block. The order in which transactions are executed will be block specific (depending on which specific lines/boxes should be split).

Transactions for validating block $B_{i}$ affect only values in $v_{i}$ (in the k-v pair ${B_{i}, v_{i}} \in ξ_{B}$ ). For simplicity of notation, in this section, we shall simply refer to a root $v_{i} . ξ_{r}$ as simply $ξ_{r}$ .

Leaves in tree $ξ_{r}$ are rectangular bounding boxes of the form $[c_{1}, c_{2},0]$ , with lower left corner at $c_{1} = (x_{1}, y_{1})$ and upper right corner at $c_{2} = (x_{2}, y_{2})$ .

Leaves in tree $ξ_{u} θ$ and $ξ_{v}$ are boundary lines of the form $[p_{1}, p_{2}, v]$ .

The tree $ξ_{t}$ is a key-value collection. A leaf $[p_{1}, p_{2}, v] \in ξ_{t}$ , implies existence of keys $p_{1}$ and $p_{2}$ and absence of keys between $p_{1}$ and $p_{2}$ (the value v has no meaning).

The 5 transaction types used for validating blocks are as follows:

1) SplitLine( $B_{i}, p_{1}, p_{2}, p$ ): Split boundary line $[p_{1}, p_{2}, B_{i}] \in v_{i} . ξ_{u}$ at a point p; this transaction is well-formed only if p lies on the line;

2) SplitBB ( $B_{i}, c_{1}, c_{2}, v, d i r$ ): Split BB $[c_{1}, c_{2},0]$ in $v_{i} . ξ_{r}$ vertically at X-coordinate v or horizontally at v (depending on the value dir; 0 for vertical and 1 for horizontal). If the tree is empty, the value v is ignored to create the first leaf $[c_{1}, c_{2},0]$ ;

3) MapLine ( $B_{i}, p_{1}, p_{2}, c_{1}, c_{2}$ ): Delete leaf $[p_{1}, p_{2}, B_{i}] \in v_{i} . ξ_{u}$ and add it to $v_{i} . ξ_{v}$ ; this transaction is valid only if the line falls entirely inside box $[c_{1}, c_{2},0] \in v_{i} . ξ_{r}$ , and $p_{1}$ and/or $p_{2}$ is a corner of the bounding box; furthermore, if $p_{1}$ is found to be redundant, add k-v pair ${p_{1}, x}$ to $ξ_{t}$ (the value x is irrelevant); update area and centroid;

4) Map2Line ( $B_{i}, p_{1}, p_{2}, p_{3}, c_{1}, c_{2}$ ): Move 2 leaves $[p_{1}, p_{2}, B_{i}]$ and $[p_{2}, p_{3}, B_{i}]$ from tree $v_{i} . ξ_{u} θ$ to tree $v_{i} . ξ_{v}$ , if both lines fall entirely inside box $[c_{1}, c_{2},0] \in v_{i} . ξ_{r}$ , and the common point $p_{2}$ is a corner of the box; if $p_{1}$ is found to be redundant, add k-v pair ${p_{1}, x}$ to $ξ_{t}$ ; if $p_{2}$ is found to be redundant, add k-v pair ${p_{2}, x}$ to $ξ_{t}$ ; update area and centroid;

5) Finalize ( $B_{i}$ ) This transaction is invoked to mark completion of verification of the polygon/block $B_{i}$ .

Only if the points in $ξ_{u}$ were specified in the CCW order, will the computed area be positive. A transaction Finalize ( $B_{i}$ ) is well-formed only if $v_{i} . p_{s} = v_{i} . p_{e}$ (mapping completed), $A > 0$ (mapping was performed in the CCW order) and $v_{i} . ξ_{u} = 0$ (all input lines were correct). The value A (area) is divided by 2 (see Equation (2)); values $p_{c} . x$ and $p_{c} . y$ are then divided 6A (see Equation (4)).

This process is repeated for every block to complete the construction of the ADS $ξ_{B}$ .

3.4. Creating and Verifying Proposals

As a redistricting proposal, the proposer submits a claimed metric G, and the commitment $ξ_{B}$ to the input to the process (the block ADS).

A transaction Proposal ( $ξ_{B}, g, m$ ) is invoked, and results in creation of a commitment $ξ$ to 24 values in state-polygon information in s. This initializes 21 values to 0; only 3 values are set to non zero values, using the input to the transaction:

$\begin{array}{l} s . ξ_{B} = ξ_{B} & Completed Block ADS \\ s . G = g & Unverified Metric \\ s . m_{u} = m & Unverified Population Mean \end{array}$ (16)

Just as the block ADS with root $ξ_{B}$ was created to summarize information regarding every block, the process of validating the redistricting proposal results in the creation of

1) a district ADS with root $ξ_{D},$

2) a key value collection $ξ_{B D}$ with block identity as key and district identity as value, and

3) the state ADS (with a single leaf), with root $ξ_{S}$ .

On completion of the process, the district ADS will contain district wide information like area, centroid, population, number of blocks, moment of inertia, etc., for every district, and the statewide metrics will be available in $ξ_{S}$ .

Successful execution of all block validation protocols simply demonstrates that every block is a valid polygon. Successful execution of processes for updating $ξ_{D}$ and $ξ_{B D}$ for every district will confirm that no overlaps exist in any of the blocks in any specific district. This is due to the fact that the district area is computed in 2 ways, by summing up areas of all blocks included in the district, and by “walking” along the boundary of the district (computed using Equation (2)).

However, it still does not rule out possible overlaps between blocks in different districts (and consequently, overlaps between districts). The final process which directly manipulates values in the lone leaf [s] in $ξ_{S}$ involves creation of boundary lines for the entire state to confirm that no overlap exists between districts. Once again, this is achieved by computing the area in 2 ways.

In other words, to prove that $G = G^{'}$ (the correctness of his/her proposal)

1) The proposer will need to execute different types of transactions that create/update the district ADS $ξ_{D}$ and key value collection $ξ_{B D}$ ; this will involve processes for each district, for computing district-wide totals like population, area, moment of inertia, etc.

2) The proposer will then need to execute transactions to verify the boundary lines of the entire state, and compute state-wide totals (total moment of inertia I, and the normalized variance $v_{n}$ , and the final metric $G^{'}$ ).

Every district that has been considered to create state-wise totals (population, area, etc.) will be added to a key-value collection $s . ξ_{d}$ to ensure that no district is double counted. It is in this final process that the a unverified mean population $s . m_{u}$ is used to compute normalized variance $s . v_{n}$ , total moment of inertia, total number of blocks/districts ( $s . b_{c t}$ and $s . d_{c t}$ ) inside the state, and the computed metric $s . G^{'}$ (which should be equal to $s . G$ ). However, by the end of the process a mean $s . m_{v}$ is computed and verified to be the same as unverified $s . m_{u}$ . Once the proposal has been verified, all boundary lines of the state will be in a tree with root $s . ξ_{v}$ .

3.5. District ADS Construction

The district ADS with root $ξ_{D}$ has key-value pairs of the form ${D_{j}, u_{j}}$ ). A transaction InitDistrict ( $D_{j}, k, k_{n}, p_{c u}$ ) is invoked to add a k-v pair for district $D_{j}$ (for the first leaf $k = k_{n} = D_{j}$ ). Only a single value in $u_{j}$ , viz., $u_{j} . p_{c u}$ , is initialized by the transaction as the (unverified) centroid of the district. All other (13) values are set to 0.

During the validation process, the area of the district will be computed incrementally in 2 different ways in $u_{j} . A$ (by MapLine and Map2Lines transactions) and $u_{j} . A_{u}$ , by adding the areas of all blocks added to $ξ_{B D}$ . The unverified centroid in $u_{j} . p_{c u}$ will be ultimately validated by a computed centroid in $u_{j} . p_{c}$ . The moment of inertia will be computed for each district using the unverified centroid $u_{j} . p_{c u}$ , and the computed values of block centroids and block areas in the block ADS $ξ_{B}$ . The value $u_{j} . b_{c t}$ is the count of unique blocks added to the district.

In the block validation process, a leaf for a block $B_{i}$ was initialized with the root $ξ_{u}$ (collection of un-validated block lines). However, a transaction to initialize a leaf for a district $D_{j}$ sets $ξ_{u} = 0$ .

To populate $u_{j} . ξ_{u}$ with outer district lines for $D_{j}$ , the prover will need to choose appropriate lines from block-ADS $ξ_{B}$ . The prover is allowed to choose any validated boundary line $[p_{1}, p_{2}, B_{i}] \in ξ_{B} . v_{i} . ξ_{v}$ , for any block $B_{i}$ , and

1) add the line to $u_{j} . ξ_{u}$ , and

2) add ${B_{i}, D_{j}}$ to k-v collection $ξ_{B D}$ (only if k-v pair does not already exist).

Any number of lines may be added from the same block $B_{i}$ . However, only for the first line from a specific block $B_{i}$ , the k-v pair ${B_{i}, D_{j}}$ is added to $ξ_{B D}$ . For subsequent lines, the existence of ${B_{i}, D_{j}} \in ξ_{B D}$ is merely confirmed. The fresh addition of a key-pair ${B_{i}, D_{j}} \in ξ_{B D}$ is accompanied by the following steps

1) $u_{j} . A_{u}$ is incremented by the computed area of block $B_{i}$ , $ξ_{B} . v_{i} . A$ ;

2) $u_{j} . I$ (moment of inertia) is incremented by using i) the unverified centroid $c_{1} = u_{j} . p_{c}$ of the district, 2) computed centroid $c_{2} = ξ_{B} . v_{i} . p_{c}$ of block $B_{i}$ and 3) computed area $A = ξ_{B} . v_{i} . A$ of block $B_{i}$ . Specifically, $u_{j} . I$ is incremented by $A r^{2}$ where r is the distance between the district centroid $c_{1}$ and block centroid $c_{2}$ ;

3) $u_{j} . b_{c t}$ (number of blocks in the district) is incremented by 1.

The prover can also merge 2 lines $[p_{1}, p_{2}, v]$ and $[p_{2}, p_{3}, v]$ in the tree $u_{j} . ξ_{u}$ of yet-to-be-verified boundary lines into $[p_{1}, p_{3}, v]$ if $p_{2}$ is found to be a redundant point.

The purpose of such transactions are 2 fold: i) to form the boundary line of the district $D_{j}$ in tree $u_{j} . ξ_{u}$ , and 2) include at least one line from every block in the district. The process for validating the district polygon in $u_{j} . ξ_{u}$ will be very similar to the process for validating a block. Unless a line is chosen from every block inside the district, the values $u_{j} . A$ (computed during district validation) and $u_{j} . A_{u}$ (computed by adding areas of unique blocks to $ξ_{B D}$ ) will not match.

Once all necessary border lines have been copied to the tree $u_{j} . ξ_{u}$ , transactions SplitLine(), SplitBB(), MapLine(), Map2Lines() can be used to ensure that the district lines of $D_{j}$ in $u_{j} . ξ_{u}$ correspond to a simple polygon.

However, after all border lines have been mapped to bounding boxes, some internal lines will still remain in $u_{j} . ξ_{u}$ , as at least one line had to be added from every internal block in the district. To be able to remove such lines from $u_{j} . ξ_{u}$ it is essential to demonstrate that such lines fall inside district $D_{i}$ . To make this possible, the transactions MapLine(), Map2Lines() set the value v of bounding box leaves $[c_{1}, c_{2}, v] \in u_{j} . ξ_{r}$ .

For internal bounding boxes with no lines, the value v will not be set by MapLine()/Map2Lines() transactions. A special transaction has to used to set the box value depending on the value of any adjacent box. For example the value of box c can be determined by examining the value of an adjacent box like d or e. With all boxes marked, in this manner, it is now possible to check if a point p lies inside the polygon (district $D_{j}$ ), by examining a single box in $u_{j} . ξ_{r}$ in which point p falls, and examining the value v of the bounding box. This test can be used to throw away remaining lines in $u_{j} . ξ_{u}$ that were mapped from internal blocks, but were not used to create the district boundary.

The transactions for validating a district are as follows: A transaction ChooseLine ( $D_{j}, p_{1}, p_{2}, B_{i}$ )

1) chooses a line $[p_{1}, p_{2}, B_{i}] \in ξ_{B} . v_{i} . ξ_{v}$ from the input ADS (a k-v pair ${B_{i}, v_{i}} \in ξ_{B}$ indicates the root $v_{i} . ξ_{v}$ of a validated boundary tree, in which this leaf $[p_{1}, p_{2}, B_{i}]$ should exist);

2) adds the line to tree $u_{j} . ξ_{u}$ ;

3) checks if ${B_{i}, D_{j}} \in ξ_{B D}$ , and adds the pair if it does not exist;

4) if ${B_{i}, D_{j}}$ was added by the transaction the following additional steps are performed;

a) increment $u_{j} . A_{u}$ using the area of block $B_{i}$ in the block ADS;

b) increment population $u_{j} . n$ by population of block $B_{i}$ in block ADS;

c) increment moment of inertia using unverified district centroid, and computed centroid and area of block $B_{i}$ in block ADS.

After district lines have been input to $u_{j} . ξ_{u}$ , they are validated using the following transactions

1) SplitLine ( $D_{j}, p_{1}, p_{2}, p$ ), for splitting boundary lines in $u_{j} . ξ_{u} .$

2) SplitBB ( $D_{j}, c_{1}, c_{2}, v, d i r$ ), for splitting bounding boxes in $u_{j} . ξ_{r} .$

3) MapLine ( $D_{j}, p_{1}, p_{2}, c_{1}, c_{2}$ ), for moving a line from $u_{j} . ξ_{u}$ to $u_{j} . ξ_{v}$ (subject to mapping constraints), and set the value v of bounding box in $u_{j} . ξ_{r}$ .

4) Map2Lines ( $D_{j}, p_{1}, p_{2}, p_{3}, c_{1}, c_{2}$ ), for moving 2 adjacent lines from $u_{j} . ξ_{u}$ to $u_{j} . ξ_{v}$ (subject to mapping constraints), and set the value v of bounding box in $u_{j} . ξ_{r}$ .

5) InnerBox ( $c_{1}, c_{2}, {c^{'}}_{1}, {c^{'}}_{2}, v^{'}$ ), to set the value of an interior box $[c_{1}, c_{2}, v = 0]$ using the value $v^{'}$ of an adjacent box $[{c^{'}}_{1}, {c^{'}}_{2}, v^{'}] .$

6) RemLine ( $p_{1}, p_{2}, c_{1}, c_{2}$ ) to remove a line with end points $p_{1}, p_{2}$ from the tree $u_{j} . ξ_{u}$ by demonstrating that $p_{1}$ falls inside bounding box $[c_{1}, c_{2}, v]$ , and examining the value v.

Finally, a transaction Finalize ( $D_{j}$ ) is recognized as well-formed only if $u_{j} . p_{s} = u_{j} . p_{e}$ (mapping completed), $A > 0$ (mapping was performed in the CCW order) and $u_{j} . ξ_{u} = 0$ (all input lines were correct). It resets points $u_{j} . p_{s}, u_{j} . p_{b}, u_{j} . p_{e}$ to zero. The value $u_{j} . A$ (area) is divided by 2; values $u_{j} . p_{c} . x$ and $u_{j} . p_{c} . y$ are then divided by 6A. Another requirement for this transaction to be well-formed is that the computed centroid $u_{j} . p_{c}$ should be the same as the unverified value $u_{j} . p_{c u}$ used for computing the moment of inertia $u_{j} . I$ of the district.

On completion of the Finalize ( $D_{j}$ ) transaction for every district, the metrics like moment of inertia and total population are available for each district. At this stage it is guaranteed that no block can appear in multiple districts as the only blocks in the key-value collection $ξ_{B D}$ can be added to a district, and duplicate keys (block identities) are ruled out in a key-value collection.

As the same block cannot be included in two different districts it follows that districts cannot have overlaps. More specifically, overlaps in districts can happen only if there are overlaps between blocks. However, by confirming that the sum of the areas of all blocks in any district is the same as the area of the district, we can conclude that blocks in the same district do not overlap.

To rule out overlaps between blocks in different districts we have one more step—to confirm that the area of the state is the sum of areas of all districts. Just as boundaries of each district were constructed using block boundary lines in the block ADS, the boundary lines of the state can be constructed using boundary lines in the district ADS $ξ_{D}$ .

3.6. Constructing State Boundary Lines

Unlike processes for initialing leaves for each block in $ξ_{B}$ or each district in $ξ_{D}$ , the “tree” $ξ_{S}$ with a single leaf [s] was already initialized by the transaction Proposal() to create a proposal.

A transaction ChooseLine ( $D_{j}, p_{1}, p_{2}$ ) can be invoked to add any line from any district $D_{j}$ (from the verified boundary line tree $ξ_{D} . u_{j} . ξ_{v}$ ). If no entry exists in the k-v collection $s . ξ_{d}$ for key $D_{j}$ , an entry is added. Every time an entry is added in the collection

1) $s . A_{u}$ is incremented by the area of district $D_{j}$ ( $ξ_{D} . u_{j} . A$ );

2) $s . I$ is incremented by the moment of inertia of $D_{j}$ ( $ξ_{D} . u_{j} . I$ );

3) $s . n$ is incremented by $n = ξ_{D} . u_{j} . n$ , the population of district $D_{j}$ , and $Ξ . v_{n}$ is incremented adding ${(n - m_{u})}^{2} / m_{u}$ ;

4) $s . d_{c t}$ , the number of unique districts added to the k-v collection is incremented by 1;

5) $s . b_{c t}$ , the number of unique blocks is incremented by the block-count of the district $ξ_{D} . u_{j} . b_{c t}$ .

After the required lines are added to $s . ξ_{u}$ for creating the state boundary (along with other lines needed to ensure that at least one line from each district is added to $s . ξ_{u}$ ), the following transactions can be used to map lines in $s . ξ_{u}$ to create a verified collection of boundary lines in $s . ξ_{v}$ .

1) SplitLine ( $p_{1}, p_{2}, p \to$ ), for splitting boundary lines in $s . ξ_{u};$

2) SplitBB ( $c_{1}, c_{2}, v, d i r$ ), for splitting bounding boxes in $s . ξ_{r};$

3) MapLine ( $p_{1}, p_{2}, c_{1}, c_{2} \supset$ ), for moving a line from $s . ξ_{u}$ to $s . ξ_{v}$ (subject to mapping constraints), and set the value v of bounding box in $s . ξ_{r};$

4) Map2Lines ( $p_{1}, p_{2}, p_{3}, c_{1}, c_{2}$ ), for moving 2 adjacent lines from $s . ξ_{u}$ to $s . ξ_{v}$ (subject to mapping constraints), and set the value v of bounding box in $s . ξ_{r}$ ;

5) InnerBox ( $c_{1}, c_{2}, {c^{'}}_{1}, {c^{'}}_{2}, v^{'}$ ), to set the value of an interior box $[c_{1}, c_{2}, v = 0]$ in $s . ξ_{r}$ using the value $v^{'}$ of an adjacent box $[{c^{'}}_{1}, {c^{'}}_{2}, v^{'}]$ ;

6) RemLine ( $p_{1}, p_{2}, c_{1}, c_{2}$ ) to remove a line with end points $p_{1}, p_{2}$ from the tree $s . ξ_{u}$ by demonstrating that $p_{1}$ falls inside a box $[c_{1}, c_{2}, v]$ and examining the value v.

Finally, a transaction ReportMetric() is considered well-formed only if $s . G = s . G^{'}$ , and $m_{v} = m_{u}$ where

$m_{u} = m_{v} = Ξ . n / Ξ . d_{c t}$

$G = G^{'} = w_{1} Ξ . I + w_{2} \sqrt{v_{n}}$ (17)

If ReportMetric() is well-formed the commitment to $ξ_{S}$ (to all values in s) is indicated as the state following the transaction. As s includes commitment to all other ADSes, it is also a commitment to all information regarding and all boundary lines, and all bounding boxes, and all other block/district/state specific information in Table 1.

4. Discussions

The strategy proposed in this paper is a significant improvement over a previous strategy in [38]. The salient differences between the two stem from a new strategy for point-location queries proposed in [34].

In [38] the strategy for verifying that there are no intersections between lines of a polygon, used the traditional algorithmic approach (sweep-line) [39] with $O (n \log n)$ complexity for n points. A state-machine model of this process will also call for $O (n \log n)$ state-changes (or transactions). In this paper, the approach of mapping lines to bounding boxes in [34] was used instead. It calls for linear dependency between number of transactions and number of points. The number of operations will depend on how many redundant points will need to be created, and/or the number of bounding box cuts. Our observations with real-world data from [6] suggest a constant factor between 1.5n to 3.2n.

Secondly, apart from being more efficient compared to [38], the end result of the transactions, viz., creation of “map” like structures $ξ_{r}$ for every polygon, can be used for purposes other than redistricting, for example, determine in which block/district a particular point $(x, y)$ falls [34].

Finally, this paper addresses a minor flaw in [38]. Validating the district/state area merely by “walking” around the district/state boundary (incremental area computation using Equation (2)), and by adding up all block/district areas, still opens up the possibility of being able to swap a block inside the district with an equal area block outside the district. By constructing bounding box maps and checking at that least one point from every included block actually lies inside the district, this flaw in [38] is addressed in this paper.

To keep the discussion simple we have focused on a single compactness metric and a single population metric. Additions necessary to substitute metrics are trivial.

5. Conclusions

This paper presented a strategy for blockchain based redistricting to enhance public trust. Every step in the construction of useful geographic ADSes—a transaction—can be verified by anyone to be correct, using VOs that accompany every transaction.

Apart from improving public trust in the redistricting process, the authenticated data structures produced by the process can have utility in several other applications. For example, ADSes of different states could be combined after every redistricting to create nationwide maps.

In the description for validating blocks in Section 3.3 there was no need to set the values of bounding boxes to facilitate reliable point-location queries (point-location capabilities were only needed at district and state levels to throw out internal boundary lines from the collection of un-validated lines). However, it may be advantageous to do so, even at the block level to support further subdivisions of blocks. For example, blocks could be further subdivided into zones, parcels, etc., for use by local governments.

One practical limitation of the proposed approach is that it does not take into account the fact that some districts/states may need to be discontinuous (for example, islands separated by water). This would call for additional transactions to handle such scenarios. Another practical issue not addressed in this paper is the validation of an important nongeographic input—the population of each block. As the process for constructing block ADS should be performed by a relevant government authority, the specific nature of authentication mechanisms is beyond the scope of this paper. Our current efforts include strategies for addressing some practical limitations of the proposed approach. More generally, the approach of merely verifying solutions to NP hard optimization problems in a blockchain network, can be extended to several other application scenarios. Our ongoing research includes other useful application scenarios.

Acknowledgements

Mahalingam Ramkumar was partially funded by the United States Department of Agriculture, Agricultural Research Service (USDA-ARS): 58-0200-0-002, “Advancing Agricultural Research through High Performance Computing”.

Abbreviations

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Crocker, R. (2012) Congressional Redistricting: An Overview. CRS Report for Congress (R42831).
[2]	Altman, M. (1998) Districting Principles and Democratic Representation. Ph.D. Thesis, California Institute of Technology, Pasadena.
[3]	Saxon, J. (2020) Reviving Legislative Avenues for Gerrymandering Reform with a Flexible, Automated Tool. Political Analysis, 28, 372-394. https://doi.org/10.1017/pan.2019.45
[4]	Cohen-Addad, V., Klein, P.N. and Young, N.E. (2018) Balanced Centroidal Power Diagrams for Redistricting. Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, 6-9 November 2018, 389-396. https://doi.org/10.1145/3274895.3274979
[5]	Altman, M. and McDonald, M.P. (2010) The Promise and Perils of Computers in Redistricting. Duke Journal of Constitutional Law & Public Policy, 5, 69-159.
[6]	United States Census Bureau. Topologically Integrated Geographic Encoding and Referencing (TIGER) Database. https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-geodatabase-file.html
[7]	ESRI (1998) ESRI Shapefile Technical Description: An ESRI White Paper. https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
[8]	Rushby, J.M. (1981) Design and Verification of Secure Systems. 8th ACM Symposium on Operating System Principles, Pacific Grove, 14-16 December 1981, 12-21. https://doi.org/10.1145/800216.806586
[9]	Wei, J. and Pu, C. (2005) TOCTOU Vulnerabilities in UNIX-Style File Systems: An Anatomical Study. 4th USENIX Conference on File and Storage Technologies, San Francisco, 13-16 December 2005, 155-167.
[10]	Bright, P. (2018) Meltdown and Spectre: Here’s What Intel, Apple, Microsoft, Others Are Doing about It. Ars Technica.
[11]	De Lucia, M.J. (2017) A Survey on Security Isolation of Virtualization, Containers, and Unikernels. US Army Research Laboratory, ARL-TR-8029.
[12]	Percival, C. (2005) Cache Missing for Fun and Profit. BSDCan. https://www.bsdcan.org/2015/
[13]	Lipp, M., Gruss, D., Spreitzer, R., et al. (2016) ARMageddon: Cache Attacks on Mobile Devices. 25th USENIX Security Symposium, Austin, 10-12 August 2016, 549-564.
[14]	Saini, H., Rao, Y.S. and Panda, T.C. (2012) Cyber-Crimes and Their Impacts: A Review. International Journal of Engineering Research and Applications, 2, 202-209.
[15]	Larochelle, D. and Evans, D. (2001) Statically Detecting Likely Buffer Overflow Vulnerabilities. 10th USENIX Security Symposium, Washington DC, 13-17 August 2001, 177-190.
[16]	Patten, D. (2017) The Evolution to Fileless Malware. http://www.infosecwriters.com/Papers/DPatten Fileless.pdf
[17]	Stewin, P. and Bystrov, I. (2012) Understanding DMA Malware. International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, c, d, 21-41. https://doi.org/10.1007/978-3-642-37300-8_2
[18]	Samyde, D., Skorobogatov, S., Anderson, R. and Quisquater, J.J. (2002) On a New Way to Read Data from Memory. First International IEEE Proceedings of Security in Storage Workshop, 11 December 2002, 65-69.
[19]	Ramkumar, M. (2018) Scalable Computing in a Blockchain. 2018 IEEE 39th Sarnoff Symposium, Newark, NJ, 24-25 September 2018, 1-6. https://doi.org/10.1109/SARNOF.2018.8720499
[20]	Dotan, M., Pignolet, Y.A., Schmid, S., et al. (2021) Survey on Blockchain Networking: Context, State-of-the-Art, Challenges. ACM Computing Surveys, 54, Article No. 107. https://doi.org/10.1145/3453161
[21]	Ramkumar, M. (2018) Executing Large Scale Processes in a Blockchain. Journal of Capital Market Studies, 2, 106-120. https://doi.org/10.1108/JCMS-05-2018-0020
[22]	Hendrix, E.M.T. and Toth, B.G. (2010) Goodness of Optimization Algorithms. In: Introduction to Nonlinear and Global Optimization, Springer, New York, 67-90. https://doi.org/10.1007/978-0-387-88670-1_4
[23]	Braden, B. (1986) The Surveyor’s Area Formula. The College Mathematics Journal, 17, 326-337. https://doi.org/10.1080/07468342.1986.11972974
[24]	Bourke, P. (1988) Calculating the Area and Centroid of a Polygon. http://paulbourke.net/geometry/polygonmesh/
[25]	Reock Jr., E.C. (1961) A Note: Measuring Compactness as a Requirement of Legislative Apportionment. Midwest Journal of Political Science, 5, 70-74. https://doi.org/10.2307/2109043
[26]	Polsby, D. and Popper, R. (1991) The Third Criterion: Compactness as a Procedural Safeguard against Partisan Gerrymandering. Yale Law & Policy Review, 9, 301-353. https://doi.org/10.2139/ssrn.2936284
[27]	Young, H.P. (1988) Measuring the Compactness of Legislative Districts. Legislative Studies Quarterly, 13, 105-115. https://doi.org/10.2307/439947
[28]	Anagnostopoulos, A., Goodrich, M.T. and Tamassia, R. (2001) Persistent Authenticated Dictionaries and Their Applications. Proceedings of the 4th International Conference on Information Security, Malaga, 1-3 October 2001, 379-393. https://doi.org/10.1007/3-540-45439-X_26
[29]	Martel, C., Nuckolls, G., Devanbu, P., Gertz, M., Kwong, A. and Stubblebine, S. (2001) A General Model for Authentic Data Publication. VC Davis Department of Computer Science Technical Report.
[30]	Ramkumar, M. (2014) Symmetric Cryptographic Protocols. Springer, Berlin. https://doi.org/10.1007/978-3-319-07584-6
[31]	Adhikari, N., Bushra, N. and Ramkumar, M. (2019) Complete Merkle Hash Trees for Large Dynamic Spatial Data. 2019 International Conference on Computational Science and Computational Intelligence (CSCI’19), Las Vegas, 5-7 December 2019, 1318-1323. https://doi.org/10.1109/CSCI49370.2019.00246
[32]	Chelladurai, U. and Pandian, S. (2021) HARE: A New Hash-Based Authenticated Reliable and Efficient Modified Merkle Tree Data Structure to Ensure Integrity of Data in the Healthcare Systems. Journal of Ambient Intelligence and Humanized Computing, 1-15. https://doi.org/10.1007/s12652-021-03085-0
[33]	Merkle, R.C. (1987) A Digital Signature Based on a Conventional Encryption Function. In: Pomerance, C., Ed., Advances in Cryptology—CRYPTO’87. CRYPTO 1987. Lecture Notes in Computer Science, Vol. 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48184-2_32
[34]	Adhikari, N. (2020) Authoritative and Unbiased Responses to Geographic Queries. Ph.D. Thesis, Mississippi State University, Starkville.
[35]	Xiao, Y., Zhang, N., Lou, W. and Hou, Y.T. (2020) A Survey of Distributed Consensus Protocols for Blockchain Networks. IEEE Communications Surveys & Tutorials, 22, 1432-1465. https://doi.org/10.1109/COMST.2020.2969706
[36]	Nakamoto, S. (2008) A Peer-to-Peer Electronic Cash System. Bitcoin.org.
[37]	Buterin, V. (2015) A Next-Generation Smart Contract and Decentralized Application Platform. Ethereum White-Paper, 36 p.
[38]	Adhikari, N., Bushra, N. and Ramkumar, M. (2019) Redistricting Using Blockchain Network. 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Los Angeles, 12-14 December 2019, 150-159. https://doi.org/10.1109/TPS-ISA48467.2019.00026
[39]	Shamos, M.I. and Hoey, D. (1976) Geometric Intersection Problems. 17th Annual Symposium on Foundations of Computer Science (SFCS 1976), Houston, 25-27 October 1976, 208-215. https://doi.org/10.1109/SFCS.1976.16

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies