1. Introduction
Redistricting is a process of subdividing a geographical region into districts such that each district meets similarity criteria such as geographic contiguity, equality of population, etc. Some of the basic principles [1] underlying the congressional redistricting process include 1) contiguity of districts, 2) equality of population size, 3) geographic compactness, 4) protection against vote dilution of minorities, 5) preservation of the boundaries of other political subdivisions, 6) promotion of electoral competition, 7) prohibition of partisan considerations, etc.
In the United States, congressional redistricting is performed every 10 years to account for dynamics in population. Specific redistricting principles, however, may differ among states. Congressional redistricting may be performed by a commission of political representatives, or by independent experts. Redistricting by a politically biased committee may lead to what is known as gerrymandering [2]. A gerrymandered redistricting process produces districts that give an advantage [3] [4] to one party over another in elections.
Ideally, redistricting seeks to create equal-population districts that are optimally compact, thereby optimizing some cost. A redistricting plan that maps b blocks to d districts has bd possible solutions, making it an NP-hard problem [5]. While the notion of using sophisticated computer algorithms to automate redistricting to produce a fair district plan is appealing, the public often lacks trust in the computer platforms and algorithms. The lack of trust is not unjustifiable, especially for complex platforms needed to solve NP-hard problems. Sophisticated computer algorithms have been known to be misused to produce visually satisfying, but with subtle deliberate sub-optimal tweaks to create politically biased districts [5].
1.1. Problem Statement
Let the b blocks in a region R be
. Any block
is a polygon with
sides, typically represented by a sequence of
boundary points (or vertices of a polygon) in the counter clockwise (CCW) order, viz.,
(1)
The last point is the same as the first point. Traversing the polygon in the CCW direction leaves the area enclosed by the polygon to the left. Figure 1 depicts a region R with 10 blocks, grouped into 3 districts.
Assigning the b blocks to d districts is a process of creating d groups
among the b blocks, subject to some hard and soft constraints. For our purposes we shall assume that the region is a state, and the districts are congressional districts. Hard constraints dictate that the blocks in a group should be contiguous (a block should share at least one side with another block in the same group), no
Figure 1. A region R with 10 blocks grouped into 3 districts.
overlap between blocks, and no overlap between districts. Some soft constraints include equal-population, compactness, convexity, and proximity of blocks in a district to the “center” of the district. The center, could be a geographic centroid, or the population centroid (which pulls the geographic centroid closer to more populated blocks). Other soft constraints may include consideration of racial mix, annual income, political leanings, etc.
In the United States, census blocks are public data [6], available in shapefile (.shp) format [7]. The shapefiles for states range from 4MB for the District of Colombia, to 760 MB for Texas. The total number of census blocks in the U.S.A is over 6 million. For example, the state of California has
blocks grouped into
districts.
1.1.1. The Challenge
The computing process of interest to us in this paper is required to take possibly up to a million blocks as input, where each block is represented by possibly hundreds or even thousands of points. Each block may also be associated with non geographic data like population, racial mix, average annual income, political leanings, etc. The input will also include a set of weights
applied to various soft metrics to influence the output of the process, viz., the allocation of blocks to each group.
The utility of any computing process is ultimately limited by the extent of trust in the correctness of output of the process. In general, higher the complexity of the platform, the harder it is to ensure that every component of the platform is trustworthy. Given b blocks and d districts there are bd possible ways of grouping the blocks (
for California). Given the enormity of the computational requirement at hand—choosing the best out of bd options—it is far from easy to justify the trust in the correctness of the output.
One approach to enhance trust in the integrity of platforms is to minimize the trusted computing base (TCB), the minimal amount of hardware/software that needs to be trusted [8] to guarantee platform integrity. In von Neumann (VN) computers—which is almost every computing platform today—each step in the execution of any process (a machine instruction) requires memory access. If data previously stored in some memory location at time ts is retrieved later at time tr, it is simply assumed that the memory contents were not modified between times ts and tr. This potential time-of-check-time-of-use (TOCTOU) [9] vulnerability is simply ignored in the VN model, by including external memory inside the TCB.
Unfortunately, trust in external memory is unjustifiable in modern computers with a complex hardware/software stack [10] [11] [12] [13]. A significant number of attacks that exploit this weakness, such as RAM scraping [14], buffer overflow [15], file-less malware [16], DMA malware attacks [17], invasive or semi-invasive attacks [18] etc., have repeatedly demonstrated exploits that illegitimately read/write memory.
1.2. Proposed Solution
The proposed solution is motivated by the fact that one does not have to be concerned about how the computation was actually performed, as long as one can verify the results of the computation. For NP optimization problems, while computing a solution may have exponential complexity, verifying the correctness of a proposed solution is typically a polynomial time algorithm. However, even while verification is substantially easier, the scale of data can make it hard to promote confidence in computing platforms used for verification.
A blockchain network is essentially a computing platform where no component needs to be included inside the TCB [19] [20] [21]. In the proposed approach, the verification algorithm is executed in a blockchain network. Specifically, the verification algorithm computes a goodness metric [22] for any proposed solution.
Anyone can submit a solution. In order to deter submission of incorrect solutions, it can be mandated that each submission should explicitly indicate the final metric, and be accompanied by a stake. The entity submitting the solution will lose the stake if the computed metric does not match the claimed metric. If the stake is high enough to serve as a deterrent, only one proposed solution (the one that claims the best metric) will need to be executed in the blockchain network.
1.3. Organization
The rest of the paper is organized as follows. Section 2 begins with a discussion of various measures of optimality of redistricting plans. Section 2.1 provides a overview of an important component of the proposed solution, viz., hash tree based authenticated data structures (ADSes). Section 2.2 is an overview of blockchain networks as a platform for computing. More specifically, executing a process in a blockchain calls for a state-machine model for the process, where permitted state-changes are well-formed transactions.
Section 3 discusses 3 distinct steps involved in the verification of redistricting plans, and outlines the scope of different types of blockchain transactions in each step. Additional discussions and conclusions are offered in Section 4.
2. Overview
Let D be a simple polygon whose n sides are described by Cartesian coordinates
in the counter clockwise (CCW) direction, along the perimeter of the polygon. The area A [23], the perimeter P, and the coordinates of the centroid
[24] of the polygon D, can be computed incrementally by considering 2 adjacent points at a time, as
(2)
(3)
(4)
Geometric compactness [25] [26] [27] is a commonly used metric for evaluating the suitability of a proposed solution. Several measures of compactness exist for a polygon D with area A and perimeter P:
1). Iso-perimetric ratio, computed as
2) Reock ratio
[25] where
is the area of the smallest circle bounding the polygon D (with area A);
3) The convex hull ratio
where
is the area of the convex hull of D.
For a district D with k blocks, the moment of inertia is computed as
, where
is the distance between the centroid of the district and the centroid of block i. The weight
for block i can be based on the area of the block, or the population of the block, or more often, a combination of both.
2.1. Authenticated Data Structures
An authenticated data structure (ADS) [28] - [32] is a mechanism for computing a cryptographic commitment s to any number of discrete data items in a collection
. Useful ADSes, typically based on hash trees, possess efficient strategies to incrementally update the commitment
, with incremental changes to
.
Commitments are hashes, computed using a cryptographic hash function h(). Given
with pre-image x and digest y, it is impractical from a computational perspective to determine any
satisfying
. One can thus reasonably conclude that digest y was computed after x was chosen, or that x “existed before” its commitment y.
2.1.1. Merkle Hash Tree
A Merkle hash tree [33] is an example of an ADS where the root of the tree (a single hash) is a commitment to a set of dynamic leaves. Figure 2 depicts a tree with
(root of the inverted tree) and
as the 8 leaves of the tree. For a balanced tree with
leaves, there are
nodes in level
, where
is the maximum depth of the balanced tree.
A node
at depth r,
is computed as
(5)
A leaf-node is obtained by hashing the contents of the leaf. In Figure 2, leaf-node
.
Any node at depth r has r complementary nodes, and r ancestor nodes. The 3 blue nodes in the figure are ancestors of node
. The red nodes—the node’s own “sibling” of
, and the siblings
and
of its ancestors—are
Figure 2. A Merkle hash tree of depth
with
leaves.
complementary to
. The complementary nodes of
are the verification objects (VOs) for node
.
2.1.2. Prover-Verifier Protocols
For any one with access to only the root
of the tree, knowledge of VOs of
, viz.,
, constitutes proof of existence of
in the tree. Note that setting
and following it with a sequence of 3 hash operations using the VOs, viz.,
,
,
will yield
(the known root of the tree). Due to the one-way property of cryptographic hash functions, that
exists in the tree, is also proof of existence of leaf
.
If there is a legitimate reason to update
, using the same sequence of hash operations starting with
instead, one can compute the new value of the root, using the same VOs
. It is important to note that VOs of
are unaffected by changes to
.
The need for an update to
may be for purposes of changing the contents of leaf
, or adding a new leaf by increasing depth of
. For example, a new leaf (say)
can be inserted alongside
, making
the new leaf node of
, and
the leaf-node for the new leaf, thus changing
to
. Just as inserting a leaf promotes a leaf node to “a parent of 2 leaf nodes,” deletion of a leaf will result in “a parent of 2 leaf nodes” being demoted to a leaf node.
For a tree with N leaves,
VOs can be used to prove
1) the existence of a specific leaf
, or
2) that the new root is
, corresponding to an update
to a leaf, or
3) that the new root is
, corresponding to insertion/deletion of a leaf.
In prover-verifier protocols based on ADSes, provers store all leaves and nodes. Verifiers are assumed to know only the root
of the tree. Provers can prove specific properties regarding a data collection
(leaves of the tree) against the commitment
(root of the tree). If the total number of leaves is
, the proof takes the form of
VOs. Verifiers can verify the proof by computing
cryptographic hash operations using the
VOs, and comparing the end result to the known commitment
. The proofs also permit verifiers to update the commitment
corresponding to legitimate updates to
(changing the contents of leaves, inserting/deleting leaves, etc.).
2.1.3. Geographic Collections
In the proposed approach, the leaves (of a hash tree/ADS) will correspond to one or more collections of geographic elements like
1) a side of a polygon, specified by 2 points (2 sets of
coordinates), or
2) a rectangular bounding box, also specified by 2 points (lower-left and upper-right corners), or
3) a more general key-value (k-v) collection.
For a geographic point
,
and
are typically latitude and longitude coordinates. Collectively, geographic elements in a collection may represent more complex objects like a polygonal region [34].
Leaves in a boundary line collection are of the form
, representing a boundary line connecting the 2 points. The value v may provide some information about the line. In such collections, an example of a meaningful update is to split a leaf
(representing a boundary line) into 2 leaves as
(6)
at a redundant point p on the line. Such an incremental operation replaces a leaf with 2 leaves (or a new leaf is added to the tree).
In a rectangular bounding box collection, leaves are of the form
, specifying the lower-left corner point
and the upper-right corner point
. A rectangular bounding box can also be split into 2, for example, by a vertical line at X coordinate value
as
(7)
or across a horizontal line with Y-coordinate value
, as
(8)
2.1.4. Key-Value Collections
In a key-value (k-v) collection [30] the leaves are of the form
corresponding to a k-v pair
with next-key
.
The first leaf added to an empty tree should be of the form
. A k-v pair
can be added to the collection only if a leaf
—a k-v pair
with next-key b—already exists in the collection such that
(9)
If the condition above is satisfied, the existing leaf
is replaced with two leaves
1)
: old k-v pair
, but with K as next-key, and
2)
: new k-v pair
with b as next-key.
2.1.5. Hierarchical Hash Trees
ADSes constructed using hash trees can have hierarchical structures. For example, one or more values in a leaf can themselves be a hash tree root.
As an example, assume that the verifier knows only the root
of a hierarchical ADS. Two sets of VOs can be used to prove that
1) “boundary line
exists in a tree with root
” and
2) “
exists in a tree with root
.”
Such hierarchical collections will be very useful for our purposes. For example, the tree with root
may have a leaf for every block in a region. The value of the leaf for a block
may include a root
for a collection of boundary lines for the block.
2.1.6. State-Change Functions
The practical utility of such ADSes is that any entity with access to merely the root s of the hash tree (with such geometric objects as leaves), can verify proof (using VOs) that “a boundary line
exists in a polygon represented by hash tree with root s, and that splitting the line into two lines
and
will result in a new commitment
.”
The ability to enable verification of useful properties regarding process specific data
, and more importantly, update commitments in step with incremental changes to
, can be leveraged using “some external mechanism” to maintain universal consensus on the dynamic commitment
at all times.
Such a mechanism is a blockchain network, where updates to
(and hence, the commitment
) are due to well-formed transactions. Specifically, different types of transactions may be permitted for a process (for example, splitting a leaf, splitting a bounding box, etc.), each of which results in an easily calculable change to the commitment
.
More generally, given a transaction T of type k, the current commitment
, and a small set of VOs v, any one can reliably calculate the updated commitment as
(10)
In other words, validating the correctness of a state-change function
(for a transaction type k) will call for
hash operations, using
VOs
. In blockchain networks, the transaction T triggering the state-change, and the updated state
after the transaction, are logged in the blockchain ledger.
2.2. Blockchain Networks
A blockchain network is a mechanism that employs a broadcast channel for maintaining consensus on the contents of a blockchain ledger. Only well-formed transactions (also broadcast over the channel) can be added to the ledger. What makes a transaction well-formed will depend on process specific rules.
2.2.1. Ideal vs Practical Blockchain Networks
In an ideal blockchain network, participation is universal. The broadcast channel is ideal—every participant will hear every broadcast at the same time; every participant can then decide on their own, if the transaction is well-formed, and if so, add the transaction to the ledger. Every ledger entry changes the ledger-hash—a cryptographic commitment to the entire ledger. Specifically, if the current ledger with i transactions has a ledger hash
, then the ledger hash after adding the next transaction
is
.
If everyone is honest, everyone will maintain identical ledger copies, with the same ledger-hash. Broadcasting the succinct ledger hash makes it easy for every one to confirm that they do indeed maintain identical ledger copies.
In practical blockchain networks a ledger entry is not a transaction, but a commitment to a block of transactions. More specifically, each transaction is seen as a leaf of a hash tree, and the root of the tree is the commitment to the entire block; the ledger hash is thus a commitment to a “chain of block (hashes).”
In practical networks, only a small fraction of users may participate actively at any time. The broadcast channel, typically realized using a multi-hop peer to peer network, is far from ideal. Broadcasts may be heard at different times by different participants depending on the number of hops. Some participants may never hear some broadcasts. Furthermore, we cannot simply assume all participants to be honest. Even honest participants may sometimes have genuine disagreements regarding the correctness of a transaction. Even differences in time-of-arrival of transactions can cause differences between blocks added to different copies of ledgers, thereby creating forks in the ledger.
In spite of the practical limitations, it is possible to use well designed incentive mechanisms [35] to ensure that a) only well-formed transactions are added to the ledger, and b) close to universal consensus is maintained on the correctness of ledger entries. An incentive mechanism has three main goals a) motivate participation to ensure a reasonably large number of participants at all times; b) serve as a mechanism for penalizing ill-formed ledger entries; and c) minimize the chance of long-term survival of a forked ledgers.
2.2.2. Explicit vs Implicit States
A blockchain ledger is a log of state-changes due to transactions. For example, in crypto-currency applications, a transaction
at time t transfers c coins from wallet A to wallet B, reducing wallet A balance by c and increasing wallet B balance by c. The state of the application at any time t can be seen as the remaining balance is every wallet.
For this application, the transaction is considered as well-formed only if a) the transaction is signed by A, and if b) A had c or more coins prior to the transaction. Ill-formed transactions are simply ignored, and thus do not change the state of the application. As an example that is more relevant to this paper, a transaction to split a boundary line
into
and
is considered as well-formed only if p lies on the line connecting
and
(as splitting the line at a redundant point p does not change the shape or area inside the polygon).
In some networks like Bitcoin [36] the process state after each transaction is not made explicit in the transactions. In Ethereum [37], on the other hand, every transaction is accompanied by a cryptographic commitment to the updated state of the application/process affected by the transaction. In this paper, we restrict ourselves to transactions with explicit states.
2.2.3. Progression of Process States
In general, any process
, with process states
, can be executed in a blockchain network, if the process is modeled as state-machine. In such a model, a process
is defined by m types of transactions (or state-change functions). A transaction
of type j is executed only if
is well-formed. Execution if
results in a change in the process state
. The progression of states of process
can be represented as
(11)
The utility of ADSes stems from their ability to capture any number of dynamic process states as leaves of a (possibly hierarchical) hash tree, whose root s is a commitment to all process states. The progression of states changes in Equation (11) can equivalently be represented as
(12)
where the VOS accompanying each transaction serves as “proof of correctness” of the state-change.
3. Redistricting Verification Protocol
Given a set of blocks
of a region R and districts
, any solution to the redistricting problem can be seen as a collection of b key-value pairs
. The conditions that need to be satisfied for the solution to be considered as correct are as follows:
1) No block is left unassigned;
2) No block is assigned to more than 1 district;
3) Blocks assigned to a district are contiguous.
While numerous soft restrictions can be imposed, in this paper we shall limit ourselves to equal-population and compactness metric. We shall use the normalized standard deviation of the population as a measure for deviation of population between districts. For the compactness metric we can use the moment of inertia I, or more specifically
(13)
where
is the number of blocks in district
,
is the area of the ith block in district
, and
is the distance between the centroids of district
and the ith block.
3.1. ADSes for Blocks, Districts and State
As the input to the process may involve tens of millions of points, and as data stored on general purpose computers are far from secure, the process involves constructing different types of ADSes such that a single commitment can be used to verify the integrity of any geographic data item, and correctness of any transaction performed to change data. For geographic data involving a billion points (or
), only of the order of
VOS are required for verifying proof of integrity of any data item or proof of correctness of any transaction that makes an incremental change to data.
The verification protocol can be seen as consisting of 3 steps,
1) Constructing a block-ADS with b blocks, for validating the inputs, viz., points corresponding to every block.
2) Constructing a district-ADS with d districts, for evaluating a proposal specified by a key-value collection (with b key-value pairs of the form
, conveying that
is assigned to district
).
3) Constructing a “state-level” ADS.
Processes in the first step ensure that the possibly tens of millions of points representing possibly hundreds of thousands of blocks do indeed constitute a set of valid polygonal regions. For a block with n points,
transactions will be called for; the total number of transactions will be proportional to the total number of points in the entire state. Consequently, the first step is possibly the most expensive one. Fortunately, this step will need to be performed only once. Once a block ADS is available for all blocks in the state, any one can access the data and confirm the validity of the block boundaries. The same block ADS can be used for any number of future redistricting attempts, as block boundaries typically remain stable.
The second step is a process for every district in the state. The complexity of such processes will depend on the number of district boundary points, which can be substantially less than the total number of points in the entire state. The purpose of the second step is to ensure that there are no overlaps between blocks included in any district, and to compute various metrics for each district.
The third step is a process for the entire state. The number of transactions for this process will be proportional to the number of state boundary points. The purpose of this is to ensure that districts do not overlap, and to compute overall metric for the state, and verify that the proposed metric is indeed correct.
The second and third steps will need to be repeated for every proposal. However, as mentioned earlier, if there is sufficient deterrent to incorrect proposals, only the solution with the best claimed metric will need to undergo the last 2 steps.
3.1.1. ADS Structures and Notations
The ADSes used at various steps include
1) the block ADS with root
, which is a collection with key-value pairs of the form
;
2) the District-ADS with root
, also a key-value collection with key-value pairs of the form
;
3) the “redistricting proposal,” a key-value collection with root
, with pairs of the form
.
4) a state ADS
with a single leaf
.
Values like
for block
(
in tree with root
) and
for district
(
in tree with root
) and value s for the state, include several values like area, centroid, population, multiple hash tree roots (for boundary-line collections, bounding box collections, key-value collections), and multiple point values needed to keep track of the progress of the process. Values associated with each polygonal structure like a block or a district or a state are summarized in Table 1.
For a block
,
conveys 10 values (in the top 5 rows of Table 1). The value
of a district
conveys 14 values (4 additional values
and
). The single state polygon is associated with 24 values.
In the rest of the paper we shall use the following notations for such values associated with blocks, districts and the state. For example,
represents a point
in value
of block
(in the block-ADS
). Tree
Table 1. Values associated with polygonal blocks, districts and state.
roots in value
are represented as
,
, etc. Similarly,
, and
represent a point and a tree root in the value
of district
, in the district-ADS
. For values associated with the state we shall use notations like
, etc.
All ADSes begin as empty trees (or
), and are incrementally constructed by blockchain transactions. In the rest of the paper we shall refer to “a tree with root
” as simply “tree
.” We shall also use the notation
to imply that “L has been verified to exist in the tree with root
(using appropriate VOs).” For hierarchical collections such as the block ADS with root
,
implies
, where
.
3.2. Validating Polygons
A high level overview of the process for validating a polygonal region can be understood from Figure 3 with an eight sided polygon. By performing 2 types of simple operations, viz., splitting a rectangular bounding box (splits across red lines in the figure), and splitting boundary lines (at 6 bold points in the figure) it is possible to ensure that every boundary line fits wholly inside a rectangular box.
Some restrictions are imposed on mapping boundary lines to bounding boxes [34] to ensure that lines cannot cross each other inside a box, and make it easy to set the value v of the box to convey the lines mapped to the box. The restrictions are 1) no more than 2 lines may be mapped to a bounding box; 2) every line should start and/or stop at a corner of the BB; 3) in boxes with 2 boundary lines, both lines should be adjacent, and meet at a corner of the bounding box.
For the example in Figure 3, 6 transactions for splitting boundary lines at bold dots, and 20 transactions for splitting a bounding box vertically or horizontally (along red lines) will be needed to create conditions necessary for other transactions that map 1 boundary line, or map 2 adjacent boundary lines, to a
Figure 3. Process for validating a polygonal region.
bounding box. For the example in Figure 3, 8 transactions that map a single line, and 3 transactions that map 2 lines (3 red shaded boxes) are called for.
The boundary lines are initially available in the un-validated collection
. When a line from this collection
is shown to fit in a bounding box in
, the line is moved to a collection
of verified boundary lines (i.e., a leaf is deleted from the tree with root
and added to the tree with root
). If all lines are mapped successfully,
will become empty, and
will hold all verified lines.
Lines have to be mapped sequentially, in the CCW order of points. A point value
is used to remember the starting point; point values
and
keep track of the beginning and end of the last-mapped line (by a previous transaction). To map a line
it is necessary that
(the beginning of the current line is the end of the last-mapped line). An exception is the first line, for which
, and causes
to be set. The last line, with
, will signify completion of mapping.
A transaction that maps a line
incrementally computes the area as in Equation (2), and the centroid as in Equation (4). Specifically, the value A and coordinates
of the centroid are incremented as
(14)
Transactions that map 2 adjacent lines
and
lines will repeat the same computation for the second line. Unless the unverified boundary lines points were specified in the CCW order, the computed area will be negative.
The constraints on mapping lines to bounding boxes prevent two lines from crossing each other inside a box. However, they do not prevent them from crossing at boundary points. Figure 4 illustrates such a situation with a bad polygon ABCD, in which AD and CB cross each other at point E. As shown in the figure, even with the restrictions, the polygon can still be mapped by splitting bounding boxes along dotted lines, to create a corner at E.
Figure 4. Redundant points in “bad” polygons.
Fortunately, unlike redundant points like
(added by splitting side DA) and
(by splitting side AB) in the figure, the redundant point E will occur twice in the polygon, as both DA and BC will need to be split at point E to “illegally” map the lines. Imposing a restriction that redundant points should be unique (as a well-formedness condition for transactions that map lines) will make it impossible to map polygons with intersecting sides.
When a line
is mapped, the value
(of the last-mapped-line) can be used to determine if
is a redundant point (i.e., if
lies on the line connecting
and
). If so, the point is added to a key-value collection
as a key. If there are duplicate redundant points, the inability to add a redundant point to the key-value collection will cause the transaction to be deemed as ill-formed.
3.2.1. The “Value” of a Bounding Box
The value v in a bounding box
can be set by transactions that map 1 or 2 lines to determine which areas inside the box fall inside the polygon [34]. The value v in every box with 1 or 2 mapped lines indicates i) a type
; ii) an offset o; and iii) traversal direction (a binary flag) r. The 3 values unambiguously convey which parts of the bounding box (above/below a diagonal, above/below or to the left/right of a side of the box, or inside/outside the wedge shape formed by 2 lines) fall inside/outside the district.
15 types of bounding boxes can be created by mapping lines; type
with no lines; 2 types (types 1 and 2), where the line is a diagonal (one with positive slope, one with negative slope); 4 types (types 3 to 6) where a vertical or horizontal boundary line is one of the 4 sides of the box; and 8 types (types 7 to 14) with 2 lines in the box; the 8 different types are due to the 4 possible corners where the 2 lines meet, and if the shorter line is above/below the diagonal.
To uniquely identify the coordinates of the mapped line(s) only the type is needed for types 0-6. For types 7-14 (boxes with two lines) we also need an offset corresponding to the shorter line; the offset is
if the shorter line is a side of the box (the offset o for a block labeled f is shown in Figure 3).
Apart from the type
and offset o, the direction of traversal inside the block is required to identify which region(s) inside the box are inside the polygon. For example, in Figure 3 while both blocks marked a and b have the same type (positive-slope diagonal) the traversal direction in block a makes the region above the diagonal inside the polygon, while in box b the region below the diagonal is inside the polygon. Similarly, in box f the wedge formed by 2 lines is inside the polygon while in block g the wedge is outside the polygon.
3.3. Transactions for Constructing Block ADS
A transaction InitBlock(
) adds a k-v pair
for key
(block identifier), by splitting an existing k-v pair for key k with next key
. For the first InitBlock() transaction (to add the first block to an empty tree with
),
.
The value
in k-v pair
is initialized as follows:
(15)
with all values set to 0—except the population n (specified by the transaction) and the root
of unverified boundary lines.
For a region R with b blocks, b InitBlock() transactions will need to be executed to create a leaf for every block in the block ADS.
3.3.1. Validating Polygon Bi
The types of transactions executed in the blockchain for validating any block is the same for every block. The order in which transactions are executed will be block specific (depending on which specific lines/boxes should be split).
Transactions for validating block
affect only values in
(in the k-v pair
). For simplicity of notation, in this section, we shall simply refer to a root
as simply
.
Leaves in tree
are rectangular bounding boxes of the form
, with lower left corner at
and upper right corner at
.
Leaves in tree
and
are boundary lines of the form
.
The tree
is a key-value collection. A leaf
, implies existence of keys
and
and absence of keys between
and
(the value v has no meaning).
The 5 transaction types used for validating blocks are as follows:
1) SplitLine(
): Split boundary line
at a point p; this transaction is well-formed only if p lies on the line;
2) SplitBB (
): Split BB
in
vertically at X-coordinate v or horizontally at v (depending on the value dir; 0 for vertical and 1 for horizontal). If the tree is empty, the value v is ignored to create the first leaf
;
3) MapLine (
): Delete leaf
and add it to
; this transaction is valid only if the line falls entirely inside box
, and
and/or
is a corner of the bounding box; furthermore, if
is found to be redundant, add k-v pair
to
(the value x is irrelevant); update area and centroid;
4) Map2Line (
): Move 2 leaves
and
from tree
to tree
, if both lines fall entirely inside box
, and the common point
is a corner of the box; if
is found to be redundant, add k-v pair
to
; if
is found to be redundant, add k-v pair
to
; update area and centroid;
5) Finalize (
) This transaction is invoked to mark completion of verification of the polygon/block
.
Only if the points in
were specified in the CCW order, will the computed area be positive. A transaction Finalize (
) is well-formed only if
(mapping completed),
(mapping was performed in the CCW order) and
(all input lines were correct). The value A (area) is divided by 2 (see Equation (2)); values
and
are then divided 6A (see Equation (4)).
This process is repeated for every block to complete the construction of the ADS
.
3.4. Creating and Verifying Proposals
As a redistricting proposal, the proposer submits a claimed metric G, and the commitment
to the input to the process (the block ADS).
A transaction Proposal (
) is invoked, and results in creation of a commitment
to 24 values in state-polygon information in s. This initializes 21 values to 0; only 3 values are set to non zero values, using the input to the transaction:
(16)
Just as the block ADS with root
was created to summarize information regarding every block, the process of validating the redistricting proposal results in the creation of
1) a district ADS with root
2) a key value collection
with block identity as key and district identity as value, and
3) the state ADS (with a single leaf), with root
.
On completion of the process, the district ADS will contain district wide information like area, centroid, population, number of blocks, moment of inertia, etc., for every district, and the statewide metrics will be available in
.
Successful execution of all block validation protocols simply demonstrates that every block is a valid polygon. Successful execution of processes for updating
and
for every district will confirm that no overlaps exist in any of the blocks in any specific district. This is due to the fact that the district area is computed in 2 ways, by summing up areas of all blocks included in the district, and by “walking” along the boundary of the district (computed using Equation (2)).
However, it still does not rule out possible overlaps between blocks in different districts (and consequently, overlaps between districts). The final process which directly manipulates values in the lone leaf [s] in
involves creation of boundary lines for the entire state to confirm that no overlap exists between districts. Once again, this is achieved by computing the area in 2 ways.
In other words, to prove that
(the correctness of his/her proposal)
1) The proposer will need to execute different types of transactions that create/update the district ADS
and key value collection
; this will involve processes for each district, for computing district-wide totals like population, area, moment of inertia, etc.
2) The proposer will then need to execute transactions to verify the boundary lines of the entire state, and compute state-wide totals (total moment of inertia I, and the normalized variance
, and the final metric
).
Every district that has been considered to create state-wise totals (population, area, etc.) will be added to a key-value collection
to ensure that no district is double counted. It is in this final process that the a unverified mean population
is used to compute normalized variance
, total moment of inertia, total number of blocks/districts (
and
) inside the state, and the computed metric
(which should be equal to
). However, by the end of the process a mean
is computed and verified to be the same as unverified
. Once the proposal has been verified, all boundary lines of the state will be in a tree with root
.
3.5. District ADS Construction
The district ADS with root
has key-value pairs of the form
). A transaction InitDistrict (
) is invoked to add a k-v pair for district
(for the first leaf
). Only a single value in
, viz.,
, is initialized by the transaction as the (unverified) centroid of the district. All other (13) values are set to 0.
During the validation process, the area of the district will be computed incrementally in 2 different ways in
(by MapLine and Map2Lines transactions) and
, by adding the areas of all blocks added to
. The unverified centroid in
will be ultimately validated by a computed centroid in
. The moment of inertia will be computed for each district using the unverified centroid
, and the computed values of block centroids and block areas in the block ADS
. The value
is the count of unique blocks added to the district.
In the block validation process, a leaf for a block
was initialized with the root
(collection of un-validated block lines). However, a transaction to initialize a leaf for a district
sets
.
To populate
with outer district lines for
, the prover will need to choose appropriate lines from block-ADS
. The prover is allowed to choose any validated boundary line
, for any block
, and
1) add the line to
, and
2) add
to k-v collection
(only if k-v pair does not already exist).
Any number of lines may be added from the same block
. However, only for the first line from a specific block
, the k-v pair
is added to
. For subsequent lines, the existence of
is merely confirmed. The fresh addition of a key-pair
is accompanied by the following steps
1)
is incremented by the computed area of block
,
;
2)
(moment of inertia) is incremented by using i) the unverified centroid
of the district, 2) computed centroid
of block
and 3) computed area
of block
. Specifically,
is incremented by
where r is the distance between the district centroid
and block centroid
;
3)
(number of blocks in the district) is incremented by 1.
The prover can also merge 2 lines
and
in the tree
of yet-to-be-verified boundary lines into
if
is found to be a redundant point.
The purpose of such transactions are 2 fold: i) to form the boundary line of the district
in tree
, and 2) include at least one line from every block in the district. The process for validating the district polygon in
will be very similar to the process for validating a block. Unless a line is chosen from every block inside the district, the values
(computed during district validation) and
(computed by adding areas of unique blocks to
) will not match.
Once all necessary border lines have been copied to the tree
, transactions SplitLine(), SplitBB(), MapLine(), Map2Lines() can be used to ensure that the district lines of
in
correspond to a simple polygon.
However, after all border lines have been mapped to bounding boxes, some internal lines will still remain in
, as at least one line had to be added from every internal block in the district. To be able to remove such lines from
it is essential to demonstrate that such lines fall inside district
. To make this possible, the transactions MapLine(), Map2Lines() set the value v of bounding box leaves
.
For internal bounding boxes with no lines, the value v will not be set by MapLine()/Map2Lines() transactions. A special transaction has to used to set the box value depending on the value of any adjacent box. For example the value of box c can be determined by examining the value of an adjacent box like d or e. With all boxes marked, in this manner, it is now possible to check if a point p lies inside the polygon (district
), by examining a single box in
in which point p falls, and examining the value v of the bounding box. This test can be used to throw away remaining lines in
that were mapped from internal blocks, but were not used to create the district boundary.
The transactions for validating a district are as follows: A transaction ChooseLine (
)
1) chooses a line
from the input ADS (a k-v pair
indicates the root
of a validated boundary tree, in which this leaf
should exist);
2) adds the line to tree
;
3) checks if
, and adds the pair if it does not exist;
4) if
was added by the transaction the following additional steps are performed;
a) increment
using the area of block
in the block ADS;
b) increment population
by population of block
in block ADS;
c) increment moment of inertia using unverified district centroid, and computed centroid and area of block
in block ADS.
After district lines have been input to
, they are validated using the following transactions
1) SplitLine (
), for splitting boundary lines in
2) SplitBB (
), for splitting bounding boxes in
3) MapLine (
), for moving a line from
to
(subject to mapping constraints), and set the value v of bounding box in
.
4) Map2Lines (
), for moving 2 adjacent lines from
to
(subject to mapping constraints), and set the value v of bounding box in
.
5) InnerBox (
), to set the value of an interior box
using the value
of an adjacent box
6) RemLine (
) to remove a line with end points
from the tree
by demonstrating that
falls inside bounding box
, and examining the value v.
Finally, a transaction Finalize (
) is recognized as well-formed only if
(mapping completed),
(mapping was performed in the CCW order) and
(all input lines were correct). It resets points
to zero. The value
(area) is divided by 2; values
and
are then divided by 6A. Another requirement for this transaction to be well-formed is that the computed centroid
should be the same as the unverified value
used for computing the moment of inertia
of the district.
On completion of the Finalize (
) transaction for every district, the metrics like moment of inertia and total population are available for each district. At this stage it is guaranteed that no block can appear in multiple districts as the only blocks in the key-value collection
can be added to a district, and duplicate keys (block identities) are ruled out in a key-value collection.
As the same block cannot be included in two different districts it follows that districts cannot have overlaps. More specifically, overlaps in districts can happen only if there are overlaps between blocks. However, by confirming that the sum of the areas of all blocks in any district is the same as the area of the district, we can conclude that blocks in the same district do not overlap.
To rule out overlaps between blocks in different districts we have one more step—to confirm that the area of the state is the sum of areas of all districts. Just as boundaries of each district were constructed using block boundary lines in the block ADS, the boundary lines of the state can be constructed using boundary lines in the district ADS
.
3.6. Constructing State Boundary Lines
Unlike processes for initialing leaves for each block in
or each district in
, the “tree”
with a single leaf [s] was already initialized by the transaction Proposal() to create a proposal.
A transaction ChooseLine (
) can be invoked to add any line from any district
(from the verified boundary line tree
). If no entry exists in the k-v collection
for key
, an entry is added. Every time an entry is added in the collection
1)
is incremented by the area of district
(
);
2)
is incremented by the moment of inertia of
(
);
3)
is incremented by
, the population of district
, and
is incremented adding
;
4)
, the number of unique districts added to the k-v collection is incremented by 1;
5)
, the number of unique blocks is incremented by the block-count of the district
.
After the required lines are added to
for creating the state boundary (along with other lines needed to ensure that at least one line from each district is added to
), the following transactions can be used to map lines in
to create a verified collection of boundary lines in
.
1) SplitLine (
), for splitting boundary lines in
2) SplitBB (
), for splitting bounding boxes in
3) MapLine (
), for moving a line from
to
(subject to mapping constraints), and set the value v of bounding box in
4) Map2Lines (
), for moving 2 adjacent lines from
to
(subject to mapping constraints), and set the value v of bounding box in
;
5) InnerBox (
), to set the value of an interior box
in
using the value
of an adjacent box
;
6) RemLine (
) to remove a line with end points
from the tree
by demonstrating that
falls inside a box
and examining the value v.
Finally, a transaction ReportMetric() is considered well-formed only if
, and
where
(17)
If ReportMetric() is well-formed the commitment to
(to all values in s) is indicated as the state following the transaction. As s includes commitment to all other ADSes, it is also a commitment to all information regarding and all boundary lines, and all bounding boxes, and all other block/district/state specific information in Table 1.
4. Discussions
The strategy proposed in this paper is a significant improvement over a previous strategy in [38]. The salient differences between the two stem from a new strategy for point-location queries proposed in [34].
In [38] the strategy for verifying that there are no intersections between lines of a polygon, used the traditional algorithmic approach (sweep-line) [39] with
complexity for n points. A state-machine model of this process will also call for
state-changes (or transactions). In this paper, the approach of mapping lines to bounding boxes in [34] was used instead. It calls for linear dependency between number of transactions and number of points. The number of operations will depend on how many redundant points will need to be created, and/or the number of bounding box cuts. Our observations with real-world data from [6] suggest a constant factor between 1.5n to 3.2n.
Secondly, apart from being more efficient compared to [38], the end result of the transactions, viz., creation of “map” like structures
for every polygon, can be used for purposes other than redistricting, for example, determine in which block/district a particular point
falls [34].
Finally, this paper addresses a minor flaw in [38]. Validating the district/state area merely by “walking” around the district/state boundary (incremental area computation using Equation (2)), and by adding up all block/district areas, still opens up the possibility of being able to swap a block inside the district with an equal area block outside the district. By constructing bounding box maps and checking at that least one point from every included block actually lies inside the district, this flaw in [38] is addressed in this paper.
To keep the discussion simple we have focused on a single compactness metric and a single population metric. Additions necessary to substitute metrics are trivial.
5. Conclusions
This paper presented a strategy for blockchain based redistricting to enhance public trust. Every step in the construction of useful geographic ADSes—a transaction—can be verified by anyone to be correct, using VOs that accompany every transaction.
Apart from improving public trust in the redistricting process, the authenticated data structures produced by the process can have utility in several other applications. For example, ADSes of different states could be combined after every redistricting to create nationwide maps.
In the description for validating blocks in Section 3.3 there was no need to set the values of bounding boxes to facilitate reliable point-location queries (point-location capabilities were only needed at district and state levels to throw out internal boundary lines from the collection of un-validated lines). However, it may be advantageous to do so, even at the block level to support further subdivisions of blocks. For example, blocks could be further subdivided into zones, parcels, etc., for use by local governments.
One practical limitation of the proposed approach is that it does not take into account the fact that some districts/states may need to be discontinuous (for example, islands separated by water). This would call for additional transactions to handle such scenarios. Another practical issue not addressed in this paper is the validation of an important nongeographic input—the population of each block. As the process for constructing block ADS should be performed by a relevant government authority, the specific nature of authentication mechanisms is beyond the scope of this paper. Our current efforts include strategies for addressing some practical limitations of the proposed approach. More generally, the approach of merely verifying solutions to NP hard optimization problems in a blockchain network, can be extended to several other application scenarios. Our ongoing research includes other useful application scenarios.
Acknowledgements
Mahalingam Ramkumar was partially funded by the United States Department of Agriculture, Agricultural Research Service (USDA-ARS): 58-0200-0-002, “Advancing Agricultural Research through High Performance Computing”.
Abbreviations