Life of a Computer Scientist: Electronic Voting

As Americans patiently wait for the result of the 2020 POTUS election, I decided to take a crack at solving the electronic voting problem, with a timing that is perhaps a bit irrelevant now but may be useful again in 4 years. Now, this is by no means a survey of the state of the art on electronic voting, since the research has been done in so many years by so many people. This exercise is just for me to kill time.

A naive electronic voting may have accountability issues, namely we don't know if a vote is legitimate, or that all votes have been counted correctly. In order to know if a vote is counted, we need to know how everyone voted, which seems to sacrifice anonymity, unless we come up with an electronic analogy of a paper ballot. Traditionally, a ballot is issued in person at the voting station, or mailed to the voter in advance for mail-in ballots. There is little protection against ballot forgery, but it would be difficult to forge ballots physically at a large scale.

For electronic voting, a ballot certificate would be issued by the county government to the voter upon voter registration. The issuer is responsible for checking voter identification before issuance. This ballot certificate gives the voter the right to vote, and should be kept secret before a vote is cast (anyone in possession of a ballot certificate could vote with it). The certificate has a unique serial number and a cryptographic signature signed by the issuer. The serial number ensures that the ballot could only be used to cast at most one vote, and the signature verifies that the ballot is issued legally by the county. Otherwise the ballot is not tied to the voter identity. The county should maintain a separate list of voters who already registered, so a voter could only ever receive one ballot certificate. Large scale fraud can be detected if a county somehow issued more ballot certificates than its voting population.

Votes are entered into the voting databases along with the ballot certificate. Since the voting database contains no personal identifiable information, its entirety can be downloaded by the public when the election is concluded, and anyone could verify the presence of their vote in the database as well as independently count all of the votes. US population is only ~328 million, so the database is probably ever going to be only a few gigabytes which can comfortably fit onto a smart phone. Ballot verification could still take an hour or two (e.g. ed25519 signature verification rate is 71K/s), but counting should take only a few minutes if not seconds.

The challenge is distributing the database while ensuring integrity, since anyone receiving a copy with parts of the database missing or tempered with would not know it, and would arrive at a different count.

To ensure that all the votes are recorded without being tempered, they need to appear as part of a Merkle Tree (like how git works). Each transaction begins with a base commit hash and stores the content of the voting record (ballot, vote), both of which would derive a new commit hash. The complete voting record (base commit hash, ballot, vote, new commit hash) is returned to the voter as a receipt, and the new hash is also used as the base commit hash for the next transaction. Once a vote is cast, the ballot certificate is no longer a secret.

This simplest form of the Merkle Tree is just a singly linked list, but this type of transaction is very inefficient, since it requires all the votes in the universe to be sequentialized (i.e. only one person can vote at any given time). A tree construction allows transactions to take place in parallel (i.e. many people can cast vote at the same time). First, vote commits are built sequentially as before, but when a good number of votes are collected, they are finalized into a block by recording the final commit hash of the block. Multiple blocks can be built concurrently and then merged into a superblock by creating a commit hash out of all the block hashes. Superblocks can be similarly merged, until the election is closed, at which point the whole election is given a final commit hash.

This way, a voter can verify that their vote is counted in the election if they could trace their vote's commit hash all the way to the final election hash. If a vote is tempered with, the commit hash would change. If a block is tempered with, its block hash would change. If any merges are tempered with, the final election hash would change. Anyone could download the whole voting database and know that they have the complete, untempered copy. They can write software to independently verify the Merkle Tree without trusting the software that generated it.

The electronic ballot certificate could be printed on paper as a 2D barcode, which allows the bearer to cast the vote as a paper ballot. When voting in person, the voting machine would scan the barcode, ask for voter input, conduct the transaction, then print out the receipt with the commit hash. If this 2D barcode is a QR code, it could be scanned by a mobile app for possibly voting by phone, but the risk is that a malware app could steal the ballot and cast the vote without the voter's knowledge.

I believe this system is secure and practical, but I haven't spent too much time agonizing over it to tell for sure. I purposefully avoided mentioning the word "blockchain" because many people use that buzzword without knowing what it really means. Even though a blockchain is built using a Merkle Tree as well, it limits the rate of transactions globally to 5 per second (really low) through proof of work. The proof of work was invented for Bitcoin to impose artificial scarcity to the cryptocurrency. In the context of election, scarcity may be able to mitigate large scale fraud, but this is not a good way for ensuring election integrity.

Life of a Computer Scientist

Thursday, November 5, 2020

Electronic Voting

No comments: