Life of a Computer Scientist: 2020

Saturday, November 28, 2020

My notes for setting up a new Yubikey 5

A hardware crypto token such as Yubikey is not meant to be used forever. A new release would address old vulnerabilities and add new crypto support. The firmware is not upgradable (for security reasons), so new features and fixing vulnerabilities always require the key to be replaced. It is already a toil for anyone to rotate their hardware token with all the servers they might have used it with for authentication. If they had used a hardware token to encrypt data for long term archival, the hardware token could not be disposed without re-encrypting all data. Cloud service providers can afford to engineer a solution that uses static derived keys encrypted by a master key that could be more easily rotated, but I'm not aware of off-the-shelf solutions for individuals. As such, I do not recommend using hardware tokens such as Yubikey to encrypt archived data.

Even though I don't use Yubikey for encryption, there is still considerable effort to set it up for authentication which is my primary use for it. In this article, I will explain the feature landscape of different crypto key types, followed by a step by step instruction how I configured my key. Hopefully it helps you decide if this is the right time to start using Yubikey or upgrade your existing key, and if it is the right time, how to do it.

Background

My own motivation is to replace my RSA key with ECC key because ECC 256-bit key is shorter and has roughly the same strength as RSA 3072 bit key (here is a rationale that I endorse). However, the NIST P-curves are not safe. There are some criteria for establishing the safety of various ECC curves, courtesy to the author of curve25519.

First of all, a cautionary tale: don't order Yubikey 5 from Amazon. I made the mistake of ordering one there in April 2020, and I got a Yubikey with an old 5.1.2 firmware (it wasn't discounted). The new firmware 5.2.3 was released in August 2019, a handsome 8 months ago. Amazon also refused to post my warning about the old firmware as a review because they said it pertains only to a specific supplier, not the product.

Starting from firmware 5.2.3, Yubikey implements OpenPGP specification 3.4 which added support for ed25519 keys, among other ECC types. However, NIST P-curves (known as "ecdsa" in SSH) and ed25519 are the only new ones that work with SSH. The new firmware also added OpenPGP attestation which certifies that a key is generated on chip, and whether touch is required to use the key (attestation was first introduced in U2F).

The old 5.1.2 firmware lacked ed25519 support. It was to replace my Yubikey 4 which generated weak RSA keys. Before that, I had a Yubikey NEO-n which lacked touch support for OpenPGP even though U2F always required touch. Yubico is already working on implementing biometric touch for the next generation Yubikey.

Also, the software tools provided by Yubico changed over time. The previous generation tools Yubikey NEO Manager and Yubikey Personalization Tool have been deprecated and replaced with Yubikey Manager. It is worth noting that the GUI version only lets you configure FIDO2 and PIV, not OpenPGP. However, a CLI version is bundled with the GUI version, and the CLI has the OpenPGP configuration options. At one point, it was only possible to configure OpenPGP touch policy with a third-party script yubitouch, but it should not be used anymore.

That said, the entire ecosystem around using OpenPGP hardware key to login over SSH is pretty fragile in my opinion. It involved several parties to be perfectly aligned: Achim Pietig who rectifies the OpenPGP specification, the GnuPG team which is mostly just Werner Koch and sometimes NIIBE Yutaka (who designed the Gniibe Gnuk key), Yubico and other hardware vendors, the OpenSSH developers who decide which new key types to support, and the various distributions that decide which versions of GnuPG and OpenSSH are available to the end user. Since OpenSSH 8.1 added FIDO/U2F support, once this is rolled out to the various OS distributions, OpenPGP should not be used with SSH anymore. OpenSSH 8.4 was only accepted into Debian Buster backports two days ago, and before that Buster only had OpenSSH 7.9. Mac OS X Big Sur finally has OpenSSH 8.1. Both client and server need to support U2F over SSH in order to work.

Step by Step Guide

Without further ado, here is what I did to set up my Yubikey 5 on Mac OS X.

First, make an alias for the ykman command. The slash escape is needed in order for the alias to observe the whitespace in the path.

$ alias ykman='/Applications/YubiKey\ Manager.app/Contents/MacOS/ykman'

Disable Yubico OTP, otherwise accidental touches will cause the passcode to be typed as if someone was typing it on the keyboard. It appears as a string of random characters.

$ ykman config usb --disable OTP

Initialize the card first with gpg --card-edit before configuring OpenPGP touch policy. That's because generating new keys will cause the touch policy to be reset to OFF. Here is a condensed recipe followed by some explanations.

$ gpg --card-edit
> admin
> factory-reset    # see NOTE(1)
> kdf-setup        # see NOTE(2)
> passwd           # see NOTE(3)
> key-attr         # see NOTE(4)
   (2) ECC
   (1) Curve 25519
> generate         # see NOTE(5)
Make off-card backup of encryption key? (Y/n) n
         0 = key does not expire
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y    # see NOTE(6)

# Optionally:
> name
> url
> login

Notes:

factory-reset does not require PIN. It only resets the OpenPGP applet and will leave FIDO U2F intact. However, it does not reset the OpenPGP attestation touch policy. This only needs to be done when I f'd up and need to start over.
kdf-setup prevents plain text PIN from being sent over USB.
first change Admin PIN (3), then change PIN (1). Note that PIN does not have to be numeric. I have my Admin PIN set to my login passphrase. Many key configuring operations require both PIN and Admin PIN, and Admin PIN can be used to override PIN, so both must be set for security reasons.
key-attr will repeatedly prompt for signature, encryption, and authentication keys. It will ask for Admin PIN each time. Again, only Curve 25519 is supported by SSH. Only the authentication key is used by SSH, so signature and encryption keys could be other types.

Since SSH does not support secp256k1, using it will cause "Agent refused operation" error with SSH.
The key types that SSH does not support are hidden unless the session was started in expert mode using gpg --card-edit --expert.

generate will first ask for PIN, then ask for user information, then ask for Admin PIN prior to key generation. Again, generation of new keys will reset the touch policy.

Do not make off-card backup. The key will be generated off-card first and then loaded back to Yubikey, which means OpenPGP attestation won't work.

There seems to be no point in setting a key expiration here. The expiration can be changed using gpg --edit-key (note: this is a different session mode than --card-edit) after the key is generated.

Now we need to configure the OpenPGP touch policy. There are a few oddities to be aware of before proceeding.

Firstly, after running gpg --card-edit, the ykman command seems to hang, but unplugging and replugging the Yubikey makes it work again.
Secondly, ykman seems to have trouble disabling terminal echo on Mac OS X ("Can not control echo on the terminal" with a GetPassWarning), so the PIN and Admin PIN will be echoed to the screen. It is using the Python getpass module, which works when I run it in Python directly. I think ykman wasn't able to import termois because the code signing prohibited it from loading dynamic libraries.
Lastly, file output could not be written, possibly due to Catalina's enhanced security policy. Writing to standard output works, but beware that prompts are also written to stdout. It means that output redirected to a file would be polluted with interactive messages. They should read my article To err is human; to out, pipeline.

Now, use ykman to set touch policy. It will ask for the Admin PIN.

$ ykman openpgp set-touch SIG FIXED
$ ykman openpgp set-touch ENC FIXED
$ ykman openpgp set-touch AUT FIXED
$ ykman openpgp set-touch ATT FIXED

Note: set-touch ATT FIXED only needs to be done once per Yubikey ever. This is not affected by factory-reset. Trying to set it again will result in a harmless error.

Finally, I am generating the OpenPGP attestation certificates but haven't found a use for them. The ones generated by Yubikey contains an X.509 v3 extension also indicates what the touch policy is, so an external trust engine may use it to decide whether to give an attested key more trust. Note that unless the touch policy is FIXED or CACHED-FIXED, it can be changed to OFF later, so it makes no sense to attest any other touch policies.

These will ask for the regular PIN and write the attestation certificate to standard output.

$ ykman openpgp attest SIG -
$ ykman openpgp attest ENC -
$ ykman openpgp attest AUT -

After each command, copy and paste the X.509 certificate to its own file. Do not attest ATT which is the attestation key. I suspect it will destroy the preloaded attestation key but haven't tried it. It will warn you that it will overwrite an existing attestation certificate. Yubikey comes with an attestation key preloaded which certifies that the OpenPGP keys are generated by a Yubico manufactured hardware. If the attestation key is overwritten, it could no longer be recovered even after a factory reset.

I use GPG Tools on Mac to add a photo to the key which has a nicer interface, but gpg --edit-key should work. Although the recommended photo size is 120x150, I find that 240x240 produces the best result. It will be scaled down to 120x120 by GnuPG. After this, the public key is ready to be exported.

Old public keys need to be revoked and exported again. The newly exported key will then contain the revocation certificate.

The new key should show up in ssh-add -L without further action. If not, make sure SSH_AUTH_SOCK points to gpg-agent, not launchd. Usually starting a new shell after running gpg --card-edit will do the trick. The output of ssh-add -L contains the public key (identifiable by the cardno which is the Yubikey's serial number) that needs to be added to $HOME/.ssh/authorized_keys on the SSH server.

Monday, November 23, 2020

pidfd_getfd is harmful!

Everyday, it seems like Linux is gaining more Windows features to allow one process to hijack another process, which enables computer viruses and malware to propagate and steal user data. A recent example is the addition of pidfd_getfd(), a system call that lets one process duplicate a file descriptor belonging to another process. The idea is that process Mallory would inspect procfs and see which file descriptors process Victim has open, then Mallory can duplicate them at will.

Before pidfd_getfd(), regular files may be reopened via procfs. It works even if the file has been unlinked, although the opening is still subject to inode permission. It may not seem like a big deal, but file permission controls access by the owner user and group, so it could not let one process keep secret from another process belonging to the same user, or from root. Fortunately, sockets and pipes cannot be reopened from procfs.

There are some questionably legitimate uses for pidfd_getfd(), but this mechanism is flaky. A commentator on LWN already pointed out the hilarious confusion due to race condition. There is a delay between Mallory enumerating the file descriptors in procfs and when it decides to steal a file descriptor. In the mean time, Victim could have duplicated another file to it (e.g. via dup2()).

Here is the real reason why pidfd_getfd() is harmful: it violates the assumption in POSIX that a file descriptor is a credential testifying that a process has access to some resource. This credential can be passed around using sendmsg() with SCM_RIGHTS, or inherited by a child process through fork(). Both of these are deliberate actions taken by someone who possesses the credential. Now the security model is weakened to allow anyone who can call pidfd_getfd() to gain unauthorized access to these resources. It's unauthorized because it lacks deliberate action by the credential owner.

Before, a process could create a socketpair() and be secure in the knowledge that only the process it shares the file descriptor with can communicate over this secure local channel. Since the socket pair is already connected, it cannot be connected to another socket. But pidfd_getfd() allows Mallory to access the secret channel between Alice and Bob.

Before, Alice could send a file descriptor for a file she owns to a specific process owned by Bob even though Bob normally couldn't open the file. Another process owned by Bob would not be able to reopen that file from procfs. However, pidfd_getfd() bypasses the file permissions check.

An important principle in designing a secure system is to disallow access or an operation by an outsider unless a deliberate action is taken by the owner of the resource. Unfortunately, pidfd_getfd() breaks that principle. It was introduced in Linux kernel 5.6, so versions henceforth are no longer secure by design.

Thursday, November 5, 2020

Electronic Voting

As Americans patiently wait for the result of the 2020 POTUS election, I decided to take a crack at solving the electronic voting problem, with a timing that is perhaps a bit irrelevant now but may be useful again in 4 years. Now, this is by no means a survey of the state of the art on electronic voting, since the research has been done in so many years by so many people. This exercise is just for me to kill time.

A naive electronic voting may have accountability issues, namely we don't know if a vote is legitimate, or that all votes have been counted correctly. In order to know if a vote is counted, we need to know how everyone voted, which seems to sacrifice anonymity, unless we come up with an electronic analogy of a paper ballot. Traditionally, a ballot is issued in person at the voting station, or mailed to the voter in advance for mail-in ballots. There is little protection against ballot forgery, but it would be difficult to forge ballots physically at a large scale.

For electronic voting, a ballot certificate would be issued by the county government to the voter upon voter registration. The issuer is responsible for checking voter identification before issuance. This ballot certificate gives the voter the right to vote, and should be kept secret before a vote is cast (anyone in possession of a ballot certificate could vote with it). The certificate has a unique serial number and a cryptographic signature signed by the issuer. The serial number ensures that the ballot could only be used to cast at most one vote, and the signature verifies that the ballot is issued legally by the county. Otherwise the ballot is not tied to the voter identity. The county should maintain a separate list of voters who already registered, so a voter could only ever receive one ballot certificate. Large scale fraud can be detected if a county somehow issued more ballot certificates than its voting population.

Votes are entered into the voting databases along with the ballot certificate. Since the voting database contains no personal identifiable information, its entirety can be downloaded by the public when the election is concluded, and anyone could verify the presence of their vote in the database as well as independently count all of the votes. US population is only ~328 million, so the database is probably ever going to be only a few gigabytes which can comfortably fit onto a smart phone. Ballot verification could still take an hour or two (e.g. ed25519 signature verification rate is 71K/s), but counting should take only a few minutes if not seconds.

The challenge is distributing the database while ensuring integrity, since anyone receiving a copy with parts of the database missing or tempered with would not know it, and would arrive at a different count.

To ensure that all the votes are recorded without being tempered, they need to appear as part of a Merkle Tree (like how git works). Each transaction begins with a base commit hash and stores the content of the voting record (ballot, vote), both of which would derive a new commit hash. The complete voting record (base commit hash, ballot, vote, new commit hash) is returned to the voter as a receipt, and the new hash is also used as the base commit hash for the next transaction. Once a vote is cast, the ballot certificate is no longer a secret.

This simplest form of the Merkle Tree is just a singly linked list, but this type of transaction is very inefficient, since it requires all the votes in the universe to be sequentialized (i.e. only one person can vote at any given time). A tree construction allows transactions to take place in parallel (i.e. many people can cast vote at the same time). First, vote commits are built sequentially as before, but when a good number of votes are collected, they are finalized into a block by recording the final commit hash of the block. Multiple blocks can be built concurrently and then merged into a superblock by creating a commit hash out of all the block hashes. Superblocks can be similarly merged, until the election is closed, at which point the whole election is given a final commit hash.

This way, a voter can verify that their vote is counted in the election if they could trace their vote's commit hash all the way to the final election hash. If a vote is tempered with, the commit hash would change. If a block is tempered with, its block hash would change. If any merges are tempered with, the final election hash would change. Anyone could download the whole voting database and know that they have the complete, untempered copy. They can write software to independently verify the Merkle Tree without trusting the software that generated it.

The electronic ballot certificate could be printed on paper as a 2D barcode, which allows the bearer to cast the vote as a paper ballot. When voting in person, the voting machine would scan the barcode, ask for voter input, conduct the transaction, then print out the receipt with the commit hash. If this 2D barcode is a QR code, it could be scanned by a mobile app for possibly voting by phone, but the risk is that a malware app could steal the ballot and cast the vote without the voter's knowledge.

I believe this system is secure and practical, but I haven't spent too much time agonizing over it to tell for sure. I purposefully avoided mentioning the word "blockchain" because many people use that buzzword without knowing what it really means. Even though a blockchain is built using a Merkle Tree as well, it limits the rate of transactions globally to 5 per second (really low) through proof of work. The proof of work was invented for Bitcoin to impose artificial scarcity to the cryptocurrency. In the context of election, scarcity may be able to mitigate large scale fraud, but this is not a good way for ensuring election integrity.