Saturday, July 15, 2023

ZFS zpool replace "no such device in pool"

Since the last time I built my ZFS pool in 2016 with 4x 1TB SSDs in a thunderbolt 2 enclosure, I ran out of space in 2019 and upgraded the SSDs to 4x 2TB. I didn't have an extra enclosure, so I took a fairly risky approach to yank each SSD out, put in a new SSD, and let ZFS resilver. It only worked because my pool is a raidz1. During resilvering, the pool was in a degraded state due to the lack of redundancy.

And I am just about to run out of space again after 4 years. I also lucked out because the SSDs are heavy discounted right now due to oversupply. The chip makers are trying to reduce production, so the price might rise again.

This time, I bought 4x 4TB M.2 NVMe (non-affiliate B&H link) and put them in a Thunderbolt 3 enclosure (non-affiliate vendor link). This means that I could do a proper replace without risking the degraded state.

But when I try to replace the SSDs, I keep getting the "no such device in pool" error. I tried these variations:

$ sudo zpool replace tank disk2 disk13
cannot replace disk2 with disk13: no such device in pool
$ sudo zpool replace tank disk2 /dev/disk13
cannot replace disk2 with /dev/disk13: no such device in pool
$ sudo zpool replace tank disk2 media-728C3B9F-647B-4C40-9DDF-E140E55D8C89
cannot replace disk2 with media-728C3B9F-647B-4C40-9DDF-E140E55D8C89: no such device in pool
$ sudo zpool replace tank disk2 /var/run/disk/by-id/media-728C3B9F-647B-4C40-9DDF-E140E55D8C89
cannot replace disk2 with /var/run/disk/by-id/media-728C3B9F-647B-4C40-9DDF-E140E55D8C89: no such device in pool

None of the above worked, and it was driving me nuts. When I was younger, I would have pulled my hair out. Now that my hair is more spase, I've learned to take better care of it by showing restraint.

It turned out that "no such device in pool" actually meant it could not find disk2, even though it is able to find disk13 just fine for some reason. This worked:

$ sudo zpool replace tank media-E235653B-0924-2F46-B708-FDEFFAB5CB15 disk13

The colon in the error message makes it appear that "no such device in pool" is about the new disk, and it is confusing why the new disk have to be added to the pool first. The error is actually about ZFS not being able to identify the existing disk in pool that needs replacement. In fact, if it could not find the new disk, the error message is different:

$ sudo zpool replace tank disk2 disk777
cannot open 'disk777': no such device in /dev
must be a full path or shorthand device name

The error message could be improved, for sure. I wonder if baldness of the people in the tech industry could have been prevented if more error messages are helpful.

Another thing I learned is that when upgrading raidz1, it is far more efficient to replace all the disks at once and let ZFS resilver them in parallel. It takes the same amount of time to resilver one disk as to resilver all disks. The reason is that resilvering requires reading all disks in order to check the parity, whether it is resilvering one disk or many. In my case, the process is bottlenecked by the read, not the write.

Friday, June 30, 2023

Beware of Passkeys Account Takeover Risks

What are Passkeys?

This is the problem passkeys promises to solve by replacing passwords, according to FIDO Alliance:

Based on FIDO standards, passkeys are a replacement for passwords that provide faster, easier, and more secure sign-ins to websites and apps across a user’s devices. Unlike passwords, passkeys are always strong and phishing-resistant.​

Passkeys simplify account registration for apps and websites, are easy to use, work across most of a user’s devices, and even work on other devices within physical proximity.

There is a lot of fanfare. Or if you want to read the spec, it's easy to see only pine needles and miss the forest. Passkeys spec is built upon several FIPS standards whose underlying mathematics principle of the cryptography is sound. However, if the cryptography is used poorly, the overall security of the user experience can still be compromised.

The spec focuses on the communication protocol between the authenticator (device), the client (user), and the website (or app). A passkey consists of a public key and a private key. You give away the public key but keep the private key secret. You can prove possession of the private key without revealing it, and the evidence can be verified by the public key alone. Furthermore, you cannot guess the private key from the public key.

Yubico has an intuitive overview how passkeys work, but here is my paraphrase of their sequence diagram.

What is the Problem?

The device used to create an account on a website stores the passkey so the user can login from the same device later, but the spec does not prescribe what happens when the user wants to login from another device. When using a hardware FIDO U2F security key, the private key is stored on the hardware and cannot be copied, but you can plug in the key to another computer. Passkeys are software generated, so it is open to the solutions provided by the platform with varying degrees of security.

The solution provided by Apple seems to suggest that the "device" is an iCloud keychain. This means the user has to trust Apple for safe-keeping of the private key on their servers. Despite Apple's synchronization security claims that passkeys are end-to-end encrypted, the fact that their implementation is "rate limited to help prevent brute-force attacks" and "recoverable even if the user loses all their devices" should raise an eyebrow. This means an attacker could still "recover" your passkeys without accessing any of your devices. This is permissible because the passkeys spec allows creating roaming passkeys.

The solution provided by Google seems to suggest that when a user wants to log in from a new device, the new device generates a new public key which is shown as QR code, and the user has to take affirmative action to copy the new public key to an existing device to be authorized by an existing key. However, according to the FAQ, the private key is still synchronized to the Google account and is recoverable even if you lose your device.

Security is only as strong as its weakest link. Any time a key is recoverable without using an authorized device, that is an opportunity for account takeover. If the recovery is done by sending a verification code via SMS, it can be susceptible to SIM swapping attack. If the recovery is done by a special passcode, then the passcode itself can be brute-forced or stolen. These targeted attacks are typically done to high-profile public figures.

The one scenario that passkey does better than password is when a website suffers data breach. Since the website only knows the public keys, they are useless to hackers.

It Could Be Worse

Even if the platform makes account recovery impossible, there are still ways to design new device login that render the security useless.

Let's say an existing device lets the user copy the private key to another device. That means a malware running on the existing device can exfiltrate the private key. If the private key is shown as QR code, then anyone with a telescope could steal the private key at a distance. The private key could be encrypted by a passphrase in transit, but then the passphrase could be brute-forced.

Alternatively, the website may send a push authentication to an old device to authorize the login. When an account is under attack, the user would be overwhelmed with many push notifications and might click the wrong button and accidentally accept. They also may be duped to think that the new login comes from a device they already enrolled that somehow lost the authorization.

Best Practice

The best way to implement passkey is to store the private key in a hardware secure element that makes retrieval physically impossible. This makes the passkey always physically bound to the device. To enroll a new device, the user will have to take affirmative action to copy out the new public key to an existing authorized device. After verifying the user to be who they are, the existing device would then create a signature, using an existing key that is already trusted by the website, to certifying that the new key is trustworthy. After the new public key is enrolled by the existing device, the website will allow the new device to login.

This process of creating new passkey on a new device to be authorized by an old device can still be automated by physically tethering the two devices. Authorization of new passkeys could also be done wirelessly over a reasonably secure channel, as long as the channel is mutually authenticated between the two devices and guarded against man-in-the-middle attacks. It is important to note that the passkey is still device-specific and never shared with another device. The user can keep a backup device in a bank safe deposit box if needed.


It is fair to say that passkeys does mitigate data breaches from a website, but there are still a lot of ways a platform could introduce account takeover risks if they employ passkeys improperly. Even though the passkeys spec is built on sound cryptography, it does not rule out scenarios where account takeover could still happen under a targeted attack. Since security is as strong as its weakest link, this is something people should be aware of.