Wednesday, December 16, 2009

ZFS on Amazon EC2 EBS

One of the Amazon EC2 advantage for customer like you and me is that you can rent a computer from their data center, and you're only charged what you use—computing hours, instance sizes, network transmission—with the exception of EBS where you are charged by the amount of storage you provisioned to a volume. For example, you're still charged 1TB of storage cost if you only used 1GB of that 1TB. Since EBS is just a block device, there is no way for Amazon to tell how much of it you actually used.

What happens if you grow out of the space you initially provisioned to a volume? You can expand it like this: detach the EBS volume, make a snapshot of the volume to S3, use that snapshot to create a new EBS volume of a greater size, and attach the new volume. You will likely need to use an operating system tool to resize the file system to the new volume size as well.

While this might work well for some people, I'm not entirely happy about this approach. The time it takes to make the snapshot is directly proportional to the amount of data you have stored. The down-time seems unavoidable.

I've been studying management of a ZFS pool, and here are my findings. You can create EBS volumes and add them to a pool. Then when you want more space, just create a new EBS volume and add them to the pool. The pool will enlarge automatically to reflect additional space.

The smallest EBS volume you can create is 1GB, but I don't suppose anyone would want to create lots of 1GB volumes. This will be a nightmare to keep track of. Fortunately, ZFS also allows us to replace smaller disks with larger disks. You can keep a pool with about 3-4 EBS volumes, and when you want more space, just create a new one with more space, and use it to replace the smallest disk in the pool. This way, only 1/3 or 1/4 of the data in the pool needs to be transfered. Furthermore, all ZFS pool operations are performed online, so there is no downtime.

What if you want ZFS mirror or ZFS raidz redundancy? I found out that, unless you're one of the lucky users of Solaris Express build 117, which provides autoexpand capability, the disks that are part of a mirror or a raidz are not automatically expanded even after all disks are replaced with larger disks. Such is the case for zfs-fuse on Linux. However, I found out that a zpool export followed by zpool import updates the size. Or you can reboot your computer. Again, the downtime now is the amount of time it takes to reboot your EC2 instance, and not the amount of time it takes to make snapshots, which is much better.

The disadvantage now, however, is that you can no longer snapshot the whole filesystem at once, which spans across multiple EBS volumes.
Post a Comment