Sun's ZFS filesystem offers storage administrators a new level of simplicity. Instead of using what on Linux are separate software RAID, logical-volume-management and filesystem layers, ZFS combines the three into a single layer that’s easier to administer.
ZFS also has new data-checking and repairing techniques to keep administrators from going through time-consuming and risky runs of the fsck filesystem checker utility. In mail to the linux-kernel mailing list, Linus Torvalds called ZFS one of the "very few bright spots" in Solaris.
But there's a catch. ZFS, like the rest of OpenSolaris, has been available under Sun's Common Development and Distribution License for about two years, and Linux has been under Version 2 of the the GNU General Public License since its first release in 1991. The licenses are incompatible.
One programmer, Ricardo Correia, has what could be an answer, however. A technology called Filesystem in Userspace (FUSE), introduced in the 2.6.14 kernel released in October 2005, enables Linux to use filesystems that run as ordinary processes. FUSE already hosts an implementation of the NTFS filesystem, NTFS-3g, that runs with decent performance.
In a project originally funded under Google's 2006 Summer of Code program, Correia has converted the ZFS implementation from OpenSolaris into a server or daemon program that runs on Linux.
The project is working, with several users running and even booting from a ZFS volume. Correia has not undertaken any performance tuning yet, and one sysadmin, Chris Samuel, has posted benchmarks that clock only about half the speed of another Linux filesystem, XFS.
NTFS-3g, however, has results comparable to a native Linux filesystem, so "at least NTFS-3g does show that good performance is quite possible for a FUSE filesystem," Samuel says in an e-mail interview.
Correia says in an e-mail interview that he's working on memory problems with ZFS. When many ZFS threads try to allocate memory at once, the system ends up with badly fragmented memory and a ZFS daemon using more than 500MB. There are many alternatives to the standard malloc function for managing memory, however, and Correia is looking for answers. "I'm trying to see if I can get tcmalloc [a fast memory allocator invented by Google engineers] to work better," he says.
One advantage to putting ZFS into a separate daemon is the same as something that microkernel operating system developers have been talking up for years: You can kill and restart the filesystem independently of the operating system. "There are still a few cases that ZFS can't handle, like a write failure on a nonreplicated pool. This will cause a panic on Solaris/FreeBSD/Mac OSX. On the other hand, in zfs-fuse, it's just a matter of restarting the zfs-fuse daemon and remounting the filesystems," Correia says.
Samuel says he's using zfs-fuse already for backing up his home directory, because the ZFS snapshot feature is handy for dealing with multiple backups of the same data. "I'm hoping that once its performance gets to an acceptable fraction of XFS's, then I will migrate my home directory to it and see what happens," he says.
Samuel has already found a ZFS bug, and Correia has submitted a fix back upstream. So, the first real users to benefit from this Linux project will be those running Solaris.