fsync, sigh

As probably everyone who keeps up with linux has heard, ext4 eats data, puppies and babies.  Except, of course, that’s not really true.

The original slashdot headline was quite breathless, I’m sure, but in the end it ended up with a rather milquetoast “Apps That Rely On Ext3’s Commit Interval May Lose Data In Ext4” after someone rose to ext4’s defense.  …erm, or sent slashdot to a reeducation camp.

Well, welcome to Posix buffered IO.

I really do feel for the app writers here, I really do.  Thanks to the infamous Firefox fsync “bug” people have been scared off of using fsync in Linux.  This is unfortunate, because without fsync, you cannot say when your data hits disk.  Yes, on ext3, in the mode it’s most usually run, it will likely hit the disk in about 5s.  But all the world is not Linux, and all of Linux is not ext3.

So here we are with fsync being painful on ext3, and fsync being a necessity on ext4 (edit: on any buffered IO filesystem, really).  At least it’s not painful on ext4…. as long as you run with delalloc enabled (the default).  But to top it off, even if you wanted to special-case ext3, and rely on the 5s commit interval and skip the fsync to avoid the pain… good luck distinguishing them; the magic numbers are the same, and statfs won’t help you.  Whee!

So now we are faced with some decisions.  Should the filesystem put in hacks that offer more data safety than posix guarantees?  Possibly.  Probably.  But there are tradeoffs.  XFS, after giving up on the fsync-education fight long ago (note; fsync is pretty well-behaved on XFS) put in some changes to essentially fsync under the covers on close, if a file has been truncated (think file overwrite).  ext4 went a step further and will essentially fsync if a rename clobbers an existing file (as it does in the write tempfile/sync (or not) tempfile/rename tempfile routine).  But now we’ve taken that control away from the apps (did they want it?) and introduced behavior which may slow down some other workloads.  And, perhaps worse, encouraged sloppy app writing because the filesystem has taken care of pushing stuff to disk when the application forgets (or never knew).

I dunno how to resolve this right now.  I talked to some nice KDE folks on irc; they basically want atomic writes, either you get your old file or your new file post-crash; and tempfile/sync/rename does this – but the fsync hurts on 78% of the Linux filesystems out there.  So their KSaveFile class doesn’t fsync.  So what to do, what to do..

Stewart Smith’s “Eat My Data, Please” is a great talk; it’s concise, funny (for geeks) and informative for anyone interested in data integrity on Posix filesystems.  It covers about everything you need to know… except what to do about ext3.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.