Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

XFS & Inode64

For the new storage server of Gaia, Galactus, we have set-up a XFS volume of 175TB. So, I’ve considered using inode64. I will describe my experience in this post.

The inode64 parameter allows XFS to encode inodes on 64 bits, instead of 32 bits. According to the XFS website:

By default, with 32bit inodes, XFS places inodes only in the first 1TB of a disk.
If you have a disk with 100TB, all inodes will be stuck in the first TB.
This can lead to strange things like "disk full" when you still have plenty space free,
but there's no more place in the first TB to create a new inode. Also, performance sucks.
To come around this, use the inode64 mount options for filesystems >1TB.
Inodes will then be placed in the location where their data is, minimizing disk seeks.

TL;DR:

  • advantages: better performance, no risk to run out of inodes
  • disadvantages: software compatibility, poor NFS support

Setup (XFS and NFS)

The system is running Centos 6.5 with the kernel 2.6.32-431.5.1.el6.centos.plus.x86_64. I’ve already configured LVM, and I just have to create the XFS partition.

Terminal

   # mkfs.xfs -f -L nfs_netapp /dev/vg_nfs_netapp/lvm_nfs_netapp
   # echo "/dev/vg_nfs_netapp/lvm_nfs_netapp  /export      xfs     defaults,noauto,inode64,uquota,prjquota 0 0" >> /etc/fstab
   # mount -a

Then, I export 2 directories from /export/ with NFS v3:

  • /export/users
  • /export/apps

In /etc/exports:

/export/apps    10.228.0.0/16(async,rw,no_root_squash,no_subtree_check)
/export/users   10.228.0.0/16(async,rw,no_root_squash,no_subtree_check)

First impression: it just works.

Terminal

   $ ls -lia /export/users/workdirs/
   total 1.3M
   373674360862 drwxr-xr-x 219 root root 8.0K Mar  5 00:31 .
          16389 drwxr-xr-x   4 root root   48 Aug 20  2012 ..
   253479845913 drwxr-xr-x   2 595   777   10 Apr 16  2013 dir1
   665807749125 drwxr-xr-x   3 549   777   33 Nov  7 11:08 dir2
   240598892562 drwxr-xr-x   2 516   777   10 Aug 20  2012 dir3
   244909948928 drwxr-xr-x   2 558   777   10 Oct 22  2012 dir4
   227825795262 drwxr-xr-x   2 564   777   10 Jan 24  2013 dir5
   232013021291 drwxr-xr-x   2 547   777   10 Oct  3  2012 dir6
   451108798487 drwxr-xr-x   5 520   777   68 Dec 29 17:52 dir7
   249296175125 drwxr-xr-x   2 528   777   10 Jun  7  2013 dir8
   236296667296 drwxr-xr-x   2 566   777   10 Jan 25  2013 dir9

First test with NFS v3…

On the nodes, I realize I can mount galactus:/export/users, but not galactus:/export/apps

In the server logs, I get these messages:

Terminal

   Mar  5 21:41:08 galactus rpc.mountd[18099]: authenticated mount request from 10.60.1.84:778 for /export/apps (/export/apps)
   Mar  5 21:41:18 galactus rpc.mountd[18099]: authenticated mount request from 10.60.1.84:790 for /export/apps (/export/apps)
   Mar  5 21:41:28 galactus rpc.mountd[18099]: authenticated mount request from 10.60.1.84:874 for /export/apps (/export/apps)
   Mar  5 21:41:38 galactus rpc.mountd[18099]: authenticated mount request from 10.60.1.84:975 for /export/apps (/export/apps)
   Mar  5 21:42:39 galactus rpc.mountd[18099]: authenticated mount request from 10.60.1.84:845 for /export/apps (/export/apps)
   Mar  5 21:46:20 galactus rpc.mountd[18099]: authenticated mount request from 10.60.1.84:948 for /export/apps (/export/apps)

On the nodes, I get these messages:

Terminal

   $ mount -vvv /mnt/nfs/apps
   ...
   mount.nfs: mount(2): Stale NFS file handle
   ...

Well, the root directory of your NFS export must have an inode number with a size inferior to 32 bits.

So, here is a workaround:

Terminal

   $ cd /export
   $ mkdir test
   $ cd test
   $ mkdir `seq 2000`

If you are lucky, you will find some directory with small inodes.

Terminal

   $ ls -li .
   
   total 4.0K
   193619034156 drwxr-xr-x 8 root root   65 Mar  5 22:32 .
      16384 drwxr-xr-x 6 root root 4.0K Mar  5 22:39 ..
   25509982 drwxr-xr-x 2 root root   10 Mar  5 22:27 31
   25509983 drwxr-xr-x 2 root root   10 Mar  5 22:27 206
   65273914 drwxr-xr-x 2 root root   10 Mar  5 22:27 381
   26738688 drwxr-xr-x 2 root root   10 Mar  5 22:27 556
   26738689 drwxr-xr-x 2 root root   10 Mar  5 22:27 731
   26738690 drwxr-xr-x 2 root root   10 Mar  5 22:27 906

Now, use them…

Terminal

   $ mv /export/test/31 /export
   $ mv /export/apps/* /export/31
   $ rmdir /export/apps
   $ mv /export/31 /export/apps

And…

Terminal

   $ ls -1i /export/
   $ 25509982 apps

Now, we can mount /export/apps on the nodes…

It’s an horrible solution, but it works…

But, what about application compatibility?

Greg Banks (engineer at SGI) answered this question on his blog.

He also gives a script which analyzes binaries in a directory and summarizes which of them depend on the old 32 bit stat system call family.

I’ve tested on our modules directory, and it’s definitely not safe to enable inode64. Lots of binaries may (or not) break.

In a HPC center, with legacy code and closed source applications which can’t be easily recompiled, it will not be possible to debug and fix all the potential issues triggered by this change.

So, how-to downgrade?

You can’t :)

If you remove the inode64 parameter, the new files will use 32 bits inodes, but all the previously created files will keep their 64 bits inodes (at least, it’s the behavior with Centos 6.5 and the kernel 2.6.32-431.5.1.el6.centos.plus.x86_64).

The only solution is to reformat with mkfs, and start from scratch with a clean filesystem…