Lustre filesystem

General recommendations

  1. Try not to have folders with more than a hundreds of files
    • Think if it is possible to create a tree structure to split the files into different folders
  2. Try to avoid writing into the same file in parallel applications
    • If needed, try to use one parallel I/O implementation like MPI-IO or HDF5, both available at the prototype
  3. Is it possible to choose the folder where your application will write? If so, is it mandatory that this folder is shared across the processes?
    • If yes to the first question and no to the second:
      • Try to modify your application to write at the SDcard
    • If yes to both questions:
      • Try to not write to the same file with several processes at the same time
  4. Try to use lfs for everything related to the files, for example lfs find. instead of *.

How to deal with folders with a huge number of files

First, we need to understand that the Lustre filesystem is composed by several disks which contain data (OST) and only one containing metadata (MDS). Furthermore, for every file accessed, a open/close operation must be done at the MDS.

Now, when executing the command, for example, ls –color -l, the MDS must be accessed several times for each file, causing overhead. To avoid this, try to use ls command without color or any other option when working on folders with more than a couple of hundreds of files.

If your prefer, you can also use the tuned ls command provided by the Lustre package. The syntax is the following:

#lfs ls --help # For more information
 
lfs ls [OPTION]... [FILE]...

How to deal with small files

Lustre stripe

As we said before, Lustre does not like to work with small files. The reason is that the overhead generated by accessing different disks distributed on different machines for getting a few bytes is too big.

Actually, the Lustre disks where each file will be placed can be choose per directory-basis. This is the stripe of the folder. Then, inside a folder with stripe 1, all the files will be written only to one disk (recursively for folders). With stripe 2, only two disks will be used, etc. There is the special case of the stripe -1, which means as much disks as possible.

Now, the question is which stripe should I choose for my folder? There's not an easy answer for this, but we can give a few suggestions:

  • Files > 1GB
    • Use stripe -1
  • Files < 10MB
    • If the file is going to be accessed sequentially most of the time, then stripe 1
    • If the file is going to be accessed in a parallel way most of the time, then stripe -1 (see stripe size section also)

Stripe size

When using a stripe of -1 or bigger than 1, the files are split between the disks composing the Lustre filesystem. Each file will be split each N mega bytes, where N can be set per folder-basis. This size is called the stripe size in Lustre.

This parameter could be very useful if our application access a file in the Lustre filesystem in a parallel way and, each process will access a different portion of the file but always the same size.

For example, imagine a file of 10MB that will be accessed per 10 processes, the process 0 will access the first 1MB of the file, process 1 the second 1MB and so on. In this case, it could improve the performance if we set the stripe size to 1MB and the stripe count to 10. This way, each process will be accessing one disk at a time, maximizing the Lustre parallelism.

How to set stripe size and count

The command lfs is used to modify/check the stripe size and count of a directory (and its files). The syntax is the following:

usage: setstripe -d <directory>   (to delete default striping)
 or
usage: setstripe [--stripe-count|-c <stripe_count>]
                 [--stripe-index|-i <start_ost_idx>]
                 [--stripe-size|-S <stripe_size>]
                 [--pool|-p <pool_name>]
                 [--block|-b] <directory|filename>
                 [--ost-list|-o <ost_indices>]
 
usage: getstripe [--ost|-O <uuid>] [--quiet | -q] [--verbose | -v]
		   [--stripe-count|-c] [--stripe-index|-i]
		   [--pool|-p] [--stripe-size|-S] [--directory|-d]
		   [--mdt-index|-M] [--recursive|-r] [--raw|-R]
		   [--layout|-L]
		   <directory|filename> ...

Examples

Set stripe count and size of a new folder

The following example will create a folder with stripe size 2MB and stripe count of 4. Meaning that the files will be split between 4 disks, with each portion of the file being of 2MB.

# --stripe-size accepts N{k,m,g}, where N is a number and k, m and g stands for KB, MB, GB
druiz@mb-login-12:~/ lfs setstripe --stripe-count 4 --stripe-size 2m testFolder
Get stripe count and size of a folder

The following will provide the user with the stripe count and size of a folder or file.

# For a file
druiz@mb-login-12:~/ lfs getstripe -c -S tests/testFile.c
lmm_stripe_count:   11
lmm_stripe_size:    1048576
 
# For a folder
druiz@mb-login-12:~/ lfs getstripe -c -S tests
stripe_count:   -1 stripe_size:    1048576
Modify the stripe of an existing folder

The following example will show how to modify the stripe size and count of a folder.

druiz@mb-login-12:~/ lfs getstripe -c -S lustreTest
stripe_count:   -1 stripe_size:    1048576
 
druiz@mb-login-12:~/ lfs getstripe -c -S lustreTest/foo
lmm_stripe_count:   11
lmm_stripe_size:    1048576
 
# If you want to change also the stripe size and count of the files within the folder
# execute the two following commands
druiz@mb-login-12:~/ mv lustreTest lustreTest.bak
druiz@mb-login-12:~/ mkdir lustreTest
druiz@mb-login-12:~/ lfs setstripe -c 4 -S 2m lustreTest
druiz@mb-login-12:~/ cp -a lustreTest.bak/* lustreTest/
druiz@mb-login-12:~/ lfs getstripe -c -S lustreTest
stripe_count:   4 stripe_size:    2097152
druiz@mb-login-12:~/ lfs getstripe -c -S lustreTest/foo
lmm_stripe_count:   4
lmm_stripe_size:    2097152
 
# If you want only the new files to use the new stripe size and count, just set the stripe of the folder
druiz@mb-login-12:~/ lfs setstripe -c 4 -S 2m lustreTest
 
druiz@mb-login-12:~/ lfs getstripe -c -S lustreTest
stripe_count:   4 stripe_size:    2097152
druiz@mb-login-12:~/ lfs getstripe -c -S lustreTest/foo
lmm_stripe_count:   11
lmm_stripe_size:    1048576
QR Code
QR Code lustre (generated for current page)