The fastest way to count files in a directory

Yesterday I needed a decision to count the number of mails in my mailbox. Mails are presented as text files, they contain a message and an additional information. Based on their number, I can find out whether I received new mails. The task is quite simple and I want to find the same simple and quick solution.

Using native tools

The first thing that comes to mind is to use ls utility. This is the most popular method, but also the slowest.

To test the methods is better on directories with a large number of files. We can use time utility to measure the execution speed .

ls

Here is an example using ls with ~3000 files directory:

time ls ~/path/to/directory/ | wc -l
...
2974       <--- the number of files

real	0m0.053s
user	0m0.024s
sys	0m0.008s

The real time of execution in this case is 0.053s. This is the actual time for execution.

find

The next method uses find utility. This is the fastest way using Linux tools.

Here is an example:

time find ~/path/to/directory/ -mindepth 1 -maxdepth 1 | wc -l
...
2974        <--- the number of files

real	0m0.008s
user	0m0.007s
sys	0m0.004s

It is 6 times faster for me than using ls.

Here -mindepth 1 is needed to not include the directory name in the output.

Writing a simple and fast program in C language (The fastest tool)

Another way I found on stackexchange. Thomas Nyman wrote his own program and offered it as a solution. Well, his program is the fastest tool for this task.

I suggest you to compile it and test it.

First, create a simple text file. Assume we named it counter.c:

cd /tmp
nano counter.c

Then paste this code:

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <error.h>
#include <errno.h>

int main(int argc, char *argv[])
{
    int file_count = 0;
    DIR * dirp;
    struct dirent * entry;

    if (argc < 2)
        error(EXIT_FAILURE, 0, "missing argument");

    if(!(dirp = opendir(argv[1])))
        error(EXIT_FAILURE, errno, "could not open '%s'", argv[1]);

    while ((entry = readdir(dirp)) != NULL) {
        if (entry->d_name[0] == '.') { /* ignore hidden files */
            continue;
        }
        file_count++;
    }
    closedir(dirp);

    printf("%d\n", file_count);
}

Now try to compile it with the following optimization options:

gcc -march=native -O2 -o counter counter.c

Note, that -march=native points to compile the program for your CPU with its features. Therefore, it may not work if you move the compiled program to another computer. Also -O2 sets the level of optimization to 2.

Don’t use these options if you don’t want, they don’t play a significant role. It is significant that this program perform only one task – to count the number of files and nothing else.

Now run the compiled program:

time ./counter ~/path/to/directory
...
2974        <--- the number of files

real	0m0.004s
user	0m0.000s
sys	0m0.003s

Execution time is 4 milliseconds 3 of which were performed by the kernel itself (syscalls).

Personally I use the method with find utility. This is quite enough.

Updated: July 22, 2019 — 10:33 am