Find how many lines are in the current folder in UNIX

Written by Georgi Stefkoff

Background

Do you ever need to know how many lines of code (for example) are in a particular folder? Do you know that you can easy find this with a single line in the terminal? I will show you how here and will explain every part.

Prerequisites

So we need a UNIX system (obviously) - Linux or MacOS. Also, we need some folder to work with. I've choose a random node_modules folder from a random project that have a bunch of nodejs modules inside (I really do not know what is inside it). You can choose any folder from yours, or create some folder and files to validate the tests

Final result

Probably you are curious to see the final result and test it out. I will show you the full command and then will break down all parts of it, so you can understands it. Here it is:

find . -type f | xargs wc -l | egrep 'total$' | awk '{print $1}' | paste -sd+ - | bc

Explanation

Here, we have used a series of commands (more often can be heard as pipesof commands, that when combining in a particular way, they can produce the desired results. Above, I've used the following commands: find, xargs, egrep, awk, paste and bc. You can see the manual for each command by typing man {command} in the terminal and read more about it. The following sections will give a brief introduction to each command and how it helps for the desired results.

find command

find command is maybe the most used command from UNIX system administrators. Basically, it searches for an entries (files, folders, etc.) in a given directory, by some name or file type (even more). See man find for more information. In the example above, I've called it like this find . -type f and here is what all of this means:

  • . operator of the directory options is specifying from where to start the search. Currently, I've set it at the current directory, but it can be any relative or absolute directory for the system
  • -type f tells that the type of the entry should be file

If you execute only this command, it will print all files in the specified directory, recursively. This is one of the main requirements of our task. So now we have all of the files, so let's more on.

xargs command

xargs is also one of the very often used and powerful command in UNIX systems. It allows to execute specific command when a STDIN is received or execute commands based on some content of a file. See man xargs for more information. In this examples, I'm using xargs to call wc command for each entry that find command returns. wc (word count) is a command that can tell the word, line, character and byte count of a file. I've used the option -l to show me only the lines count for the given file, return from find command previously. If you execute only find . -type f | xargs wc -l you will see the number of lines followed by the name of the file. So far so good, but there is an issue - I do not need the file name returned from wc command. I need to get rid of it. See next.

egrep command

Like grep command, egrep is just a shortcut for not using grep -E notation. See man egrep for more details. Basically, egrep will look through the input and it will try to match the string to the give regex pattern. In my example, I've given /result$/. This will means that will search for the word result at the end of the string. We need this, because once xargs get the input, it will send specific about of items to wc and just one. Once wc receives more than one file, it will show the total count of lines for each provided files. The default number for xargs is 5000, so this means that if the files in the current folder are more than 5000, then wc will receive 5000 files at once and it will show the total number of lines for all of them

awk command

awk is also powerful command. By the way, awk is now only a command, but is an entire language. You can read more about it in the official website here. In my example. I'm using one of the basic part if AWK, and one of the most used across the UNIX command piping for building desired results. Basically what awk do is to take some string, divide it on all of the delimeters (default is tab or \t) and gives you the access to the array of all of the items from the string. If you know some programming language like JavaScript or PHP, it is similar of what String.split in JavaScript or explode() in PHP do. In my example, I'm calling it like awk '{print $1}'. This means: Get the string and return me the first item. Remember that the string is splitted by the delitetters. Because wc -l commands prints the lines number and the filename (always), we want to get only the line numbers and get rid of the file name. That is why, we want only the first token, or $1. Remember that in awk, the elements starts from 1 and not from 0, as in many programming languages. So far so good. We have the line numbers for all of the files in our directory. Now, we need to calculate all of this.

paste command

Do not get confused from the name of the command paste. It will not going to paste you anything in the terminal. It will basically merge everything in one and then paste it. See man paste for more information. paste command has 2 options: -d and -s

  • -d If you do not specify this options, everything will be pasted with a tab (\t) one by one, except for the last item. If you specify something different than the \t (+ for example), then everything will be followed by + sign
  • -s is to concatenate everything in one, based on the -d options

This is what we need to calculate all lines in the files. Basically we will have the following scenario until now:

  • find will print the file one by one:

/path/to/file1.ext /path/to/file2.ext2

  • xargs will execute wc and it will show how many lines are in every file:

55 /path/to/file1.ext 323 /path/to/file2.ext2

  • awk command will get only the line numbers:

55 323

  • and now paste will concatenate everything with + sign:

55+323

Wow. We have the formula, so now, someone have to calculate everything. Luckily, we have bc command for this. See below.

bc command

Basically, bc is a arithmetic language that can calculate the provide input. Of course, bc is way powerful than I can show you or explain, but think about it like to calculate expressions (like the one that we have from above). See man bc for more information. So until now, we have the arithmetic expression and we can pass it to bc to be calculated and printed. In your example is very easy, by just calling bc without any arguments or options

Aaand we have the result

Executing the entire command find . -type f | xargs wc -l | egrep 'total$' | awk '{print $1}' | paste -sd+ - | bc will print you just the total number of files that are in the current directory.

Conclusion

In Unix systems, piping commands can be very useful and can lead to a different results (desired or not). Every SysAdmin is using piping in the terminal, because we do have time to write a new peace of software that combine two already build. We can just pipe the result of one command to be argument (or input more correctly) to another program and get the desired result.

Comments

  1. Markdown is allowed. HTML tags allowed: <strong>, <em>, <blockquote>, <code>, <pre>, <a>.