Find how many lines are in the current folder in UNIX
Written by Georgi Stefkoff
Background
Do you ever need to know how many lines of code (for example) are in a particular folder? Do you know that you can easy find this with a single line in the terminal? I will show you how here and will explain every part.
Prerequisites
So we need a UNIX system (obviously) - Linux or MacOS. Also, we need some folder to work with. I've choose a random node_modules
folder from a random project that have a bunch of nodejs
modules inside (I really do not know what is inside it). You can choose any folder from yours, or create some folder and files to validate the tests
Final result
Probably you are curious to see the final result and test it out. I will show you the full command and then will break down all parts of it, so you can understands it. Here it is:
find . -type f | xargs wc -l | egrep 'total$' | awk '{print $1}' | paste -sd+ - | bc
Explanation
Here, we have used a series of commands (more often can be heard as pipes
of commands, that when combining in a particular way, they can produce the desired results.
Above, I've used the following commands: find, xargs, egrep, awk, paste and bc
. You can see the manual for each command by typing man {command}
in the terminal and read more about it.
The following sections will give a brief introduction to each command and how it helps for the desired results.
find command
find
command is maybe the most used command from UNIX system administrators. Basically, it searches for an entries (files, folders, etc.) in a given directory, by some name or file type (even more). See man find
for more information.
In the example above, I've called it like this find . -type f
and here is what all of this means:
.
operator of thedirectory
options is specifying from where to start the search. Currently, I've set it at the current directory, but it can be any relative or absolute directory for the system-type f
tells that the type of the entry should befile
If you execute only this command, it will print all files in the specified directory, recursively. This is one of the main requirements of our task. So now we have all of the files, so let's more on.
xargs command
xargs
is also one of the very often used and powerful command in UNIX systems. It allows to execute specific command when a STDIN is received or execute commands based on some content of a file. See man xargs
for more information.
In this examples, I'm using xargs
to call wc
command for each entry that find
command returns.
wc
(word count) is a command that can tell the word, line, character and byte count of a file. I've used the option -l
to show me only the lines count for the given file, return from find
command previously.
If you execute only find . -type f | xargs wc -l
you will see the number of lines followed by the name of the file.
So far so good, but there is an issue - I do not need the file name returned from wc
command. I need to get rid of it. See next.
egrep command
Like grep
command, egrep
is just a shortcut for not using grep -E
notation. See man egrep
for more details.
Basically, egrep
will look through the input and it will try to match the string to the give regex pattern. In my example, I've given /result$/
. This will means that will search for the word result
at the end of the string. We need this, because once xargs
get the input, it will send specific about of items to wc
and just one. Once wc
receives more than one file, it will show the total count of lines for each provided files. The default number for xargs
is 5000, so this means that if the files in the current folder are more than 5000, then wc
will receive 5000 files at once and it will show the total number of lines for all of them
awk command
awk
is also powerful command. By the way, awk is now only a command, but is an entire language. You can read more about it in the official website here. In my example. I'm using one of the basic part if AWK, and one of the most used across the UNIX command piping for building desired results.
Basically what awk
do is to take some string, divide it on all of the delimeters (default is tab or \t) and gives you the access to the array of all of the items from the string. If you know some programming language like JavaScript or PHP, it is similar of what String.split
in JavaScript or explode()
in PHP do.
In my example, I'm calling it like awk '{print $1}'
. This means: Get the string and return me the first item. Remember that the string is splitted by the delitetters.
Because wc -l
commands prints the lines number and the filename (always), we want to get only the line numbers and get rid of the file name. That is why, we want only the first token, or $1
. Remember that in awk
, the elements starts from 1 and not from 0, as in many programming languages.
So far so good. We have the line numbers for all of the files in our directory. Now, we need to calculate all of this.
paste command
Do not get confused from the name of the command paste
. It will not going to paste you anything in the terminal. It will basically merge everything in one and then paste it. See man paste
for more information.
paste
command has 2 options: -d and -s
-d
If you do not specify this options, everything will be pasted with a tab (\t) one by one, except for the last item. If you specify something different than the \t (+ for example), then everything will be followed by + sign-s
is to concatenate everything in one, based on the-d
options
This is what we need to calculate all lines in the files. Basically we will have the following scenario until now:
find
will print the file one by one:
/path/to/file1.ext
/path/to/file2.ext2
xargs
will executewc
and it will show how many lines are in every file:
55 /path/to/file1.ext
323 /path/to/file2.ext2
awk
command will get only the line numbers:
55
323
- and now
paste
will concatenate everything with+
sign:
55+323
Wow. We have the formula, so now, someone have to calculate everything.
Luckily, we have bc
command for this. See below.
bc command
Basically, bc
is a arithmetic language that can calculate the provide input. Of course, bc
is way powerful than I can show you or explain, but think about it like to calculate expressions (like the one that we have from above). See man bc
for more information.
So until now, we have the arithmetic expression and we can pass it to bc
to be calculated and printed. In your example is very easy, by just calling bc
without any arguments or options
Aaand we have the result
Executing the entire command find . -type f | xargs wc -l | egrep 'total$' | awk '{print $1}' | paste -sd+ - | bc
will print you just the total number of files that are in the current directory.
Conclusion
In Unix systems, piping commands can be very useful and can lead to a different results (desired or not). Every SysAdmin is using piping in the terminal, because we do have time to write a new peace of software that combine two already build. We can just pipe the result of one command to be argument (or input more correctly) to another program and get the desired result.