How I prevent my servers from attackers

Written by Georgi Stefkoff

In these days, cyber security is one of the important topic that everyone should consider. It goes from the local router up to the production environments and Cloud infrastructure.

Currently, there are a lot of open-source tools for testing vulnerabilities of an app, that they can be used even for gaining access over it. Also, on the other hand, a lot of users are using open-sourced applications, like nginx, apache, etc, that the attacker could find the exact version of the application and find how it can be broken or even gain access over some private resources.

There are a lot of Vulnerabilities databases, like OVS, that the hacker could just know the version of the application, searches in the OVS for the particular version and do some mess with you server.

That it why, you should always needs to update your software to the latest versions, because the developers are giving more and more security fixes for the application.

My infrastructure

Current, I have a bunch of Dell servers in my basement, that there are VMWare vSphere Hypervisor with a lot of virtual machines that I'm using for my public or private apps. This blog is server from one of the virtual machines in my basement.

For managing the internet access, I'm using 2 Mikrotik and two ISPs. Mikrotiks are responsible for the NAT, port forwarding and Firewall.

I'm also managing some servers from my clients, since I work as a DevOps. There are all in the cloud.

What is the problem

The general problem with all open servers (server with opened port, like 22 SSH), is that there are a lot of bots that are searching for a IP addresses and opened ports. Port 22 is mostly used, because it can grant access to the entire system.

Since I have a GitLab instance in one of my virtual servers, my port 22 is also opened. If I check the Auth log: tail -f /var/log/auth.log I can see that almost every minute (even less) there is someone that is trying to access throw ssh via some random username and password. And this is not only in my server. This is valid for all of the servers across the World, that have some port opened.

There is another BIG issue that I have - in my routes, I have a VPN configuration and since the VPN is using IPSEC for authentication, my ports for IPSEC are opened also. When I open the logs from the Mikrotik, I can see the same situation - a lot of attackers are trying to negotiate with my IPSEC policies by guessing username and passwords. If this happens, it will be very bad - Mikrotik will assign an IP address of the attacker and he will be part of my local network and will have access over all of the assets.

Any solutions?

Yea, there are a lot of solutions for this. One of the best solutions, but may be written as *"workaround** is to allow the access for you applications (ssh, nginx, apache, etc) only by the local network, et Listen 192.168.0.100 an example from Apache configuration file, that will listen on the private assigned address from the router, and can be accesses only from the local network (of no port forwarding is made). This is good, but if you need to access your apps from remote location, you have to open a VPN connection and you can end up with some opened ports and the issue that I was explaining above with IPSEC.

What I'm using:

I'm using a series for tools (applications) that helps me to block most of the attackers. Here is the list of tools that I'm using and later on I will explain what each of this is doing:

  • Graylog - used for Syslog messaging and alert notification
  • fail2ban - used to catch the unauthorized access some application
  • python application - written by my, simple application that receives an IP address and add it to the routes blocked list
  • rsyslog - UNIX application that manages syslog and remote syslog servers

With combination of all of these tools, the following situation can happen:

  • If an attacker is trying to connect to IPSEC with an invalid credentials, Mikrotik will send a syslog message to the graylog server. Once the syslog message is received in the Graylog server, it is process though an Alerts with a specific pattern, check for IPSEC negotiations fails and it will send an HTTP request to the python application
  • If fail2ban catches someone to try to access with invalid credentials, then an request is send to the python application with the given IP address and iptables role is added to the server (default behavior for fail2ban) to block the feature access for the given IP address)
  • if the python application receives an IP address, then it checks if it some local or reserved address and if not, it is adding the address to the Mikrotik's Blocked List. This blocked list is used to filter the IP addresses that are trying to enter inside the router (inside my entire network).

In my case, for three months, I end up with more than 4500 addresses that are blocked in my Mikrotik. And the list grows every day.

So here is a breakdown for each of the tools that I'm using and their pseudo configuration

Rsyslog

Rsyslog is a UNIX tools that can read from the syslog (/var/log/syslog) and send the logs to a remote server. You can read more about the syslog in Wikipedia. Rsyslog can be configured, that once a Critical error is received in the syslog, by some application, then it will be send over a remote location with opened UDP 514 port (default syslog port or any other configured). In my case, in all of my servers (local or remote), every syslog notification, except for INFO, are send to my graylog server.

Graylog server

The Graylog server is configured to receive an Syslog UDP packets. The configuration is simple as it is.

Graylog, have a really good Alert manager, that can be used to analyze the messages, and execute some actions, like - send an email, executing HTTP request and so on. I use the alerts for if I receive some authentication error message, the it will be posted to the python application for analyzing and blocking the IP address

fail2ban

fail2ban is maybe, one of the most used open-source application for analyzing logs and blocking IP address. I use it in all of my servers, that have public access, It most power is to analyze the auth log and find what IP address is trying to reach your system with unauthorized access. The default behavior in fail2ban is to add an iptables record to block the given IP address access to some resource or to entire system (all ports).

fail2ban can be very customizable and it have a lot of built-in filters and actions that can be using. Filter is basically the functionality that analyze the log for for a specific application log, finds if there is some unauthorized access and do something with the IP or the domain.

In my case, I'm using two actions - one to ban the IP in the server (throw iptables) and the other one is the send a HTTP request with the given IP address to the python application, so the URL could be added to the Mikrotik's firewall.

fail2ban can be configured in a lot of more complex way, and it can be used for varius situations and not only for ssh

python application

If you are waiting to show you some source code - I will not, because this blog post will become unreadable. Here is a breakdown of what this app is doing:

  1. Using Flask the application exposes an TCP port that everyone could send HTTP request.
  2. There is only one route that is protected via static Basic Authentication.
  3. Once an request is received and authorized, it search for an IP address in the body parameters
  4. If an IP address if founds, it compares it for local address and some reserved on.
  5. If it not part of the reserved address or not part of the local network, then it is send to the Mikrotik's opened API to add the IP address to a Firewall address list, called "Blocked address"

If you need the source code of the application, you can send be an email to georgi@stefkoff.com

Conclusion

As you can see, the setup is not one of the complex one, but in this way, can can guarantee that these attackers, once they try to break into my system, they will not have another chance.

Also, this configurations is implemented in all of the remote servers, so once some hacker is trying to break throw one of the production servers, it will be added in my local Firewall and I can prevent a feature attacks from him/her.

So, to sumarize - if you are working as developer or DevOps, you have to be sure that you are trying to limit the attackers as more as possible. If you do not do this, later you may end up to getting help from Cyber Security Services and this can cost you a lot of money.

And remember: Your system will NEVER be secured. There will always come a guy that finds a way to break it - even if it is an simple user that do some stupid things and brings DoS by mistake.

If you need more information of my implementation, you can write in the commends below or mail me at georgi@stefkoff.com.

Have a secured day! :)

How I use Vim as DevOps Engineer

Written by Georgi Stefkoff

Vim for DevOps?

Of course vim in not one of the requirements in order to be good DevOps, but I choose it over nano, emacs or any other terminal editors. Why? Read below to see its power.

Why it is better than others?

It is not better than others, because in one way or another, someone could finds nano (for example) as a best editor for him/her. In my experience, I've started with vim and almost everything I did was with vim.

If you prefer some other terminal editor, probably this blog post is not for you, or you can stay and see how I use it.

Here are some key features that I got used to vim and I think it is the best one:

  • a lot of short codes that you can use in order to speed-up the process of writing/editing files
  • very customisable software that you can find plugins for almost everything. You can even write your own.
  • ability to split your screen by multiple layouts and edit multiple files at one screen, without need to open and close every time
  • as the second point, it is customisable and it can be configured per user, so one of the users of the system could use one key mapping, but another user can use the default one, without overlapping or even knowing of the existing of each other.
  • built-in terminal, so you can execute some code, while editing the file, without the need to close and re-open the file
  • many, many more

I just can't remember what I like in vim, because there are a lot things in it. Even there is a huge database with plugins that the boring terminal editor could become as a fully-featured IDE with auto-complete and AI assistance.

In this blog post, I will not cover how you should install vim or configuring any plugins. Probably, your Linux distributing (or even UNIX) has it already.

The following sections describes the most used features by me, in order to speed-up the process of writing/editing files.

How to open?

Opening a file for editing with vim is simple as follows:

vim /path/to/file.ext

With this command, you will open the file in Normal mode. You can also, open the entire folder:

vim /path/to/folder

in this way, vim will list you the files inside this folder so you can hit Enter on the particular file that you want to edit.

Creating the files is the same as editing:

vim new_file_name.ext

With this command, once you made the modifications, vim will save the file with the name that you provided (new_file_name.ext in my case).

Normal mode?

Vim has three modes: Normal, Inserting ,Visual and Command mode

  1. Normal mode: Normal mode is the default one, or once you hit Esc key. In this mode, you can navigate throw the content, but once you click some key on the keyboard, probably you will hit some command.
  2. Inserting mode - this mode is where you will want to insert you content. You enter in this mode by hitting i key or Insert. Insert mode has its opposite Replace mode. The difference between them is that in Insert mode, you are adding a character before the cursor, but in Replace mode, you are changing the character that the cursor is over it. Note: Depending on the shortcuts that you are using, you may enter in Insert mode also
  3. Visual Model - this mode is used for visual selection of the content so you can copy, cut, replace and any other operations over the visually selected code. You will enter in this mode by pressing v button.
  4. Command mode - this is where the magic happen. Command mode in vim allows you to enter specific commands that can do various things (like exiting). In command mode also, saving is happening. You enter in command mode by pressing : (colon). Once you there, you have to write the command and press Enter. You can exit from the Command mode, by pressing Esc key

How to exit?

This is probably the first question of a beginner that is using vim. It is really simple and you have several ways:

  1. Quit from not modified files: You have to enter in Command mode and the writer q and hit Enter:

:q<Enter>

In this way, you will exit from vim and close the program. Note that, I've mentioned "not modified files". This is intentionally, because if the file is modified, after it has been opened, vim will deny the closing, without saving or discarding the changes.

  1. Discard the changes: you can exit and discard all of the changes by writing q! in the Command mode

:q!<Enter>

In this way, vim will discard all changes and will close the program.

Extra You can close all opened instances by adding a after q command, like:

:qa!<Etner>

Will discard all modified opened files and exit.

  1. Save the file and exit: You can save you file and exit from vim by writing w after the q command, like:

:wq<Enter>

In this way, vim will save everything and will exit.

Extra You can use w command just to save your progress, like :w<Enter> and still stay with the opened file. After saving, you will enter in Normal mode

Enough boring stuff for now. I will make the entire tutorial for vim. You can read more about this in details, if you enter the following command: :h<Enter>

Editing multiple files?

Yes. You can edit multiple files in one screen in vim. My favourite command for this is vertical split. In vertical split, you can open the same file, or any other file and the screen will be split in vertical (you will have two editors).

:vs|vsplit [file]<Enter>

You can use both vs or vsplit in order to split the screen. If you do not provide a file name as an argument, you will split the current file in two. You can also use :sp|split [file]<Enter> in order to split the file (or files) horizontally.

Once you split the file (or open another one), your cursor will be active on the opened window. This means that, all of the navigations, commands and so on will works on that window and not in the previous one. In order to navigate between the opened windows, you can use the key combination of Ctl+w - [arrow-keys] (pressing Ctl+w, releasing and then pressing the arrow keys to navigate between the windows.

Built-in terminal

Imaging that you are wring a C program or in any other language that requires compilation, before executing. After each modifications that you made, you may need to check the compilations and fix the errors (if any). If you need to close and open the file, again and again, you will be pissed very soon. Adding second monitor is not the correct way.

Vim has a great feature to execute a commands. This is done by entering in Command mode, then writing ! followed by the command. Here is an example of listing the content of the current working directory:

:!ls -la<Enter>

After hitting Enter you will see a complete different screen (all active windows will be closed and you will see the output of the entered command. You will be prompted to hit some key, in order to return to the editor.

I've use this feature maybe every time when I use vim. Like I said for compiling or debugging, finding some files that I need to edit, and many other commands that I can use it. Note that, I've use this feature for commands that are in one-line. This means that if I need to use series of commands in order to do something, most of the time, I exit from vim and to my job in the normal terminal.

Going to a specific line

If you need to go to a specific line in the file, is more than easy: :[line number]<Enter>. An example is: :55<Enter> which will jump to the 55th line.

Replacing single character

Sometimes, I have to open a file in order to change a single character (typo for example). In vim you do not have to enter in Ineset mode first, then changing the character and then exit from Insert mode. You can simply move the cursor over the target character, hit the r key and then hit the replaced character. In this way, you immediately atelly change a character, without moveing throw the different modes.

Replacing a word

Replacing a word is simple as replacing a character. You have to go to the first character of the word, hit cw. In this way, the entire word, starting from the cursor will be deleting and you enter in Insert mode. You can then write the correct word and hit Esc key to exit from Insert mode.

Replacing until start or end

If you need to change a peace of text from one point until the end of the line, you can hit c$. In this way, from where is the cursor, until the end of the line will be deleted and you will enter in Insert mode. If you need to edit it from the point to the begging, then you have to hit c0. In this way, everything that is between the cursor and the start of the line will be deleted and you will enter in Insert mode.

Navigating through the words

This is simple as you hit w key. Once pressed, the cursor will go to the next word at the first character. This is for moving forward word-by-word. If you need to move backward, you can press b key. Then the cursor will move back to the previous word at the first character.

From the last section it can be seen that, you can go to the end of the line by hitting $ and go to the begging of the line with 0.

Deleting a word

You can delete an word by hitting dw. In this way, starting from the cursor, until the end of the current word, everything will be deleted and you will enter in Insert mode. You can delete multiple words by dNw for example: :d2w<Enter> will delete the first two words from the cursor.

You can also delete everything until the start or the begging of the line by d0 or d$ accordingly.

Deleting a line

You do not have to use the previous approaches to delete a line. You can simple do a dd and the line will be removed.

Moving to the begging and end of the file

You can go to the very first character by hitting gg. In order to move the the end of the file, you have to press G. Note on the uppercase G. You have to actually press Shift+g key.

If you need to delete the entire content of the file, you can combine some commands that we met in order to do this:

  1. Go the the begging of the file - gg
  2. Press d
  3. Go to the end of the file: G

In this way, you will delete everything that is from the begging to the end of the file.

Copy-paste

You can have the copy-paste function with the commands and it can be active to all of the opened windows in vim, *Note that, this will not copy the content to the buffer of the machine, but just in vim memory. From every delete command that you made, it works like cut. This means that the content will stay in the buffer memory and you can paste it somewhere else, after deletion.

Copy

You can copy the content with the combination of y command. y commands copy the cuurect character at the cursor, but combined with other commands it can do more:

  • copy entire word: yw
  • copy entire line: yy
  • copy entire content: gg+y+G

Pasting

Pasting is done by just pressing the p key. As I said above, once you delete something, you can paste it anywhere (until you delete another it will stay in the buffer).

Note that, the pasting will occur at the cursor position

Undo

If you need to do some undo(s), you can hit u key. You can hit as many us you need.

Searching

Searching is one of the most important features for the file editors. You can search in vim by pressing / first and followed by the search pattern, like:

`/search text`

This will search for search text in the entire file.

If there are more than one search results, you can go to the next occurrence by hitting n key, or go back to the previous by hitting the N key.

and many, many more features, that I can't even remember remember

Conclusion

These are just a quick pull from my memory that I can remember while working with vim and one that I use most of the time. Of course there are many more like replacing, complex navigation, mouse behaviour, etc. If you learn this basic usage, you will not need more than that in order to do some magic in the cloud.

Managing cron job for web applications

Written by Georgi Stefkoff

Background

Almost every developer or DevOps has been serving their Web content via some web server like nginx or apache. All of these servers comes with the default user www-data. This user is used for a security reasons that is different from the normal system users (like root or something pre-configured while installing the system), so the application will have a limited resources over the entire server. Of course, you can change the running user for the web server to something else, but this is not recommended if there is not another reason behind this.

If you close the application source code, with some user, let's say root user, you have to change the permission of the source code to be owned by the www-data user. This is done by the following command: chown -R www-data:www-data /var/www/html. In this way, only the www-data user will have the rights to operate with the content in /var/www/html.

Depending on the use-cases, you would want to give the following permissions to all of the files: 770 or 660 if there are no executable files in the source code. This will means that only the user and the rest of the users that are part of the same group (in our case is www-data ground) to operate with the files. One useful permission options can be 774 that means that members of the group can read, write and execute the files, and the rest of the users can only read. But I do not recommend this, so stay with 770 or 660 to limit the access over the source code.

Once you setup the web server, and adjust the correct file permissions, in most of the cases, you will need to run some background jobs or cron job that will do some magic behind the scenes. Usually, this jobs will fix some broken database records or will trigger some background jobs.

So, where is the problem?

The problem

If you configured the cron jobs with the root users (or something else), according to the file permissions of the source code, you may end up with Permissions Denied error, while trying to execute some PHP script (for example). One example is, if you users is ubuntu and the source code file permission are 660 this means that only the www-data user or users under www-data group can only access these files. You can easily fix this, bu adding your user to the www-data group by sudo usermod -aG www-data ubutntu. In this way, ubuntu user will have access over the files that are under www-data group. This is not the best approach, because we are giving access to another user over the source code. The eases way will be just to set the web server to work with user ubuntu and get rid of file permissions and so on. But this is highly not recommended.

Another big issue will be the following: If you set the cron jobs by the root users. In this way, since root user is part of the sudoers it have access over all of the files, and no permissions can stop him. If you execute the cron job with the root users, probably everything will went fine and most of the time, you will not notice any issues (you have to pray that there will be no security issues with this), but imagine the following case: root user executes the cron job, which its work is to generate some files. After the execution, these files will be owned by the root users, since the root users forked the process. Then the issue will come if the web server is trying to access these files. The system will denied the read request, because the web server user is www-data, but the files are created by root user, and your application will stop working properly.

One workaround if this issue is to change the file permissions of the source code after every cron job to www-data user, but this is just a waste of CPU resource.

The solution

The correct solution here is to execute these cron jobs, by the www-data user. you can do this by running the following command:

sudo crontab -e -u www-data

Note the sudo at the begging. Changing another user's cron jobs, will require a root privileges!

In this way, the www-data user will have their own background jobs and they will operate with the source code by themselves and no additional permissions adjustments will be required after the cron job.

Extra: fixing in production

One extra information for use, is a similar scenario: if you need to fix something directly on the production server, or you need just to change some code for debugging, you will want to have access to the source code with the correct user. If you are using the root user, there will be no issue to make changes over the files and save them.

NOTE: if you apply some changes on a file with a different user that have explicit access over the file, this will not change the owner of the file

But if you are using an user, different that root (or the want that is not part of the sudoers), you may end up with Permission denied error.

Another issue will be the same as above - if you need to create a new file, or execute a script that create a file(s), this can lead to the web server user to have no access over the newly created files, because the files will be owed by your user and not by the web server user (www-data in our case)

The correct solution is to first login as the web server user and then apply the changes. In order to login with the www-data user, you have to do the following: (NOTE: you need to have root access for this job)

If you are not the root user the call this first:

sudo su

(enter password)

In order to enter is super-user mode.

Then execute the following command:

su -l -s /bin/bash www-data

And you will be logged as the www-data user.

Here is what this means:

  • -l options is said to execute the login logic of the user - setting the environment variables, etc.
  • -s is to provide the shell - in our case is /bin/bash. You can also use /bin/sh or whatever shell you are using.
  • www-data is the user that use need to impersonate

After this, you will work with www-data user and will have the correct access over your source code, as the web server is working

Conclusion

Managing the correct access for the web server files is the key to avoid some cyber attacks over your server. Imagine that if you do not limit the resource for you web server over the file system, some peace of code, could give access to attacker to a files that are not part of the source code and they can even fine some database access, access to the server itself or some other sensitive information which is not supposed to be served to the end user

Find how many lines are in the current folder in UNIX

Written by Georgi Stefkoff

Background

Do you ever need to know how many lines of code (for example) are in a particular folder? Do you know that you can easy find this with a single line in the terminal? I will show you how here and will explain every part.

Prerequisites

So we need a UNIX system (obviously) - Linux or MacOS. Also, we need some folder to work with. I've choose a random node_modules folder from a random project that have a bunch of nodejs modules inside (I really do not know what is inside it). You can choose any folder from yours, or create some folder and files to validate the tests

Final result

Probably you are curious to see the final result and test it out. I will show you the full command and then will break down all parts of it, so you can understands it. Here it is:

find . -type f | xargs wc -l | egrep 'total$' | awk '{print $1}' | paste -sd+ - | bc

Explanation

Here, we have used a series of commands (more often can be heard as pipesof commands, that when combining in a particular way, they can produce the desired results. Above, I've used the following commands: find, xargs, egrep, awk, paste and bc. You can see the manual for each command by typing man {command} in the terminal and read more about it. The following sections will give a brief introduction to each command and how it helps for the desired results.

find command

find command is maybe the most used command from UNIX system administrators. Basically, it searches for an entries (files, folders, etc.) in a given directory, by some name or file type (even more). See man find for more information. In the example above, I've called it like this find . -type f and here is what all of this means:

  • . operator of the directory options is specifying from where to start the search. Currently, I've set it at the current directory, but it can be any relative or absolute directory for the system
  • -type f tells that the type of the entry should be file

If you execute only this command, it will print all files in the specified directory, recursively. This is one of the main requirements of our task. So now we have all of the files, so let's more on.

xargs command

xargs is also one of the very often used and powerful command in UNIX systems. It allows to execute specific command when a STDIN is received or execute commands based on some content of a file. See man xargs for more information. In this examples, I'm using xargs to call wc command for each entry that find command returns. wc (word count) is a command that can tell the word, line, character and byte count of a file. I've used the option -l to show me only the lines count for the given file, return from find command previously. If you execute only find . -type f | xargs wc -l you will see the number of lines followed by the name of the file. So far so good, but there is an issue - I do not need the file name returned from wc command. I need to get rid of it. See next.

egrep command

Like grep command, egrep is just a shortcut for not using grep -E notation. See man egrep for more details. Basically, egrep will look through the input and it will try to match the string to the give regex pattern. In my example, I've given /result$/. This will means that will search for the word result at the end of the string. We need this, because once xargs get the input, it will send specific about of items to wc and just one. Once wc receives more than one file, it will show the total count of lines for each provided files. The default number for xargs is 5000, so this means that if the files in the current folder are more than 5000, then wc will receive 5000 files at once and it will show the total number of lines for all of them

awk command

awk is also powerful command. By the way, awk is now only a command, but is an entire language. You can read more about it in the official website here. In my example. I'm using one of the basic part if AWK, and one of the most used across the UNIX command piping for building desired results. Basically what awk do is to take some string, divide it on all of the delimeters (default is tab or \t) and gives you the access to the array of all of the items from the string. If you know some programming language like JavaScript or PHP, it is similar of what String.split in JavaScript or explode() in PHP do. In my example, I'm calling it like awk '{print $1}'. This means: Get the string and return me the first item. Remember that the string is splitted by the delitetters. Because wc -l commands prints the lines number and the filename (always), we want to get only the line numbers and get rid of the file name. That is why, we want only the first token, or $1. Remember that in awk, the elements starts from 1 and not from 0, as in many programming languages. So far so good. We have the line numbers for all of the files in our directory. Now, we need to calculate all of this.

paste command

Do not get confused from the name of the command paste. It will not going to paste you anything in the terminal. It will basically merge everything in one and then paste it. See man paste for more information. paste command has 2 options: -d and -s

  • -d If you do not specify this options, everything will be pasted with a tab (\t) one by one, except for the last item. If you specify something different than the \t (+ for example), then everything will be followed by + sign
  • -s is to concatenate everything in one, based on the -d options

This is what we need to calculate all lines in the files. Basically we will have the following scenario until now:

  • find will print the file one by one:

/path/to/file1.ext /path/to/file2.ext2

  • xargs will execute wc and it will show how many lines are in every file:

55 /path/to/file1.ext 323 /path/to/file2.ext2

  • awk command will get only the line numbers:

55 323

  • and now paste will concatenate everything with + sign:

55+323

Wow. We have the formula, so now, someone have to calculate everything. Luckily, we have bc command for this. See below.

bc command

Basically, bc is a arithmetic language that can calculate the provide input. Of course, bc is way powerful than I can show you or explain, but think about it like to calculate expressions (like the one that we have from above). See man bc for more information. So until now, we have the arithmetic expression and we can pass it to bc to be calculated and printed. In your example is very easy, by just calling bc without any arguments or options

Aaand we have the result

Executing the entire command find . -type f | xargs wc -l | egrep 'total$' | awk '{print $1}' | paste -sd+ - | bc will print you just the total number of files that are in the current directory.

Conclusion

In Unix systems, piping commands can be very useful and can lead to a different results (desired or not). Every SysAdmin is using piping in the terminal, because we do have time to write a new peace of software that combine two already build. We can just pipe the result of one command to be argument (or input more correctly) to another program and get the desired result.

Backup MySQL database and upload them to AWS S3

Written by Georgi Stefkoff

Introduction

When you have a database server (production or not), you may want to back up the database regular, because everything can happen - disk failure, hardware failure, etc.

In this blog post, I will how you how to back up all databases, compress them and upload them to AWS S3 object storage.

Requirements

  • This tutorial will be valid for Ubuntu 22.04.
  • You need to have a running MySQL server
  • You need to have mysqldump script installed on the machine that you will making the back-up. It could be installed

by sudo apt-get install mysql-client

  • You also need to have root access to

all the database that needs to be dumped.

  • Make sure that you will have enough free disk space, because depending on the size of the file, you may run out of

disk space when the dump is generated

  • You need to have an AWS account and credentials (AWS KEY and AWS SECRET KEY)

Dump all databases to an .sql file

In order to dump all the database, run the following command:

mysqldump -u USER -p --all-databases --skip-add-locks > alldb.sql // Replace USER with your user that has access to all the database

  • --skip-lock-tables will not lock the tables when a query is executed

When running this command, you will be prompted to enter the password for your user. If you wish to include the password at the command you can do the following:

mysqldump -u USER -pPASSWRD --all-databases --skip-add-locks > alldb.sql

You can see that the password (PASSWORD) is appended immediately after -p flag. Note: if the password contains special characters, you may surround the password in ' like this:

mysqldump -u USER -p'PASSWORD' --all-databases --skip-add-locks > alldb.sql

Exposing the password to the command line is not a good practice, because it can be seen by anyone

After some amount of time, your .sql file will be generated and will contain the SQL of all the databases from the server.

Compressing the .sql file

When we dump all the databases, there were dumped in a text format, which takes a lot of space. Now, we will modify the above command in order to compress the .sql file and reduce the size of the dump file.

We are going to use GZIP command. Make sure that you have installed gzip in your Ubuntu installations:

sudo apt update
sudo apt install gzip

Now, execute the following command:

mysqldump -u USER -p'PASSWORD' --all-databases --skip-add-locks | gzip > backup.$(date +%F.%H%M%S).sql.gz

Here we are using the same command, but instead to redirect the output from the dump command, we are piping the output to gzip and redirecting the output from the compression to a .sql.gz file. Also, for the name of the compressed file, we are specifying the current date

Executing this command will take more time from the last time, but at least the end result will save some space.

If you list the content of the directory where the back file is, you will see the file is actually there:

stefkoff@stefkoff-mysql-server:~$ ls -la | grep backup.
-rw-rw-r--  1 stefkoff stefkoff  8860655596 May  9 12:30 backup.2024-05-09.115631.sql.gz
stefkoff@stefkoff-mysql-server:~$ 

Creating the bucket

Now, when we have the file, it is time to create a bucket inside AWS. First login to your console, navigate the S3, and create a new bucket. Enter the name for you bucket and the rest of the options as there are.

Upload the dump to S3

We are going to use the s3cmd package in order to upload the files to the S3 bucket. If you do not have the package installed on your system yet, you can install it by:

sudo apt update
sudo apt install s3cmd

After s3cmd is installed, you need to configure it, in order to save your credentials to a local file for further usage. You have to run the following command:

s3cmd --configure

First af all, you will be prompted to your Access Key. Then the same for the Secret Key. Next is the default region. It is good to specify the default region of you AWS S3 bucket. Then confirm the endpoint s3.amazonaws.com (or just press enter). Confirm all the rest of the setting and hit two Yes on the last questions. At the end s3cmd will generate a config if file as follows /home/$USER/.s3cfg. Now we can use s3cmd to upload files to the bucket.

In order to upload the backup file, you can execute the following command:

s3cmd put backup.2024-05-09.115631.sql.gz s3://BUCKET_NAME // replace BUCKET_NAME with your S3 actual bucket name

After few minutes (depending on the file size) you file will be uploaded to AWS S3 bucket with the same name as it is.

Putting all together

We have managed to run all of this, but we have to do it at once and probably by some cron task, so it can be done automatically at certain time of the day. The thinks that we will do in a single shell file are the following:

  • Create the compressed back-up file
  • Upload the file to AWS S3

Here is how you can do it in one shell-script:

#!/bin/bash

# fail on every error
set -e

# save the current date to be used for the filenames later
CURRENT_DATE=$(date +%F.%H%M%S)
# variable for the backup filename
BACKUP_FILENAME="backup.$CURRENT_DATE.sql.gz"
# mysql user
MYSQL_USER=root
# mysql password
MYSQL_PASSWORD=password
#extra options that will be passed to mysql dump command
MYSQL_EXTRA_PARAMS="--all-databases --skip-add-locks"
# mysqldump absolute location
MYSQLDUMP_COMMAND=$(which mysqldump)
S3_BUCKET_NAME="stefkoff-db-dumps"

# make sure that mysqldump is installed
if [ -z "$MYSQLDUMP_COMMAND" ]; then
  echo "mysqldump command not found. Exiting"
  exit 1
fi

# s3cmd absolute path
S3CMD_COMMAND=$(which s3cmd)
if [ -z "$S3CMD_COMMAND" ]; then
  echo "s3cmd command not found. Exiting"
  exit 1
fi

#gzip absolute path
GZIP_COMMAND=$(which gzip)

# make sure gip is installed
if [ -z "$GZIP_COMMAND" ]; then
  echo "gzip command not found. Exiting"
  exit 1
fi

# dump all database
/bin/bash -c "$MYSQLDUMP_COMMAND -u $MYSQL_USER -p'$MYSQL_PASSWORD' $MYSQL_EXTRA_PARAMS | $GZIP_COMMAND > $BACKUP_FILENAME"

# check if the backup file exists
if [ ! -f "$BACKUP_FILENAME" ]; then
  echo "Something went wrong and cannot upload the file to S3. Exiting"
  exit 1
fi

/bin/bash -c "$S3CMD_COMMAND put $BACKUP_FILENAME s3://$S3_BUCKET_NAME"

# remove the backup file, one it is successfully uploaded to the bucket
rm "$BACKUP_FILENAME"

# (OPTIONAL) remove previously uploaded files
/bin/bash -c "$S3CMD_COMMAND ls s3://$S3_BUCKET_NAME | awk '{print \$4}' | grep -v s3://$S3_BUCKET_NAME/$BACKUP_FILENAME | xargs -I {} $S3CMD_COMMAND del {}"

NOTE the last command, it will delete the previous uploaded backups from the bucket

Save the script as backup-db.sh somewhere and allow the users to execute the file: chmod +x backup-db.sh

If you execute the file by ./backup-db.sh all above the operations should be done in one time

Setup cron task

Now we want to execute the script at certain recurring time, so we want to be sure that we will have a "fresh" backup, before the point of failure. You can add the following cron record:

0 0 * * 6 /bin/bash -c "cd /home/USER && ./backup-db.sh" // replace USER with your user

In my case, I will execute the command once a week (sunday at 00:00). It is good to have a daily back, and you can replace it by:

0 0 * * * /bin/bash -c "cd /home/USER && ./backup-db.sh"

Conclusion

Doing a regular backups, especially for a DB servers is a MUST and everyone have to do it, not only on production environments, but also for a local usage, because when we lost the data, recovering will take a huge amount of time, which can be critical when working.