The etcd server is the only stateful component of the Kubernetes cluster. Kubernetes stores all API objects and settings on the etcd server. Etcd backup is enough to restore the Kubernetes cluster’s state completely. Kubernetes disaster recovery plans often include backing up the etcd cluster and using infrastructure as code to create new cloud servers.
In this blog, we will cover How to Backup & Restore the Etcd in Kubernetes. We have a set of Hands-on Labs that you must perform in order to learn Docker & Kubernetes and clear the CKA certification exam. Cluster Architecture, Installation & Configuration which includes etcd backup and restore, has a total weightage of 25% in the Exam.
In this blog post, we are going to cover the following topics:
- What is Etcd?
- Kubernetes and Etcd
- Prerequisites
- Installing and Placing Etcd Binaries
- Find K8s Manifest Location
- How to backup the Etcd & Restore it
- Scenarios & Use Cases
- Conclusion
What is Etcd?
Etcd is a distributed key-value store with high consistency that provides a secure mechanism to store data that must be accessible by a distributed system or cluster of machines. During network partitions, it gently conducts leader elections and can withstand machine failure, even in the master node.
Etcd is used in a variety of different applications. It is most famous for being the core datastore for Kubernetes, the de facto standard for container orchestration. Cloud-native apps that use etcd can have more continuous uptime and keep operating even when individual servers fail. Applications read and write to etcd, which distributes configuration data and provides redundancy and robustness for node configuration.
Kubernetes and Etcd
Etcd is the Kubernetes‘ primary datastore, that stores and duplicates all Kubernetes cluster states. Because etcd is such a vital component of a Kubernetes cluster, it’s critical that it’s configured and managed properly.
The cluster configuration of etcd can be challenging because it is a distributed consensus-based system. Bootstrapping, maintaining quorum, adjusting cluster membership, making backups, dealing with disaster recovery, and monitoring crucial events are all time-consuming and difficult operations that require specialist knowledge. In Etcd we can create backup & restore within the cluster and we can run this on the separate server for the HA cluster.
Pre Requisite
Make sure you have a K8s cluster deployed already.
Learn How To Setup A Three Node Kubernetes Cluster For CKA
Installing and Placing Etcd Binaries
Users mostly interact with etcd by putting or getting the value of a key. We do that by using etcdctl, a command line tool for interacting with etcd server. In this section, we are downloading the etcd binaries so that we have the etcdctl tool with us to interact.
1) Create a temporary directory & download the ETCD binaries.
$ mkdir -p /tmp/etcd && cd /tmp/etcd $ curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest | grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -qi -
2) Unzip the compressed binaries:
$ tar xvf *.tar.gz
3. Move the etcd folder to /local/bin/ makes the etcd binary globally accessible on your system, simplifying the process of running etcd commands.
$ cd etcd-*/ $ mv etcd* /usr/local/bin/ $ cd ~ $ rm -rf /tmp/etcd
Find K8s Manifest Location
In Cluster we can check manifest default location with the help of the kubelet config file.
# cat /var/lib/kubelet/config.yaml
With this Manifest location, you can check the Kubernetes static pods location and find Api-server and ETCD pod location then under these pods you can check certificate file and data-dir location.
How to backup the Etcd & Restore it
The etcd server is the only stateful component of the Kubernetes cluster. Kuberenetes stores all API objects and settings on the etcd server.
Backing up this storage is enough to restore the Kubernetes cluster’s state completely.
Taking Snapshot and Verifying it:
1) Check backup Command flag which you need to include in the command
$ ETCDCTL_API=3 etcdctl snapshot backup -h
2) Take a snapshot of the etcd datastore using etcdctl:
$ ETCDCTL_API=3 etcdctl snapshot save snapshot.db --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key
3) View that the snapshot was successful:
$ ETCDCTL_API=3 etcdctl snapshot status --write-out=table snapshot.db
Note: Important Note: If you are backing up and restoring the cluster do not run the status command after the backup this might temper the backup due to this restore process might fail.
Always make a Backup to avoid data loss:
Make a compressed copy of everything in the etcd directory, including all files and folders, certificates, key files, and anything else in that folder.
$ tar -zcvf etcd.tar.gz /etc/kubernetes/pki/etcd
In this step, we compressed all the files stored in the etcd folder and kept them as a backup.
Restoring Etcd From Snapshot & Verify:
1) Check the present state of the cluster which is stored in present snapshot taken in above task:
$ kubectl get all
2) To verify, we will now create a pod. Since the new pod is not present in the snapshot, it will not be available when we restore the content using the restore command.
$ kubectl run testing-restore --image=nginx $ kubectl get pods
3) Check restore Command flag which you need to include in command
$ ETCDCTL_API=3 etcdctl snapshot restore -h
4) To restore we will have to first delete the present ETCD content. So lets look into and grab all the details we need for the restore command to execute
$ cat /etc/kubernetes/manifests/etcd.yaml
5) Will delete the present content of ETCD and execute the restore command
$ rm -rf /var/lib/etcd $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --name=kubeadm-master --data-dir=/var/lib/etcd --initial-cluster=kubeadmmaster=https://10.0.0.4:2380 --initial-cluster-token=etcd-cluster-1 --initial-advertise-peerurls=https://10.0.0.4:2380
6) Verify that the cluster is back to status of which we had taken the snapshot
$ kubectl get pods
Here, you can verify that the ‘testing-restore’ pod is not present because it was not saved in the snapshot.db, The rest of the data saved in snapshot.db has been successfully restored.
Congratulations! We are now successfully done with the backup & restoration process of our ETCD cluster in Kubernetes.
Scenarios & Use Cases:
- Prevent Data Loss:
- Regular backups to avoid accidental data deletion or misconfigurations.
- Disaster Recovery:
- Off-site backups for recovering from catastrophic etcd cluster failures.
- Cluster Migration:
- Backup before migrating, restore on the new cluster for a seamless transition.
- Rollback to Stable State:
- Use backups to revert the cluster to a stable state after faulty changes.
- Testing & Development:
- Snapshot before testing changes; restore if issues arise.
Conclusion
For a single control plane arrangement, a Kubernetes cluster with infrequent API server changes is a fantastic alternative. Backups of the etcd cluster on a regular basis will reduce the time frame for potential data loss.
Frequently Asked Questions
What is etcd?
etcd is a distributed key-value store that is often used as the primary data store for distributed systems, particularly in Kubernetes clusters. It is designed for reliability and fault-tolerance and is a crucial component in many container orchestration platforms.
Why is backing up etcd important?
etcd stores the configuration data of a cluster, including metadata about the state of the cluster and the data needed for distributed coordination. Backing up etcd is essential to ensure data recovery in case of accidental data loss, hardware failures, or other disasters.
How often should I perform etcd backups?
The frequency of etcd backups depends on factors like the rate of data changes and the criticality of the system. In production environments, it's common to schedule regular backups, such as daily or hourly, to minimize data loss in the event of a failure.
Can I automate the etcd backup process?
Yes, the etcd backup process can be automated using scripts and scheduling tools. You can create a script that runs the etcdctl snapshot save command and use a tool like cron (on Linux) or Task Scheduler (on Windows) to schedule regular backups.
What precautions should I take during the etcd restore process?
Before restoring from a backup, ensure that you have a good understanding of the etcd cluster's state. It's important to stop the etcd service, perform the restore, and then restart etcd. Additionally, verify the integrity of your backup files.
Related/References
- Visit our YouTube channel “K21Academy”
- Certified Kubernetes Administrator (CKA) Certification Exam
- (CKA) Certification: Step By Step Activity Guides/Hands-On Lab Exercise & Learning Path
- Certified Kubernetes Application Developer (CKAD) Certification Exam
- (CKAD) Certification: Step By Step Activity Guides/Hands-On Lab Exercise & Learning Path
- Create AKS Cluster: A Complete Step-by-Step Guide
- Container (Docker) vs Virtual Machines (VM): What Is The Difference?
- How To Setup A Three Node Kubernetes Cluster For CKA: Step By Step
- CKA/CKAD Exam Questions & Answers 2022
Join Our Free Class
Discover the Power of Kubernetes, Docker & DevOps – Join Our Free Masterclass. Unlock the secrets of Kubernetes, Docker, and DevOps in our exclusive, no-cost masterclass. Take the first step towards building highly sought-after skills and securing lucrative job opportunities. Click on the below image to Register Our FREE Masterclass Now!
Niraj says
Is backup and restore procedure applicable at node level or cluster level? I have 3 master nodes. While backing up, do I run backup command on each of the 3 master nodes or just1. Similarly, while restoring, do I delete /var/lib/etcd on each node and then run the restore command?
Surbhi Sharma says
Hello Niraj,
The backup and restore procedure for a Kubernetes cluster’s etcd database is typically performed at the cluster level, rather than at the node level. This means that you would typically run the backup and restore commands on one of the master nodes in the cluster, and the changes would be propagated to all of the other master nodes automatically.
When backing up the etcd database, you only need to run the backup command on one of the master nodes in the cluster. This is because the etcd database is replicated across all of the master nodes in the cluster, so backing up the database from one node will capture all of the data in the database.
Similarly, when restoring the etcd database, you should only need to delete the /var/lib/etcd directory on one of the master nodes, and then run the restore command on that node. The changes will be automatically propagated to the other master nodes in the cluster.
It’s important to note that you should only perform backup and restore operations on the etcd database if you are experienced with Kubernetes and have a good understanding of the implications of doing so. Backup and restore operations can be complex and can have significant consequences if not performed correctly. You should always ensure that you have a good backup strategy in place before attempting any backup or restore operations on your Kubernetes cluster.
some reference links to documentation that can help you understand how to backup and restore the etcd database in a Kubernetes cluster:
Kubernetes documentation on backing up etcd: https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster
Kubernetes documentation on restoring etcd from a backup: https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#restoring-an-etcd-cluster
CoreOS documentation on etcd backup and restore: https://coreos.com/etcd/docs/latest/op-guide/recovery.html
Regards,
Team K21