Have you ever moved data from one computer to another with a pen drive or a CD? In this case, a PenDrive or CD serves as a data transfer agent. Similarly, AWS DataSync is a service that allows us to easily transport data between storage systems and services.
When numerous systems running on different architectures with distinct file systems are involved, data transmission might be an essential task. AWS DataSync makes moving data between various platforms easier, faster, and more automated.
We’re going to go over everything you need to know about AWS DataSync in this blog.
- Overview
- Concepts and Terminology
- How DataSync works?
- Feature of DataSync
- Benefits of DataSync
- Why use AWS DataSync
- DataSync vs Direct Connect vs Snowball vs Storage Gateway
- Pricing
- FAQs
Overview:
Traditionally, companies have been hosting their applications and storing data on-premises. That means investing in hardware upfront. Cloud being highly elastic, scalable, and pay-as-you-go pricing has attracted many customers to shift to the cloud. But then the issue arises – how to shift data stored on-premises to the cloud? Or from one cloud to another? AWS DataSync is the answer to all the questions that concern moving data from one point to another!
AWS DataSync is a cloud-based service that streamlines the process of transferring data to and from AWS. It also facilitates the movement of data between on-premises storage, edge locations, and other cloud platforms. The service simplifies, automates, and expedites the copying process, making it easy to move data between various storage locations, including between different AWS Storage services.
AWS DataSync can copy data from:
-
- Network File System (NFS) file servers
- Server Message Block (SMB) file servers
- Hadoop Distributed File System (HDFS)
- Object storage systems
- Amazon Simple Storage Service (Amazon S3) buckets
- Amazon EFS file systems
- Amazon FSx
- AWS Snowcone devices
- Google Cloud Storage buckets
- Azure Files
Concepts and Terminology
- Agent: An agent is a self-hosted virtual machine (VM) that is used to read or write data from storage systems. The agent can be installed on VMware ESXi, Linux Kernel-based Virtual Machine (KVM), Microsoft Hyper-V, or as an Amazon EC2 instance. To set up and activate your agent, you use the DataSync console, AWS CLI, or DataSync API.
- Location: A location specifies where you want to copy data from or to. Each DataSync transfer (also known as a task) has a starting and ending point.
- Task: A DataSync transfer is described by a task. It specifies a source and destination location, as well as instructions for copying data between those locations.
- Task completion: Task execution is a single instance of a DataSync task. A task execution consists of several stages.
How DataSync Works?
When you initiate a transfer, DataSync assesses the storage systems at the source and destination to determine what needs to be synchronized. It detects inconsistencies between the two systems by recursively analyzing the contents and metadata of both systems. This could take a few minutes or several hours, depending on the volume of files or objects.
DataSync then begins transferring your data (including metadata) from the source to the destination based on the transfer settings you selected. DataSync, for example, constantly verifies the integrity of the data during a transfer. After the transfer is complete, DataSync can verify the full dataset between the two sites, or only the data you copied.
Features of DataSync:
- You are able to schedule tasks to run on a regular basis to identify and copy modifications from your source storage device to the target storage system.
- DataSync supports VPC endpoints (provided by AWS PrivateLink) for moving files securely into your Amazon VPC.
- AWS DataSync can be used to copy files into EFS, and files that have not been viewed for a specific length of time can be transferred to Infrequent Access (IA) storage.
- DataSync works in combination with Amazon EventBridge. You can configure it for sending an event when a file transfer is completed, automating your operations.
- It allows you to store data instantly in any S3 storage class. You don’t need to apply lifecycle policies or transfer data manually to S3.
Benefits of DataSync:
- DataSync supports almost all types of storage systems.
- It is a service offered by AWS, there is no need to purchase any resources to build a connection to AWS servers.
- DataSync supports task scheduling for recurring events and trigger transfers based on a specific event.
- The protocol can be customized according to the need i.e. how data is sent over the network.
- Data is encrypted both in transit and at rest. The integrity of the data is also checked at the destination. AWS DataSync was built with security in mind, and it includes a number of security safeguards to protect data in transit and at rest.
- AWS DataSync uses a combination of network optimization and parallel file transfers to achieve high-speed data transfer rates. In order to improve transfer performance, the agent divides the data into smaller chunks and sends each one in parallel.
- When a task executes, you can define an exclude filter, an include filter, or both to control which files, folders, or objects are transferred.
Why use AWS DataSync:
- Migrate your data: As stated above, it can be used to move data from anywhere to AWS storage services. It automatically encrypts data in motion and also validates the integrity at the destination.
- Backups: If there is limited availability of data storage capacity on-premises, you can move less frequently used data or data stored for compliance requirements into the AWS cloud and free up the space for your adventures!
- Data-intensive tasks: Transfer large volumes and TBs of data to AWS cloud and back to on-premises, for data-intensive processing. Tasks such as batch processing, deep learning in research, graphics rendering in the entertainment and media industry, data analytics and interpretation in the financial sector, etc. are effortlessly carried out.
DataSync vs Direct Connect vs Snowball vs Storage Gateway:
Service | Description | Use Case |
AWS DataSync | AWS DataSync is a managed data transfer service that simplifies and automates moving data between on-premises storage and Amazon S3, Amazon FSx for Windows File Server, or Amazon Elastic File System (EFS). It provides fast and secure data transfers with flexible scheduling options | It provides fast and secure data transfers with flexible scheduling options and can be used for a variety of use cases, such as data backup and disaster recovery, migration to the cloud, and data distribution. AWS DataSync is optimized for fast and efficient data transfer between on-premises storage and AWS storage services |
AWS Direct Connect |
AWS Direct Connect is a network service that provides dedicated network connections from on-premises data centers to AWS. It offers a reliable, high-bandwidth, low-latency connection to AWS, bypassing the public internet. |
Direct Connect is useful for use cases where high-speed, low-latency connectivity is required, such as data transfers between on-premises storage and AWS, and VPN connections to AWS resources. |
The Snowball family | It consists of physical storage appliances that you can use to transfer large amounts of data into and out of AWS. The Snowball family includes Snowball Edge, Snowball, and Snowmobile | Snowball Edge is a device that has built-in storage and computing resources, making it ideal for transferring large amounts of data in remote locations where a high-speed internet connection may not be available. |
AWS Storage Gateway | It is a hybrid storage service that enables you to store data in the AWS Cloud and access it as a network file system. |
AWS Storage Gateway is optimized for hybrid storage scenarios where data needs to be stored both on-premises and in the AWS Cloud. |
Pricing:
Pricing for AWS DataSync is straightforward and reliable. Pay a flat payment per gigabyte based on the location you are in (Region) and just for the data you move. The utilization of network acceleration technologies, managed cloud infrastructure, data validation, and automation features are all included in this cost. DataSync has no upfront expenses, no minimum payment, or resources to manage.
FAQs:
What are the resource requirements for the AWS DataSync agent?
The virtual machine with four virtual processors. The disk space required for the installation of the VM image and system data is 80 GB. The amount of RAM required will vary based on the transfer scenario. For the transfer of up to 20 million files, the virtual machine has been allocated 32 GB of RAM. However, for transfers exceeding 20 million files, 64 GB of RAM has been assigned to the virtual machine.
How does AWS DataSync access my Amazon S3 bucket?
DataSync utilizes an IAM role that is specified by the user. The actions that the role can perform are determined by the policy attached to it. The role can either be automatically generated by DataSync or configured manually by the user.
Can I filter the files and folders that AWS DataSync transfers?
It is possible to define limitations on which files, folders, or objects are transferred during a task execution by using either an exclude filter, an include filter or both. The include filter defines the specific file paths or object keys that should be included and restricts the range that is examined by DataSync on both the source and destination.
How fast can AWS DataSync copy my file system to AWS?
The speed at which DataSync can transfer a specific dataset is dependent on various factors such as the size of the data, the I/O bandwidth attainable from the source and destination storage, the network bandwidth that is accessible, and the state of the network. For data transfers between on-premise and AWS storage services, a single DataSync task can optimize a 10 Gbps network connection.
Related Links/References
- AWS Free Tier Account Details
- Top 10 Must-Have AWS Cloud Migration Tools in 2023
- AWS Cloud Migration: Step-by-Step Activity Guide
- AWS Database Migration Service: Everything You Need To Know
- AWS Certified Solutions Architect Associate SAA-C03 Exam details
- AWS Virtual Private Network (AWS VPN): Everything You need to Know
Next Task For You
Begin your journey towards an AWS Cloud by joining our FREE Informative Class on Amazon Cloud Free Class by clicking on the below image.
Leave a Reply