AWS DataSync features

AWS DataSync

AWS DataSync is an online data movement and discovery service that simplifies and accelerates data migrations to AWS and helps you move data quickly and securely between on-premises storage, edge locations, other cloud providers, and AWS Storage.

Data Movement

For online data transfers, AWS DataSync simplifies, automates, and accelerates copying large amounts of data between on-premises storage, edge locations, or other cloud providers, and AWS Storage services. DataSync can copy data to and from Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, object storage in other clouds such as Google Cloud Storage and Wasabi Cloud Storage (see the full list of support clouds), Azure Files, Azure Blob Storage (including Azure Data Lake Storage Gen2), Amazon S3 compatible storage on Snowball Edge, Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSx for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems.

Purpose-Built Network Protocol

AWS DataSync employs an AWS-designed transfer protocol—decoupled from the storage protocol—to accelerate data movement. The protocol performs optimizations on how, when, and what data is sent over the network. Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.

Connections between the local DataSync agent and the in-cloud service components are multi-threaded, maximizing performance over your Wide Area Network (WAN). A single DataSync task is capable fully utilizing 10 Gbps over a network link between your on-premises environment and AWS.

Data Encryption and Validation

All your data is encrypted in transit between the DataSync agent and the DataSync service using Transport Layer Security (TLS). DataSync supports using default at-rest encryption for Amazon S3 buckets. DataSync also supports encryption of data at rest and in transit for Amazon EFS and Amazon FSx.

DataSync ensures that your data arrives intact. For each transfer, the service performs integrity checks both in transit and at rest. These checks ensure that the data written to your destination matches the data read from your source, validating consistency.

Multicloud Data Movement

AWS DataSync helps you move data between AWS, on-premises file systems, and other cloud storage services. AWS has continued to extend its cloud services to help customers streamline, manage, and govern their hybrid and multicloud infrastructure and applications. For customers who operate in multicloud environments, AWS DataSync can now move data to and from storage on various clouds. In addition to support for Google Cloud Storage, Azure Files, and Azure Blob Storage, with DataSync, you can move your object data at-scale between S3-compatible storage on other clouds and AWS Storage services such as Amazon S3. This includes support for object storage on Wasabi Cloud, Oracle Cloud, Cloudflare, DigitalOcean Spaces, and Backblaze, among others.

Learn more:

Bandwidth Optimization and Control

Transferring hot or cold data should not impede your business. DataSync is equipped with granular controls to optimize bandwidth consumptions. Throttle transfer speeds up to 10 Gbps during off hours and set limits when network availability is needed elsewhere.

File System Integration and Metadata Preservation

The DataSync agent connects to your existing storage systems using the industry-standard NFS and SMB protocols, to your Hadoop cluster as an HDFS client, to your self-managed object storage or Google Cloud Storage using the Amazon S3 application programming interface (API), or to Azure Blob Storage using the Blob API. The agent transfers data rapidly and writes it into your designated Amazon S3 bucket, Amazon EFS file system, Amazon FSx for Windows File Server file system, or Amazon FSx file system.

File permissions and metadata are preserved when copying objects and or data between Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for OpenZFS, or Amazon FSx for NetApp ONTAP.

When copying data to Amazon S3, DataSync automatically converts each file to a single S3 object in a 1:1 relationship, and preserves POSIX metadata from NFS shares or HDFS as Amazon S3 object metadata. When you copy objects containing file system metadata back to file formats, the original file metadata (that DataSync copied to S3) is restored.

Data Transfer Scheduling

DataSync comes with a built-in scheduling mechanism, allowing you to periodically run data transfer tasks to detect and copy changes from your source storage system to the destination. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI) without writing scripts to manage repeated transfers. Task scheduling automatically runs tasks on your configured schedule with hourly, daily, or weekly options provided directly in the AWS Console.

Monitoring and Auditing

DataSync task reports provide JSON-formatted output files that include a summary and detailed reports for all files transferred, skipped, verified, and deleted, enabling you to easily verify and audit the data transfer operations for each task execution. Task reports are generated after the completion of your transfer tasks and they are stored in your Amazon S3 bucket. This allows you to easily use AWS services such as AWS Glue, Amazon Athena, and Amazon QuickSight to automatically catalog, analyze, and visualize task report output to check the progress of your data transfers across all task executions. Task reports simplify tracking and auditing, enabling you to easily understand common task execution trends or failure patterns, and gain critical insights into your data transfer processes.

With Amazon CloudWatch, you can monitor the status of any DataSync transfers currently in progress and check previous data transfer history. With CloudWatch Metrics, you can see the number of files and amount of data copied. Consult CloudWatch Logs for information about individual files transferred at a given time, as well as the results of DataSync integrity verification. This simplifies monitoring, reporting, and troubleshooting, enabling you to provide timely updates to stakeholders. In addition, CloudWatch Events are triggered as your transfer tasks complete, enabling automation of dependent workflows. For audit purposes, you can consult AWS CloudTrail, which logs all actions performed by DataSync.

Discovery

AWS DataSync Discovery helps you simplify migration planning and accelerate data migration to AWS by giving you visibility into your on-premises storage performance and utilization, and providing recommendations for migrating your data to AWS Storage services, such as Amazon FSx for NetApp ONTAPAmazon FSx for Windows File Server, and Amazon Elastic File System (EFS). DataSync Discovery enables you to better understand your on-premises storage performance and capacity usage through automated data collection and analysis, enabling you to quickly identify data to be migrated and use generated recommendations to select AWS Storage services that align to your performance and capacity needs.

Pay-As-You-Go Pricing

With AWS DataSync, you pay only for your usage of the service. No software licenses, contracts, or maintenance fees are required. This provides a lower total cost of ownership (TCO) compared to manually building, operating, and optimizing your own high-performance scripted transfers, as well as lower total cost than buying and running commercial transfer tools.

Using AWS DataSync Discovery, you can run discovery jobs for up to 31 days and receive recommendations free of charge. DataSync Discovery keeps collected data and associated recommendations for 60 days following job completion.

Integration with AWS Infrastructure and Management Services

DataSync works natively with AWS security, monitoring, and audit services to simplify data movement and to provide a consistent management experience for your IT, storage, and DevOps teams. In addition to integrations with Amazon S3, Amazon EFS, and Amazon FSx, DataSync supports AWS Virtual Private Cloud (VPC) endpoints (powered by AWS PrivateLink) to move files directly into your Amazon VPC. Like other AWS services, you can use AWS Identity and Access Management (IAM) to securely manage DataSync access. Similarly, you can configure an IAM role to control the services accessing your Amazon S3 bucket.