AWS Certified Big Data - Specialty (#70)

An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR. They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon EMR environment. They have multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G AWS Direct Connect setup between their data center and AWS, and the network team has agreed to allocate 50% of AWS Direct Connect bandwidth to data transfer. The data transfer cannot take more than two days. What would be the MOST efficient approach to transfer data to AWS on a monthly basis?

Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to Amazon S3.
Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct Connect.
Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct Connect.
Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3 over AWS Direct Connect.