AWS Certified Big Data - Specialty (#76)

An organization is developing a mobile social application and needs to collect logs from all devices on which it is installed. The organization is evaluating the Amazon Kinesis Data Streams to push logs and Amazon EMR to process data. They want to store data on HDFS using the default replication factor to replicate data among the cluster, but they are concerned about the durability of the data. Currently, they are producing 300 GB of raw data daily, with additional spikes during special events. They will need to scale out the Amazon EMR cluster to match the increase in streamed data. Which solution prevents data loss and matches compute demand?

Use multiple Amazon EBS volumes on Amazon EMR to store processed data and scale out the Amazon EMR cluster as needed.
Use the EMR File System and Amazon S3 to store processed data and scale out the Amazon EMR cluster as needed.
Use Amazon DynamoDB to store processed data and scale out the Amazon EMR cluster as needed.
use Amazon Kinesis Data Firehose and, instead of using Amazon EMR, stream logs directly into Amazon Elasticsearch Service.