AWS Certified Big Data - Specialty (#31)

An organization uses a custom map reduce application to build monthly reports based on many small data files in an Amazon S3 bucket. The data is submitted from various business units on a frequent but unpredictable schedule. As the dataset continues to grow, it becomes increasingly difficult to process all of the data in one day. The organization has scaled up its Amazon EMR cluster, but other optimizations could improve performance. The organization needs to improve performance with minimal changes to existing processes and applications. What action should the organization take?

Use Amazon S3 Event Notifications and AWS Lambda to create a quick search file index in DynamoDB.
Add Spark to the Amazon EMR cluster and utilize Resilient Distributed Datasets in-memory.
Use Amazon S3 Event Notifications and AWS Lambda to index each file into an Amazon Elasticsearch Service cluster.
Schedule a daily AWS Data Pipeline process that aggregates content into larger files using S3DistCp.
Have business units submit data via Amazon Kinesis Firehose to aggregate data hourly into Amazon S3.