AWS Certified Big Data - Specialty (#33)

A customer has a machine learning workflow that consists of multiple quick cycles of reads-writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles. How should the customer accomplish this?

Turn on EMRFS consistent view when configuring the EMR cluster.
Use AWS Data Pipeline to orchestrate the data processing cycles.
Set
hadoop.data.consistency = true
in the
core-site.xml
file.
Set
hadoop.s3.consistency = true
in the
core-site.xml
file.