AWS Certified Big Data - Specialty (#24)

An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, Amazon Redshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog. How can this be accomplished?

Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the DynamoDB table.
Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.
Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the catalog.
Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.

Need help?