Free Questions on AWS Data Engineer Associate

AWS Certified Data Engineer - Associate Exam Dumps

Domain	Weightage %
Domain 1: Data Ingestion and Transformation	34%
Domain 2: Data Store Management	26%
Domain 3: Data Operations and Support	22%
Domain 4: Data Security and Governance	18%

1 / 65

1. A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server databases, Amazon DynamoDB tables that are in provisionedcapacity mode, and an Amazon Redshift cluster. A data engineering team must develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL.

Which solution will meet these requirements with the LEAST operational overhead?

A) Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format

B) Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Redshift Spectrum to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format

C) Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use AWS Glue jobs to transform data that is in JSON format to Apache Parquet or .csv format. Store the transformed data in an S3 bucket. Use Amazon Athena to query the original and transformed data from the S3 bucket

D) Use AWS Lake Formation to create a data lake. Use Lake Formation jobs to transform the data from all data sources to Apache Parquet format. Store the transformed data in an S3 bucket. Use Amazon Athena or Redshift Spectrum to query the data

2 / 65

2. A company is running an Amazon Redshift data warehouse on AWS. The company has recently started using a software as a service (SaaS) sales application that is supported by several AWS services. The company wants to transfer some of the data in the SaaS application to Amazon Redshift for reporting purposes.

A data engineer must configure a solution that can continuously send data from the SaaS application to Amazon Redshift.

Which solution will meet these requirements with the LEAST operational overhead?

A) Create an Amazon AppFlow flow to ingest the selected source data to Amazon Redshift. Configure the flow to run on event.

B) Create an Amazon EventBridge rule that reacts to selected data creation events in the SaaS application. Send the events to Amazon Redshift.

C) Create an Amazon Redshift user-defined function (UDF) in AWS Lambda that can transfer data between the SaaS application and Amazon Redshift. Configure the SaaS application to invoke the Lambda function.

D) Deploy an Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflow. Configure the SaaS application to send the data to Amazon MWAA and output the data by using an Amazon Redshift Java Database Connectivity (JDBC) connector.

3 / 65

3. A financial institution requires a message queuing service to decouple various components of its microservices architecture. The system needs to handle variable message volumes with minimal latency. Which AWS service should they use?

A) Amazon SQS

B) Amazon SNS

C) Amazon Glue

D) Amazon RDS

4 / 65

4. A company uses AWS Step Functions to orchestrate a data pipeline. The pipeline consists of Amazon EMR jobs that ingest data from data sources and store the data in an Amazon S3 bucket. The pipeline also includes EMR jobs that load the data to Amazon Redshift.
The company's cloud infrastructure team manually built a Step Functions state machine. The cloud infrastructure team launched an EMR cluster into a VPC to support the EMR jobs. However, the deployed Step Functions state machine is not able to run the EMR jobs.

Which combination of steps should the company take to identify the reason the Step Functions state machine is not able to run the EMR jobs? (Choose two.)

A) Use AWS CloudFormation to automate the Step Functions state machine deployment. Create a step to pause the state machine during the EMR jobs that fail. Configure the step to wait for a human user to send approval through an email message. Include details of the EMR task in the email message for further analysis.

B) Verify that the Step Functions state machine code has all IAM permissions that are necessary to create and run the EMR jobs. Verify that the Step Functions state machine code also includes IAM permissions to access the Amazon S3 buckets that the EMR jobs use. Use Access Analyzer for S3 to check the S3 access properties.

C) Check for entries in Amazon CloudWatch for the newly created EMR cluster. Change the AWS Step Functions state machine code to use Amazon EMR on EKS. Change the IAM access policies and the security group configuration for the Step Functions state machine code to reflect inclusion of Amazon Elastic Kubernetes Service (Amazon EKS).

D) Query the flow logs for the VPC. Determine whether the traffic that originates from the EMR cluster can successfully reach the data providers. Determine whether any security group that might be attached to the Amazon EMR cluster allows connections to the data source servers on the informed ports.

E) Check the retry scenarios that the company configured for the EMR jobs. Increase the number of seconds in the interval between each EMR task. Validate that each fallback state has the appropriate catch for each decision state. Configure an Amazon Simple Notification Service (Amazon SNS) topic to store the error messages.

5 / 65

5. A data analytics company operates several AWS accounts across different countries. The company needs to ensure consistent configuration compliance across all accounts and regions while minimizing administrative overhead. Which approach should the company consider?

A) Configure a separate AWS Config instance in each account and region.

B) Implement AWS Organizations to centrally manage AWS Config across accounts and regions.

C) Manually configure AWS Config rules in each account and region.

D) Utilize AWS Config Multi-Account, Multi-Region Data Aggregation.

6 / 65

6. A company is running an Amazon Redshift cluster. A data engineer must design a solution that would give the company the ability to provide analysis on a separate test environment in Amazon Redshift. The solution would use the data from the main Redshift cluster. The second cluster is expected to be used for only 2 hours every 2 weeks as part of the new testing process.

Which solution will meet these requirements in the MOST cost-effective manner?

A) Unload the data from the main Redshift cluster to Amazon S3 every 2 weeks. Create an AWS Glue job that loads the data into the Redshift test cluster.

B) Create a data share from the main Redshift cluster to the Redshift test cluster. Use Amazon Redshift Serverless for the test environment.

C) Unload the data from the main Redshift cluster to Amazon S3 every 2 weeks. Access the data from the Redshift test cluster by using Amazon Redshift Spectrum.

D) Create a manual snapshot from the main Redshift cluster every 2 weeks. Restore the snapshot into the Redshift test cluster by using the same node configuration as the main cluster.

7 / 65

7. An ecommerce company is running an application on AWS. The application sources recent data from tables in Amazon Redshift. Data that is older than 1 year is accessible in Amazon S3. Recently, a new report has been written in SQL. The report needs to compare a few columns from the current year sales table with the same columns from tables with sales data from previous years. The report runs slowly, with poor performance and long wait times to get results.

A data engineer must optimize the back-end storage to accelerate the query.

Which solution will meet these requirements MOST efficiently?

A) Run a Redshift SQL COPY command and load the data from Amazon S3 to Amazon Redshift before running the report. Configure the report to query the table with the most recent data and the newly loaded tables.

B) Run a SQL JOIN clause by using Amazon Redshift Spectrum to create a new table from the most recent data and the data in the S3 external table. Configure the report to query the newly created table.

C) Run the report SQL statement to gather the data from Amazon S3. Store the result set in an Amazon Redshift materialized view. Configure the report to run SQL REFRESH. Then, query the materialized view.

D) Run the SQL UNLOAD command on the current sales table to a new external table in Amazon S3. Configure the report to use Amazon Redshift Spectrum to query the newly created table and the existing tables in Amazon S3.

8 / 65

8. A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column. Which solution will MOST speed up the Athena query performance?

A) Change the data format from .csv to JSON format. Apply Snappy compression.

B) Compress the .csv files by using Snappy compression.

C) Compress the .csv files by using gzjg compression.

D) Change the data format from .csv to Apache Parquet. Apply Snappy compression.

9 / 65

9. A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.
Which solution will meet this requirement?

A) Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.

B) Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.

C) Turn on concurrency scaling in the settings during the creation of any new Redshift cluster.

D) Turn on concurrency scaling for the daily usage quota for the Redshift cluster.

10 / 65

10. Which of the following scenarios best exemplifies the application of AWS Event-driven architecture?

A) A static website hosted on AWS S3 that serves content to users upon request.

B) A database hosted on AWS RDS that stores customer information for a mobile application.

C) An enterprise application that performs batch processing tasks at scheduled intervals.

D) An e-commerce platform that triggers a notification to the customer when an item in their wishlist becomes available.

11 / 65

11. As a Data Engineering Consultant, you are implementing a data processing solution using AWS Glue, which leverages Apache Spark under the hood. You need to explain to your team how AWS Glue, using Apache Spark, manages data processing jobs differently than a standalone Apache Spark environment.
Which of the following points would you emphasize as a key difference in the AWS Glue implementation of Spark?

A) AWS Glue provides a managed Spark environment where Spark's native UI is replaced with Glue's job monitoring and logging features, providing a more integrated experience within the AWS ecosystem.

B) Apache Spark in AWS Glue is limited to processing only structured data, such as tables in a database, while standalone Apache Spark can handle both structured and unstructured data formats.

C) In AWS Glue, Spark jobs can only process data stored in Amazon S3, whereas standalone Spark can process data from various sources including HDFS, S3, and local filesystems.

D) AWS Glue requires a deep understanding of Spark internals to optimize and manage jobs, while in a standalone Spark environment, the job optimization and management can be handled without in-depth knowledge of Spark.

12 / 65

12. In a data engineering pipeline, a company is using multiple applications and teams to access a shared Amazon S3 bucket. To streamline access and simplify permissions management for these different entities, which S3 feature should the company utilize?

A) Enable multiple IAM roles, each corresponding to an application or team, granting access to the S3 bucket.

B) Activate S3 Transfer Acceleration for the bucket to ensure fast and differentiated access for each application or team.

C) Use S3 Access Points to create unique endpoints with tailored permissions for each application or team.

D) Implement S3 Lifecycle policies for each application or team to manage their specific data access and retention.

13 / 65

13. A company is using an Amazon S3 data lake. The company ingests data into the data lake by using Amazon Kinesis Data Streams. The company reads and processes the incoming data from the stream by using AWS Lambda. The data being ingested has highly variable and unpredictable volume. Currently, the IteratorAge metric is high at peak times when a high volume of data is being posted to the stream. A data engineer must design a solution to increase performance when reading Kinesis Data Streams with Lambda.

Which solution will meet these requirements? (Select THREE.)

A) Increase the number of shards for the Kinesis data stream.

B) Test different parallelization factor settings to find the most performant.

C) Configure the Kinesis data stream to use provisioned capacity mode.

D) Register the Lambda function as a consumer with enhanced fan-out.

E) Increase the reserved concurrency for the Lambda function.

F) Increase the provisioned concurrency for the Lambda function.

14 / 65

14. A company is running a cloud-based software application in an Amazon EC2 instance backed by an Amazon RDS for Microsoft SQL Server database. The application collects, processes, and stores confidential information and records in the database. The company wants to eliminate the risk of credential exposure.

Which solution will meet this requirement?

A) Use AWS Identity and Access Management (IAM) database authentication to configure authentication to the RDS for Microsoft SQL Server database.

B) Use AWS Systems Manager Parameter Store to store the credentials. Configure automatic rotation in Parameter Store to rotate the credentials every 30 days.

C) Use AWS Security Token Service (AWS STS) to configure authentication to the RDS for Microsoft SQL Server database.

D) Use AWS Secrets Manager to store the credentials. Configure automatic rotation in Secrets Manager to rotate the credentials every 30 days.

15 / 65

15. A data analyst needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB.
Which solution will meet these requirements MOST cost-effectively?

A) Write a PySpark ETL script. Host the script on an Amazon EMR cluster.

B) Write a custom Python application. Host the application on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

C) Write an AWS Glue Python shell job. Use pandas to transform the data.

D) Write an AWS Glue PySpark job. Use Apache Spark to transform the data.

16 / 65

16. A data engineer has created a new account to deploy an AWS Glue extract, transform, and load (ETL) pipeline. The pipeline jobs need to ingest raw data from a source Amazon S3 bucket. Then, the pipeline jobs write the transformed data to a destination S3 bucket in the same account. The data engineer has written an IAM policy with permissions for AWS Glue to access the source S3 bucket and destination S3 bucket. The data engineer needs to grant the permissions in the IAM policy to AWS Glue to run the ETL pipeline.

Which solution will meet these requirements?

A) Create a new IAM user and an access key pair. Apply the policy to the user. Provide the access key ID and secret key when configuring the AWS Glue jobs.

B) Create two IAM policies from the existing policy. One policy for the source S3 bucket and one policy for the destination S3 bucket. Attach the policies when configuring the AWS Glue jobs.

C) Create a new IAM service role for AWS Glue. Attach the policy to the new role. Configure AWS Glue to use the new role.

D) Create two resource policies from the existing policy. One policy for the source S3 bucket and one policy for the destination S3 bucket. Attach the policies to the correct S3 buckets.

17 / 65

17. A data engineer is designing an application that will transform data in containers managed by Amazon Elastic Kubernetes Service (Amazon EKS). The containers run on Amazon EC2 nodes. Each containerized application will transform independent datasets and then store the data in a data lake. Data does not need to be shared to other containers. The data engineer must decide where to store data before transformation is complete.

Which solution will meet these requirements with the LOWEST latency?

A) Containers should use an ephemeral volume provided by the node's RAM.

B) Containers should establish a connection to Amazon DynamoDB Accelerator (DAX) within the application code.

C) Containers should use a PersistentVolume object provided by an NFS storage.

D) Containers should establish a connection to Amazon MemoryDB for Redis within the application code.

18 / 65

18. A Startup utilizes Amazon Athena for executing one-time queries on data stored in Amazon S3. With multiple use cases, the startup needs to enforce permission controls to segregate query processes and access to query history among users, teams, and applications within the same AWS account. Which solution aligns with these requirements?

A) Establish an S3 bucket dedicated to each use case. Craft an S3 bucket policy to allocate permissions to relevant IAM users. Enforce the S3 bucket policy on the respective S3 bucket.

B) Designate an IAM role tailored to each use case. Define and assign pertinent permissions to each role, associating them with Athena.

C) Formulate an AWS Glue Data Catalog resource policy granting permissions to relevant IAM users for each use case. Apply the resource policy to the specific tables accessed by Athena.

D) Create distinct Athena workgroups for each use case. Apply tags to the workgroups and devise an IAM policy utilizing these tags to assign appropriate permissions.

19 / 65

19. A healthcare provider organization has a large amount of patient data stored in an AWS-based data warehouse. The organization wants to make this data available to other systems within the organization and to third-party applications through a modern API interface. Additionally, the organization's compliance requirements mandate that access to patient data must be secure and auditable.
Which combination of AWS services will meet this requirement?

A) Amazon API Gateway, AWS Lambda, and Amazon RDS

B) AWS AppSync, Amazon DynamoDB, and AWS CloudTrail

C) Amazon API Gateway, AWS Lambda, and Amazon Redshift

D) AWS AppSync, Amazon Aurora Serverless, and AWS Config

20 / 65

20. A data engineer maintains custom Python scripts utilized by numerous AWS Lambda functions for data formatting processes. Currently, whenever modifications are made to the Python scripts, the data engineer manually updates each Lambda function, which is time-consuming. The data engineer seeks a more streamlined approach for updating the Lambda functions. Which solution addresses this requirement?

A) Save a reference to the custom Python scripts in the execution context object within a shared Amazon S3 bucket.

B) Bundle the custom Python scripts into Lambda layers and apply these layers to the Lambda functions.

C) Store a reference to the custom Python scripts in environment variables within a shared Amazon S3 bucket.

D) Assign an identical alias to all Lambda functions and invoke each function by specifying the alias.

21 / 65

21. At Healthcare firm, a data engineer aims to schedule a workflow executing a set of AWS Glue jobs daily, without the necessity for the jobs to commence or conclude at precise times. Which solution provides the most cost-effective method for running the Glue jobs?

A) Employ the Spot Instance type in Glue job properties.

B) Select the STANDARD execution class in the Glue job properties.

C) Specify the latest version in the GlueVersion field in the Glue job properties.

D) Opt for the FLEX execution class in the Glue job properties.

22 / 65

22. A company stores data from an application in an Amazon DynamoDB table that operates in provisioned capacity mode. The workloads of the application have predictable throughput load on a regular schedule. Every Monday, there is an immediate increase in activity early in the morning. The application has very low usage during weekends. The company must ensure that the application performs consistently during peak usage times.
Which solution will meet these requirements in the MOST cost-effective way?

A) Increase the provisioned capacity to the maximum capacity that is currently present during peak load times.

B) Divide the table into two tables. Provision each table with half of the provisioned capacity of the original table. Spread queries evenly across both tables.

C) Use AWS Application Auto Scaling to schedule higher provisioned capacity for peak usage times. Schedule lower capacity during off-peak times.

D) Change the capacity mode from provisioned to on-demand. Configure the table to scale up and scale down based on the load on the table.

23 / 65

23. A company is collecting data that is generated by its users for analysis by using an Amazon S3 data lake. Some of the data being collected and stored in Amazon S3 includes personally identifiable information (PII).

The company wants a data engineer to design an automated solution to identify new and existing data that needs PII to be masked before analysis is performed. Additionally, the data engineer must provide an overview of the data that is identified. The task of masking the data will be handled by an application already created in the AWS account. The data engineer needs to design a solution that can invoke this application in real time when PII is found.

Which solution will meet these requirements with the LEAST operational overhead?

A) Create an AWS Lambda function to analyze data for PII. Configure notification settings on the S3 bucket to invoke the Lambda function when a new object is uploaded.

B) Configure notification settings on the S3 bucket. Configure an Amazon EventBridge rule for the default event bus for new object uploads. Set the masking application as the target for the rule.

C) Enable Amazon Macie in the AWS account. Create an AWS Lambda function to run on a schedule to poll Macie findings and invoke the masking application.

D) Enable Amazon Macie in the AWS account. Create an Amazon EventBridge rule for the default event bus for Macie findings. Set the masking application as the target for the rule.

24 / 65

24. A company operates a production AWS account for running its workloads, while a separate security AWS account has been established by the security team to store and analyze security logs sourced from the production AWS account's Amazon CloudWatch Logs.
To deliver the security logs to the security AWS account using Amazon Kinesis Data Streams, the company needs an appropriate solution.
Which solution will fulfill these requirements effectively?

A) Establish a destination data stream within the production AWS account. Then, in the security AWS account, create an IAM role with cross-account permissions granting access to Kinesis Data Streams in the production AWS account.

B) Set up a destination data stream within the security AWS account. Subsequently, create an IAM role along with a trust policy permitting CloudWatch Logs to push data into the stream. Finally, configure a subscription filter in the security AWS account.

C) Create a destination data stream within the production AWS account. Next, within the production AWS account, configure an IAM role with cross-account permissions enabling access to Kinesis Data Streams in the security AWS account.

D) Establish a destination data stream within the security AWS account. Then, create an IAM role along with a trust policy allowing CloudWatch Logs to deposit data into the stream. Finally, configure a subscription filter in the production AWS account.

25 / 65

25. A global manufacturing company is modernizing its data architecture and adopting cloud-based solutions for data processing and analytics. As part of this transformation, the company needs to implement intermediate data staging locations to efficiently manage and process large volumes of data from multiple sources before loading it into a data warehouse for analysis. The company's data engineering team is tasked with designing a scalable and fault-tolerant solution using AWS services.
Which AWS services will meet these requirements?

A) Amazon EFS for storage, Amazon EMR for data processing, and Amazon RDS for data warehousing

B) Amazon S3 for storage, AWS Glue for data transformation, and Amazon Redshift for data warehousing

C) Amazon Kinesis Data Streams for data ingestion, AWS Lambda for data processing, and Amazon Aurora for data warehousing

D) Amazon S3 for storage, AWS Batch for data processing, and Amazon Redshift Spectrum for data warehousing

26 / 65

26. During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script.
A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials.
Which combination of steps should the data engineer take to meet these requirements? (Choose two.)

A) Store the credentials in the AWS Glue job parameters.

B) Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job.

C) Store the credentials in a configuration file that is in an Amazon S3 bucket.

D) Store the credentials in AWS Secrets Manager.

E) Grant the AWS Glue job 1AM role access to the stored credentials

27 / 65

27. A finance company is storing paid invoices in an Amazon S3 bucket. After the invoices are uploaded, an AWS Lambda function uses Amazon Textract to process the PDF data and persist the data to Amazon DynamoDB. Currently, the Lambda execution role has the following S3 permission:

{

"Version": "2012-10-17",
"Statement": [
{
"Sid": "ExampleStmt",
"Action": ["s3:*”],
"Effect": "Allow",
"Resource": ["*"]
}
]
}

The company wants to correct the role permissions specific to Amazon S3 according to security best practices.

Which solution will meet these requirements?

A) Append "s3:GetObject" to the Action. Append the bucket name to the Resource.

B) Modify the Action to be "s3:GetObjectAttributes." Modify the Resource to be only the bucket name.

C) Append "s3:GetObject" to the Action. Modify the Resource to be only the bucket ARN.

D) Modify the Action to be: "s3:GetObject." Modify the Resource to be only the bucket ARN.

28 / 65

28. A multinational corporation with regional offices worldwide utilizes Amazon Redshift for its data warehousing needs. The company needs to ensure that critical tables are not accessed by multiple users simultaneously to prevent data corruption and maintain data consistency. Which locking mechanism in Amazon Redshift would best address this requirement?

A) Row-level locks

B) Shared locks

C) Exclusive locks

D) Table-level locks

29 / 65

29. Which of the following scenarios best represents a stateful data transaction in an AWS environment?

A) An EC2 instance receives HTTP requests and stores user session information in a DynamoDB table.

B) A Lambda function orchestrates data processing tasks and maintains the state of each task in an ElastiCache cluster.

C) An S3 bucket serves static web content to users without retaining any session data.

D) An SQS queue processes messages from a fleet of Lambda functions, which do not require persistent state.

30 / 65

30. A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.
The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.
Which solution will meet this requirement?

A) Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.

B) Review the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name

C) Verify that the VPC's route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.

D) Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.

31 / 65

31. A financial institution needs to ingest large volumes of data from various sources, including stock market feeds and customer transactions, for analysis. What type of data ingestion method would be most appropriate in this case?

A) Scheduled ingestion

B) Real-time ingestion

C) Batch ingestion

D) Stream ingestion

32 / 65

32. A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?

A) Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.

B) Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically

C) Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically.

D) Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog.

33 / 65

33. A company is storing data in an Amazon S3 bucket. The company is in the process of adopting a new data lifecycle and retention policy. The policy is defined as follows:

Any newly created data must be available online and will occasionally need to be analyzed with SQL.
Data older than 3 years must be securely stored and made available when needed for compliance evaluation within 12 hours.
Data older than 10 years must be securely deleted.

A data engineer must configure a solution that would ensure that the data is stored cost effectively according to the lifecycle and retention policy.

Which solution will meet these requirements?

A) Store new data on the S3 Infrequent Access storage class. Query the data in-place on Amazon S3 with Amazon Athena. Create a lifecycle rule to migrate the data to the S3 Glacier Flexible Retrieval storage class after 3 years. Configure the lifecycle rule to delete the data after 10 years.

B) Store new data on the S3 Intelligent-Tiering storage class. Configure the storage class with the Deep Archive Access tier. Query the data in-place on Amazon S3 with Amazon Athena. Configure the Intelligent-Tiering actions to delete the data after 10 years.

C) Store new data on an Amazon Redshift cluster. Unload older data to the S3 Standard storage class. Create a lifecycle rule that migrates the data to the S3 Glacier Deep Archive storage class after 3 years. Configure the lifecycle rule actions to delete the data after 10 years.

D) Store new data on an Amazon RDS database. Create database snapshots to the S3 Standard storage class. Create a lifecycle rule that migrates the snapshots to the S3 Glacier Flexible Retrieval storage class after 3 years. Configure the lifecycle rule actions to delete the data after 10 years.

34 / 65

34. A company is building a real-time monitoring system for analyzing web traffic logs. The logs are continuously generated and stored in an Amazon Kinesis Data Firehose delivery stream. The company wants to perform real-time analysis on the incoming logs and extract insights using custom logic.
Which architecture pattern should the company adopt to achieve this requirement?

A) Set up an Amazon Kinesis Data Firehose delivery stream to directly invoke a Lambda function for processing the incoming logs.

B) Configure an Amazon Kinesis Data Analytics application to process the logs and trigger a Lambda function to handle the output data.

C) Use Amazon CloudWatch Events to schedule Lambda function invocations at regular intervals, fetching the logs from the Amazon Kinesis Data Firehose delivery stream.

D) Integrate Amazon Kinesis Data Firehose with Amazon S3, and configure an S3 event notification to invoke a Lambda function upon the arrival of new log data in the S3 bucket.

35 / 65

35. A data engineer is designing an application that will add data for transformation to an Amazon Simple Queue Service (Amazon SQS) queue. A microservice will receive messages from the queue. The data engineer wants to ensure message persistence.

Which events can remove messages from an SQS queue? (Select THREE.)

A) An application makes a DeleteMessage API call to Amazon SQS.

B) The maxReceiveCount has been reached for a message.

C) The queue is purged.

D) An application makes a ReceiveMessage API call to Amazon SQS.

E) The visibility timeout expires on a message.

F) The configuration for a queue is edited.

36 / 65

36. A finance company has developed a machine learning (ML) model to enhance its investment strategy. The model uses various sources of data about stock, bond, and commodities markets. The model has been approved for production. A data engineer must ensure that the data being used to run ML decisions is accurate, complete, and trustworthy. The data engineer must automate the data preparation for the model's production deployment.

Which solution will meet these requirements?

A) Use Amazon SageMaker Feature Store to prepare the data, store the data, and track the data lineage for the model.

B) Use Amazon SageMaker workflows with an Amazon SageMaker ML Lineage Tracking step to prepare the data for the model.

C) Use Amazon SageMaker Data Wrangler to run an exploratory data analysis (EDA) to prepare the data for the model.

D) Use Amazon SageMaker Processing to process the input data. Output the processed data to Amazon S3 for the model.

37 / 65

37. The financial institution intends to integrate a data mesh framework. This framework should facilitate centralized data governance, data analysis, and data access control. The organization has opted to employ AWS Glue for managing data catalogs and executing extract, transform, and load (ETL) processes. Which pair of AWS services is suitable for realizing the data mesh framework? (Select two.)

A) Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.

B) Use Amazon S3 for data storage. Use Amazon Athena for data analysis.

C) Use AWS Glue DataBrew for centralized data governance and access control.

D) Use Amazon RDS for data storage. Use Amazon EMR for data analysis.

E) Use AWS Lake Formation for centralized data governance and access control.

38 / 65

38. How can you improve the replayability of data ingestion pipelines in AWS?

A) Implementing strict data validation rules

B) Using serverless computing services

C) Storing data in a single database

D) Increasing network bandwidth

39 / 65

39. A studio aims to enhance its media content recommendation system based on user behavior and preferences by integrating insights from third-party datasets into its existing analytics platform. The studio seeks a solution with minimal operational overhead to incorporate third-party datasets efficiently. Which option will fulfill these requirements with the LEAST operational overhead?

A) Employ API calls to access and integrate third-party datasets from AWS Data Exchange.

B) Utilize API calls to access and integrate third-party datasets from AWS DataSync.

C) Implement Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.

D) Deploy Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).

40 / 65

40. You are developing a serverless application on AWS that requires orchestrating a complex workflow involving multiple AWS services. The workflow includes processing data from an S3 bucket, performing data transformations using AWS Lambda functions, and triggering subsequent steps based on the output. Which service provides a fully managed solution for orchestrating this serverless workflow?

A) AWS Step Functions

B) AWS Lambda

C) AWS Batch

D) AWS Kinesis

41 / 65

41. A Data Engineering Team at a financial services company is developing a data API to serve real-time, user specific transactional data from an Amazon RDS for PostgreSQL database to their mobile banking application.
The data is highly dynamic, with frequent reads and writes. The API must offer low latency and high availability, and be capable of scaling automatically to handle peak loads during business hours.
Given these requirements, which architecture should the team implement?

A) Set up an Amazon ECS cluster with Fargate to host a custom-built REST API, connected to the RDS instance. Implement an in-memory caching layer within the API application to cache common queries and reduce direct calls to the database.

B) Use Amazon API Gateway integrated with AWS AppSync, which directly connects to the RDS for PostgreSQL database. Leverage the GraphQL capabilities of AppSync for efficient, tailored queries.

C) Employ Amazon API Gateway with an AWS Step Functions state machine to orchestrate Lambda functions for different query types, reducing the load on RDS by distributing the processing. Use Amazon DynamoDB to cache frequently accessed data.

D) Configure Amazon API Gateway with a serverless AWS Lambda function. The Lambda function should use efficient connection management strategies when accessing the RDS instance. Utilize Amazon ElastiCache to cache frequent queries and reduce database load.

42 / 65

42. A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data. Which solution will meet these requirements with the LEAST operational overhead?

A) Confiqure an AWS Lambda function to load data from the S3 bucket into a pandas dataframeWrite a SQL SELECT statement on the dataframe to query the required column.

B) Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.

C) Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.

D) Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

43 / 65

43. A company operates a frontend ReactJS website that interacts with REST APIs via Amazon API Gateway, which serves as the conduit for the website's functionalities. A data engineer is tasked with developing a Python script that will be sporadically invoked through API Gateway, with the requirement of returning results to API Gateway.

Which solution presents the LEAST operational overhead to meet these specifications?

A) Implementing a custom Python script deployed on an Amazon Elastic Container Service (Amazon ECS) cluster.

B) Employing a custom Python script capable of integrating with API Gateway on Amazon Elastic Kubernetes Service (Amazon EKS).

C) Creating an AWS Lambda function and ensuring its readiness by scheduling an Amazon EventBridge rule to invoke the Lambda function every 5 minutes using mock events.

D) Creating an AWS Lambda Python function with provisioned concurrency.

44 / 65

44. An insurance company is using vehicle insurance data to build a risk analysis machine learning (ML) model. The data contains personally identifiable information (PII). The ML model should not use the PII. Regulations also require the data to be encrypted with an AWS Key Management Service (AWS KMS) key. A data engineer must select the appropriate services to deliver insurance data for use with the ML model.

Which combination of steps will meet these requirements in the MOST cost-effective manner? (Select TWO.)

A) Deliver the data to an Amazon RDS database encrypted with AWS KMS.

B) Deliver the data to an Amazon S3 bucket encrypted with server-side encryption with AWS KMS (SSE-KMS).

C) Deliver the data to an Amazon Redshift cluster with default settings.

D) Use AWS Glue DataBrew to configure data ingestion and mask the PII.

E) Use Amazon SageMaker Data Wrangler to ingest the data and encode the PII.

45 / 65

45. A company is preparing to utilize a provisioned Amazon EMR cluster for executing Apache Spark jobs to conduct big data analysis while prioritizing high reliability. The big data team aims to adhere to best practices for managing cost-efficient and long-running workloads on Amazon EMR while sustaining the current level of performance for the company.

Which combination of resources will most efficiently meet these requirements? (Select two.)

A) Employing Hadoop Distributed File System (HDFS) as a persistent data store.

B) Utilizing Amazon S3 as a persistent data store.

C) Opting for x86-based instances for core nodes and task nodes.

D) Choosing Graviton instances for core nodes and task nodes.

E) Implementing Spot Instances for all primary nodes.

46 / 65

46. A data engineer needs to store configuration parameters for different data processing workflows, such as Spark job configurations and database connection details. Which feature of AWS Systems Manager Parameter Store should the engineer use to maintain organization and structure?

A) SecureString parameters

B) String parameters

C) Hierarchical parameters

D) Tagging parameters

47 / 65

47. A consultant company uses a cloud-based time-tracking system to track employee work hours. The company has thousands of employees that are globally distributed. The time-tracking system provides a REST API to obtain the records from the previous day in CSV format. The company has a cron on premises that is scheduled to run a Python program each morning at the same time. The program saves the data into an Amazon S3 bucket that serves as a data lake. A data engineer must provide a solution with AWS services that reuses the same Python code and cron configuration.

Which combination of steps will meet these requirements with the LEAST operational overhead? (Select TWO.)

A) Schedule the cron by using AWS CloudShell

B) Run the Python code on AWS Lambda functions

C) Install Python and the AWS SDK for Python (Boto3) on an Amazon EC2 instance to run the code

D) Schedule the cron by using Amazon EventBridge Scheduler

E) Run the Python code on AWS Cloud9

48 / 65

48. You are tasked with designing a comprehensive serverless workflow for a real-time analytics platform that processes streaming data from various sources. The platform needs to ingest data, perform near real-time analysis, store aggregated results, and trigger alerts based on predefined conditions. You have chosen AWS services to architect this solution.
To perform real-time analytics on the streaming data, which AWS service can be leveraged to process and analyze data as it arrives?

A) Amazon Redshift

B) Amazon Kinesis Data Analytics

C) Amazon Redshift Spectrum

D) Amazon Quicksight

49 / 65

49. A company has data in an on-premises NFS file share. The company plans to migrate to AWS. The company uses the data for data analysis. The company has written AWS Lambda functions to analyze the data. The company wants to continue to use NFS for the file system that Lambda accesses. The data must be shared across all concurrently running Lambda functions.

Which solution should the company use for this data migration?

A) Migrate the data into the local storage for each Lambda function. Use the local storage for data access.

B) Migrate the data to Amazon Elastic Block Store (Amazon EBS) volumes. Access the EBS volumes from the Lambda functions.

C) Migrate the data to Amazon DynamoDB. Ensure the Lambda functions have permissions to access the table.

D) Migrate the data to Amazon Elastic File System (Amazon EFS). Configure the Lambda functions to mount the file system.

50 / 65

50. A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR.
Which solution will meet these requirements with the LEAST operational overhead?

A) Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access by rows and columns. Provide data access throughAmazon S3.

B) Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR to restrict data access byrows and columns. Providedata access by using Apache Pig.

C) Use Amazon Redshift for data lake storage. Use Redshift security policies to restrict data access byrows and columns. Provide data accessby usingApache Spark and Amazon Athena federated queries.

D) UseAmazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access through AWS Lake Formation.

51 / 65

51. A data engineer must deploy a centralized metadata storage solution on AWS. The solution needs to be reliable and scalable. The solution needs to ensure that fine-grained permissions can be controlled at the database, table, column, row, and cell levels.

Which solution will meet these requirements with the LEAST operational overhead?

A) Use AWS Glue to create a data catalog. Control access with resource-level policies for the AWS Glue Data Catalog objects.

B) Use an Amazon Aurora database as a catalog. Control access by using SQL GRANTs at the database, table, column, row, and cell levels.

C) Use AWS Lake Formation to create a data lake and a data catalog. Control access by using Lake Formation data filters.

D) Use Amazon EMR to deploy a Hive metastore. Control user access by using HiveQL data definition language statements.

52 / 65

52. A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information. The data engineer must identify and remove duplicate information from the legacy application data.
Which solution will meet these requirements with the LEAST operational overhead?

A) Write a custom extract, transform, and load (ETL) job in Python. Use the DataFramedrop duplicatesf) function by importingthe Pandas library to perform data deduplication.

B) Write an AWS Glue extract, transform, and load (ETL) job. Usethe FindMatches machine learning(ML) transform to transform the data to perform data deduplication.

C) Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

D) Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

53 / 65

53. A company has deployed a data pipeline that uses AWS Glue to process records. The records include a JSON-formatted event and can sometimes include base64-encoded images. The AWS Glue job is configured with 10 data processing units (DPUs). However, the AWS Glue job regularly scales to several hundred DPUs and can take a long time to run.

A data engineer must monitor the data pipeline to determine the appropriate DPU capacity.

Which solution will meet these requirements?

A) Inspect the job run monitoring section of the AWS Glue console. Review the results of the previous job runs. Visualize the profiled metrics to determine the appropriate number of DPUs.

B) Inspect the visual extract, transform, and load (ETL) section of the AWS Glue console. Review the job details with the visual section. Visualize the selected job details to determine the appropriate number of DPUs.

C) Inspect the metrics section of the Amazon CloudWatch console. Filter the metrics by searching for AWS Glue. Inspect the aggregate job details to determine the appropriate number of DPUs.

D) Inspect the logs insights section of the Amazon CloudWatch console. Select the log group that belongs to the AWS Glue job. Query the logs for "DPU" to determine the appropriate number of DPUs.

54 / 65

54. A company ingests data into an Amazon S3 data lake from multiple operational sources. The company then ingests the data into Amazon Redshift for a business analysis team to analyze. The business analysis team requires access to only the last 3 months of customer data.

Additionally, once a year, the company runs a detailed analysis of the past year's data to compare the overall results of the previous 12 months. After the analysis and comparison, the data is no longer accessed. However, the data must be kept after 12 months for compliance reasons.

Which solution will meet these requirements in the MOST cost-effective manner?

A) Ingest 12 months of data into Amazon Redshift. Automate an unload process from Amazon Redshift to Amazon S3 after the data is over 12 months old. Implement a lifecycle policy in Amazon S3 to move the unloaded data to S3 Glacier Deep Archive.

B) Ingest 3 months of data into Amazon Redshift. Automate an unload process from Amazon Redshift to S3 Glacier Deep Archive after the data is over 3 months old. Use Redshift Spectrum for the yearly analysis to include data up to 12 months old.

C) Ingest 3 months of data into Amazon Redshift. Automate an unload process from Amazon Redshift to Amazon S3 after the data is over 3 months old. Use Amazon Redshift Spectrum for the yearly analysis to include data up to 12 months old. Implement a lifecycle policy in Amazon S3 to move the unloaded data to S3 Glacier Deep Archive after the data is over 12 months old.

D) Ingest 3 months of data into Amazon Redshift. Automate an unload process from Amazon Redshift to S3 Glacier Instant Retrieval after the data is over 3 months old. Use Amazon Redshift Spectrum for the yearly analysis to include data up to 12 months old. Implement a lifecycle policy in Amazon S3 to move the unloaded data to S3 Glacier Deep Archive after the data is over 12 months old.

55 / 65

55. A multinational retail corporation is looking to modernize its data processing infrastructure by implementing ETL (Extract, Transform, Load) pipelines on AWS to handle a variety of data sources, including transactional data from its online stores, customer interaction logs from its mobile applications, and inventory data from its physical stores. The company's primary goal is to build a scalable and cost-effective solution that can handle large volumes of data while providing real-time insights to support decision-making processes.

Which AWS services and architecture would best suit the company's requirements?

A) Implement AWS Glue as the ETL orchestration service, leveraging AWS Lambda functions for data transformations, and use Amazon Redshift for data warehousing and analytics.

B) Utilize Amazon S3 as the data lake for storing raw and processed data, orchestrate ETL jobs using AWS Data Pipeline, and analyze data using Amazon Athena for ad-hoc querying.

C) Set up Amazon Kinesis Data Firehose to ingest data from various sources, transform the data in real-time with AWS Lambda, and load it into Amazon DynamoDB for fast and scalable data access.

D) Deploy AWS Step Functions to orchestrate the ETL workflows, use AWS Glue for data cataloging and transformation, and leverage Amazon Redshift Spectrum for querying data directly from Amazon S3.

56 / 65

56. A Cloud Data Engineering Team is implementing a system for real-time data ingestion through an API. The architecture needs to include data transformation before storage. The system must handle large files and store them efficiently post-transformation. The team is focused on using a serverless architecture on AWS, with an emphasis on Infrastructure as Code (IaC) for standardized and repeatable deployments across various environments.
Which combination of actions should the Cloud Data Engineering Team take to implement IaC for serverless deployments of data ingestion and transformation pipelines? (Select THREE)

A) Set up AWS Data Pipeline in the AWS SAM template for data movement and transformation.

B) Use AWS Elastic MapReduce (EMR) for data transformation in Kinesis Firehose.

C) Define an Amazon API Gateway in the AWS SAM template for data ingestion.

D) Configure DynamoDB in the AWS SAM template for storing the transformed data.

E) Utilize AWS Serverless Application Model (SAM) to declare AWS Lambda functions integrated with Amazon Kinesis Data Firehose

F) Configure Amazon S3 bucket creation in the AWS SAM template for storing the transformed data.

57 / 65

57. A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options. The company's current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS.
Which extract, transform, and load (ETL) service will meet these requirements?

A) AWS Glue

B) Amazon EMR

C) AWS Lambda

D) Amazon Redshift

58 / 65

58. An Amazon Kinesis application is trying to read data from a Kinesis data stream. However, the read data call is rejected. The following error message is displayed: ProvisionedThroughputExceededException.

Which combination of steps will resolve the error? (Select TWO.)

A) Configure enhanced fan-out on the stream

B) Enable enhanced monitoring on the stream

C) Increase the number of shards within the stream to provide enough capacity for the read data calls

D) Increase the size of the GetRecords requests

E) Make the application retry to read data from the stream

59 / 65

59. A financial institution is looking to improve the performance and cost efficiency of its data analytics platform. The institution has a massive amount of historical financial transaction data stored in Avro format, and they aim to transform this data into Apache Parquet format to optimize query performance and reduce storage costs. The institution's compliance requirements dictate that the data transformation process must be auditable and trackable.
Considering the stringent compliance requirements and the need for efficient data transformation, which combination of AWS services can the financial institution utilize to achieve the desired data format transformation and compliance adherence?

A) Amazon Kinesis Data Firehose with AWS Glue

B) Amazon EMR with Amazon Redshift Spectrum

C) AWS Glue with AWS Lake Formation

D) AWS Data Pipeline with Amazon Athena

60 / 65

60. An ecommerce company runs several applications on AWS. The company wants to design a centralized streaming log ingestion solution. The solution needs to be able to convert the log files to Apache Parquet format. Then, the solution must store the log files in Amazon S3. The number of log files being created varies throughout the day. A data engineer must configure a solution that ensures the log files are delivered in near real time.

Which solution will meet these requirements with the LEAST operational overhead?

A) Configure the applications to send the log files to an input S3 bucket. Create an Amazon EventBridge event that starts an AWS Glue extract, transform, and load (ETL) workflow when the log files are delivered to Amazon S3. Configure the workflow to output the Parquet files to an output S3 bucket.

B) Configure the applications to send the log files to Amazon Kinesis Data Firehose. Configure Kinesis Data Firehose to invoke an AWS Lambda function that converts the log files to Parquet format. Configure Kinesis Data Firehose to deliver the Parquet files to an output S3 bucket.

C) Configure the applications to send the log files to Amazon Kinesis Data Streams. Install the Kinesis Client Library (KCL) on a group of Amazon EC2 instances. Use the EC2 instances to read the stream records and convert the log files to Parquet and store the Parquet files in Amazon S3.

D) Configure the applications to send the log files to an Amazon EMR cluster with Hive installed. Create a table from the log files by using a regex. Create an external table on Amazon S3 in Hive with the format set to Parquet. Schedule a HiveQL UNLOAD query to store the log files to the external S3 table.

61 / 65

61. A data engineer needs to analyze API call patterns to identify potential optimization opportunities within their AWS data processing infrastructure. Which AWS CloudTrail feature can provide insights into API usage trends and patterns?

A) CloudTrail Insights

B) CloudTrail Events

C) CloudTrail Logs

D) CloudTrail Log File Validation

62 / 65

62. ABC Corporation, a leading financial institution, endeavors to modernize its data infrastructure on AWS to bolster its analytics prowess and regulatory compliance. The corporation collects vast datasets from diverse sources including market transactions, customer interactions, and regulatory filings. The imperative is to establish a sophisticated framework for configuring data pipelines, one that seamlessly accommodates scheduling intricacies and interdependencies within the data workflows.
Which amalgamation of AWS services offers the most apt solution?

A) Employing Amazon S3 for durable object storage, AWS Lambda for serverless processing, and Amazon Redshift for high-performance data warehousing constitutes the quintessential architecture.

B) The fusion of Amazon EC2 for elastic computing, AWS Glue for streamlined ETL processes, and Amazon RDS for robust database management epitomizes the pinnacle of data pipeline orchestration.

C) The fusion of Amazon EC2 for elastic computing, AWS Glue for streamlined ETL processes, and Amazon RDS for robust database management epitomizes the pinnacle of data pipeline orchestration.

D) Leveraging Amazon Kinesis for real-time data ingestion, AWS Data Pipeline for meticulous task scheduling, and Amazon Redshift for advanced analytics epitomizes a comprehensive approach to data pipeline orchestration.

63 / 65

63. Your company has a diverse set of data sources in different formats stored in Amazon S3, and you want to create a unified catalog capturing metadata and schema details. Which AWS service can automate this process without the need for manual schema definition?

A) AWS Glue ETL

B) Amazon Redshift Spectrum

C) Amazon QuickSight

D) AWS Glue Crawler

64 / 65

64. A company loads transaction data for each day into Amazon Redshift tables at the end of each day. The company wants to have the ability to track which tables have been loaded and which tables still need to be loaded. A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table. The data engineer creates an AWS Lambda function to publish the details of the load statuses to DynamoDB.
How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB table.

A) Use a second Lambda function to invoke the first Lambda function based on Amazon CloudWatch events.

B) Use the Amazon Redshift Data API to publish an event to Amazon EventBridqe. Configure an EventBridge rule to invoke the Lambda function.

C) Use the Amazon Redshift Data API to publish a message to an Amazon Simple Queue Service (Amazon SQS) queue. Configure the SQS queue to invoke the Lambda function.

D) Use a second Lambda function to invoke the first Lambda function based on AWS CloudTrail events.

65 / 65

65. A healthcare organization needs to perform complex analytical queries on patient records stored in a scalable data warehouse. The solution should offer fast query performance and seamless integration with existing BI tools. Which AWS service should they consider?

A) Amazon RDS

B) Amazon Redshift

C) Amazon EMR

D) Amazon Aurora

Your score is

Exit