Ultimate Guide to Prepare Free Amazon MLS-C01 Exam Questions & Answer [Q68-Q89]

Share

Ultimate Guide to Prepare Free Amazon MLS-C01 Exam Questions and Answer

Pass Amazon MLS-C01 Tests Engine pdf - All Free Dumps


What are the exam results for AWS Certified Machine Learning - Specialty

The examination is a pass or fail exam. The examination is scored against a minimum standard established by AWS professionals who are guided by certification industry best practices and guidelines. Your results for the examination are reported as a score from 100-1,000, with a minimum passing score of 720. Your score shows how you performed on the examination as a whole and whether or not you passed. Scaled scoring models are used to equate scores across multiple exam forms that may have slightly different difficulty levels. Your score report contains a table of classifications of your performance at each section level. This information is designed to provide general feedback concerning your examination performance. The examination uses a compensatory scoring model, which means that you do not need to “pass” the individual sections, only the overall examination. Each section of the examination has a specific weighting, so some sections have more questions than others.


To be eligible to take the Amazon MLS-C01 exam, candidates must have a minimum of one year of experience using AWS technology in a machine learning context. They should also have experience with machine learning frameworks such as TensorFlow and PyTorch, as well as programming languages such as Python and R.

 

NEW QUESTION # 68
A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy.
Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team's needs? (Choose two.)

  • A. Add features to the dataset
  • B. Perform recursive feature elimination
  • C. Perform linear discriminant analysis
  • D. Add L1 regularization to the classifier
  • E. Perform t-distributed stochastic neighbor embedding (t-SNE)

Answer: B,D

Explanation:
Explanation
The Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. However, the Data Scientist observes that there is a wide gap between the training and validation set accuracy, which indicates that the model is overfitting the data and generalizing poorly to new data.
To improve the model performance and satisfy the Marketing team's needs, the Data Scientist can use the following methods:
Add L1 regularization to the classifier: L1 regularization is a technique that adds a penalty term to the loss function of the logistic regression model, proportional to the sum of the absolute values of the coefficients. L1 regularization can help reduce overfitting by shrinking the coefficients of the less important features to zero, effectively performing feature selection. This can simplify the model and make it more interpretable, as well as improve the validation accuracy.
Perform recursive feature elimination: Recursive feature elimination (RFE) is a feature selection technique that involves training a model on a subset of the features, and then iteratively removing the least important features one by one until the desired number of features is reached. The idea behind RFE is to determine the contribution of each feature to the model by measuring how well the model performs when that feature is removed. The features that are most important to the model will have the greatest impact on performance when they are removed. RFE can help improve the model performance by eliminating the irrelevant or redundant features that may cause noise or multicollinearity in the data. RFE can also help the Marketing team understand the direct impact of the relevant features on the model outcome, as the remaining features will have the highest weights in the model.
References:
Regularization for Logistic Regression
Recursive Feature Elimination


NEW QUESTION # 69
A Machine Learning Specialist is training a model to identify the make and model of vehicles in images The Specialist wants to use transfer learning and an existing model trained on images of general objects The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.
What should the Specialist do to initialize the model to re-train it with the custom data?

  • A. Initialize the model with random weights in all layers including the last fully connected layer
  • B. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
  • C. Initialize the model with random weights in all layers and replace the last fully connected layer
  • D. Initialize the model with pre-trained weights in all layers including the last fully connected layer

Answer: B

Explanation:
Explanation
Transfer learning is a technique that allows us to use a model trained for a certain task as a starting point for a machine learning model for a different task. For image classification, a common practice is to use a pre-trained model that was trained on a large and general dataset, such as ImageNet, and then customize it for the specific task. One way to customize the model is to replace the last fully connected layer, which is responsible for the final classification, with a new layer that has the same number of units as the number of classes in the new task. This way, the model can leverage the features learned by the previous layers, which are generic and useful for many image recognition tasks, and learn to map them to the new classes. The new layer can be initialized with random weights, and the rest of the model can be initialized with the pre-trained weights. This method is also known as feature extraction, as it extracts meaningful features from the pre-trained model and uses them for the new task. References:
Transfer learning and fine-tuning
Deep transfer learning for image classification: a survey


NEW QUESTION # 70
A retail company intends to use machine learning to categorize new products A labeled dataset of current products was provided to the Data Science team The dataset includes 1 200 products The labeled dataset has
15 features for each product such as title dimensions, weight, and price Each product is labeled as belonging to one of six categories such as books, games, electronics, and movies.
Which model should be used for categorizing new products using the provided dataset for training?

  • A. An XGBoost model where the objective parameter is set to multi: softmax
  • B. A DeepAR forecasting model based on a recurrent neural network (RNN)
  • C. A regression forest where the number of trees is set equal to the number of product categories
  • D. A deep convolutional neural network (CNN) with a softmax activation function for the last layer

Answer: D


NEW QUESTION # 71
Given the following confusion matrix for a movie classification model, what is the true class frequency for Romance and the predicted class frequency for Adventure?

  • A. The true class frequency for Romance is 77.56% and the predicted class frequency for Adventure is 20
    85%
  • B. The true class frequency for Romance is 0 78 and the predicted class frequency for Adventure is (0 47 -
    0.32).
  • C. The true class frequency for Romance is 77.56% * 0.78 and the predicted class frequency for Adventure is 20 85% ' 0.32
  • D. The true class frequency for Romance is 57.92% and the predicted class frequency for Adventure is
    1312%

Answer: A


NEW QUESTION # 72
Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.
What should the Specialist do to initialize the model to re-train it with the custom data?

  • A. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
  • B. Initialize the model with random weights in all layers including the last fully connected layer.
  • C. Initialize the model with pre-trained weights in all layers including the last fully connected layer.
  • D. Initialize the model with random weights in all layers and replace the last fully connected layer.

Answer: A


NEW QUESTION # 73
A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket.
The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.
Which approach allows the Specialist to use all the data to train the model?

  • A. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset
  • B. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.
  • C. Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
  • D. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.

Answer: D


NEW QUESTION # 74
A real estate company wants to create a machine learning model for predicting housing prices based on a historical dataset. The dataset contains 32 features.
Which model will meet the business requirement?

  • A. Principal component analysis (PCA)
  • B. Logistic regression
  • C. K-means
  • D. Linear regression

Answer: D

Explanation:
Explanation
The best model for predicting housing prices based on a historical dataset with 32 features is linear regression.
Linear regression is a supervised learning algorithm that fits a linear relationship between a dependent variable (housing price) and one or more independent variables (features). Linear regression can handle multiple features and output a continuous value for the housing price. Linear regression can also return the coefficients of the features, which indicate how each feature affects the housing price. Linear regression is suitable for this problem because the outcome of interest is numerical and continuous, and the model needs to capture the linear relationship between the features and the outcome.
References:
AWS Machine Learning Specialty Exam Guide
AWS Machine Learning Training - Regression vs Classification in Machine Learning AWS Machine Learning Training - Linear Regression with Amazon SageMaker


NEW QUESTION # 75
A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers that most of the features are normally distributed. The plot of one feature in the dataset is shown in the graphic.

What transformation should the Data Scientist apply to satisfy the statistical assumptions of the linear regression model?

  • A. Polynomial transformation
  • B. Exponential transformation
  • C. Sinusoidal transformation
  • D. Logarithmic transformation

Answer: D

Explanation:
Explanation
The plot in the graphic shows a right-skewed distribution, which violates the assumption of normality for linear regression. To correct this, the Data Scientist should apply a logarithmic transformation to the feature.
This will help to make the distribution more symmetric and closer to a normal distribution, which is a key assumption for linear regression. References:
Linear Regression
Linear Regression with Amazon Machine Learning
Machine Learning on AWS


NEW QUESTION # 76
A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive.
The model produces the following confusion matrix after evaluating on a test dataset of 100 customers:
Based on the model evaluation results, why is this a viable model for production?

  • A. The precision of the model is 86%, which is less than the accuracy of the model.
  • B. The model is 86% accurate and the cost incurred by the company as a result of false negatives is less than the false positives.
  • C. The precision of the model is 86%, which is greater than the accuracy of the model.
  • D. The model is 86% accurate and the cost incurred by the company as a result of false positives is less than the false negatives.

Answer: B


NEW QUESTION # 77
A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive.
The model produces the following confusion matrix after evaluating on a test dataset of 100 customers:

Based on the model evaluation results, why is this a viable model for production?

  • A. The precision of the model is 86%, which is greater than the accuracy of the model.
  • B. The model is 86% accurate and the cost incurred by the company as a result of false negatives is less than the false positives.
  • C. The model is 86% accurate and the cost incurred by the company as a result of false positives is less than the false negatives.
  • D. The precision of the model is 86%, which is less than the accuracy of the model.

Answer: D


NEW QUESTION # 78
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.

Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?

  • A. Decision tree
  • B. Naive Bayesian classifier
  • C. Single Perceptron with sigmoidal activation function
  • D. Linear support vector machine (SVM)

Answer: A

Explanation:
Explanation
Based on the figure provided, a decision tree would have the highest recall with respect to the fraudulent class. Recall is a model evaluation metric that measures the proportion of actual positive instances that are correctly classified by the model. Recall is calculated as follows:
Recall = True Positives / (True Positives + False Negatives)
A decision tree is a type of machine learning model that can perform classification tasks by splitting the data into smaller and purer subsets based on a series of rules or conditions. A decision tree can handle both linear and non-linear data, and can capture complex patterns and interactions among the features. A decision tree can also be easily visualized and interpreted1 In this case, the data is not linearly separable, and has a clear pattern of seasonality. The fraudulent class forms a large circle in the center of the plot, while the normal class is scattered around the edges. A decision tree can use the transaction month and the age of account as the splitting criteria, and create a circular boundary that separates the fraudulent class from the normal class. A decision tree can achieve a high recall for the fraudulent class, as it can correctly identify most of the black dots as positive instances, and minimize the number of false negatives. A decision tree can also adjust the depth and complexity of the tree to balance the trade-off between recall and precision23 The other options are not valid or suitable for achieving a high recall for the fraudulent class. A linear support vector machine (SVM) is a type of machine learning model that can perform classification tasks by finding a linear hyperplane that maximizes the margin between the classes. A linear SVM can handle linearly separable data, but not non-linear data. A linear SVM cannot capture the circular pattern of the fraudulent class, and may misclassify many of the black dots as negative instances, resulting in a low recall4 A naive Bayesian classifier is a type of machine learning model that can perform classification tasks by applying the Bayes' theorem and assuming conditional independence among the features. A naive Bayesian classifier can handle both linear and non-linear data, and can incorporate prior knowledge and probabilities into the model. However, a naive Bayesian classifier may not perform well when the features are correlated or dependent, as in this case. A naive Bayesian classifier may not capture the circular pattern of the fraudulent class, and may misclassify many of the black dots as negative instances, resulting in a low recall5 A single perceptron with sigmoidal activation function is a type of machine learning model that can perform classification tasks by applying a weighted linear combination of the features and a non-linear activation function. A single perceptron with sigmoidal activation function can handle linearly separable data, but not non-linear data. A single perceptron with sigmoidal activation function cannot capture the circular pattern of the fraudulent class, and may misclassify many of the black dots as negative instances, resulting in a low recall.


NEW QUESTION # 79
A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test"?

  • A. Send Amazon CloudWatch Logs that were generated by Amazon SageMaker lo Amazon ES and use Kibana to query and visualize the log data.
  • B. Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the data as it is generated by Amazon SageMaker
  • C. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker
  • D. Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon OuickSight to visualize logs as they are being produced

Answer: C

Explanation:
Explanation
Amazon CloudWatch is a service that can monitor and collect various metrics and logs from AWS resources, such as Amazon SageMaker. Amazon CloudWatch can also generate dashboards to create a single view for the metrics and logs that are of interest. By using Amazon CloudWatch, the Machine Learning Specialist can review the latency, memory utilization, and CPU utilization during the load test, as these are some of the metrics that are outputted by Amazon SageMaker. The Specialist can create a custom dashboard that displays these metrics in different widgets, such as graphs, tables, or text. The dashboard can also be configured to refresh automatically and show the latest data as the load test is running. This approach will allow the Specialist to monitor the performance and resource utilization of the model variant and adjust the Auto Scaling configuration accordingly.
References:
[Monitoring Amazon SageMaker with Amazon CloudWatch - Amazon SageMaker]
[Using Amazon CloudWatch Dashboards - Amazon CloudWatch]
[Create a CloudWatch Dashboard - Amazon CloudWatch]


NEW QUESTION # 80
A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences, and trends to enhance the website for better service and smart recommendations.
Which solution should the Specialist recommend?

  • A. Random Cut Forest (RCF) over random subsamples to identify patterns in the customer database.
  • B. Collaborative filtering based on user interactions and correlations to identify patterns in the customer database.
  • C. A neural network with a minimum of three layers and random initial weights to identify patterns in the customer database.
  • D. Latent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer database.

Answer: B


NEW QUESTION # 81
A Machine Learning Specialist has created a deep learning neural network model that performs well on the training data but performs poorly on the test data.
Which of the following methods should the Specialist consider using to correct this? (Select THREE.)

  • A. Increase regularization.
  • B. Decrease dropout.
  • C. Increase feature combinations.
  • D. Increase dropout.
  • E. Decrease feature combinations.
  • F. Decrease regularization.

Answer: A,B,D


NEW QUESTION # 82
A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users.
What should the Specialist do to meet this objective?

  • A. Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR.
  • B. Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR.
  • C. Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR.
  • D. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.

Answer: C


NEW QUESTION # 83
A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1:10]

Considering the graph, what is a reasonable selection for the optimal choice of k?

  • A. 0
  • B. 1
  • C. 2
  • D. 3

Answer: B

Explanation:
Explanation
The elbow method is a technique that we use to determine the number of centroids (k) to use in a k-means clustering algorithm. In this method, we plot the within-cluster sum of squares (WCSS) against the number of clusters (k) and look for the point where the curve bends sharply. This point is called the elbow point and it indicates that adding more clusters does not improve the model significantly. The graph in the question shows that the elbow point is at k = 4, which means that 4 is a reasonable choice for the optimal number of clusters.
References:
Elbow Method for optimal value of k in KMeans: A tutorial on how to use the elbow method with Amazon SageMaker.
K-Means Clustering: A video that explains the concept and benefits of k-means clustering.


NEW QUESTION # 84
A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (PII).
The dataset:
* Must be accessible from a VPC only.
* Must not traverse the public internet.
How can these requirements be satisfied?

  • A. Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance.
  • B. Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.
  • C. Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance
  • D. Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance.

Answer: A

Explanation:
Explanation/Reference: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies-vpc-endpoint.html


NEW QUESTION # 85
A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing.
The Data Scientist has been given the following requirements to the cloud solution:
* Combine multiple data sources.
* Reuse existing PySpark logic.
* Run the solution on the existing schedule.
* Minimize the number of servers that will need to be managed.
Which architecture should the Data Scientist use to build this solution?

  • A. Use Amazon Kinesis Data Analytics to stream the input data and perform real-time SQL queries against the stream to carry out the required transformations within the stream. Deliver the output results to a
    "processed" location in Amazon S3 that is accessible for downstream use.
  • B. Write the raw data to Amazon S3. Schedule an AWS Lambda function to run on the existing schedule and process the input data from Amazon S3. Write the Lambda logic in Python and implement the existing PySpark logic to perform the ETL process. Have the Lambda function output the results to a "processed" location in Amazon S3 that is accessible for downstream use.
  • C. Write the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing against the input data. Write the ETL job in PySpark to leverage the existing logic. Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule. Configure the output target of the ETL job to write to a
    "processed" location in Amazon S3 that is accessible for downstream use.
  • D. Write the raw data to Amazon S3. Schedule an AWS Lambda function to submit a Spark step to a persistent Amazon EMR cluster based on the existing schedule. Use the existing PySpark logic to run the ETL job on the EMR cluster. Output the results to a "processed" location in Amazon S3 that is accessible for downstream use.

Answer: A


NEW QUESTION # 86
A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked.
Which services are integrated with Amazon SageMaker to track this information? (Select TWO.)

  • A. AWS Health
  • B. AWS CloudTrail
  • C. AWS Config
  • D. AWS Trusted Advisor
  • E. Amazon CloudWatch

Answer: B,E


NEW QUESTION # 87
A company needs to deploy a chatbot to answer common questions from customers. The chatbot must base its answers on company documentation.
Which solution will meet these requirements with the LEAST development effort?

  • A. Index company documents by using Amazon Kendra. Integrate the chatbot with Amazon Kendra by using the Amazon Kendra Query API operation to answer customer questions.
  • B. Train a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents. Deploy the model as a real-time Amazon SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.
  • C. Index company documents by using Amazon OpenSearch Service. Integrate the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation to answer customer questions.
  • D. Train an Amazon SageMaker BlazingText model based on past customer questions and company documents. Deploy the model as a real-time SageMaker endpoint. Integrate the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation to answer customer questions.

Answer: A

Explanation:
Explanation
The solution A will meet the requirements with the least development effort because it uses Amazon Kendra, which is a highly accurate and easy to use intelligent search service powered by machine learning. Amazon Kendra can index company documents from various sources and formats, such as PDF, HTML, Word, and more. Amazon Kendra can also integrate with chatbots by using the Amazon Kendra Query API operation, which can understand natural language questions and provide relevant answers from the indexed documents. Amazon Kendra can also provide additional information, such as document excerpts, links, and FAQs, to enhance the chatbot experience1.
The other options are not suitable because:
Option B: Training a Bidirectional Attention Flow (BiDAF) network based on past customer questions and company documents, deploying the model as a real-time Amazon SageMaker endpoint, and integrating the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation will incur more development effort than using Amazon Kendra. The company will have to write the code for the BiDAF network, which is a complex deep learning model for question answering. The company will also have to manage the SageMaker endpoint, the model artifact, and the inference logic2.
Option C: Training an Amazon SageMaker BlazingText model based on past customer questions and company documents, deploying the model as a real-time SageMaker endpoint, and integrating the model with the chatbot by using the SageMaker Runtime InvokeEndpoint API operation will incur more development effort than using Amazon Kendra. The company will have to write the code for the BlazingText model, which is a fast and scalable text classification and word embedding algorithm. The company will also have to manage the SageMaker endpoint, the model artifact, and the inference logic3.
Option D: Indexing company documents by using Amazon OpenSearch Service and integrating the chatbot with OpenSearch Service by using the OpenSearch Service k-nearest neighbors (k-NN) Query API operation will not meet the requirements effectively. Amazon OpenSearch Service is a fully managed service that provides fast and scalable search and analytics capabilities. However, it is not designed for natural language question answering, and it may not provide accurate or relevant answers for the chatbot. Moreover, the k-NN Query API operation is used to find the most similar documents or vectors based on a distance function, not to find the best answers based on a natural language query4.
References:
1: Amazon Kendra
2: Bidirectional Attention Flow for Machine Comprehension
3: Amazon SageMaker BlazingText
4: Amazon OpenSearch Service


NEW QUESTION # 88
A data scientist is developing a pipeline to ingest streaming web traffic dat a. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.
The solution needs to do the following:
Calculate an anomaly score for each web traffic entry.
Adapt unusual event identification to changing web patterns over time.
Which approach should the data scientist implement to meet these requirements?

  • A. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the RCF model to calculate the anomaly score for each record.
  • B. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the XGBoost model to calculate the anomaly score for each record.
  • C. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the k-Nearest Neighbors (kNN) SQL extension to calculate anomaly scores for each record using a tumbling window.
  • D. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.

Answer: D


NEW QUESTION # 89
......

AWS Certified Machine Learning - Specialty Practice Tests 2024 | Pass MLS-C01 with confidence!: https://drive.google.com/open?id=1CzS77OzVYZLOoUv5gsYS5zbWwHYVqyJF

Online Exam Practice Tests with detailed explanations!: https://www.torrentexam.com/MLS-C01-exam-latest-torrent.html