Databricks-Machine-Learning-Associate 無料問題集「Databricks Certified Machine Learning Associate」
A data scientist has written a feature engineering notebook that utilizes the pandas library. As the size of the data processed by the notebook increases, the notebook's runtime is drastically increasing, but it is processing slowly as the size of the data included in the process increases.
Which of the following tools can the data scientist use to spend the least amount of time refactoring their notebook to scale with big data?
Which of the following tools can the data scientist use to spend the least amount of time refactoring their notebook to scale with big data?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist has replaced missing values in their feature set with each respective feature variable's median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.
Which of the following approaches can they take to include as much information as possible in the feature set?
Which of the following approaches can they take to include as much information as possible in the feature set?
正解:E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A machine learning engineer is using the following code block to scale the inference of a single-node model on a Spark DataFrame with one million records:
Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator?
Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:
* Hyperparameter 1: [2, 5, 10]
* Hyperparameter 2: [50, 100]
Which of the following represents the number of machine learning models that can be trained in parallel during this process?
* Hyperparameter 1: [2, 5, 10]
* Hyperparameter 2: [50, 100]
Which of the following represents the number of machine learning models that can be trained in parallel during this process?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)