Databricks-Machine-Learning-Associate無料問題集「Databricks Certified Machine Learning Associate」

質問 1

A data scientist is using Spark SQL to import their data into a machine learning pipeline. Once the data is imported, the data scientist performs machine learning tasks using Spark ML.
Which of the following compute tools is best suited for this use case?

（A）None of these compute tools support this task

（B）SQL Warehouse

（C）Standard cluster

（D）Single Node cluster

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

Which of the following machine learning algorithms typically uses bagging?

（A）Linear regression

（B）Gradient boosted trees

（C）Decision tree

（D）Random forest

（E）K-means

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

A data scientist is using MLflow to track their machine learning experiment. As a part of each of their MLflow runs, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values. All parent and child runs are being manually started with mlflow.start_run.
Which of the following approaches can the data scientist use to accomplish this MLflow run organization?

（A）They can turn on Databricks Autologging

（B）They can start each child run with the same experiment ID as the parent run

（C）They can specify nested=True when starting the child run for each unique combination of hyperparameter values

（D）They can start each child run inside the parent run's indented code block using mlflow.start runO

（E）They can specify nested=True when starting the parent run for the tuning process

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model in parallel. They elect to use the Hyperopt library to facilitate this process.
Which of the following Hyperopt tools provides the ability to optimize hyperparameters in parallel?

（A）quniform

（B）search_space

（C）objective_function

（D）SparkTrials

（E）fmin

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

（A）spark_df.to_pandas()

（B）import pyspark.pandas as ps
df = ps.DataFrame(spark_df)

（C）import pyspark.pandas as ps
df = ps.to_pandas(spark_df)

（D）import pandas as pd
df = pd.DataFrame(spark_df)

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective function objective_function and they have defined the search space search_space.
As a result, they have the following code block:

Which of the following changes do they need to make to the above code block in order to accomplish the task?

（A）Remove the trials=trials argument

（B）Change fmin() to fmax()

（C）Reduce num_evals to be less than 10

（D）Change SparkTrials() to Trials()

（E）Remove the algo=tpe.suggest argument

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

A data scientist has replaced missing values in their feature set with each respective feature variable's median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.
Which of the following approaches can they take to include as much information as possible in the feature set?

（A）Impute the missing values using each respective feature variable's mean value instead of the median value

（B）Create a constant feature variable for each feature that contained missing values indicating the percentage of rows from the feature that was originally missing

（C）Remove all feature variables that originally contained missing values from the feature set

（D）Refrain from imputing the missing values in favor of letting the machine learning algorithm determine how to handle them

（E）Create a binary feature variable for each feature that contained missing values indicating whether each row's value has been imputed

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.
Which of the following feature engineering tasks will be the least efficient to distribute?

（A）Creating binary indicator features for missing values

（B）Target encoding categorical features

（C）One-hot encoding categorical features

（D）Imputing missing feature values with the mean

（E）Imputing missing feature values with the true median

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

Databricks-Machine-Learning-Associate 無料問題集「Databricks Certified Machine Learning Associate」

弊社を連絡する

関連リンク

トップ試験