Databricks-Machine-Learning-Associate 無料問題集「Databricks Certified Machine Learning Associate」

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

解説: (JPNTest メンバーにのみ表示されます)
A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library's fmin operation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with the objective_function being passed as an argument to fmin.
They use the following code block to create the objective_function:

Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?

解説: (JPNTest メンバーにのみ表示されます)
A data scientist is using the following code block to tune hyperparameters for a machine learning model:

Which change can they make the above code block to improve the likelihood of a more accurate model?

解説: (JPNTest メンバーにのみ表示されます)
Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

解説: (JPNTest メンバーにのみ表示されます)
A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation.
They attempt to run the following code block, but it does not accomplish the desired task:

Which of the following changes can the data scientist make to accomplish the task?

解説: (JPNTest メンバーにのみ表示されます)
The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

解説: (JPNTest メンバーにのみ表示されます)
Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?

解説: (JPNTest メンバーにのみ表示されます)
A data scientist is working with a feature set with the following schema:

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

解説: (JPNTest メンバーにのみ表示されます)

弊社を連絡する

我々は12時間以内ですべてのお問い合わせを答えます。

オンラインサポート時間:( UTC+9 ) 9:00-24:00
月曜日から土曜日まで

サポート:現在連絡