Databricks-Certified-Professional-Data-Engineer無料問題集「Databricks Certified Professional Data Engineer」

質問 1

The marketing team is looking to share data in an aggregate table with the sales organization, but the field names used by the teams do not match, and a number of marketing specific fields have not been approval for the sales org.
Which of the following solutions addresses the situation while emphasizing simplicity?

（A）Use a CTAS statement to create a derivative table from the marketing table configure a production jon to propagation changes.

（B）Add a parallel table write to the current production pipeline, updating a new sales table that varies as required from marketing table.

（C）Create a new table with the required schema and use Delta Lake's DEEP CLONE functionality to sync up changes committed to one table to the corresponding table.

（D）Create a view on the marketing table selecting only these fields approved for the sales team alias the names of any fields that should be standardized to the sales naming conventions.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

Which statement describes the correct use of pyspark.sql.functions.broadcast?

（A）It marks a column as having low enough cardinality to properly map distinct values to available partitions, allowing a broadcast join.

（B）It marks a column as small enough to store in memory on all executors, allowing a broadcast join.

（C）It marks a DataFrame as small enough to store in memory on all executors, allowing a broadcast join.

（D）It caches a copy of the indicated table on attached storage volumes for all active clusters within a Databricks workspace.

（E）It caches a copy of the indicated table on all nodes in the cluster for use in all future queries during the cluster lifetime.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.
A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.
Which statement captures best practices for this situation?

（A）In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

（B）Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

（C）All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

（D）Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

The data architect has mandated that all tables in the Lakehouse should be configured as external (also known as "unmanaged") Delta Lake tables.
Which approach will ensure that this requirement is met?

（A）When the workspace is being configured, make sure that external cloud object storage has been mounted.

（B）When a database is being created, make sure that the LOCATION keyword is used.

（C）When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.

（D）When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.

（E）When data is saved to a table, make sure that a full file path is specified alongside the Delta format.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.
The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate tableused by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.
Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

（A）Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.

（B）Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.

（C）Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.

（D）Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.

（E）Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

Which Python variable contains a list of directories to be searched when trying to locate required modules?

（A）pylib.source

（B）importlib.resource path

（C）pypi.path

（D）,sys.path

（E）os-path

正解：D 解答を投票する

質問 7

A data engineer has created a transactions Delta table on Databricks that should be used by the analytics team.
The analytics team wants to use the table with another tool that requires Apache Iceberg format.
What should the data engineer do?

（A）Create an Iceberg copy of the transactions Delta table which can be used by the analytics team.

（B）Require the analytics team to use a tool that supports Delta table.

（C）Convert the transactions Delta table to Iceberg and enable uniform so that the table can be read as a Delta table.

（D）Enable uniform on the transactions table to 'iceberg' so that the table can be read as an Iceberg table.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

The data governance team is reviewing code used for deleting records for compliance with GDPR. They note the following logic is used to delete records from the Delta Lake table namedusers.

Assuming thatuser_idis a unique identifying key and thatdelete_requestscontains all users that have requested deletion, which statement describes whether successfully executing the above logic guarantees that the records to be deleted are no longer accessible and why?

（A）No; the Delta Lake delete command only provides ACID guarantees when combined with the merge into command.

（B）No; files containing deleted records may still be accessible with time travel until a vacuum command is used to remove invalidated data files.

（C）Yes; the Delta cache immediately updates to reflect the latest data files recorded to disk.

（D）Yes; Delta Lake ACID guarantees provide assurance that the delete command succeeded fully and permanently purged these records.

（E）No; the Delta cache may return records from previous versions of the table until the cluster is restarted.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 9

Review the following error traceback:

Which statement describes the error being raised?

（A）There is a syntax error because the heartrate column is not correctly identified as a column.

（B）The code executed was PvSoark but was executed in a Scala notebook.

（C）There is a type error because a DataFrame object cannot be multiplied.

（D）There is no column in the table named heartrateheartrateheartrate

（E）There is a type error because a column object cannot be multiplied.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 10

A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, usingdisplay()calls to confirm code is producing the logically correct results as new transformations are added to an operation. To get a measure of average time to execute, the user is running each cell multiple times interactively.
Which of the following adjustments will get a more accurate measure of how code is likely to perform in production?

（A）Scala is the only language that can be accurately tested using interactive notebooks; because the best performance is achieved by using Scala code compiled to JARs. all PySpark and Spark SQL logic should be refactored.

（B）The Jobs Ul should be leveraged to occasionally run the notebook as a job and track execution time during incremental code development because Photon can only be enabled on clusters launched for scheduled jobs.

（C）The only way to meaningfully troubleshoot code execution times in development notebooks Is to use production-sized data and production-sized clusters with Run All execution.

（D）Production code development should only be done using an IDE; executing code against a local build of open source Spark and Delta Lake will provide the most accurate benchmarks for how code will perform in production.

（E）Calling display () forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

Databricks-Certified-Professional-Data-Engineer 無料問題集「Databricks Certified Professional Data Engineer」

弊社を連絡する

関連リンク

トップ試験