DP-203 無料問題集「Microsoft Data Engineering on Microsoft Azure」
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must meet the following requirements:
Automatically scale down workers when the cluster is underutilized for three minutes.
Minimize the time it takes to scale to the maximum number of workers.
Minimize costs.
What should you do first?
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must meet the following requirements:
Automatically scale down workers when the cluster is underutilized for three minutes.
Minimize the time it takes to scale to the maximum number of workers.
Minimize costs.
What should you do first?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You have an Azure Data Lake Storage Gen2 account named account1 that contains a container named Container"1. Container1 contains two folders named FolderA and FolderB.
You need to configure access control lists (ACLs) to meet the following requirements:
* Group1 must be able to list and read the contents and subfolders of FolderA.
* Group2 must be able to list and read the contents of FolderA and FolderB.
* Group2 must be prevented from reading any other folders at the root of Container1.
How should you configure the ACL permissions for each group? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_864_20250121.jpg)
You need to configure access control lists (ACLs) to meet the following requirements:
* Group1 must be able to list and read the contents and subfolders of FolderA.
* Group2 must be able to list and read the contents of FolderA and FolderB.
* Group2 must be prevented from reading any other folders at the root of Container1.
How should you configure the ACL permissions for each group? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_864_20250121.jpg)
正解:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_865_20250121.jpg)
Explanation:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_866_20250121.jpg)
You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following requirements:
Ensure that the data remains in the UK South region at all times.
Minimize administrative effort.
Which type of integration runtime should you use?
You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following requirements:
Ensure that the data remains in the UK South region at all times.
Minimize administrative effort.
Which type of integration runtime should you use?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_294_20250121.jpg)
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_295_20250121.jpg)
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_294_20250121.jpg)
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_295_20250121.jpg)
正解:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_297_20250121.jpg)
Explanation:
Box 1: succeed
Box 2: failed
Example:
Now let's say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as the previous scenario.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_298_20250121.jpg)
Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.
Reference:
https://datasavvy.me/2021/02/18/azure-data-factory-activity-failures-and-pipeline-outcomes/
You have an Azure data factory that connects to a Microsoft Purview account. The data factory is registered in Microsoft Purview.
You update a Data Factory pipeline.
You need to ensure that the updated lineage is available in Microsoft Purview.
What You have an Azure subscription that contains an Azure SQL database named DB1 and a storage account named storage1. The storage1 account contains a file named File1.txt. File1.txt contains the names of selected tables in DB1.
You need to use an Azure Synapse pipeline to copy data from the selected tables in DB1 to the files in storage1. The solution must meet the following requirements:
* The Copy activity in the pipeline must be parameterized to use the data in File1.txt to identify the source and destination of the copy.
* Copy activities must occur in parallel as often as possible.
Which two pipeline activities should you include in the pipeline? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
You update a Data Factory pipeline.
You need to ensure that the updated lineage is available in Microsoft Purview.
What You have an Azure subscription that contains an Azure SQL database named DB1 and a storage account named storage1. The storage1 account contains a file named File1.txt. File1.txt contains the names of selected tables in DB1.
You need to use an Azure Synapse pipeline to copy data from the selected tables in DB1 to the files in storage1. The solution must meet the following requirements:
* The Copy activity in the pipeline must be parameterized to use the data in File1.txt to identify the source and destination of the copy.
* Copy activities must occur in parallel as often as possible.
Which two pipeline activities should you include in the pipeline? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
正解:C、D
解答を投票する
You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:
* The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.
* Line total sales amount and line total tax amount will be aggregated in Databricks.
* Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.
You need to recommend an output mode for the dataset that will be processed by using Structured Streaming.
The solution must minimize duplicate data.
What should you recommend?
* The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.
* Line total sales amount and line total tax amount will be aggregated in Databricks.
* Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.
You need to recommend an output mode for the dataset that will be processed by using Structured Streaming.
The solution must minimize duplicate data.
What should you recommend?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.
You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.
What should you include in the solution?
You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.
What should you include in the solution?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1.
You have the queries shown in the following table.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_155_20250121.jpg)
You are evaluating whether to enable result set caching for Pool1. Which query results will be cached if result set caching is enabled?
You have the queries shown in the following table.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_155_20250121.jpg)
You are evaluating whether to enable result set caching for Pool1. Which query results will be cached if result set caching is enabled?
正解:D
解答を投票する
You have an Azure Synapse Analytics workspace named WS1.
You have an Azure Data Lake Storage Gen2 container that contains JSON-formatted files in the following format.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_198_20250121.jpg)
You need to use the serverless SQL pool in WS1 to read the files.
How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_200_20250121.jpg)
You have an Azure Data Lake Storage Gen2 container that contains JSON-formatted files in the following format.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_198_20250121.jpg)
You need to use the serverless SQL pool in WS1 to read the files.
How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_200_20250121.jpg)
正解:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_202_20250121.jpg)
Explanation:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_204_20250121.jpg)
Box 1: openrowset
The easiest way to see to the content of your CSV file is to provide file URL to OPENROWSET function, specify csv FORMAT.
Example:
SELECT *
FROM OPENROWSET(
BULK 'csv/population/population.csv',
DATA_SOURCE = 'SqlOnDemandDemo',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
FIELDTERMINATOR =',',
ROWTERMINATOR = '\n'
Box 2: openjson
You can access your JSON files from the Azure File Storage share by using the mapped drive, as shown in the following example:
SELECT book.* FROM
OPENROWSET(BULK N't:\books\books.json', SINGLE_CLOB) AS json
CROSS APPLY OPENJSON(BulkColumn)
WITH( id nvarchar(100), name nvarchar(100), price float,
pages_i int, author nvarchar(100)) AS book
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-single-csv-file
https://docs.microsoft.com/en-us/sql/relational-databases/json/import-json-documents-into-sql-server
You plan to develop a dataset named Purchases by using Azure databricks Purchases will contain the following columns:
* ProductID
* ItemPrice
* lineTotal
* Quantity
* StorelD
* Minute
* Month
* Hour
* Year
* Day
You need to store the data to support hourly incremental load pipelines that will vary for each StoreID. the solution must minimize storage costs. How should you complete the rode? To answer, select the appropriate options In the answer area.
NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_250_20250121.jpg)
* ProductID
* ItemPrice
* lineTotal
* Quantity
* StorelD
* Minute
* Month
* Hour
* Year
* Day
You need to store the data to support hourly incremental load pipelines that will vary for each StoreID. the solution must minimize storage costs. How should you complete the rode? To answer, select the appropriate options In the answer area.
NOTE: Each correct selection is worth one point.
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_250_20250121.jpg)
正解:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_251_20250121.jpg)
Explanation:
![](https://www.jpntest.com/uploads/imgs/DP-203 V20.65/img_252_20250121.jpg)
Box 1: partitionBy
We should overwrite at the partition level.
Example:
df.write.partitionBy("y","m","d")
mode(SaveMode.Append)
parquet("/data/hive/warehouse/db_name.db/" + tableName)
Box 2: ("StoreID", "Year", "Month", "Day", "Hour", "StoreID")
Box 3: parquet("/Purchases")
Reference:
https://intellipaat.com/community/11744/how-to-partition-and-write-dataframe-in-spark-without-deleting- partitions-with-no-new-data
A company purchases IoT devices to monitor manufacturing machinery. The company uses an IoT appliance to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)