Note: A new version of Implementing an Azure Data Solution [DP-200] has come, refer to DP-203
This blog covers the Step-By-Step Activity Guide of Implementing an Azure Data Solution [DP-200] Hands-On Labs Training program that you must perform to learn this course.
Azure data engineers are responsible for data-related implementation tasks that consist of provisioning data storage services, ingesting streaming and batch data, implementing security requirements, transforming data, implementing data retention policies, identifying performance bottlenecks, and accessing external data sources.
Here’s the quick guide of how to start learning Data Science on Azure & to clear the DP-200 Exam by doing Hands-on.
To know more about the DP-200 Exam you can read our blog on DP-200 Implementing An Azure Data Solution.
DP-200| Implementing An Azure Data Solution
- Azure for the Data Engineer.
- Working with Data Storage.
- Enabling Team-Based Data Science with Azure Databricks.
- Building a Globally Distributed Database with Cosmos DB.
- Working with Relational Data Stores in the Cloud.
- Performing Real-Time Analytics with Stream Analytics.
- Orchestrating Data Movement with Azure Data Factory.
- Securing Azure Data Platforms.
- Monitoring and Troubleshooting Data Storage and Processing.
Skills Measured In DP-200 Exam
- Implement Data Storage Solutions (40-45%)
- Manage and Develop Data Processing (25-30%)
- Monitor and Optimize Data Solutions (30-35%)
Note: We will use the data gained in these modules and cover it to a scenario that is explained in a case study about AdventureWorks.
Lab 1: Azure For The Data Engineer
Exercise 1: Identify The Evolving Of World Data
- In this exercise, we’ll identify the data requirements and identify if the data structure for the requirement is structured, semi-structured, or unstructured from the case study in Implementing an Azure Data Solution.
- Non-Relational: Document data, graph data, column family, etc.
- Relational: Data stored in tables. i.e. Customer table, employee tables.
Exercise 2: Determine The Azure Data Platform Services
- In this exercise, we’ll determine the data platform technology that conveys the identified data requirements.
Exercise 3: Identify The Tasks To Be Performed By The Data Engineer
- In this exercise, we’ll select one of the requirements and determine the high-level tasks that will perform to meet the data requirement selected i.e.
- Provisioning data storage services.
- Ingesting Streaming and batch data.
- Transforming data.
Exercise 4: Finalize The Data Engineering Deliverables
- In this exercise, we’ll decide the data engineering output for AdventureWorks.
Also read: Everything you need to know on Azure SQL Database
Lab 2: Working With Data Storage
Exercise 1: Choose A Data Storage Approach In Azure
- In this exercise, we’ll identify the data storage requirements for the static images for the website, and the predictive analytics solution from the case study in Implementing an Azure Data Solution.
- Each data set has distinct requirements, and it’s our job to figure out which storage solution is best.
Exercise 2: Create An Azure Storage Account
- In this exercise, we’ll create an Azure resource group in the region closest to our location.
- Create a container named images, phone calls, and tweets within the storage account.
- Upload some graphics to the images container of the storage account.
Exercise 3: Explain Azure Data Lake Storage
- An azure data lake is a no-limits analytics job service to power intelligent action.
- In this exercise, we’ll create and configure a storage account as a Data Lake Store Gen2 storage type in the region closest to our location, in the resource group.
Exercise 4: Upload Data Into Azure Data Lake
- In this exercise, we’ll Install and start Microsoft Azure Storage Explorer and Upload some data files to the containers of the Data Lake Gen II Storage Account.
Also read: How Azure Event Hub & Event Grid Works?
Lab 3: Enabling Team-Based Data Science With Azure Databricks
Exercise 1: Explain Azure Databricks
- Azure Databricks is easy to create data analytics platforms. Depend on Apache Spark “big data” platform.
Exercise 2: Work With Azure Databricks
- In this exercise, we’ll Create an Azure Databricks Premium Tier instance in a resource group and then Open Azure Databricks and Launch a Databricks Workspace, and create a Spark Cluster.
Exercise 3: Read Data With Azure Databricks
- In this exercise, we’ll confirm that the Databricks cluster has been created and then collect the Azure Data Lake Store Gen2 account name. Enable your Databricks instance to access the Data Lake Gen2 Store.
- We’ll Create a Databricks Notebook and connect to a Data Lake Store and then Read data in Azure Databricks.
Exercise 4: Perform Basic Transformations With Azure Databricks
- In this exercise, we’ll Retrieve specific columns on a Dataset and then Performing a column rename on a Dataset. Add an Annotation and If Time permits: Additional transformations.
Also read: our blog on Azure Data Lake Overview for Beginners
Lab 4: Building Globally Distributed Databases with Cosmos DB
Exercise 1: Create An Azure Cosmos DB database Built To Scale
- Azure Cosmos DB is Microsoft’s globally distributed, multi-model database service. In this exercise, we’ll create an Azure Cosmos DB instance.
Check Out : Overview of stream analytics
Exercise 2: Insert And Query Data In Your Azure Cosmos DB database
- In this exercise, we’ll Setup your Azure Cosmos DB database and container and then add data using the portal.
- We’ll Run queries in the Azure portal. Run complex operations on our data.
Exercise 3: Distribute Your Data Globally With Azure Cosmos DB
- In this exercise, we’ll clone Data to Multiple Regions and managing Failover.
Note: A new version of Implementing an Azure Data Solution [DP-200] has come, refer to DP-203
Lab 5: Working With Relational Data Stores In The Cloud
Exercise 1: Use Azure SQL Database
- A “database as a service” offering from Azure runs the SQL server database engine under the hood, not 100 % compatible, but also a slight change to our code might be required some SQL server features are not supported.
- In this exercise, we’ll create and configure a SQL Database instance.
Exercise 2: Describe Azure Synapse Analytics
- Azure Synapse is a limitless analytics service that brings together Big Data analytics and enterprise data warehousing.
- In this exercise, we’ll Create and configure an Azure Synapse Analytics instance and Configure the Server Firewall and then Pause the warehouse database.
Exercise 3: Creating An Azure Synapse Analytics Database And Tables
- In this exercise, we’ll install SQL Server Management Studio and attach to a data warehouse instance and then create a SQL Data Warehouse database, and create SQL Data Warehouse tables.
Exercise 4: Using PolyBase To Load Data Into Azure Synapse Analytics
- Polybase allows us to query external databases like SQL, Oracle, Teradata, MongoDB, and Azure blob storage.
- In this exercise, we’ll save the Data Lake Storage container and key details and then create a dbo. Dates table using PolyBase from Azure Data Lake Storage.
Lab 6: Performing Real-Time Analytics With Stream Analytics
Exercise 1: Explain Data Streams And Event Processing
- Azure Stream Analytics is a complex event-processing and real-time analytics engine that is designed to analyze and handle high volumes of fast streaming data from many sources together.
- In this exercise, we’ll analyze the data stream ingestion technology for AdventureWorks, and the high-level tasks that we will conduct as a data engineer to complete the social media analysis requirements from the case study in Implementing an Azure Data Solution and the scenario.
Exercise 2: Data Ingestion With Event Hubs
- Event Hubs is a fully managed, real-time data ingestion service that’s smooth, reliable, and scalable.
- In this exercise, we’ll create and configure an Event Hub Namespace, and Event Hub and configure Event Hub security.
Exercise 3: Starting The Telecom Event Generator Application
- In this exercise, we’ll update the application connection string and run the application.
Exercise 4: Processing Data With Stream Analytics Jobs
In this exercise, we’ll do the following tasks:
- Provision a Stream Analytics job and Specify the Stream Analytics job input.
- Specify the Stream Analytics job output and Define a Stream Analytics query.
- Start the Stream Analytics job and Validate streaming data is collected.
Lab 7: Orchestrating Data Movement With Azure Data Factory
Exercise 1: Setup Azure Data Factory
- Using Azure Data Factory, we can create and schedule data-driven pipelines that can ingest data from dissimilar data stores. In this exercise, we’ll set up Azure Data Factory.
Exercise 2: Ingest Data Using The Copy Activity
- In this exercise, we’ll add the Copy Activity to the designer and then Create a new HTTP dataset to use as a source. Create a new ADLS Gen2 sink and test the Copy Activity.
Exercise 3: Transforming Data With Mapping Data Flow
- In this exercise, we’ll be preparing the environment and be Adding a Data Source. Using Mapping Data Flow transformation writing to a Data Sink and then running the Pipeline.
Exercise 4: Azure Data Factory And Databricks
- In this exercise, we’ll Generate a Databricks Access Token. Generate a Databricks Notebook Create Linked Services and we Create a Pipeline that uses Databricks Notebook Activity and then triggers a Pipeline Run.
Lab 8: Securing Azure Data Platforms
Exercise 1: An Introduction To Security
- In this exercise, we’ll find proper and up-to-data information about Azure security and used Security as a layered approach.
Exercise 2: Key Security Components
- In this exercise, we’ll be determining Data and Storage Security Hygiene.
Exercise 3: Securing Storage Accounts And Data Lake Storage
- In this exercise, we’ll be determining the appropriate security approach for Azure Blob.
Exercise 4: Securing Data Stores
- In this exercise, we’ll Enable Auditing, Query the Database, and View the Audit log.
Exercise 5: Securing Streaming Data
- In this exercise, we’ll Change Event Hub Permissions.
Lab 9: Monitoring And Troubleshooting Data Storage And Processing
Exercise 1: Explain The Monitoring Capabilities That Are Available
- In this exercise, we’ll be Defining a corporate monitoring approach.
- Network Performance Monitor.
- Application Gateway Analytics.
Exercise 2: Troubleshoot Common Data Storage Issues
- In this exercise, we’ll find issues that are related to data storage.
- Consistency
- Corruption
Exercise 3: Troubleshoot Common Data Processing Issues
- In this exercise, we’ll determine issues that are related to data processing.
Exercise 4: Manage Disaster Recovery
- In this exercise, we’ll Manage Disaster Recovery.
- In Azure, there are two core services that we’ll take advantage of. The first is the Azure Site Recovery or ASR, and the second is Azure Backup. Both ASR and Azure Backup complement each other to provide you with end-to-end business continuity and disaster recovery solution with unlimited scale.
These are the list of activity guides/hands-on required for the preparation of DP-200 Implementing An Azure Data Solution Exam.
Note: A new version of Implementing an Azure Data Solution [DP-200] has come, refer to DP-203
Related/References
- Implementing An Azure Data Solution Exam [DP-200]: Everything You Need To Know
- [DP-100] Microsoft Certified Azure Data Scientist Associate: Everything you must know
- Microsoft Certified Azure Data Scientist Associate | DP 100 | Step By Step Activity Guides (Hands-On Labs)
- [AI-900] Microsoft Certified Azure AI Fundamentals Course: Everything you must know
- Microsoft Azure AI Fundamentals [AI-900]: Step By Step Activity Guides (Hands-On Labs)
Next Task For You
In our Azure Data Engineer training program, we will cover 28 Hands-On Labs. If you want to begin your journey towards becoming a Microsoft Certified: Azure Data Engineer Associate by checking out our FREE CLASS.
Leave a Reply