This blog post will give a quick review of all the questions that were discussed in our Microsoft Azure Data Fundamentals [DP900]. The Azure DP900 Certification is for all those who are looking forward to starting working with or shifting their focus to Azure Cloud Data Services for various data-related tasks.
The previous week, In Module 3 Q/A we got an overview of Azure Storage Account, Azure Data Lake, and Azure Cosmos DB.
Microsoft Azure DP900 Certification gives a holistic overview of the most common services. It covers some Modern Data Warehousing concepts, Data Ingestion in Azure, and an overview of Power BI.
We covered the following Modules in the Azure DP-900 Day 4 Session:
- Module 04: Explore Modern Data Warehouse Analytics in Azure.
Here are the questions that we discussed in the Azure DP900 Day 4 Session:
> Module 04: Data warehouse analytics workload on Azure
After looking at the Azure Services to work with non-relational data, we moved to Module 04 where we covered the Data Warehousing concepts and their solutions in Azure. Azure Data Factory, Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics, and Azure HDInsight are some widely used data warehouse services in Azure. We also saw how to ingest huge data loads in these warehouses using Azure tools – PolyBase (File Based) and SSIS (Heterogenous). Finally, we learned Power BI in Azure Cloud.
Q2: What is Azure Data Lake Storage?
A: Azure Data Lake storage is a repository of data for your modern data warehouse
- Organizes data into directories for improved file access.
- Support POSIX and RBAC permissions.
- Compatible with the Hadoop Distributed File System
Source: Microsoft
Also Check: Structured vs Unstructured Data, know their major differences!
Q3: What is the Azure Data factory?
A: Azure Data Factory is described as a data integration service.
- Retrieves data from more than one data source and converts it.
- Filters out noise to keep interesting data
- Work is defined as a pipeline operation -runs continuously as data is received
Source: Microsoft
Q4: What is Azure Databricks?
A: The Azure Databricks is a cloud analytics platform for Apache Spark. It is a fully managed and Spark platform and provides a good collaboration environment for data engineers, data scientists, and data analysts. It provides connectors for many storage services and supports multiple languages such as Python, R, Scala, Java.
It provides a scalable, secure, and optimized platform for ingesting data and performing various analytical and machine learning workloads on the huge data loads with Apache Spark clusters at the backend.
Source: Microsoft
Q5: Can you explain Azure HDInsight?
A: The Azure HDInsight is a one-stop collection of various Apache Hadoop components for Big Data in the cloud environment. We can access and use all popular open-source Big Data frameworks such as Hadoop, Hive, Spark, Kafka, and many more. This makes your solution highly optimized, fast, scalable to huge data workloads, and cost-effective.
We can use Azure HDInsight for our Big Data ETL workloads, batch data processing, data warehousing, machine learning processes, and IoT data handling to create effective business solutions that are difficult to build while on-premise.
Azure HDInsight supports multiple programming languages and development tools for developers to work with. It also supports Java and other Hadoop-specific languages. HDInsight can be later connected with BI Tools for further data analysis on the process data.
Source: Microsoft
Q6: Use of Azure Synapse Analytics?
A: Azure Synapse Analytics is the latest enhancement of the Azure SQL Data Warehouse that promises to bridge the gap between data lakes and data warehouses.
To extract data from where it is stored, we can load that data into an analytical store, transform the data, and shape it for the analysis as well. This approach is known as ELT (Extract Load Transform)
Azure Synapse Analytics is particularly suitable for this approach. Using Apache spark and an automated pipeline, synapse analytics can run parallel processing tasks across massive data sets and perform big data analytics.
Source: Microsoft
Q7: What is Polybase used for?
A: Polybase is a feature of the SQL Server (Azure SQL Servers or Azure Synapse Analytics, Parallel data warehouse) for data virtualization. This means it allows our SQL Server instances to virtualize and query external data (both relational and non-relational) using T-SQL, while the data are still in their original location and format.
Polybase can join and query data from various sources – Hadoop clusters, MongoDB, Cosmos DB into relational tables in the SQL Server instance. All this is done without a separate client connection software or any ETL process. Polybase connectors allow you to directly query external data in one go, just like any other table in SQL Server.
Some noticeable features of Polybase are that they can push your query computations to Hadoop. This enables data engineers to leverage Hadoop’s distributed computing environment and results in or query optimization. Polybase scale-out groups transfer data parallelly between SQL Server instances and Hadoop nodes also using additional compute resources to work with the external data.
Source: Microsoft
Also Check: Our blog post on Azure Cosmos DB.
Source: Microsoft
Q9: Characteristics of Online transaction processing (OLTP) workload?
A: The characteristics of OLTP are:
- Schema on write
- Heavy write and Moderate reads
- Normalized data.
Q10: What is Power BI?
A: Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Your data may be an Excel spreadsheet or a collection of cloud-based and on-premises hybrid data warehouses. Power BI lets you easily connect to your data
sources, visualize and discover what’s important, and share that with anyone or everyone you want.
Q11: Difference between a Reporting tool and a BI tool?
A: Reporting Tool: It helps in Visualizing the Data.
BI Tool: Ingesting, Transform, Load (Data Model), and visualize.
Power BI: Self-Service BI Tool.
Q12: Power BI Desktop is licensed how we can install and work for practice?
A: If you want to practice Power BI you can use it freely but for the premium edition, you have to pay for it.
Q13. Can we SSIS package migrate or move from on-premises to azure?
A: Yes, migrate SSIS is using Azure Data Factory. This is the lift and shift approach for migrating SSIS packages on Azure. Azure Data Factory provides an SSIS Integration Runtime to run Integration Services on Azure.
Q14. A pipeline is used to transform data, then how come it will be used to ingest? won’t the answer be Linked services?
A: Yes, it is done through a linked service.
Q15. What is the sink in this?
A: A Data Sink is a reservoir that accumulates and stores collected for an indefinite period.
Q16: Where did we define the azure data lake storage when we created this pipeline?
A: Linked services define the information needed for Data Factory to connect to external resources. For example, an Azure Storage linked service specifies a connection string to connect to the Azure Storage account.
> General Azure DP-900 Certification FAQ
Q1: After 2 days session, can we write the DP-900 exam, or do we need to refer to more materials/questions?
A: Yes, you can book your exam after two days of live sessions. Also, you need to give all the practice sets that cover near real-time questions to boost your score in the exam.
Q2: Do, I need AZ Fundamentals as well or it can be skipped?
A: Azure Fundamentals can be used to prepare for other Azure role-based or specialty certifications, but it is not a prerequisite for any of them.
Quiz Time (Sample Exam Questions)
With our Microsoft Azure Data Fundamental Program, we cover Over 150+ Sample questions to help you prepare for the Certification [DP-900]
Check out these Questions:
References
- Microsoft Certified Azure Data Fundamentals | DP 900 |
- Microsoft Azure Data Fundamentals [DP-900]: Step By Step Activity Guides (Hands-On Labs)
- Introduction To Modern Data Warehouse
Next Task For You
In our Azure Data Engineer training program, we will cover 40 Hands-On Labs. If you want to begin your journey towards becoming a Microsoft Certified: Azure Data Engineer Associate by checking our FREE CLASS.
Leave a Reply