This blog post will go through some quick tips including Q/A and related blog posts on the topics that we covered in the Azure Data Engineer Day 8 Live Session which will help you gain a better understanding and make it easier for you to
The previous week, In Day 7 session we got an overview of concepts of Orchestrate data movement and transformation in Azure Synapse Pipelines, where we have covered topics like Azure Data Factory or Azure Synapse Pipelines, windowing and Hyper Loglog functions, use data loading best practices.
In this week, Day 8 Live Session of the
Training Program, we covered the concepts of Module 8: End-to-end security with Azure Synapse Analytics, where we have covered topics like Secure Azure Synapse Analytics supporting infrastructure, Secure the Azure Synapse Analytics workspace and managed services, Secure Azure Synapse Analytics workspace data.We also covered hands-on Securing Azure Synapse Analytics Supporting Infrastructure and Securing The Azure Synapse Analytics Workspace and Managed Services out of our 27 extensive labs.
So, here are some FAQs asked during the Day 8 Live session from Module 8 Of DP203.
>End-to-end security with Azure Synapse Analytics
Azure Synapse Analytics is a powerful solution that handles security for many of the resources that it creates and manages. In order to run Azure Synapse Analytics, some foundational security measures need to be put in place to ensure the infrastructure that it relies upon is secure.
>Understand Network Security Options for Azure Synapse Analytics
Firewall rules enable you to define the type of traffic that is allowed or denied access to an Azure Synapse workspace using the originating IP address of the client that is trying to access the Azure Synapse Workspace. IP firewall rules configured at the workspace level apply to all public endpoints of the workspace including dedicated SQL pools, serverless SQL pool, and the development endpoint.
Also Read: Our blog post on Apache Spark Architecture.
> Configure authentication
Authentication is the process of validating credentials as you access resources in digital infrastructure. This ensures that you can validate that an individual or a service that wants to access a service in your environment can prove who they are. Azure Synapse Analytics provides several different methods for authentication.
> Types of security
Azure Active Directory
Azure Active Directory is a directory service that allows you to centrally maintain objects that can be secured. The objects can include user accounts and computer accounts. An employee of an organization will typically have a user account that represents them in the organization’s Azure Active Directory tenant, and they then use the user account with a password to authenticate against other resources that are stored within the directory using a process known as single sign-on.
Managed identities
Managed identity for Azure resources could be a feature of Azure Active Directory. The feature provides Azure services with associate mechanically managed identity in Azure AD. you’ll be able to use the Managed Identity capability to authenticate to any service that supports Azure Active Directory authentication.
Keys
If you are unable to use a managed identity to access resources such as Azure Data Lake then you can use storage account keys and shared access signatures.
With storage account keys. Azure creates 2 of those keys (primary and secondary) for every storage account you create. The keys give access to everything within the account. You’ll find the storage account keys in the Azure portal view of the storage account. Just select Settings, and then click Access keys.
Shared access signatures
If an external third-party application needs access to your data, you’ll need to secure their connections without using storage account keys. For untrusted purchasers, use a shared access signature (SAS). A shared access signature could be a string that contains a security token that will be attached to a URI. Use a shared access signature to delegate access to storage objects and specify constraints, like the permissions and also the time vary of access. you’ll be able to provide a customer a shared access signature token
Column level security in Azure Synapse Analytics
Column-level security is simplifying a design and coding for the security in your application. It allows you to restrict column access in order to protect sensitive data. For example, if you want to ensure that a specific user ‘Leo’ can only access certain columns of a table because he’s in a specific department. The logic for ‘Leo’ only to access the columns specified for the department he works in, is a logic that is located in the database tier, rather than on the application level data tier
Row-level security in Azure Synapse Analytics
Row-level security (RLS) will assist you to make a group membership or execution context so as to regulate not simply columns in a very database table, however really, the rows. RLS, similar to column-level security, will merely facilitate and change the look and cryptography of your application security.
Manage sensitive data with Dynamic Data Masking
Azure SQL database, Azure SQL Managed Instance, and Azure synapse Analytics support Dynamic knowledge Masking. Dynamic data Masking ensures restricted data exposure to non-privileged users, such they can’t see the info that’s being cloaked. It additionally helps you in preventing unauthorized access to sensitive data that has minimal impact on the applying layer. Dynamic knowledge Masking could be a policy-based security feature. it’ll hide the sensitive data in a very result set of a question that runs over selected database fields.
Also Check: Our blog post on Azure Databricks.
> Azure Stream Analytics
Azure Stream Analytics is a time period analytics and complicated event-processing engine that’s designed to investigate and method high volumes of quick streaming data from multiple sources at the same time. Patterns and relationships are often known in data extracted from a variety of input sources together with devices, sensors, clickstreams, social media feeds, and applications. These patterns are often wont to trigger actions and initiate workflows like making alerts, feeding data to a coverage tool, or storing remodeled data for later use. Also, Stream Analytics is out there on Azure IoT Edge runtime, enabling the method of data on IoT devices.
Source: Microsoft
An Azure Stream analytics job consists of an input, query, and output. Stream Analytics ingests knowledge from Azure Event Hubs (including Azure Event Hubs from Apache Kafka), Azure IoT Hub, or Azure Blob Storage. The query, that is predicated on SQL command language, is often wont to simply filter, sort, aggregate, and be a part of streaming knowledge over an amount of your time. you’ll be able to additionally extend this SQL language with JavaScript and C# user-defined functions (UDFs). you’ll be able to simply regulate the event ordering choices and period of your time windows once acting aggregation operations through simple language constructs and/or configurations.
Each job has one or many outputs for the remodeled data, and you’ll be able to manage what happens in response to the data you’ve got analyzed. For example, you can:
- Send data to services such as Azure Functions, Service Bus Topics, or Queues to trigger communications or custom workflows downstream.
- Send data to a Power BI dashboard for real-time dashboarding.
- Store data in other Azure storage services (for example, Azure Data Lake, Azure Synapse Analytics, etc.) to train a machine learning model based on historical data or perform batch analytics.
Also Check: Our blog post on Data Factory Interview Questions.
>Lab: Securing The Azure Synapse Analytics Workspace and Managed Services
In this lab, you will learn how to manage this, assign Azure Key Vault into linked services and apply it to both Azure Synapse Analytics pipeline runs and dedicated SQL pools.
Q1. What are Private Links?
Ans: Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage and SQL Database) and Azure-hosted customer-owned/partner services over a Private Endpoint in your virtual network. Traffic between your virtual network and the service traverses over the Microsoft backbone network, eliminating exposure from the public Internet. You can also create your own Private Link Service in your virtual network and deliver it privately to your customers.
Q2. Will the user get an error if they use select * from the table when column security is implemented?
Ans: Yes. If column-level security is implemented it will give an error like ‘Select permission was denied on the column ‘SSN’ because grant permission was not given to a particular caller.
Q3. What is the use of the Revert statement?
Ans: The REVERT statement is then wont to reset the execution context to the previous caller. The REVERT statement is dead multiple times, moving up the stack till the execution context is ready to the initial caller.
Q4. is this all available in the current SQL server too or specific to Synapse?
Ans: Yes, all three, that is column-level security, row-level security, and data masking are available on the SQL servers as well.
Q5. Can we grant an unmask for a specific column? I mean keep masking for one column and allow unmask for another column for a user?
Ans: Yes you can grant an unmask for a specific column as well. For more information click here
Q6. Is the row-level security defined using a filtering predicate applicable only for SELECT? not for insert/update?
Ans: Yes, Filter predicates are applied whereas reading knowledge from the bottom table. They affect all get operations: SELECT, DELETE and UPDATE. But insertion will take place insertion scenario won’t affect because if you insert any record and you are having filter predicates and filter predicates are designed in such a way that the inserted record is already getting filtered, then you will not have it back in the result set. For more information click here
Q7. Is Event Hub a ‘Pull’ type of messaging queue system?
Ans: No, it is not a pull-type of messaging queue system, it is mostly a publisher-subscriber sort of a system where you have a publisher who is actually publishing events onto the EventHub so it will be a Push type messaging queue system.
Q8. What is an Edge hosting environment?
Ans: Edge computing could be a distributed, open IT design that options suburbanized process power, enabling mobile computing and web of Things (IoT) technologies. In edge computing, data is processed by the device itself or by a local pc or server, instead of being transmitted to a data center.
Example: These edge devices can include many different things, such as an IoT sensor, an employee’s notebook computer, their latest smartphone, the security camera, or even the internet-connected microwave oven in the office break room.
Q9. How will you integrate the synapse spark pool in the pipeline?
Ans: For integrating synapse spark pool in the pipeline we need Synapse Integrate Pipelines to replace Azure Data Factory. While ADF is upheld by Databricks engine for a portion of its usefulness, Azure Integrate Pipeline runs a similar Apache Spark motor supporting Synapse Spark pools in the engine. Theoretically, they do the same thing, however, Integrate Pipelines has some ways to deal with certain undertakings that you will either not find in ADF or is executed in an unexpected way.
Q10. Why must we provide cluster details again on the linked service if we already have configured it in the DataBricks environment?
Ans: We must provide cluster details on linked service because An Azure Databricks cluster is a set of computation resources and configurations on which you run data engineerings and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.
Q11. Can we synch SQL and spark tables?
Ans: Yes, we can. Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools. When a table is partitioned in Spark, files in storage are organized by folders.
Q12. What is identity-based authorization?
Ans: Identity-based authorization is a process that provides assurance of an entity’s identity by means of an authentication mechanism that verifies the identity of the entity.
Q13. Can we call Databricks notebook from ADF as well just like Synapse pipeline?
Ans: Yes, we can call Databricks from ADF.
By Create linked services
On the home page, switch to the Manage tab in the left panel. Select Connections at the bottom of the window, and then select + New. In the New Linked Service window, select Compute > Azure Databricks, and then select Continue. For Access Token, generate it from Azure Databricks workplace
Q14. How will shuffling occur in the larger dataset for the group?
Ans: Shuffling occurs in the largest dataset for the group because Spark shuffles are simply moving around data in the cluster. So every transformation that requires data that is not present locally in the partition would perform a shuffle. The same thing will happen with the group by key where all the same keys need to end up in the same partition, so the shuffle moves them there.
Q15. How to verify what SQL version we are using?
Ans: We are using ANSI SQL which is standard SQL. Versions don’t matter in SQL format.
Q16. Where is the spark table stored?
Ans: Spark table is stored in SQL. warehouse. dir., which defaults to the directory spark-warehouse in the current directory that the Spark application is started.
Check Out: Our blog post on Azure Data Engineer Interview Questions.
Feedback Received…
We always work on improving and being the best version of ourselves from the previous session hence constantly ask feedback from our attendees.
Here’s the feedback that we received from our trainees who had attended the session…
Quiz Time (Sample Exam Questions)
Ques: You are developing a solution that will stream to Azure Stream Analytics. The solution will have both streaming data and reference data.
Which input type should you use for the reference data?
A. Azure IoT Hub
B. Azure Blob storage
C. Azure Cosmos DB
D. Azure Event Hubs
Comment with your answer & we will tell you if you are correct or not!
References
- Microsoft Certified Azure Data Engineer Associate | DP 203 | Step By Step Activity Guides (Hands-On Labs)
- Azure Data Lake For Beginners: All you Need To Know
- Azure Synapse Analytics (Azure SQL Data Warehouse)
- Azure Data Engineer [DP-203] Q/A | Day 1 Live Session Review
- Azure Data Engineer [DP-203] Q/A | Day 2 Live Session Review
- Azure Data Engineer [DP-203] Q/A | Day 3 Live Session Review
- Azure Data Engineer [DP-203] Q/A | Day 4 Live Session Review
- Azure Data Engineer [DP-203] Q/A | Day 5 Live Session Review
- Azure Data Engineer [DP-203] Q/A | Day 6 Live Session Review
- Azure Data Engineer [DP-203] Q/A | Day 7 Live Session Review
Next Task For You
In our Azure Data Engineer training program, we will cover 28 Hands-On Labs. If you want to begin your journey towards becoming a Microsoft Certified: Azure Data Engineer Associate by checking out our FREE CLASS.
Leave a Reply