I just got my AI-100 Microsoft Azure AI Engineer Associate Certification. I took this exam as part of my Microsoft Cloud Solution Architect certification path and also because I am fascinated to learn and develop AI, machine learning, deep learning solution, and applications.
There are not many online courses available specifically for AI-100 exam and/or I didn’t come across if there are any hence I started documenting the research and resources and approach I used to prepare for AI-100 which lead me to pass the exam in the first attempt.
So it is now time that I share my complete preparation guide to hopefully help anyone who is currently preparing or thinking to prepare for AI-100 or for anyone who is curious bout AI on Microsoft Azure.
Here are some resources that you may find useful before actually diving into preparation -
Now that we are familiar with what the exam is about and prerequisite, let’s start understanding topic that will be covered as part of the exam
Full details on each of the above topics can be found here.
Tip 1: Always do hands-on exercises to put theory into practice using Microsoft labs or on Azure Portal.
Online learning resources that I’ve used -
Tip 2: Do not solely rely on practice test(s) and go through all relevant Microsoft Documentations (I’ve provided all relevant links below in this post.)
(I’d highly recommend reading every single documentation below.)
Examples & common scenarios - Azure Logic Apps | Microsoft Docs
Choosing a data storage technology - Azure Architecture Center | Microsoft Docs
Data partitioning guidance - Best practices for cloud applications | Microsoft Docs
Use the Azure portal to configure customer-managed keys - Azure Storage | Microsoft Docs
Ad hoc reporting queries across multiple databases - Azure SQL Database | Microsoft Docs
Choose a real-time and stream processing solution on Azure | Microsoft Docs
What are Apache Hadoop and MapReduce - Azure HDInsight | Microsoft Docs
Audit activity reports in the Azure Active Directory portal | Microsoft Docs
Batch processing - Azure Architecture Center | Microsoft Docs
What is Azure Event Hubs? - a Big Data ingestion service | Microsoft Docs
Create a function that integrates with Azure Logic Apps | Microsoft Docs
Choosing a real-time message ingestion technology - Azure Architecture Center | Microsoft Docs
What is Apache Hive and HiveQL - Azure HDInsight | Microsoft Docs
Hybrid Connections (Preview) | Azure Blog and Updates | Microsoft Azure
Azure VM sizes - GPU - Azure Virtual Machines | Microsoft Docs
Introduction to Azure Data Factory - Azure Data Factory | Microsoft Docs
Azure Stream Analytics output to Azure Cosmos DB | Microsoft Docs
What is Interactive Query in Azure HDInsight? | Microsoft Docs
Real time processing - Azure Architecture Center | Microsoft Docs
An introduction to Apache Kafka on HDInsight - Azure | Microsoft Docs
Validation - SQL Server Master Data Services | Microsoft Docs
Understand outputs from Azure Stream Analytics | Microsoft Docs
New capabilities to enable robust GDPR compliance | Azure Blog and Updates | Microsoft Azure
My notes on some of the Azure Services, Data Storages, Analytics and types of VMs that I found useful. (correct at the time of writing however I’ve provided links to Microsoft Documentation for each one of the Azure services below to refer to the latest)
Azure Event Hubs is a real-time streaming platform and event ingestion service, capable of receiving and processing millions of events per second. Event Hubs can process and store events, data, or telemetry produced by distributed software and devices. Data sent to an event hub can be transformed and stored by using any real-time analytics provider or batching/storage adapters.
Event Hubs represents the “front door” for an event pipeline, often called an event ingestor in solution architectures. An event ingestor is a component or service that sits between event publishers and event consumers to decouple the production of an event stream from the consumption of those events. Event Hubs provides a unified streaming platform with time retention buffer, decoupling event producers from event consumers.
Azure Event Grid connects data sources and event handlers. For example, use Event Grid to instantly trigger a serverless function to run image analysis each time a new photo is added to a blob storage container. Azure Event Grid Components
Azure Service Bus is a fully managed enterprise integration message broker. Service Bus can decouple applications and services. Service Bus offers a reliable and secure platform for asynchronous transfer of data and state. Data is transferred between different applications and services using messages. A message is in binary format and can contain JSON, XML, or just text. Some common messaging scenarios are:
Azure Queue Storage is a service for storing large numbers of messages. You access messages from anywhere in the world via authenticated calls using HTTP or HTTPS. A queue message can be up to 64 KB in size. A queue may contain millions of messages, up to the total capacity limit of a storage account. Queues are commonly used to create a backlog of work to process asynchronously.
Azure Notification Hubs provide an easy-to-use and scaled-out push engine that enables you to send notifications to any platform (iOS, Android, Windows, etc.) from any back-end (cloud or on-premises). Here are a few example scenarios:
Azure Stream Analytics is an event-processing engine that can analyze high volumes of data streaming from devices and other data sources. It also supports extracting information from data streams to identify patterns and relationships. These patterns can trigger other downstream actions.
Azure Data Factory is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
Additionally, you can publish your transformed data to data stores such as Azure SQL Data Warehouse for business intelligence (BI) applications to consume. Ultimately, through Azure Data Factory, raw data can be organized into meaningful data stores and data lakes for better business decisions.
Big data requires service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
Azure HDInsight is a managed, full-spectrum, open-source analytics service in the cloud for enterprises. You can use open-source frameworks such as Hadoop, Apache Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm, R, and more.
Azure HDInsight Cluster Types.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics service. For a big data pipeline, the data (raw or structured) is ingested into Azure through Azure Data Factory in batches, or streamed near real-time using Kafka, Event Hub, or IoT Hub. This data lands in a data lake for long term persisted storage, in Azure Blob Storage or Azure Data Lake Storage. As part of your analytics workflow, use Azure Databricks to read data from multiple data sources such as Azure Blob Storage, Azure Data Lake Storage, Azure Cosmos DB, or Azure SQL Data Warehouse and turn it into breakthrough insights using Spark.
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. Data Lake Storage Gen2 is the result of converging the capabilities of our two existing storage services, Azure Blob storage and Azure Data Lake Storage Gen1. Features from Azure Data Lake Storage Gen1, such as file system semantics, directory, and file level security and scale are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.
Azure SQL Data Warehouse or Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
Azure IoT Edge moves cloud analytics and custom business logic to devices so that your organization can focus on business insights instead of data management. Scale out your IoT solution by packaging your business logic into standard containers, then you can deploy those containers to any of your devices and monitor it all from the cloud.
Analytics drives business value in IoT solutions, but not all analytics needs to be in the cloud. If you want to respond to emergencies as quickly as possible, you can run anomaly detection workloads at the edge. If you want to reduce bandwidth costs and avoid transferring terabytes of raw data, you can clean and aggregate the data locally then only send the insights to the cloud for analysis.
Azure IoT Edge is made up of three components:
Azure IoT Hub is a managed service, hosted in the cloud, that acts as a central message hub for bi-directional communication between your IoT application and the devices it manages.
IoT Hub supports multiple messaging patterns such as device-to-cloud telemetry, file upload from devices, and request-reply methods to control your devices from the cloud. IoT Hub monitoring helps you maintain the health of your solution by tracking events such as device creation, device failures, and device connections.
IoT Hub’s capabilities help you build scalable, full-featured IoT solutions such as managing industrial equipment used in manufacturing, tracking valuable assets in healthcare, and monitoring office building usage.
Azure Virtual Machine Series (as of 20th April 2020)
A-Series - Entry-level economical VMs for dev/test
Bs-Series - Economical burstable VMs
D-Series - General purpose compute. D-series VMs feature fast CPUs and optimal CPU-to-memory configuration, making them suitable for most production workloads.
DC-series - Protect data in use
E Series - Optimised for in-memory hyper-threaded applications
F-Series - Compute optimised virtual machines Example use cases include batch processing, web servers, analytics and gaming.
G-Series - Memory and storage optimised virtual machines
Ls-Series - Storage optimised virtual machines
M-Series - Memory-optimised virtual machines
Mv2-Series - Largest-memory optimised virtual machines
N series - The N-series is a family of Azure Virtual Machines with GPU capabilities. GPUs are ideal for compute and graphics-intensive workloads, helping customers to fuel innovation through scenarios such as high-end remote visualisation, deep learning and predictive analytics.
Last but not least, don’t forget to spend time on Microsoft Learn and Azure Documentation to find additional information to prepare your certification.
Good Luck!
Thanks for reading! 🎊 Hope you found this useful. Don’t hesitate to share, or post a comment or send me a message on LinkedIn 🙏
Please leave the comment below if you want to know more or have any question. I will be happy to help.
This article was originally published on my personal blog website.
Quick Links
Legal Stuff