Trending March 2024 # From Blob Storage To Sql Database Using Azure Data Factory # Suggested April 2024 # Top 10 Popular

You are reading the article From Blob Storage To Sql Database Using Azure Data Factory updated in March 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 From Blob Storage To Sql Database Using Azure Data Factory

This article was published as a part of the Data Science Blogathon.


Azure data factory (ADF) is a cloud-based ETL (Extract, Transform, Load) tool and data integration service which allows you to create a data-driven workflow. The data-driven workflow in ADF orchestrates and automates the data movement and data transformation. In this article, I’ll show you how to create a blob storage, SQL database, data factory in Azure and then build a pipeline to copy data from Blob Storage to SQL Database using copy activity.

Create Azure Blob Storage

Azure storage account provides highly available, massively scalable and secure storage for storing a variety of data objects such as blobs, files, queues and tables in the cloud. Azure storage account contains content which is used to store blobs. This azure blob storage is used to store massive amounts of unstructured data such as text, images, binary data, log files, etc.

For creating azure blob storage, you first need to create an Azure account and sign in to it. After signing into the Azure account follow the below steps:

Step 8: Create a blob, launch excel, copy the following text and save it in a file named chúng tôi on your machine.

  FirstName     LastName     Department     Salary

  Rahul   Patel   Sales   90000

  Chaitanya   Shah   R&D   95000

  Ashna   Jain   HR   93000

  Mansi   Garg   Sales   81000

  Vipul   Gupta   HR   84000

Step 9: Upload the Emp.csv  file to the employee container.

Now, we have successfully uploaded data to blob storage. We will move forward to create Azure SQL database.

Create Azure SQL Database

1. Single database: It is the simplest deployment method. In this approach, a single database is deployed to the Azure VM and managed by the SQL Database Server. Each database is isolated from the other and has its own guaranteed amount of memory, storage, and compute resources.

2. Elastic pool: Elastic pool is a collection of single databases that share a set of resources. This deployment model is cost-efficient as you can create a new database, or move the existing single databases into a resource pool to maximize the resource usage.

3. Managed instance: Managed Instance is a fully managed database instance. It helps to easily migrate on-premise SQL databases.

Follow the below steps to create Azure SQL database:

Step 6: Paste the below SQL query in the query editor to create the table Employee.

CREATE TABLE dbo.Employee ( ID int IDENTITY(1,1) NOT NULL, FirstName varchar(50), LastName varchar(50), Department varchar(50), Salary int ) GO CREATE CLUSTERED INDEX IX_emp_ID ON dbo.Employee (ID);

Note: Ensure that Allow Azure services and resources to access this Server option are turned on in your SQL Server.

Now, we have successfully created Employee table inside the Azure SQL database. We will move forward to create Azure data factory.

Create a Data Factory in Azure

Azure data factory (ADF) is a cloud-based ETL (Extract, Transform, Load) tool and data integration service. ADF is a cost-efficient and scalable fully managed serverless cloud data integration tool.

Follow the below steps to create a data factory:

Create Pipeline to Copy Data

Step 2: In the Activities toolbox, search for Copy data activity and drag it to the pipeline designer surface. Rename it to CopyFromBlobToSQL.

Step 3: In Source tab, select +New to create the source dataset. Search for Azure Blob Storage.

After the linked service is created, it navigates back to the Set properties page. Now, select Emp.csv path in the File path.

Step 4: In Sink tab, select +New to create a sink dataset. Search for Azure SQL Database.

Step 7: Verify that CopyPipeline runs successfully by visiting the Monitor section in Azure Data Factory Studio.


The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 


You're reading From Blob Storage To Sql Database Using Azure Data Factory

Beginner’s Guide For Data Analysis Using Sql

This article was published as a part of the Data Science Blogathon

Overview Important Definitions

The most significant part of the database is its tables, which contain all of the data. Normally, data would be divided among several tables rather than being saved all in one location (so designing the data structure properly is very important). The majority of this script would deal with table manipulation. Aside from tables, there are a few more extremely helpful concepts/features that we will not discuss:

table creation

inserting/updating data in the database

functions – takes a value as an input and returns a value that has been manipulated (for example function that remove white spaces)

#Improts import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) import sqlite3 import matplotlib.pyplot as plt # Load data from database.sqlite database = 'database.sqlite' We’ll start by connecting to the database and seeing what tables we have.

The query’s basic structure is straightforward: After the SELECT, you specify what you wish to see; * denotes all possible columns. Following the FROM, you select the table. After the WHERE, you add the conditions for the data you wish to use from the table(s).

The section’s structure and order of content, with spaces, new lines, capital letters, and indentation to make the code easier to understand.

conn = sqlite3.connect(database) tables = pd.read_sql("""SELECT * FROM sqlite_master WHERE type='table';""", conn) tables List of countries

This is the most basic query. The only must part of a query is the SELECT and the FROM (assuming you want to pull from a table)

countries = pd.read_sql("""SELECT * FROM Country;""", conn) countries List of leagues and their country

When joining tables, you must do the following:

Choose the type of join you want to utilize. The following are the most common:

INNER JOIN – only maintain records in both tables that match the criterion (after the ON), and records from both tables that don’t match won’t appear in the output.

LEFT JOIN – all values from the first (left) table are combined with the matching rows from the right table. NULL values would be assigned to the columns from the right table that doesn’t have a corresponding value in the left table.

Specify the common value that will be used to link the tables together (the id of the country in that case).

Ensure that at least one of the values in the table is a key. It’s the chúng tôi in our case. Because there can be more than one league in the same nation, the id is not unique.

leagues = pd.read_sql("""SELECT * FROM League JOIN Country ON chúng tôi = League.country_id;""", conn) leagues List of teams

ORDER BY defines the sorting of the output – ascending or descending (DESC)

LIMIT limits the number of rows in the output – after the sorting

teams = pd.read_sql("""SELECT * FROM Team ORDER BY team_long_name LIMIT 10;""", conn) teams

We’ll just show the columns that interest us in this example, so instead of *, we’ll use the actual names.

The names of several of the cells are the same (, We’ll use AS to rename them.

This query, as you can see, includes a lot more joins. The reason for this is that the database is designed in a star structure, with one table (Match) containing all of the “performance” and metrics, but only keys and IDs, and other tables including all of the descriptive information (Country, League, Team)

It’s important to note that the Team is joined twice. This is a hard one because, despite the fact that we are using the same table name, we are bringing two separate copies (and rename them using AS). The reason for this is that we need to bring data for two separate values (home team API id and away team API id), and joining them to the same database would imply that they are equal.

It’s also worth noting that the Team tables are linked together using a left join. The reason for this is that I’ve decided to keep the matches in the output, even if one of the teams isn’t on the Team table.

ORDER comes before LIMIT and after WHERE and determines the output order.

detailed_matches = pd.read_sql("""SELECT chúng tôi chúng tôi AS country_name, chúng tôi AS league_name, season, stage, date, HT.team_long_name AS home_team, AT.team_long_name AS away_team, home_team_goal, away_team_goal FROM Match JOIN Country on chúng tôi = Match.country_id JOIN League on chúng tôi = Match.league_id LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id WHERE country_name = 'Spain' ORDER by date LIMIT 10;""", conn) detailed_matches Let’s do some basic analytics

Here we are starting to look at the data at a more aggregated level. Instead of looking at the raw data, we will start to group it to the different levels we want to examine. In this example, process the previous query, remove the match and date information, and look at it at the country league season level.

The functionality we will use for that is GROUP BY, which comes between the WHERE and ORDER

Once you chose what level you want to analyze, we can decide the select statement into two:

Dimensions are the values we’re describing, and they’re the same ones we’ll group by later.

Metrics must be grouped together using functions. sum(), count(), count(distinct), avg(), min(), and max() are some of the most common functions.

It’s critical to use the same dimensions in both the select and the GROUP BY functions. Otherwise, the output could be incorrect.

HAVING is another feature that can be used after grouping. This adds another layer of data filtering, this time using the table’s output after grouping. It’s frequently used to clean the output.

leages_by_season = pd.read_sql("""SELECT chúng tôi AS country_name, chúng tôi AS league_name, season, count(distinct stage) AS number_of_stages, count(distinct HT.team_long_name) AS number_of_teams, avg(home_team_goal) AS avg_home_team_scors, avg(away_team_goal) AS avg_away_team_goals, avg(home_team_goal-away_team_goal) AS avg_goal_dif, avg(home_team_goal+away_team_goal) AS avg_goals, sum(home_team_goal+away_team_goal) AS total_goals FROM Match JOIN Country on chúng tôi = Match.country_id JOIN League on chúng tôi = Match.league_id LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id WHERE country_name in ('Spain', 'Germany', 'France', 'Italy', 'England') GROUP BY,, season ORDER BY,, season DESC ;""", conn) leages_by_season df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique()) df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goals']) df.loc[:,'Spain'] = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goals']) df.loc[:,'France'] = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goals']) df.loc[:,'Italy'] = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goals']) df.loc[:,'England'] = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goals']) df.plot(figsize=(12,5),title='Average Goals per Game Over Time') df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique()) df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goal_dif']) df.loc[:,'Spain'] = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goal_dif']) df.loc[:,'France'] = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goal_dif']) df.loc[:,'Italy'] = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goal_dif']) df.loc[:,'England'] = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goal_dif']) df.plot(figsize=(12,5),title='Average Goals Difference Home vs Out') Query Run Order

Now that we are familiar with most of the functionalities being used in a query, it is very important to understand the order that code runs.

First, order of how we write it (reminder):









Define which tables will be used and how they will be connected (FROM + JOIN).

Only the rows that apply to the conditions should be kept (WHERE)

Sort the information by the required level (if need) (BY GROUP)

Select the data you wish to include in the new table. It can contain only raw data (if there is no grouping), or a combination of dimensions (from the grouping), as well as metrics. You’ve decided to show the following from the table.

Order the new table’s output (ORDER BY)

Add extra filtering conditions to the newly generated table (HAVING)

Limit the number of rows – this would reduce the number of rows, as well as the need for filtering (LIMIT)

Sub Queries and Functions

Use a subquery as a solution. The attributes database would need to be grouped to a different key-player level only (without season). Of course, we would need to decide first how we would want to combine all the attributes to a single row. use AVG, also one can decide on maximum, latest season and etc. Once both tables have the same keys, we can join them together (think of the subquery like any other table, only temporal), knowing that we won’t have duplicated rows after the join.

You can also see two examples of how to use functions here:

– A conditional function is an important tool for data manipulation. While the IF statement is widely used in other languages, SQLite does not support it, hence CASE + WHEN + ELSE is used instead. As you can see, the query would return varied results depending on the data input.

– ROUND – straightforward. Every SQL language comes with a lot of useful functions by default.

players_height = pd.read_sql("""SELECT CASE WHEN ROUND(height)<165 then 165 ELSE ROUND(height) END AS calc_height, COUNT(height) AS distribution, (avg(PA_Grouped.avg_overall_rating)) AS avg_overall_rating, (avg(PA_Grouped.avg_potential)) AS avg_potential, AVG(weight) AS avg_weight FROM PLAYER LEFT JOIN (SELECT Player_Attributes.player_api_id, avg(Player_Attributes.overall_rating) AS avg_overall_rating, avg(Player_Attributes.potential) AS avg_potential FROM Player_Attributes GROUP BY Player_Attributes.player_api_id) AS PA_Grouped ON PLAYER.player_api_id = PA_Grouped.player_api_id GROUP BY calc_height ORDER BY calc_height ;""", conn) players_height players_height.plot(figsize=(12,5)) EndNote About the Author

Connect with me on Github

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Azure Storage Explorer Download Process

Storage Explorer simplifies administering Azure blobs, queues, tables, and files. It’s a program that facilitates access, control over, and manipulation of information saved in an Azure storage account. Working with data stored in Azure Storage is a breeze thanks to this Azure Storage Explorer. Storage Explorer gives you access to your Azure cloud storage account without cost. It’s a handy utility that facilitates quick access to and control over your Azure Storage documents. These days, cloud storage in the form of Azure Storage is essential. The ability to safeguard your data or applications is a major factor in the success of Azure storage.

Azure Storage Explorer

Microsoft Azure Storage Explorer is great for browsing and managing your cloud storage account, including adding new containers and downloading or uploading files. The Azure Storage Explorer is a graphical user interface (GUI) tool with many capabilities to facilitate software development. While using Azure, it’s simple to link your storage accounts to any gadget. Managing numerous storage accounts is a breeze with the help of Microsoft’s Azure Storage Explorer. Microsoft Azure storage is growing in popularity due to its adaptability, security, and scalability. It’s a simple cloud storage management service. The Azure Storage Explorer is a useful tool for quickly gaining access to data and modifying it as needed. It’s a free program that works with many different OSes, including Windows, Linux, Mac OS, and more.

Advantages of Azure Storage Explorer

Even though there are many cloud storage solutions out there, most companies favour Microsoft’s Azure platform. Storage on Microsoft’s Azure cloud is in high demand because of the service’s scalability, consistency, and massive capacity. To get started with Azure Storage Explorer, simply download it and set it up, and you can get the following −

Files and folders can be viewed, uploaded, deleted, and downloaded using this.

It’s also a convenient way to set up and control your storage space, containers, and blobs.

Very safe, as it encrypts all data in Azure StorageStorage.

Use this robust utility to take charge of your Azure Storage space.

Cloud management is made simple with Azure’s attention to hardware upkeep, patching, and emergency fixes.

Get instant access to, and control over everything stored in your cloud service of choice.

The files are simply accessible in your cloud storage account for both uploading and downloading.

A cloud storage account allows you to upload and download files from any computer with internet access.

When it comes to managing your Azure storage account, you won’t find a better tool than Storage Explorer.

How to install Microsoft Azure Storage Explorer?

Azure Storage Explorer has a quick and simple installation process −

Visit the Main Azure Storage Explorer Page. First, choose your operating system from the drop-down menu.

Then there will be a pop-up agreement box; select “I accept the agreement” and then Install” to proceed.

Choose the box and hit Finish to launch the freshly installed program.

Steps for accessing Azure Storage Explorer

Follow the steps below to your newly installed copy of Azure Storage Explorer to the web-based Azure management console. Working with this is a lot less strenuous and more pleasant.

To do this, use the button labelled “Connect to Azure Storage,” then use the Choose Resource panel to choose “Subscription” in the Azure Storage dialogue that displays.

Now choose the Azure environment you want to sign into.

You can connect to either the global Azure cloud or a regional cloud, or even an Azure Stack instance in your region. Follow this by selecting the Next button.

After you’ve finished using the page, open it again and go to Azure Storage Explorer.


The Azure Storage Explorer is perfect for this purpose. It provides an adaptable means of storing vast quantities of data on the cloud, and it may be used for a wide range of purposes. Together with its amazing capacity for storing data, Azure Storage also serves as a dependable means of sending messages, a cloud-based filing system, a NoSQL database, and a vast ascendable object store for knowledge objects. When it comes to managing your Azure Storage, you can’t do better than with Azure Storage Explorer. The Azure Storage Explorer is the tool to use if you want to manage your Azure StorageStorage with minimal effort.

Advanced Sql For Data Science

This article was published as a part of the Data Science Blogathon.

The article focuses on techniques to deal with a wide range of data types; mastering these can be useful for the user. The article doesn’t focus on the basics of SQL, including standard syntax, functions, and applications but aims to expand the fundamental knowledge of SQL. The writing also covers the concept of subqueries.

Let’s get started!

S s

SQL provides built-in functions for performing operations, categorized into two types. The types are:

1. Aggregate Functions

These functions are used to perform operations on values of the column, and a single value is returned. The SQL provides the following aggregate functions:

AVG(): It returns the calculated average value from values of the selected numeric column

Syntax of AVG() function:

Select AVG(column_name) From table_name;

Example of AVG() function:

Select AVG(Salary) AS AverageSalary From Employees;

COUNT(): This function is used to count the number of rows returned in a Select Statement.

Syntax of COUNT() function:

Select COUNT(column_name) From table_name;

Example of COUNT() function:

Select COUNT(*) AS NumEmployees From Employees;

FIRST(): The function returns the first value of the selected column.

Syntax of FIRST() function:

Select FIRST(column_name) From table_name;

Example of FIRST() function:

Select FIRST(Employee_ID) AS FirstEmployee From Employees;

LAST(): The function returns the last value of the selected column.

Syntax of LAST() function:

Select LAST(column_name) From table_name;

Example of LAST() function:

Select LAST(Employee_ID) AS LastEmployee From Employees;

MAX(): The function returns the maximum value of the selected column.

Syntax of MAX() function:

Select MAX(column_name) From table_name;

Example of MAX() function:

Select MAX(Salary) AS MaxSalary From Employees;

MIN(): The function returns the minimum value of the selected column.

Syntax of MIN() function:

Select MIN(column_name) From table_name;

Example of MIN() function:

Select MIN(Salary) AS MinSalary From Employees;

SUM(): The function returns the sum of the values of the selected column.

Syntax of SUM() function:

Select SUM(column_name) From table_name;

Example of SUM() function:

Select SUM(Salary) AS TotalSalary From Employees; 2. Scalar Functions

The scalar functions are based on user input and return a single value. Let’s understand through scalar functions:

UCASE(): The function converts the value of a field to uppercase.

Syntax of UCASE() function:

Select UCASE(column_name) From table_name;

Example of UCASE() function:

Select UCASE(Ename) From Employees;

LCASE(): The function converts the value of a field to lowercase.

Syntax of LCASE() function:

Select LCASE(column_name) From table_name;

Example of LCASE() function:

Select LCASE(Ename) From Employees;

MID(): The function extracts texts from the text field.

Syntax of MID() function:

Select MID(column_name,start,length) FROM table_name;

Specifying the length is not compulsory here and the start represents the start position.

Example of MID() function:

Select MID(Ename, 1, 4) From Employees;

LEN(): The function returns the length of the specified value.

Syntax of LEN() function:

Select LENGTH(column_name) From table_name;

Example of LEN() function:

Select LENGTH(Ename) From Employees;

ROUND(): The function returns the round numeric value to the specified decimal places. This arithmetic operation is performed considering IEEE 754 standard.

Syntax of ROUND() function:

Select ROUND(column_name, decimals) From table_name;

decimals in the syntax specify the number of decimals to be fetched.

Example of ROUND() function:

Select ROUND(Salary, 0) From Employees;

NOW(): The function returns the current date and time of the system.

Syntax of NOW() function:

Select NOW() From table_name;

Example of NOW() function:

Select Ename, NOW() From Employees;

FORMAT(): The function formats how a field is to be presented.

Syntax of FORMAT() function:

Select FORMAT(column_name, format) From table_name;

Example of FORMAT() function:

Select Ename, FORMAT(NOW(), 'YYYY-MM-DD') AS Date From Employees;

CONCAT(): The function joins the values stored in different columns, or it can be used to join two strings simply.

Syntax of CONCAT() function:

Select CONCAT(string_1, string_2,...., string_n) AS Alias_Name; Select CONCAT(column_name1, column_name2,...., column_name_n) From table_name;

Example of CONCAT() function:

Select CONCAT('Hello', ' Everyone') As Gesture; Select CONCAT(FirstName, LastName) AS EmployeeName From Employee;

REPLACE(): The function replaces the occurrence of a specified value with the new one.

Syntax of REPLACE() function:

Select REPLACE(Original_Value, Value_to_Replace, New_Value) AS Alias_Name; Select REPLACE(Column_Name, Character/string_to_replace, new_String/character ) AS Alias_Name FROM Table_Name;

Example of REPLACE() function:

Select REPLACE('APPSE', 'S', 'L'); Select LastName, REPLACE(LastName, 'r', 'a') AS Replace_r_a From Employees;

POSITION(): The function returns the position of the first occurrence of a specified substring in a string.

Syntax of POSITION() function:

Select POSITION(substring IN string/column_name);

Example of POSITION() function:

Select POSITION("A" IN "APPLE") As Position; Select POSITION("a" in FirstName) From employees; SQL Joins

As the name suggests, JOIN means combining something, which refers to combining two or more tables. The JOIN combines the data of two or more tables in a database. The joins are used if we want to access the data of multiple tables simultaneously. The joining of tables is done based on a common field between them.

According to ANSI standards, there are five types of JOIN:






Firstly, let’s look at how SQL JOIN works:

Suppose we have two tables:

1. Parent Table

ID Name Age Address Salary

1 Ram 26 Mumbai 20000

2 Jack 28 Delhi 18000

3 John 25 Pune 25000

4 Amy 32 Delhi 22000

2. Student Table

Student_Id Class Class_ID Grades

101 9 1 A

102 8 3 B

103 10 4 A

So, if we use the following JOIN statement:

Select ID, Name, Student_ID, Grades From Parent p, Student s Where chúng tôi = s.Class_ID;

The result would be:

ID Name Student_ID Grades

3 John 102 B

1 Ram 101 A

4 Amy 103 A

1 Ram 101 A

Now, let’s look at different types of joins:


In the outer JOIN of SQL, the content of the specified tables is integrated whether their data matches or not.

Outer join is done in two ways:

Left outer join, or a left join, returns all the rows from the left table, combining them with the matching rows of the right table. If there is no matching data, it returns NULL values.

Right outer join, or a right join, returns all the rows from the right table, combining them with the matching rows of the left table. If there is no matching data, it returns NULL values.

Syntax of LEFT JOIN:

Select table1.column1, table2.column2,.... From table1 LEFT JOIN table2 ON table1.coulmn_field = table2.column_field;

Example of LEFT JOIN:

Select ID, Name, Student_Id, Grades From Parent LEFT JOIN Student  ON chúng tôi = Student.Class_ID;

Syntax of RIGHT JOIN:

Select table1.column1, table2.column2,.... From table1 RIGHT JOIN table2 ON table1.coulmn_field = table2.column_field;

Example of RIGHT JOIN:

Select ID, Name, Student_Id, Grades From Parent RIGHT JOIN Student ON chúng tôi = Student.Class_ID; 2. SQL FULL JOIN

The full join or full outer join of SQL returns the combination of both right and left outer join, and the resulting table has all the records from both tables. If no matches are found, then the NULL value is returned.


Select *  From table1 FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;


Select * From Parent FULL OUTER JOIN Student ON chúng tôi = Student.Class_ID; 3. SQL CROSS JOIN

The SQL cross join used to combine the data gives the cartesian product of the sets of rows from the joined table. When each row of the first table is combined with every single row of the second table, it is called Cartesian join or Cross join.

The resulting number of rows equals the product of the number of rows in the first table to the number of rows in the second table.

Syntax of CROSS JOIN:

Select table1.column1, table2.column2,.... From table1 CROSS JOIN table2 ON table1.coulmn_field = table2.column_field;

Example of CROSS JOIN:

Select *  From Parent CROSS JOIN Student ON chúng tôi = Student.Class_ID; Grouping Data

In SQL, the GROUP BY statement is used for organizing data into groups based on similarity in the data. Further data is organized with the help of the equivalent functions. In simple words, if different rows of a specified column have the same values, they are placed together in a group. The following criteria are taken into consideration while using Group By statement:

In SQL, the Select statement is used with the GROUP BY clause.

Where clause is used before GROUP BY clause.

ORDER BY clause is placed after this clause.

Syntax of GROUP BY clause:

Select column1, function_name(column2) From table_name Where condition GROUP BY column1, column2 ORDER BY column1, column2;

Example of GROUP BY clause:

Select Name, SUM(Salary), Age  From Employee GROUP BY Age;

One important point to remember here is that the Where clause is used for deciding purposes. It is used to place conditions on columns to decide the part of the result table. Here, we cannot use aggregate functions like COUNT(), SUM(), etc. with the Where clause. So, we use the Having clause.

Syntax of Having Clause:

Select column1, function_name(column2) From table_name Where condition GROUP BY column1, column2 Having condition ORDER BY column1, column2;

Example of Having Clause:

Select Name, SUM(Salary), ID  From Employee GROUP BY Name ORDER BY ID;

The Select statement can use constants, aggregate functions, expressions, and column names in the GROUP BY clause.


The CASE statement operates like if-then-else logical queries. When the specified condition is true, the statement returns the specified value. If the condition turns out to be false, it executes the ELSE part. When there is no specified ELSE condition, it returns a NULL value.

The CASE statement is used in Select, Delete, and Insert statements with Where, ORDER BY, and GROUP BY clauses.

Syntax of CASE statement:

CASE WHEN condition_1 THEN statement_1 WHEN condition_2 THEN statement_2 . . . WHEN condition_N THEN statement_N ELSE result END;

The above query will go through each condition one by one. If the expression matches the query, it will print the result accordingly and skip all the condition statements afterward. If no condition matches the term, the control would go to the ELSE part and return its result. Here, the ELSE part is optional; in that case, it returns a NULL value if no condition satisfies the expression.

Example of CASE Statement:

Select Student_ID, Name, Subject, Marks, CASE ELSE 'FAIL' END AS Student_Result From Student; SQL View

To hide the complexity of the data and prevent unnecessary access to the database, SQL introduces the concept of VIEW. It allows the user to pick a particular column rather than the complete table. The view or virtual table as it is considered depends on the result-set of the predefined SQL query.

In the views, the rows don’t have any physical existence in the database, and just like SQL tables, views store data in rows and columns using the Where clause.

Syntax to create View from Single Table

Create VIEW View_name AS Select column_name1, column_name2,....., column_nameN From table_name Where condition;

Syntax to create View from Multiple Tables

Create VIEW View_name AS Select table_name1.column_name1, table_name2.column_name1,..... From table_name1, table_name2,....., table_nameN Where condition;

We can also modify the current view and insert new data, but it can only be done if the following conditions are followed:

Views based on one table can only be updated, not the one formed from multiple tables.

The view fields should not contain any NULL values.

The view doesn’t contain any subquery and DISTINCT keyword in its definition.

The view cannot be modified if the Select statement used to create a view contains JOIN, HAVING, or GROUP BY clause.

The view cannot be updated if any field contains an aggregate function.

Syntax to Update a View:

CREATE OR REPLACE VIEW View_name AS Select column_name1, column_name2,...., column_nameN From table_name Where condition;

To delete the current view from the database DROP statement is used:


The UNION operator combines the result of two or more Select queries and results in a single output.

Syntax of UNION operator:

Select column_name1, column_name2,...., column_nameN From table_name1 UNION Select column_name1, column_name2,...., column_nameN From table_name2 UNION Select column_name1, column_name2,...., column_nameN From table_name3;

The UNION ALL operator has the same functionality as the UNION operator, the only difference is that UNION ALL operator shows the common rows in the result, whereas the UNION operator does not.

Syntax of UNION ALL operator:

Select column_name1, column_name2,...., column_nameN From table_name1 UNION ALL Select column_name1, column_name2,...., column_nameN From table_name2;

The EXCEPT operator is used to filter out data. The statement combines the two select statements and returns the records that are present in the first Select query and not in the second Select query. It works in the same way as the minus operator does in mathematics.

Syntax of EXCEPT operator:

Select column_name1, column_name2,...., column_nameN From table_name1 EXCEPT Select column_name1, column_name2,...., column_nameN From table_name2; SQL Subqueries

Syntax to write an inner query:

Select column_name1, column_name2, ...., column_nameN From table_name Where operator (Select column_name1, ..., column_nameN                                From table_name);

The subquery is executed before the outer or main query, and the main query uses its result.

There are some conditions required to be followed for writing a subquery:

Subqueries are enclosed within parentheses.

A subquery has only one column in the SELECT clause unless there are multiple columns in the main query for comparing the selected columns of the subquery.

The subquery cannot contain an ORDER BY command, but the GROUP BY clause can be used to perform the same function, the main query can use an ORDER BY clause.

Subqueries that return multiple rows can only be used with multiple value operators like IN operator.

The BETWEEN operator cannot be used with a subquery but within it.

Example of a subquery:

Select * From Employees Where ID IN(Select ID From Employees_Another

The above statement selects the data of employees in the Employees table whose given salary is more than 4500 in the Employees_Another table.

Update Employees SET Salary = Salary * 0.25 Where Age IN(Select Age From Employees_Another

The above query updates the employees’ salary value in the Employees table if their age is greater than or equal to 27 in the Employees_Another table.

In a nutshell, we learned about:

How can aggregated functions be used to get the required data, and how can we compute various functions

Creating and working on virtual tables

Some additional clauses like CASE, UNION, UNION ALL, and EXCEPT

The concept and working of SQL subqueries

SQL is all about constant practicing. Keep on practicing, and you will master it in no time!

Thank you.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


Learn Microservices Using Spring Boot With Mongo Database

In this course, you will learn how to build REST APIs or Microservices using the latest version of Spring Boot), and Mongo database.

Spring Boot is basically an extension of the Spring framework which eliminated the boilerplate configurations required for setting up a Spring application.

Spring Boot is an opinionated framework that helps developers build Spring-based applications quickly and easily. The main goal of Spring Boot is to quickly create Spring-based applications without requiring developers to write the same boilerplate configuration again and again.

Microservices are an architectural style to building the system/application as

A set of smaller business components that are autonomous, self-contained and

Loosely coupled. Micro means small and service is webservice. After completing this student will become more comfortable to develop microservice based projects.

Using this course students can learn about API Gateway’s, Eureka servers.

Using this course students able to perform CURD operations.

Using this course students will understand about various integrations.

Using this course students can manage to download open source software’s  and they will be able to setup environment setup for microservices.

Evolution of microservice:-

Many organizations such as Netflix, Amazon, and eBay successfully used the divide and conquer technique to functionality partition their monolithic applications into smaller atomic units, each performing a single function. These organization solved a number of issue

they were experiencing with their monolithic applications. Following the success of these organization, many other organizations started adopting

This as a common pattern to refactor their monolithic applications.

Purpose of microservice:

Applications to achieve a high degree of agility, speed of delivery, and scalability

Principles of Microservice:

1) Single Responsibility per Service

2) Autonomous

Micro services are self-containerd, independently deployable and autonomous service that takes full responsibility and its execution.

In the microservices approach , each microservice will be built as a fat jar, embedding all dependencies and run as a standalone java process

Course Contents ?

1. Introduction to Microservices

2. Microservices Architecture

3. Environment setup-Java installation

4. Environment setup-Java is not recognized error fix

5. Environment setup-Spring tool suit

6. Environment setup-POST Man

7. Environment setup- Mongo DB

8. Environment setup- MySQL

9. REST based CRUD operations.

10. How to add student details into Mongo DB using micro services.

11. How to delete student details from Mongo DB using micro services

12. How to update student details into Mongo Db using micro services

13. Retrieve student details from Mongo DB using using micro services.

14. Spring cloud API gateway with Eureka Server architecture and Integrations

15. Micro service integration with Eureka server

16. Micro services Integration with API gateway

17. Spring cloud openfeign integration.

18.OpenFeign communication with Eureka server

19. OpenFeign communication with server for Insert

20. OpenFeign communication with server for Retrieve

21. OpenFeign communication with server for Delete

22. OpenFeign communication with server for Update

23.About Swagger

24. Swagger document generation

25.Introduction to Spring cloud load balance

26 Spring cloud load balance at client side

What you’ll learn

Learn how to build Microservices or REST API using Spring Boot

Learn how to integrate with Eureka server

Learn how to use CRUD operations.

Learn how to connect Microservice to MongoDB database

Learn how to use Spring cloud API gateway

Learn how to use Swagger

Learn how to use Spring cloud load balancing

Are there any course requirements or prerequisites?

Java basics

Spring basics

Who this course is for:

This course is for Spring boot beginners who want to getting started with building Microservices and RestAPI’s using Spring Boot and Mongo database


Students can learn about microservices and its features and operations. Students also can experience hands on examples.

After completing the course students can proceed to create micro services using CURD operations, They will also know about other functionalities in Microservice

Students able to create their own projects and they can apply gained knowledge in their personal/ official project works.

Students can develop their own project and fulfilling their development needs

Before learning Microservices it is better to have minimal knowledge in java.


Prior java experience is needed and little bit JSON / Mongo DB knowledge needed.

Need to have knowledge of Spring Boot.

Visualizing Netflix Data Using Python!

Image Source

We can say that data visualization is basically a graphical representation of data and information. It is mainly used for data cleaning, exploratory data analysis, and proper effective communication with business stakeholders. Right now the demand for data scientists is on the rise. Day by day we are shifting towards a data-driven world. It is highly beneficial to be able to make decisions from data and use the skill of visualization to tell stories about what, when, where, and how data might lead us to a fruitful outcome.

Data visualization is going to change the way our analysts work with data. They’re going to be expected to respond to issues more rapidly. And they’ll need to be able to dig for more insights – look at data differently, more imaginatively. Data visualization will promote that creative data exploration. -Simon Samuel

Table of contents

Why do we need Data Visualization?

Types of Data Visualization.

Brief about tools we will be using

Data pre-processing

Data Visualization

Keep in mind


Why do we need good Data Visualizations?

Our eyes are drawn to colours and patterns. We can quickly recognize blue from yellow, circle from a square. Data visualization is a form of visual art that not only grabs our interests but also keeps our eyes on the message. We can literally narrate our entire numerical data to the stakeholders in a form of captivating graphs with the help of data visualization.

Right now we are living in “an age of Big data” trillions of rows of data are being generated every day. Data visualization helps us in curating data into a form that is easily understandable and also helps in highlighting a specific portion. Plain graphs are too boring for anyone to notice and even fail to keep the reader engaged. Hence, today we will be seeing how to create some mind-blowing visualization using matplotlib and seaborn.

Types of Data visualization

In this article we will be creating two types of Data visualization:

1. Bar Plot( Horizontal ): 

It is a graph that represents a specific category of data with rectangular bars with length and height proportional to the values they represent.

Syntax: matplotlib.pyplot.barh(y,width,height) 


Y: Co-ordinates of the Y bar.

Width: Width of the bar.

Height: Height of the bar.

2. Timeline (Customized Horizontal line):

Syntax: axhline(y=0, xmin=0, xmax=1, c, zorder )


Y: Co-ordinates of Y in a horizontal line with a default value of 0.

xmin: This parameter should be between 0 and 1. 0 means the extreme left of the plot and 1 means the extreme right of the plot with 0 being the default value.

xmax: This parameter should be between 0 and 1. 0 means the extreme left of the plot and 1 means the extreme right of the plot with 1 being the default value.

Before we get started, I want you to know that we won’t be using any python library other than Matplotlib, seaborn and we will be using Netflix’s dataset for the explanation.

By the end of this article, you will be able to create some awesome data visualization using matplotlib and seaborn. So without further ado, let’s get started.

Brief about Data Visualization libraries we will be using

*Feel free to skip this part if you are already aware of these libraries…

Matplotlib: It is a plotting library for the Python programming language and it has numerical mathematics extension Numpy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, QT, WxPython, or GTX. (Source)

Seaborn: It is an amazing visualization library for statistical graphics plotting in python. It provides beautiful default styles and color palettes to make statistical plots more attractive. It is built on top of matplotlib library and is also closely integrated into the data structures from pandas. The aim of seaborn is to make visualization the central part of exploring and understanding data. It also provides dataset-oriented APIs so that we can switch between different visual representations for the same variables for a better understanding of the dataset. (Source)

Numpy: It is a library for python that supports multi-dimensional arrays and matrices with many high-level mathematical operations to perform on these arrays and matrices.

Pandas: It is a powerful, flexible, and easy-to-use data manipulation tool for the python programming language.

Best time to grab a Coffee !!

  Data pre-processing  Importing all the necessary libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns Pre-processing the data

Calculating the missing data  for i in df.columns: null_rate = df[i].isna().sum()/len(df) * 100 print("{} null rate: {}%".format(i,round(null_rate,2))) director missing percentage: 30.68% cast missing percentage: 9.22% country missing percentage: 6.51% date_added missing percentage: 0.13% rating missing percentage: 0.09% Dealing with the missing data 

Here we will be replacing the missing country with the most country (mode), cast, and director with no data.

df['country'] = df['country'].fillna(df['country'].mode()[0]) df['cast'].replace(np.nan,'No data',inplace=True) df['director'].replace(np.nan,'No data',inplace=True) df.dropna(inplace=True) df.drop_duplicates(inplace=True)

Now we are done with missing values, but the dates are still not quite right…

df['date_added'] = pd.to_datetime(df['date_added']) df['month_added'] = df['date_added'].dt.month df['month_name_added'] = df['date_added'].dt.month_name() df['year_added'] = df['date_added'].dt.year Okay, let’s visualize now!!!

Netflix’s Brand Palette

Always use a color palette, it is a great way in achieving good integrity and helps us to give a professional look keeping all the readers engaged.

sns.palplot(['#221f1f', '#b20710', '#e50914','#f5f5f1']) plt.title("Netflix brand palette",loc='left',fontfamily='serif',fontsize=15,y=1.2)

We will use Netflix brand colors wherever we can…

Let’s visualize the ratio between Netflix’s TV shows and Movies

Awesome !! Isn’t it?


1. Calculating the ratio

x = df.groupby(['type'])['type'].count() y = len(df) r=((x/y)).round(2) mf_ratio = pd.DataFrame(r).T

Drawing the figure:

fig, ax = plt.subplots(1,1,figsize=(6.5,2.5)) ax.barh(mf_ratio.index, mf_ratio['Movie'], color='#b20710', alpha=0.9, label='Male') ax.barh(mf_ratio.index, mf_ratio['TV Show'], left=mf_ratio['Movie'], color='#221f1f', alpha=0.9, label='Female') ax.set_xlim(0, 1) ax.set_xticks([]) ax.set_yticks([])

2. Annotating the figure:

fig, ax = plt.subplots(1,1,figsize=(6.5,2.5)) ax.barh(mf_ratio.index, mf_ratio['Movie'], color='#b20710', alpha=0.9, label='Male') ax.barh(mf_ratio.index, mf_ratio['TV Show'], left=mf_ratio['Movie'], color='#221f1f', alpha=0.9, label='Female') ax.set_xlim(0, 1) ax.set_xticks([]) ax.set_yticks([]) for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['Movie'][i]*100)}%", xy=(mf_ratio['Movie'][i]/2, i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("Movie", xy=(mf_ratio['Movie'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['TV Show'][i]*100)}%", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2,i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("TV Shows", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white')

3. Adding text and removing legend & spines:

fig, ax = plt.subplots(1,1,figsize=(6.5,2.5)) ax.barh(mf_ratio.index, mf_ratio['Movie'], color='#b20710', alpha=0.9, label='Male') ax.barh(mf_ratio.index, mf_ratio['TV Show'], left=mf_ratio['Movie'], color='#221f1f', alpha=0.9, label='Female') ax.set_xlim(0, 1) ax.set_xticks([]) ax.set_yticks([]) for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['Movie'][i]*100)}%", xy=(mf_ratio['Movie'][i]/2, i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("Movie", xy=(mf_ratio['Movie'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['TV Show'][i]*100)}%", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2,i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("TV Shows", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') fig.text(0.125,1.0,'Movie & TV Show distribution',fontfamily='serif',fontsize=15,fontweight='bold') fig.text(0.125,0.90,'We see vastly more movies than TV shows on Netflix.',fontfamily='serif',fontsize=12,fontweight='light') for s in ['top','left','right','bottom']: ax.spines[s].set_visible(False) ax.legend().set_visible(False)


Now let’s visualize Netflix’s Timeline


1. Initializing the timeline list:

from datetime import datetime tl_dates = [ "1997nFounded", "1998nMail Services", "2003nGoes Public", "2007nStreming service", "2024nGoes Global", "2024nNetflix & Chill" ] tl_x = [1,2,4,5.3,8,9]

2. Drawing the figure :

fig,ax = plt.subplots(figsize=(15,4),constrained_layout=True) ax.set_ylim(-2,1.5) ax.set_xlim(0,10) ax.axhline(0, xmin=0.1, xmax=0.9,c="#000000",zorder=1) ax.scatter(tl_x,np.zeros(len(tl_x)),s=120,c="#4a4a4a",zorder=2) ax.scatter(tl_x, np.zeros(len(tl_x)), s=30, c='#fafafa', zorder=3) for x, date in zip(tl_x, tl_dates): ax.text(x, -0.55, date, ha='center', fontfamily='serif', fontweight='bold', color='#4a4a4a',fontsize=12) for spine in ["left", "top", "right", "bottom"]: ax.spines[spine].set_visible(False) ax.set_xticks([]) ax.set_yticks([]) ax.set_title("Netflix through the years", fontweight="bold", fontfamily='serif', fontsize=16, color='#4a4a4a')


Now let’s visualize histogram displaying countries

For that, we need to pre-process the data a little bit more:

Firstly, let’s print the country columns see what we get…


As can we see that in 7782 and 7786 there are multi countries in a single column so what we will do is we will create another column that will store only the first country.

df['first_country'] = df['country'].apply(lambda x: x.split(",")[0]) df['first_country']

Now we will replace some of the country names with their short form.

df['first_country'].replace('United States', 'USA', inplace=True) df['first_country'].replace('United Kingdom', 'UK',inplace=True) df['first_country'].replace('South Korea', 'S. Korea',inplace=True)

After that, we calculate the total occurrence of each country.

df['count']=1 #helper column data = df.groupby('first_country')['count'].sum().sort_values(ascending=False)[:10]


Now let’s get started with the visualization:

#Drawing the figure color_map = ['#f5f5f1' for _ in range(10)] color_map[0] = color_map[1]= color_map[2] = '#b20710' fig,ax = plt.subplots(1,1,figsize=(12,6)) #Annotating the figure,data,width=0.5,edgecolor='darkgray',linewidth=0.6,color=color_map) for i in data.index: ax.annotate(f"{data[i]}",xy=(i,data[i]+100),va='center',ha='center',fontweight='light',fontfamily='serif') for s in ['top','left','right']: ax.spines[s].set_visible(False) #Adding text fig.text(0.125,1,'Top 10 countries on Netflix',fontsize=15,fontweight='bold',fontfamily='serif') fig.text(0.125,0.95,'The three most frequent countries have been highlighted.',fontsize=10,fontweight='light',fontfamily='serif') fig.text(1.1, 1.01, 'Insight', fontsize=15, fontweight='bold', fontfamily='serif') fig.text(1.1, 0.67, ''' Here we see that US is major content producers for Netflix, and on second we have India after UK and so on. Netflix being a US Company, it makes sense that they major producers. ''' , fontsize=12, fontweight='light', fontfamily='serif') ax.grid(axis='y', linestyle='-', alpha=0.5)

At last, we will create a word cloud:

To create a word cloud you will be needing a mask(Structure of our word cloud) in this example mask image is given below, feel free to create any shape you want.

Importing necessary libraries:

from wordcloud import WordCloud import random from PIL import Image import matplotlib

Creating a word cloud and displaying it:

paintcmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ['#221f1f', '#b20710']) text = str(list(df['title'])).replace(',', '').replace('[', '').replace("'", '').replace(']', '').replace('.', '') mask = np.array('mask.png')) wordcloud = WordCloud(background_color = 'white', width = 500, height = 200,colormap=cmap, max_words = 150, mask = mask).generate(text) plt.figure( figsize=(5,5)) plt.imshow(wordcloud, interpolation = 'bilinear') plt.axis('off') plt.tight_layout(pad=0)   Keep in Mind

Always make sure that you keep your data visualizations organized and Coherent.

Make sure to use proper colours to represent and differentiate information. Colours can be a key factor in a reader’s decisions.

Use high contrast colours and annotate the elements of your data visualization properly.

Never distort the data, data visualization is said to be great when it tells the story clearly without distortions.

Never use a graphical representation that does not represent the data set accurately(For eg: 3D pie charts).

Your Data visualization should be easy to comprehend at a glance.

Never forget that our agenda of data visualization is to enhance the data with the help of design, not just draw attention to the design itself.


So, we wrap up our first tutorial on Netflix Data visualization – Part 1 here. There is still a problem with this visualization i.e. these are not interactive, the ones we can build with plotly and cufflinks.

Sometimes data visualization should be captivating and attention-grabbing which I think we have achieved here even if it isn’t precise. So by customizing our visualization like what we did here reader’s eye is drawn exactly where we want.

Connect me on LinkedIn

Email: [email protected]

Thank You !!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Update the detailed information about From Blob Storage To Sql Database Using Azure Data Factory on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!