Trending March 2024 # Operating On The Pandas Dataframe In Python # Suggested April 2024 # Top 7 Popular

You are reading the article Operating On The Pandas Dataframe In Python updated in March 2024 on the website Hatcungthantuong.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Operating On The Pandas Dataframe In Python

Overview

DataFrame in Python

Performing Data Cleaning Operations on the Pandas DataFrame

Introduction

Undoubtedly, a DataFrame in python is the most important structure used to store the data because it is used in all practical cases to store our given data set which we will be using for creating our models. It is defined under the Pandas library of Python. While doing any kind of analysis on our given dataset with the help of the Python tool the very next step after importing the required libraries is to create a data frame which is mostly done by reading the data file having our data set into Python. And now since we get our data set stored in a structure (Data Frame) we have to perform all our operations on this data frame only which makes it a big deal to learn about the various operations we have to perform on a data frame i.e on its constituent rows and columns in almost every case as a part of the Data Cleaning and hence the Data Preparation process.

Before moving on to the DataFrame, it would be helpful to first understand some of the basic data structures defined in Python like the Series and the built-in data structures. You will get all of this knowledge just by referring to one article:  A Beginners’ Guide to Data Structures in Python.

Table of Contents

Introducing the Dataset

Importing the Python Libraries

Reading data into a DataFrame

Subsetting a DataFrame

Renaming the Variables

Re-ordering the Variables

Creating Calculated Columns

Dropping a Variable

Filtering the Data in a DataFrame

Sorting the Data

Grouping and Binning

Creating Summaries

Introducing the Dataset

For this article, we will be using the Iris dataset which can be downloaded from here. We will use this data set to learn how these operations are actually performed on some actual data.

Importing the Python Libraries

Let’s import all the python libraries we will be needed for operating on a DataFrame namely NumPy and Pandas.

import numpy as np import pandas as pd Reading Data into a DataFrame

Before going to the operations we first need to create a DataFrame and here we will be reading the data from a CSV (comma-separated values) file into a Pandas DataFrame naming it as df here.

df=pd.read_csv('C:/Users/ACER/Desktop/Iris.csv')

By this our data frame df is created and to have a basic look at the data we can give the command:

df.head()

Subsetting a DataFrame

By subsetting a DataFrame we mean selecting particular columns from the table. It is one of the frequently used operations. There are various ways to subset the data. We will discuss each of them one by one.

Let’s determine the column names first!

df.columns

Let’s start subsetting!

df.SepalLengthCm

Here we use the name of the column and using this method we can get the data out of a single column only.

df['Species']

Using this method we can subset one or more columns on the basis of the column names.

df.iloc[:,1]

By this, we get all the rows and the column with the index as 1 i.e. the second column only and hence the column is taken out using the default index. As is clear from the slicers being used here multiple columns can be taken out at the same time.

df.loc[:,['PetalLengthCm','PetalWidthCm']]

Here we get all the rows and two columns namely Item_Type and Item_MRP.

Re-ordering the Variables

While there is no specific way to reorder the variables in the original data frame we have two options to reorder them. Firstly, we can view the columns of a Data Frame in a specific order as per our wish by subsetting the data in that same order. Secondly, we can update the original data frame with the data subsetted in the first option.

To view the data with the column names in a specific order we can do the following:

df.loc[:,['Species','SepalLengthCm', 'PetalWidthCm', 'PetalLengthCm', 'SepalWidthCm','Id']]

However, do remember that it does not lead to any permanent change in df.

To overwrite df simply command:

df=df.loc[:,['Species','SepalLengthCm', 'PetalWidthCm', 'PetalLengthCm', 'SepalWidthCm','Id']] Creating Calculated Columns

Also known as the derived columns, the calculated columns take their values from existing columns or a combination of them. In this case we can replace an existing column or create a new one both of which will be seen as permanent changes in the table.

Let’s take a scenario!

I want to get the area of the Sepal for some kind of analysis. How can I do so?

df['SepalArea']=df.SepalLengthCm*df.SepalWidthCm df

A new column SepalArea is created towards the end. However, it makes more sense to have the area column besides the parameter columns. Well, that can be done too using the insert method.

df.insert(5,'PetalArea',df.PetalLengthCm*df.PetalWidthCm) df.head()

A new column PetalArea is created at the sixth position.

In both of the cases above the derived column was added in df. However to first view the output before making a permanent change in df we can go for the assign method.

df.assign(Ratio=df.PetalArea/df.SepalArea)

The ratio column is displayed in the output only and not added to df.

Renaming the Variables

The rename method comes as a saviour when we get a data set having misspelt column names or sometimes when the variables are not self-explanatory giving us no idea about the data they are storing.

For example, I wish the Species variable to be called NameOfSpecies.

df.rename(columns={'Species':'NameOfSpecies'},inplace=True) df.tail()

Dropping a Variable

An extremely important step as a part of the Data Cleaning process is to remove the unnecessary variables we have in our data usually which do not affect our analysis in any way and do not relate to the given business problem we are trying to solve.

Let me show you how it is done by dropping the variables we created above!

df.drop(columns=['PetalArea','SepalArea'],inplace=True) df.head()

Filtering the Data in a DataFrame

Filtering a data set essentially means filtering the rows which in turn refers to selecting particular rows from the data frame. This selection can be done both manually and conditionally. Let’s try filtering our data by both methods one by one!

Manual Filtering

You might have noticed that we have already filtered our data in some of the steps above! Recall! Yes , Using .head() and .tail()

#display the first 4 rows of df df.head(4)

#display the last 3 rows of df df.tail(3)

There are other ways too by which filtering can be done.

Using [ ] we can slice the data. Giving a slicer in the first argument gives us the required rows on the basis of their default index.

df[:4]

Using .iloc[ ] we can extract out the rows on the basis of their default index i.e the default row names. It takes out the rows with index from start to end – 1 if we slice as .iloc[start:end]

df.iloc[:2]

We get the rows with the default index as 0 and 1 i.e the first two rows of df.

Using .loc[ ] we can extract out the rows on the basis of their user-defined index i.e the row names. It takes out the rows with index from start to end if we slice as .loc[start:end]

df.loc[:5]

We get the rows with the User Defined Index in (0,1,2,3,4,5) i.e the first six rows of df.

You must notice that in this case, the UDI is the same as the DI.

Suppose we want to extract only some specific rows, not necessarily consequent ones. How can that be done? Just mention the individual index and we are done!

df.iloc[[3,0,12,5,9]]

Conditional Filtering

Unlike the manual filtering where we mentioned the row indices manually in order to filter the rows, in the case of conditional filtering we filter the rows by indexing i.e checking conditions on the data. This can be done using [ ] and .loc[ ] on df but not with .iloc[ ]. Let’s take a different approach to learn indexing by considering some scenarios.

Task 1: Get details for virginica species.

df[df.NameOfSpecies=='Iris-virginica'].head()

Task 2 : Get details for virginica and setosa species.

Although the above method can also be used, let’s try a different approach here where we will be using .isin

names = ['Iris-setosa','Iris-virginica'] df[df.NameOfSpecies.isin(names)]

By this we get all the records where the NameOfSpecies value is Iris-setosa or Iris-virginica.

Task 3 : Get the records for which the petal length is greater than the average petal length.

df.PetalLengthCm.mean() gives the average petal length ~ 3.75.

So we get the records where the petal length is greater than 3.75(approximately).

We can combine task 2 and task 3 to get all those records where the species is virginica or setosa and petal length is more than the overall average petal length.

And we get along data frame in this case! Let me show you a few rows and columns from it.

Sorting the Data

And now comes an interesting operation which is sorting. Our primary purpose of sorting the data in a data frame is to arrange it in order for the better readability of the data. To sort the values inside a particular column we use the sort_values method.

Let’s take a scenario where we want the sepal lengths to be in ascending order!

df.sort_values('SepalLengthCm',inplace=True) df

What do we see?

The individual records have been sorted according to the sepal length values. (Check the row indices !)

Now, to sort the data by sepal width from highest to lowest value we can simply write the command as:

df.sort_values(by='SepalWidthCm' , ascending=False , inplace=True) df

The data frame is changed which is evident from the jumbled row indices!

But what if I am not happy with the indices being in this way and rather want them to be ordered starting from 0 while at the same time the records should be sorted by the sepal width from highest to lowest. We can simply give another argument in the above method!

df.sort_values(by='SepalWidthCm' , ascending=False , inplace=True , ignore_index=True) df

By this, we are just resetting the index to the default index from the user-defined index we obtained on sorting initially.

The next thing I am going to do is combine the above two examples we studied. We can actually sort the sepal length in the ascending order and within that sort the sepal width in the descending order by giving the command!

df.sort_values(by=['SepalLengthCm','SepalWidthCm'],ascending=[True,False],ignore_index=True)

Grouping and Binning

We just learned about derived columns and it’s time to introduce another kind of them. According to our business problem, the values in an existing column can be grouped or binned to make a new column known as a grouped/binned column. Why is it even done? To convert the continuous variables to categorical variables.

Both of these falls in the category of derived columns however they differ in some way. While binning is done only on continuous variables, grouping can be performed on categorical variables too. This is due to the fact that bins are of equal frequencies.

But why do we even want these columns? They help us reduce the cardinality of the columns.

Let’s try grouping and binning the variables in our dataset!

Grouping 

To create groups in Python we have 3 main methods two of which are defined in Pandas library and one comes from the numpy library.

Method 1 : pd.cut()

This is used to group the values of a single continuous variable only.

Task: Group the Petal length values into groups!

pd.cut(df.PetalLengthCm , [0,2,5,8])

Here we are creating user-defined groups (0,2] , (2,5] , (5,8]. We get the class intervals in ascending order.

Method 2 : pd.qcut()

Just like pd.cut() it is used to group the values of a single continuous variable only. But it divides the values into groups having equal frequencies i.e. each group has an equal number of values.

Task: Group the Petal width into three equal parts!

pd.qcut(df.PetalWidthCm,3)

In this case, first, the values inside the PetalWidthCm column is sorted and then the data is divided into 3 equal parts and hence we get the groups.

Method 3 : np.where()

Unlike the previous 2 methods, it can be used for one or multiple columns for any type of variable.

Task: Create a column ‘grouped’ with a few columns in one category and the rest in other.

#np.where(df.NameOfSpecies.isin(['Iris-virginica']),'Major','Minor' ) df['grouped']=pd.Series(np.where(df.NameOfSpecies.isin(['Iris-virginica']),'Major','Minor' )) df

Binning 

To create bins we used the pd.cut() method!

Creating 4 bins of equal class interval

pd.cut(df.SepalLengthCm , 4)

There is yet another way where we do not even need to mention the number of bins!

pd.cut(df.SepalLengthCm,range(0,10,2))

Creating Data Summaries

To summarize the data in Python and create tables we have three ways.

Method 1 : Using .groupby()

Task: Determine species wise total Sepal Length.

df.groupby('NameOfSpecies').SepalLengthCm.sum()

df.groupby(['NameOfSpecies','grouped']).SepalLengthCm.sum()

Task: Determine species wise total Sepal Length and average Sepal Length.

df.groupby('NameOfSpecies').SepalLengthCm.agg([np.sum,np.mean])

df.groupby('NameOfSpecies')['SepalLengthCm','SepalWidthCm'].agg([np.sum,np.mean])

So we have grouped the data successfully and created a summary. Now let’s learn a bit about tables. There are three tables we come across: Vertical tables are those having their first row as header, horizontal tables are those having their first column as header and crosstables are those having header in both rows and columns.

To create a cross table on top of the summarized data we use the .pivot() method.

But these are two different steps. Instead, we can use just one method and do all the operations: group the data, aggregate it and create the table on top of the summarized data. This can be done using .pivot_table().

Method 2 : Using .pivot_table()

To create a cross table we can give the following command:

df.pivot_table(index=’col1 ‘,columns=’col2 ‘ , values =‘col3 ‘ , aggfunc=’sum’)

Method 3 : Using pd.crosstab()

With this method, only the cross tables can be created and it is used to create the frequency tables.

I will mention the syntax here to create a frequency table:

pd.crosstab(index=df.col1 , columns=df.col2 ,values=df.col3 , aggfunc=’count’)

EndNotes

Finally, we have come to the end of this article. In this article we performed various operations on a Pandas DataFrame in Python which is typically done while cleaning the data, manipulating it and preparing it for our analysis. However, this is not all, A lot more operations can be performed on a data frame like dealing with the duplicates, outliers and missing values followed by their treatment. These are really important steps in the EDA part and hence should not be missed.

I strongly recommend you to read this article on Exploratory Data Analysis in Python which will help you understand much more crucial operations performed on a DataFrame.

You can connect with me on LinkedIn.

Related

You're reading Operating On The Pandas Dataframe In Python

Lottery Process Scheduling In Operating System

Introduction

Lottery scheduling is a process scheduling algorithm used in operating systems that assign processes a fixed number of “lottery tickets” based on their priority, determining their likelihood of execution. In this article, we will talk about the lottery process scheduling algorithm, and how can manipulate tickets using the same.

The Lottery Process Scheduling Algorithm

The higher the priority of a process, the more tickets the lottery process scheduling algorithm receives. In this algorithm, the scheduler chooses a ticket at random from the pool of available tickets. For execution, this algorithm chooses the process that owns the winning ticket.

The lottery scheduling algorithm is probabilistic. In this, the likelihood of a process being selected for execution is proportional to the number of tickets it contains. This is because each process has a chance of being selected regardless of its priority. This, in turn, allows for a more equitable distribution of resources among processes.

The operating system keeps track of all processes that are currently awaiting execution to enable lottery scheduling. Each process is assigned a certain number of tickets based on its priority. For instance, a process with a higher priority may be assigned 100 tickets. On the other hand, a process with a lower priority may be assigned only 10 tickets.

When it’s time to start a new process, the lottery scheduler chooses a ticket at random from the pool of available tickets. The process of winning a ticket is chosen for execution, and its ticket count is reduced by one. The process is then executed for a specific time slice before being returned to the pool of available processes.

The operating system keeps track of all processes that are currently awaiting execution in order to enable lottery scheduling. Each process is assigned a number of tickets based on its priority. A higher priority process, for example, may be assigned 100 tickets, whereas a lower priority process may be assigned only 10 tickets.

Lottery Scheduling algorithm as a probabilistic algorithm

The lottery scheduling algorithm is a probabilistic algorithm. This means that the likelihood of a process being selected for execution is proportional to the number of tickets it contains. This is because each process has a chance of being selected regardless of priority. This further, allows for a more equitable distribution of resources among processes.

When a new process is initiated, the lottery scheduler selects a ticket at random from the pool of available tickets. The winning process is chosen for execution, and its ticket count is reduced by one. The process is then run for a set amount before being returned to the pool of available processes.

Manipulating tickets in the Lottery Process Scheduling algorithm

The lottery tickets are typically manipulated based on the priority of each process. Higher-priority processes are assigned more tickets than lower-priority processes, increasing their chances of being selected for execution. However, there are a few different ways to manipulate tickets in lottery scheduling −

Static Distribution − The number of tickets assigned to each process in this method is fixed and does not change over time. For instance, a process with a higher priority may be assigned 100 tickets. On the other hand, a process with a lower priority may be assigned only 10 tickets. This method is simple to implement, but it may not result in the most efficient or equitable resource distribution.

Dynamic Distribution − In this method, the total amount of tickets allocated to each process may fluctuate over time according to the system’s behavior. For example, if a strong-priority process is taking up materials and starving other operations, its ticket count may be minimized to give other procedures an increased likelihood of being selected. Although this technique has a greater computation overhead, it may result in more effective and equal resource allocation.

Weighted Distribution − In this method, the total amount of tickets designated to each process is determined by factors other than its priority. Other factors, which include the quantity of processing power it has already consumed, also play a role. Despite having the same priority, an operation that consumed a lot of processing power may be allocated a lesser number of tickets than a procedure that has employed very little CPU time. This approach can be difficult to put in place, but it can help prevent processes from monopolizing resources.

Conclusion

Lottery scheduling is an effective algorithm for process scheduling in operating systems. This is especially effective when a fair distribution of resources is required. In this article, we explored this article in detail along with its probabilistic nature and also, how we can manipulate tickets using this algorithm.

Top Interview Questions On Dictionary In Python

This article was published as a part of the Data Science Blogathon.

Intro

In python, Dictionary is an unordered collection of data values, i.e., key: value pair within the curly braces. The keys in the dictionary are unique (can’t be repeated), whereas values can be duplicated. Questions on Dictionary are often asked in interviews due to its massive use during projects.

Therefore, having a piece of good knowledge about dictionary for every Data Scientist aspirant.

In this article, some critical theoretical as well as practical questions will be discussed, which will help aspirants have a good understanding of the Dictionary.

Interview Questions on Dictionary

Question 1: What is a dictionary?

Dictionary is a set of key: value pairs, with each pair being unique. The dictionary can be created by using empty braces {}. We can add a key: value pair to it.

eg-  dictionary1 = { ‘a’: 1, ‘b’: 2, ‘c’: 3 }

Question 2: Are dictionaries case-sensitive?

Yes, dictionaries are case-sensitive, i.e., the same name of keys, but different cases are treated differently, i.e., ‘apple’ and ‘APPLE’ will be treated as separate keys.

Question 3: What are different ways of creating a Dictionary?

Three different ways of creating a Dictionary are:

1. Create an empty Dictionary

Dictionary1 = {} print(Dictionary1)

Output:

{} key1 = 'a' value1 = 1 Dictionary1[key1] = value1

Output:

{'a': 1}

2. Create Dictionary using dict() method

Dictionary1 = dict({1: 'a', 2: 'b'}) print(Dictionary1)

Output:

{1: 'a', 2: 'b'}

3. Create Dictionary with each item as Pair

Dictionary1 = dict([(1,'a'), (2, 'b')]) print(Dictionary1)

Output:

{1: 'a', 2: 'b'}

4. Creating Dictionary directly

Dictionary1 = {1: 'a', 2: 'b'}

Output:

{1: 'a', 2: 'b'}

Question 4: What is a Nested Dictionary? How is it created?

A dictionary inside the dictionary is known as a “Nested Dictionary”. For ex-

dictionary1 = {1: {'roll': '101', 'name': 'sam'},                           2: {'roll': '102', 'name': 'ram'}} print(dictionary1)

Output

{1: {'roll': '101', 'name': 'sam'}, 2: {'roll': '102', 'name': 'ram'}}

The elements of nested dictionary can be accessed using

print(dictionary[1]['roll'])

Output:

101

Question 5: How do you add an element in Dictionary?

Elements in a Dictionary can be added in multiple ways:

1. Adding one pair at a time

Dict1 ={} Dict1[0] = 'a' Dict1[1] = 'b' print("Dictionary after adding 3 elements: ", Dict1)

Output:

{0: 'a', 1: 'b' }

2. Adding more than one value to a single key

Dict1['values'] = 4, 5, 6 print("Dictionary after adding multiple values to a key: ", Dict1)

Output:

{0: 'a', 1: 'b', 'values': (4, 5, 6) }

3. Adding nested key-value pair

Dict1['Nested'] = {1: 'Analytics', 2: 'Life'}

Output:

{0: 'a', 1: 'b', 'values': (4, 5, 6), 'Nested': {1: 'Analytics', 2: 'Life'} }

Question 6: Discuss different methods used with Dictionary.

Various methods used with Dictionary are:

1. clear()

It is used to delete all elements from a dictionary i.e., to create empty dictionary.

dict2 = {1: 'Analytics', 2: 'Vidhya'} dict2.clear() print(dict2)

Output:

{ }

2. get()

It is used to get the value of the specified key.

x = dict2.get(2) print(x)

Output:

Vidhya

3. copy()

It is used to return copy of a dictionary

dict3 = dict2.copy() print(dict3)

Output:

{1: 'Analytics', 2: 'Vidhya'}

4. items()

It is used to return a list tuples consisting of key-value pairs.

Dict1 = {1: 'Analytics', 2: 'Vidhya'} print(Dict1.items())

Output:

dict_items([(1, 'Analytics'), (2, 'Vidhya')])

5. keys() and values()

Returns all keys and values within a dictionary respectively.

Dict1.key() Dict1.values()

Output:

dict_keys([1, 2])  dict_values(['Analytics', 'Vidhya'])

6. update()

This method updated value of a key in dictionary

Dict1.update({2:"Blogathon"}) print(Dict1)

Output:

{1: 'Analytics', 2: 'Blogathon'}

Question 7: Create a dictionary from a given list. For instance-

Input : [1, ‘a’, 2, ‘b’, 3, ‘c’] Output : {1: ‘a’, 2: ‘b’, 3: ‘c’}

def Convert_list_dict(dict2):

x = iter(dict2)

res_dct1 = dict(zip(x, x))

return res_dct1

dict1 = [1, ‘a’, 2, ‘b’,3, ‘c’]

print(Convert_list_dict(dict1))

Here, zip() function takes iterables (it can be more than two also) and combines them in a tuple.

Output:

{1: 'a', 2: 'b', 3: 'c'}

Question 8: Create a list of tuples from the dictionary

The list of tuples can be created in following way:

dict1 = { 1: 'a', 2: 'b', 3: 'c' } lst1 = list(dict1.items()) print(lst1)

Output:

[(1, 'a'), (2, 'b'), (3, 'c')]

Question 9: Create a list from the dictionary.

Suppose the given dictionary is:

dict1 = { 1: 'a', 2: 'b', 3: 'c' }

A list can be created using the below code:

x = list(dict1.keys()) y = list(dict1.values()) for i in y:       x.append(i) print(x)

Output:

[1, 2, 3, 'a', 'b', 'c']

Question 10: How can you delete key-value pair from Dictionary?

Key-value pair can be deleted by using ‘del’ keyword as shown below:

del dict1[1] print(dict1)

Output:

{2: 'b', 3: 'c' }

Question 11: Is the dictionary mutable?

The term ‘Mutable’ means we can add, remove or update key-value pairs in a dictionary.

Yes, the dictionary is mutable. For instance,

Dict1 = {1: 'a', 2: 'b', 3: 'c', 4: 'd' } Dict1[2] = 'h' print(Dict2)

Output:

{1: 'a', 2: 'h', 3: 'c', 4: 'd' }

Question 12: Given two lists, create a dictionary from them.

Input: [ 1, 2, 3, 4, 5], [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]

Output: {1: ‘a’, 2: ‘b’, 3: ‘c’, 4: ‘d’, 5: ‘e’}

Let’s define these two lists as list1 and list2 as follows:

list1 = [1, 2, 3, 4, 5] list2 = ['a', 'b', 'c', 'd', 'e'] dict1 = {} for i, j in zip(list1, list2): dict1[i] = j print(dict1)

Output:

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Another way of achieving the same output:

dict1 = {i:h for i,j in zip(list1, list2)} print(dict1)

Output:

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Question 13: Write a code to sort dictionaries using a key.

Input: {2: ‘Apple’, 1:’Mango’, 3:’Orange’, 4:’Banana’}

4: Banana

Below is the code to sort dictionaries using the key:

dict1 = {2: 'Apple', 1:'Mango', 3:'Orange', 4:'Banana'} print(sorted(dict1.keys())) for key in sorted(dict1):       print("Sorted dictionary using key:",(key, color_dict[key]))

Output:

[1, 2, 3, 4]  1: Mango 2: Apple 3: Orange 4

Conclusion

In this blog, we studied some of the important and frequently asked interview questions on Dictionary. To sum up, the following are the major contributions of the article:

1. Basic concepts of the Dictionary have been discussed to make the reader familiar with it.

2. We learned how to perform various functions on Dictionary, such as adding key-value pairs and deleting key-value pairs.

3. We discussed various functions that can be used to work and play with Dictionary.

4. Further, we also discussed several programming questions on Dictionary that can be asked in interviews.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

Quick Notes On The Basics Of Python And The Numpy Library

This article was published as a part of the Data Science Blogathon.     

      “Champions are brilliant at the basics”

Quick Basics of python 1. What is an interpreter? 2. Difference between the virtual environment and the existing interpreter?

Ans: The difference is that if you use a virtual environment for your project and add/remove packages then it will only affect the virtual environment. If you use an existing interpreter then all changes will affect the system-wide interpreter and these changes will be available in all the projects that use that interpreter.

3. What is a pip?

Ans: Pip is a standard package management system used to install and manage the software packages written in python.

4. What are the various commands of pip?

Ans: Below are the various commands of pip to be run in the command prompt/terminal

5. What are variables?

Ans:  Variables are used to store information to be referenced and manipulated in a program. In python, we don’t need to explicitly mention the datatype during declaration. A string variable cannot be manipulated by mathematical actions.

6. What are the basic operations in python?

Ans: We have ‘+’ (addition) ;  ‘-‘ (subtraction) ; ‘*’ (multiplication);  ‘/’ (division); ‘%’ (modulus meaning the remainder of division) ; ‘**’ (power meaning ab i.e. a to the power of b);  ‘//’ (floor division meaning this will give the quotient of division without the decimal).

7. What are string indexing and slicing?

Ans: Index is the position of each character starting from 0. Slicing is getting the substring i.e. subset out of a string value or a word or sentence.

You slice your butter so that the chunks can be used for various purposes !! Right !! SAME IS APPLICABLE HERE AS WELL !! 

Suppose we have a string variable having ‘stay positive’ stored in it. The below picture shows the indexing of the elements in the string. 

Below is a code snippet having examples of indexing and slicing.

8. What is mutable and immutable property?

Ans: Mutability means you can change the values of an object after it is created, and the Immutable property of an object means it cannot be changed after it is created.

9. What are the different data structures of python?

Ans: Below are different types of data structures :

9.1) Lists : These are the data types that hold elements of different/same datatype together in a collection in sequential manner. These are enclosed in square brackets [ ]. Lists are mutable and indexable in nature, and also allow duplicates. There are many list methods like list.append(), list.pop(), list.reverse(),list.sort(),list.count(),list.insert(),list.remove() etc.  for performing various list operations; few of which is showed in the below code snippet.

NOTE:  Accessing and indexing elements in the list are the same as the Q7(indexing and slicing) topic explained above.

9.2) Tuples: These are similar to lists with two major differences i.e. (that is) Firstly they are enclosed within round brackets() and second they are immutable in nature.  There are two inbuilt methods that can be used on tuples index() and count(). Code snippet for same is mentioned below:

NOTE: Accessing and indexing elements in the tuples are the same as the Q7(indexing and slicing) explained above.

9.4) Dictionaries: Dictionaries in python is a data structure that stores the values against its keys. Basically key-value pair. It is enclosed within curly braces having key:value pair i.e. {key1:val1}

In the above example, ‘mydict’ is a dictionary that stores the number of students present in each class. So classA is key and 30 is its value. 

Dictionaries do not allow duplicate keys and are mutable in nature. Below is the code snippet having dictionary examples:

There are few more functions like get(keyname) that will return the value of that key, update(),popitem(), etc. . We can also have nested dictionaries,                                                            lists value for the key i.e. key1 : [1,2,3].

What is the use of dictionaries  ???

Well, there might be scenarios wherein you will have to count the number of occurrences of an item in a list, then you can easily compute this using dictionary. Another example is using a dictionary like a lookup file wherein you might have a set of static key-value pairs to refer to. Also, dictionaries are used in backend code while building APIs. Hence with dictionaries in place, many operations like I mentioned above become easier to deal with.

10. What are the various common libraries used in Data Science?

Ans: Common libraries are :

11. Why is Numpy required when we have python Lists? Since both do the same work of storing data in array form?

Ans: Absolutely, but Numpy is better since it takes less memory as compared to lists. Also, a Numpy array is faster than a list.

Now the question is HOW ??? Please follow the below code snippet showing the answer for the question of HOW  IT TAKES LESS MEMORY AND IS FASTER THAN LISTS??   

In the above code, we have compared the memory used by the list and the memory used by the Numpy array. The size of a single integer element in the list takes 28 bytes whereas a Numpy array takes only 4 bytes. This is because lists are python object which requires memory for pointers as well as value, but Numpy array does not have pointers that will point to the value. Hence IT TAKES LESS MEMORY.

HOW NUMPY ARRAY  OPERATIONS ARE FASTER THAN LIST ?? 

Let us PROVE IT IN BELOW CODE SNIPPET

In the above code, we have computed the time taken by the addition of two lists each having 1 million records that took142.3 seconds whereas when we performed the same operation with the same number of records with two arrays, the computation took 0.0 seconds !!!!! WOW!!!!

HENCE PROVED! Numpy array is much faster than a list.

In real-time, we have a huge amount of data that needs to process and analyzed so as to get useful and strategic information out of the data. Hence Numpy arrays are better than a list.

12. Can we create and access the n-D(n-dimension) array using the Numpy library?

Ans: Definitely, this is one more key feature of the Numpy array. We can create an n-dimensional array using the array() method of Numpy by passing a list, tuple, or an array-like object.

In order to know the number of dimensions an array has, we have the “ndim” attribute of Numpy arrays.

We can also explicitly define the dimension for an array by using “ndmin” argument of the Numpy array() method.

There is a “dtype” property that will return the data type of array. Also, we can also define the data type of array by passing an argument of dtype to the array method

Below is the code snippet for the same

13. How to index, access, and  perform slicing on an n-D Numpy Array

These are the positions assigned internally in the n-D array. Keeping this in mind, an n-D array can be accessed, manipulated, etc.

Please follow the below code snippets to understand how to access and slice the n-dimensional arrays.

The above picture represents how the indexes are represented in an n-dimensional array. Using the indexes, we can access array elements and perform slicing.

SLICING:  The concept of slicing remains the same as mentioned in the above queries. The syntax for slicing is arrayName[startIndex:stopIndex:step(optional)]

Be it 1-D,2-D, or n-D, array slicing works the same.

The array examples used in the below code snippet are the same as in above eg i.e. oneD_array, TwoD_array, ThreeD_array. Please refer to the array declarations in the above code snippet.

14. What are various methods and attributes in Numpy?

Ans: Numpy has various attributes and methods that can tell you the size, shape of the array(rows X columns), change the shape(reshape), size of each element, datatypes, and many more. Few are listed in below code snippet:

We also have copy and view methods that duplicate an existing array. But these two methods do have a very major difference internally. Please find the below code snippet wherein the difference is shown in a practical way:

Numpy also has many more methods and attributes like :

and MANY MORE. You can go through all of them on the Numpy website. The ones that I have listed are the common ones that are frequently used.

ARRAY OPERATIONS : 

Very easy addition, subtraction, multiplication, division can be done between two arrays. Below is the code snippet for adding and subtracting. Others can also be done in the same manner viz (a*b),(a/b)

15. Numpy array has an AMAZING PROPERTY !!  What is that ??

Ans: Let us assume we have a 2-D array wherein I want to check if that every array element is greater than value 10. If yes, then replace them with True otherwise False. So in return, I will get TRUE FALSE MATRIX. Below is the code snippet:

Now, if I want the values of array ‘arr’ which is greater than 10 ??  This can be achieved in just a line as shown in the below code snippet:

Now we can also replace these with specific flag values like -1 or 0 or anything.

The code snippet is as follows:

So all those elements greater than 10 is replaced by -1

THAT IS ALL FOR THIS ARTICLE !!! 

ENDNOTES: 

I am sure, you, as a beginner must have found this article to be useful. All the necessary basics have been covered and I have tried to cover the concepts in detail with the practicals where most people find difficulty in understanding. Thank you for your time.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

How To Create A Correlation Matrix Using Pandas?

Correlation analysis is a crucial technique in data analysis, helping to identify relationships between variables in a dataset. A correlation matrix is a table showing the correlation coefficients between variables in a dataset. It is a powerful tool that provides valuable insights into the underlying patterns in the data and is widely used in many fields, including finance, economics, social sciences, and engineering.

In this tutorial, we will explore how to create a correlation matrix using Pandas, a popular data manipulation library in Python.

To generate a correlation matrix with pandas, the following steps must be followed −

Acquire the data

Construct a pandas DataFrame

Produce a correlation matrix using pandas

Example

Now let’s work on different examples to understand how we can create correlation matrices using pandas.

This code demonstrates how to use the pandas library in Python to create a correlation matrix from a given dataset. The dataset contains three variables: Sales, Expenses, and Profit for three different time periods. The code creates a pandas DataFrame using the data and then uses the DataFrame to create a correlation matrix.

The correlation coefficients between Sales and Expenses and Sales and Profit are then extracted and displayed along with the correlation matrix. The correlation coefficients indicate the degree of correlation between two variables, with a value of “1” representing perfect positive correlation, “-1” representing perfect negative correlation, and “0” indicating no correlation.

Consider the code shown below.

# Import the pandas library import pandas as pd # Create a dictionary containing the data to be used in the correlation analysis data = { 'Sales': [25, 36, 12], # Values for sales in three different time periods 'Expenses': [30, 25, 20], # Values for expenses in the same time periods 'Profit': [15, 20, 10] # Values for profit in the same time periods } # Create a pandas DataFrame using the dictionary sales_data = pd.DataFrame(data) # Use the DataFrame to create a correlation matrix correlation_matrix = sales_data.corr() # Display the correlation matrix print("Correlation Matrix:") print(correlation_matrix) # Get the correlation coefficient between Sales and Expenses sales_expenses_correlation = correlation_matrix.loc['Sales', 'Expenses'] # Get the correlation coefficient between Sales and Profit sales_profit_correlation = correlation_matrix.loc['Sales', 'Profit'] # Display the correlation coefficients print("Correlation Coefficients:") print(f"Sales and Expenses: {sales_expenses_correlation:.2f}") print(f"Sales and Profit: {sales_profit_correlation:.2f}") Output

On execution, you will get the following output −

Correlation Matrix: Sales Expenses Profit Sales 1.000000 0.541041 0.998845 Expenses 0.541041 1.000000 0.500000 Profit 0.998845 0.500000 1.000000 Correlation Coefficients: Sales and Expenses: 0.54 Sales and Profit: 1.00

The values on the diagonal represent the correlation between a variable and itself, therefore the diagonal values indicate a correlation of 1.

Example

Let’s explore one more example. Consider the code shown below.

In this example, we create a simple DataFrame with three columns and three rows. We then use the .corr() method on the DataFrame to calculate the correlation matrix, and finally print the correlation matrix to the console.

# Import the pandas library import pandas as pd # Create a sample data frame data = { 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] } df = pd.DataFrame(data) # Create the correlation matrix corr_matrix = df.corr() # Display the correlation matrix print(corr_matrix) Output

On execution, you will get the following output −

A B C A 1.0 1.0 1.0 B 1.0 1.0 1.0 C 1.0 1.0 1.0 Conclusion

In conclusion, creating a correlation matrix using pandas in Python is a straightforward process. First, a pandas DataFrame is created with the desired data, and then the .corr() method is used to calculate the correlation matrix. The resulting correlation matrix provides valuable insights into the relationships between the different variables, with the diagonal values indicating the correlation of each variable with itself.

The correlation coefficients range from -1 to 1, where values closer to -1 or 1 indicate stronger correlation, while values closer to 0 indicate weaker or no correlation. Correlation matrices are useful in a wide range of applications, such as data analysis, finance, and machine learning.

Full Guide On Whatsapp Automation Using Python

This article was published as a part of the Data Science Blogathon.

Overview

Introduction

What is Whatsapp automation using python?

What are its features?

Let’s code!

Errors and exceptions

Handling error

Can we do it another way?

Conclusion

 Introduction

Imagine, you turned on your pc, a message “Good Morning!” is automatically sent to your WhatsApp contact without having done anything. And this is what we are going to create and with various other features.

What is Whatsapp Automation Using Python?

It is a utility and the best python software that will save your time and will make you a punctual person in other person’s views. It will basically automate WhatsApp web and send the message.

What are its Features?

So let’s understand it backwards, what are its features and how will it work.

You turned on your pc,

This program will run automatically

Wait for 2 minutes to not load to pc (as many programs run simultaneously when a pc starts like antivirus programs and many more which slows down the pc and this program will wait to not load the pc)

Check the file (database of the program)

If it does not exist,

it will create the file and will send the message and then update the file.

if it exists,

Then, will check if the last date (the database will have the dates of the message sent ) is of another year and if yes it will delete all the contents of the file (to free up the space taken by it) and will send the message and it will again update the current date to the file.

And if the last date’s year is the same as the current year, it will directly move forward.

If the last date in the file is not the current date, then it will send the message and update the database.

And if the last date in the file is the current date, then it will just check for the last year’s date and will close the program.

If there is a problem in sending the message, it will not update the database and will notify the user that there is a problem in sending the message.

Whatsapp logo

Let’s code!

As we have known all the things that our program will do. So now we will start creating our program.

First of all, we will need to download the modules/libraries required to work with the program

And the modules are “time”, “datetime”, “selenium”, “os”, “plyer”

Out of all these modules, only Selenium (here its version: 3.141.0) and plyer (here its version: 2.0.0) module needs to be downloaded and other remaining comes preinstalled with python 3.

So using this command in this terminal, you can download and install all these modules.

pip install selenium plyer

Now we have installed the modules and now we can import them into our program and use it. And below we will start our coding.

import time time.sleep(120) import datetime from selenium import webdriver from selenium.webdriver.chrome.options import Options from chúng tôi import Keys import os from plyer import notification

As we will make our program automatically started at the start of the pc, we will make our program sleep for 2 minutes to not load on our pc (As explained in the what are its features section ).  On the first line of our program, we have imported the time module then in the second line, we made our program sleep. And line after line we will import all the modules like in third line, datetime module, fourth line webdriver class from selenium and in next line options from selenium and again in next line Keys from selenium and after that, we have imported os module and at last, we have imported notification from plyer (Same as our infinite timer using python program ).

Below is why we are importing all these modules

time – mainly for making our program sleep

datetime – To work with dates and years (To update the database)

webdriver – It is used to work with the browser and the website

Options – It is used to add arguments to the browser like which extensions to use and which user account to use and maximize the window and much more.

Keys – It is used to work with the keys of the keyboard or hotkeys like Ctrl+A and Ctrl+C or Enter.

os – Exiting the program

notification – For notification (If an error occurs while sending the message, as explained in what are its features section )

Note: YOU CAN ADJUST THE SLEEP TIME ACCORDING TO YOUR CONVENIENCE. IF YOUR PC IS ALWAYS CONNECTED TO A WIFI OR ETHERNET, THEN YOU CAN DECREASE THE SLEEP TIME OR IF YOUR PC IS FAST ENOUGH, THEN ALSO YOU CAN DECREASE THE SLEEPING TIME AS IT IS DEPENDENT ON PC TO PC.

Now the main messenger() function starts here

def messenger(): try: message_content = "Good morning!" options = webdriver.ChromeOptions() driver = webdriver.Chrome(executable_path=path, options=options) driver.minimize_window() driver.get(url) time.sleep(20) type_it = driver.find_elements_by_class_name('_13NKt') time.sleep(20) try: type_it[1].send_keys(message_content + Keys.ENTER) except IndexError as e:

time.sleep(20)

  type_it = driver.find_element_by_xpath('/html/body/div[1]/div[1]/div[1]/div[4]/div[1]/footer/div[1]/div/span[2]/div/div[2]/div[1]/div/div[2]')

       

type_it.send_keys(message_content+ Keys.ENTER)

print(e) time.sleep(10) driver.quit() except Exception as e: notification.notify( title = "Whatsapp message not sent", message = "Error while sending!", app_name = "Whatsapp Message error", toast = True, ) print(e) os._exit(0)

In the first line of the above code, we have defined the messenger() function. Since in the automation process, there may be some errors ( like internet problems or any other issue ). So we will use try-except block to make our program error-free.

Therefore, we have to use the try method in the second line. In the next line, we have created the variable url which will contain the url of the Whatsapp web ( Url will contain the phone number of the person to whom you want to send the message ). And in the next line, we have the variable message_content ‘ (You can change it as you want) which will contain the message to be sent.

And in the next line, we have the variable message_content which will contain the message to be sent. The next line contains the path variable which will contain the path of the chrome driver (Note: The ‘r’ (r is used for raw string )behind string is used to not escape the escape characters, like ‘/n’ is used to get a new line and if you want to print it, then it will not get printed, just a new line will be printed. To print it, we can use either ‘//n’ or r’/n’, Both will print the characters ‘/n’. And in a path like ‘C:/users’, ‘/’ may raise any issue, so we are using it for raw string ). More about the path in the handling section. 

BONUS: You can’t use WhatsApp Web without scanning the QR code. You have to scan the QR minimum of 1 time and then if you checked the keep me signed in option, you can visit the site directly without scanning the QR again. And if you are using another profile of the browser which doesn’t have WhatsApp logged in, then you have to scan the QR again. But what you can do is just join the beta mode of WhatsApp, and you will have no issue sending the message. More on this in the handling error section.

In the next line, we have variable ‘options’ which will contain the ChromeOptions(). It is just used to work with the profiles, extensions, cookies or proxies, and stuffs like that on the browser. In the next line, we have added the argument which contains the profile in which WhatsApp is logged in. In the next line, the driver variable is used which initializes the Chrome with the chrome driver as specified above and the options for the profile as an argument.

In the next line, we have minimized the screen for just working in the background type. Next, we have a driver variable that will get the URL in the browser. Now we will make our program sleep for 20 seconds to not get any error in accessing the elements of the site (As the website may take time to load and elements of the site may not be loaded quickly). In the next line, we have variable type_it, which will contain the list of the elements of the given class name of the element. NOTE: THE CLASSES AND XPATH MAY HAVE CHANGED WHEN YOU ARE READING THIS, SO FIND YOUR CLASS NAME FOR THE TYPING BOX AND THEN USE IT. Again we will make the program sleep for 20 seconds. Then we will use the second element from the list for sending the message and will use keys. ENTER to send the message using Keys that we have imported earlier. We will use try and except because sometimes it is not able to access the element and will throw the Indexerror. And if it happens, we will try it again and then send it using its ‘Xpath’. After that, we will make our program sleep, as if it is sent from our program but due to instant use of quit method to quit the driver, it may not send it sometimes.

And if any error happens in sending, then it will simply send the user a notification. For that, we have to use notify function of the imported notification from the plyer module. As we have done in infinite timer using python, we will send the desktop notification to the user. To know more about what is done here, just refer to another article which is infinite timer using python. At the last of the error happened, we will just quit the program as we don’t need to do any work now. So we will use the _exit(0) function of the os module with ‘0’ to say everything is fine in the program to the system.

Now we have to work only on the database section.

today = str(datetime.date.today()) today_2 = f"{today} " content = bytes(today_2,'utf-8') year_str = str(datetime.datetime.now().year) year_edit = bytes(year_str,'utf-8').decode('utf-8') date_str = str(datetime.datetime.now().day) date_edit = bytes(date_str,'utf-8').decode('utf-8') edit = {"1":"01", "2":"02", "2":"03", "4":"04", "5":"05", "6":"06", "7":"07", "8":"08", "9":"09",} print(date_edit) try: file = open("database.txt", "x") messenger() file.write(today_2) file.close() except Exception as e: file = open("database.txt", "a+b") try: try: file.seek(-11,2) # seek will not work in negative in text mode, only in byte mode except OSError as e: print(e) messenger() file.write(content) file.close() os._exit(0) year = file.read(10).decode('utf-8') file.seek(-11,2) date = file.read(10).decode('utf-8') if year_edit != year[:4]: file.close() file = open("database.txt", "wb") file.close() file = open("database.txt", "a+b") messenger() file.write(content) file.close() for x in edit.keys(): if x == date_edit: date_edit = edit.get(x) break if date_edit != date[8:14]: messenger() file.write(content) file.close() os._exit(0) except Exception as e: print(e)

Now the main thing is we will use datetime module to get the date, year, and all. Since datetime object is of type ‘datetime’ not ‘str’ (string in python ), there will an error in writing this to the file. So we have to convert it to ‘str’ and also we have to convert all these to bytes to read and write in the file in bytes mode. But one of the bugs of the datetime module in our program can be the dates, like the date, 2023-01-01 will be 2023-1-1 as per datetime module. Now if we check the last date in our database to check if the message was sent on that date or not, we will get a wrong answer as both are different. So we will fix this bug by making a dictionary of the digits (technically string) to that of the required digit (Again string).

NOTE: YOU MUST CLOSE THE FILE AFTER OPENING IT BECAUSE SOMETIMES IT DOESN’T SAVE THE FILE AND YOUR WORK WILL NOT BE DONE.

And if there is an error like there is the database already in the directory, then it will directly enter into except block and will open the file in read + write mode in bytes like “a+b”.

And if the file has contents, then it will read the year and decode it to normal encodings which is ‘utf-8’ and then we will seek the file pointer to a position from where it can read the specific date in the file and again decode it.

First, it will check if the year in the file is not the current year, then it will close the file opened in ‘appending and reading in bytes mode’ and then open it in writing and bytes mode (it will clear all the contents of the file ). Then it will simply close the file as all contents are deleted from the file and then we will again open the file in ‘a+b’ mode and call the messenger() function and then update the file with the current date and then close it. And the program is closed now.

The bug we have solved above will be used, using for loop it will change the contents of the date_edit variable using the dictionary keys.

And if there is an error while the running of the program, it will simply handle it and will print the problem.

Errors and Exceptions

You will have an error when WhatsApp will not load or will ask for QR. Or your chromedriver is of another version as of your chrome version.

This program may not send messages every time.

Reasons

Due to improper internet connection

Due to not working of webdriver

Due to high usage of ram in the background

And these may not notify you about the message not be being sent because according to our program all these will be handled by our webdriver and it will not get into exception handling. But every time you start your pc, it will try to message if it is not sent.

Exception Handling

You can join Whatsapp beta to manage the QR error. To join beta mode on WhatsApp, you can read this article

Chromedriver

First of all, you have to check for the version of your chrome and then just google for the chromedriver of that version and then download it and you are all done.

Tip: YOU CAN’T USE YOUR WHATSAPP WEB PROFILE  WHEN THIS PROGRAM IS RUNNING OR THE PROGRAM WILL CRASH. SO YOU CAN CHANGE THE SLEEP TIME OF THE PROGRAM ACCORDINGLY.

So in case you want to use the browser, you have to use another browser, not the browser that our program will use.

And yes, you can regulate the sleep time of your program according to your pc, ram, internet connection.

Can we do it another way?

Yes, but no! This software which is made manually can be replaced by various modules. But all those modules will have the same kind of code or maybe different from ours but we can use it. It is like we bought noodles and cooked them or we can just make raw noodles ourselves and then cook them. In both cases, we have cooked noodles, but by buying and cooking, it becomes hassle-free but you can’t add flavor to that raw noodle, on the other hand making noodles on our own takes time and sometimes hassles, but it is worth doing as you can do what you want with your raw noodles. So there are many alternatives to this and one of which is pywhatkit. It is also a very lightweight, easy to use, and fantastic module that is also worth practicing. I will definitely try to make an article on this in the future.

Conclusion

Woohoo! you made your own beast WhatsApp automating software with its own database.

Now you must try it yourself to fully enjoy the simplicity of python.

There is no specific output of this program but this is the image of the message sent.

snapshot of the sent message

To do this in windows, the steps are following.

Now your program will run automatically at the startup of your computer

About

I am Atulya Khatri, a python geek. I love to learn different programming languages, try different libraries and create different programming kinds of stuff.

My other articles are as follows-

Gui Calculator using python

Beginners guide to password generator using python

Building an infinite timer using python

Youtube Video Downloader using Python

Images on this page.

1st: Photo by Alexander Shatov on Unsplash

2nd: Photo by author

Do share this with all your friends who you think need this.

Happy coding : )

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Update the detailed information about Operating On The Pandas Dataframe In Python on the Hatcungthantuong.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!