Trending March 2024 # Quick Notes On The Basics Of Python And The Numpy Library # Suggested April 2024 # Top 10 Popular

You are reading the article Quick Notes On The Basics Of Python And The Numpy Library updated in March 2024 on the website Hatcungthantuong.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Quick Notes On The Basics Of Python And The Numpy Library

This article was published as a part of the Data Science Blogathon.     

      “Champions are brilliant at the basics”

Quick Basics of python 1. What is an interpreter? 2. Difference between the virtual environment and the existing interpreter?

Ans: The difference is that if you use a virtual environment for your project and add/remove packages then it will only affect the virtual environment. If you use an existing interpreter then all changes will affect the system-wide interpreter and these changes will be available in all the projects that use that interpreter.

3. What is a pip?

Ans: Pip is a standard package management system used to install and manage the software packages written in python.

4. What are the various commands of pip?

Ans: Below are the various commands of pip to be run in the command prompt/terminal

5. What are variables?

Ans:  Variables are used to store information to be referenced and manipulated in a program. In python, we don’t need to explicitly mention the datatype during declaration. A string variable cannot be manipulated by mathematical actions.

6. What are the basic operations in python?

Ans: We have ‘+’ (addition) ;  ‘-‘ (subtraction) ; ‘*’ (multiplication);  ‘/’ (division); ‘%’ (modulus meaning the remainder of division) ; ‘**’ (power meaning ab i.e. a to the power of b);  ‘//’ (floor division meaning this will give the quotient of division without the decimal).

7. What are string indexing and slicing?

Ans: Index is the position of each character starting from 0. Slicing is getting the substring i.e. subset out of a string value or a word or sentence.

You slice your butter so that the chunks can be used for various purposes !! Right !! SAME IS APPLICABLE HERE AS WELL !! 

Suppose we have a string variable having ‘stay positive’ stored in it. The below picture shows the indexing of the elements in the string. 

Below is a code snippet having examples of indexing and slicing.

8. What is mutable and immutable property?

Ans: Mutability means you can change the values of an object after it is created, and the Immutable property of an object means it cannot be changed after it is created.

9. What are the different data structures of python?

Ans: Below are different types of data structures :

9.1) Lists : These are the data types that hold elements of different/same datatype together in a collection in sequential manner. These are enclosed in square brackets [ ]. Lists are mutable and indexable in nature, and also allow duplicates. There are many list methods like list.append(), list.pop(), list.reverse(),list.sort(),list.count(),list.insert(),list.remove() etc.  for performing various list operations; few of which is showed in the below code snippet.

NOTE:  Accessing and indexing elements in the list are the same as the Q7(indexing and slicing) topic explained above.

9.2) Tuples: These are similar to lists with two major differences i.e. (that is) Firstly they are enclosed within round brackets() and second they are immutable in nature.  There are two inbuilt methods that can be used on tuples index() and count(). Code snippet for same is mentioned below:

NOTE: Accessing and indexing elements in the tuples are the same as the Q7(indexing and slicing) explained above.

9.4) Dictionaries: Dictionaries in python is a data structure that stores the values against its keys. Basically key-value pair. It is enclosed within curly braces having key:value pair i.e. {key1:val1}

In the above example, ‘mydict’ is a dictionary that stores the number of students present in each class. So classA is key and 30 is its value. 

Dictionaries do not allow duplicate keys and are mutable in nature. Below is the code snippet having dictionary examples:

There are few more functions like get(keyname) that will return the value of that key, update(),popitem(), etc. . We can also have nested dictionaries,                                                            lists value for the key i.e. key1 : [1,2,3].

What is the use of dictionaries  ???

Well, there might be scenarios wherein you will have to count the number of occurrences of an item in a list, then you can easily compute this using dictionary. Another example is using a dictionary like a lookup file wherein you might have a set of static key-value pairs to refer to. Also, dictionaries are used in backend code while building APIs. Hence with dictionaries in place, many operations like I mentioned above become easier to deal with.

10. What are the various common libraries used in Data Science?

Ans: Common libraries are :

11. Why is Numpy required when we have python Lists? Since both do the same work of storing data in array form?

Ans: Absolutely, but Numpy is better since it takes less memory as compared to lists. Also, a Numpy array is faster than a list.

Now the question is HOW ??? Please follow the below code snippet showing the answer for the question of HOW  IT TAKES LESS MEMORY AND IS FASTER THAN LISTS??   

In the above code, we have compared the memory used by the list and the memory used by the Numpy array. The size of a single integer element in the list takes 28 bytes whereas a Numpy array takes only 4 bytes. This is because lists are python object which requires memory for pointers as well as value, but Numpy array does not have pointers that will point to the value. Hence IT TAKES LESS MEMORY.

HOW NUMPY ARRAY  OPERATIONS ARE FASTER THAN LIST ?? 

Let us PROVE IT IN BELOW CODE SNIPPET

In the above code, we have computed the time taken by the addition of two lists each having 1 million records that took142.3 seconds whereas when we performed the same operation with the same number of records with two arrays, the computation took 0.0 seconds !!!!! WOW!!!!

HENCE PROVED! Numpy array is much faster than a list.

In real-time, we have a huge amount of data that needs to process and analyzed so as to get useful and strategic information out of the data. Hence Numpy arrays are better than a list.

12. Can we create and access the n-D(n-dimension) array using the Numpy library?

Ans: Definitely, this is one more key feature of the Numpy array. We can create an n-dimensional array using the array() method of Numpy by passing a list, tuple, or an array-like object.

In order to know the number of dimensions an array has, we have the “ndim” attribute of Numpy arrays.

We can also explicitly define the dimension for an array by using “ndmin” argument of the Numpy array() method.

There is a “dtype” property that will return the data type of array. Also, we can also define the data type of array by passing an argument of dtype to the array method

Below is the code snippet for the same

13. How to index, access, and  perform slicing on an n-D Numpy Array

These are the positions assigned internally in the n-D array. Keeping this in mind, an n-D array can be accessed, manipulated, etc.

Please follow the below code snippets to understand how to access and slice the n-dimensional arrays.

The above picture represents how the indexes are represented in an n-dimensional array. Using the indexes, we can access array elements and perform slicing.

SLICING:  The concept of slicing remains the same as mentioned in the above queries. The syntax for slicing is arrayName[startIndex:stopIndex:step(optional)]

Be it 1-D,2-D, or n-D, array slicing works the same.

The array examples used in the below code snippet are the same as in above eg i.e. oneD_array, TwoD_array, ThreeD_array. Please refer to the array declarations in the above code snippet.

14. What are various methods and attributes in Numpy?

Ans: Numpy has various attributes and methods that can tell you the size, shape of the array(rows X columns), change the shape(reshape), size of each element, datatypes, and many more. Few are listed in below code snippet:

We also have copy and view methods that duplicate an existing array. But these two methods do have a very major difference internally. Please find the below code snippet wherein the difference is shown in a practical way:

Numpy also has many more methods and attributes like :

and MANY MORE. You can go through all of them on the Numpy website. The ones that I have listed are the common ones that are frequently used.

ARRAY OPERATIONS : 

Very easy addition, subtraction, multiplication, division can be done between two arrays. Below is the code snippet for adding and subtracting. Others can also be done in the same manner viz (a*b),(a/b)

15. Numpy array has an AMAZING PROPERTY !!  What is that ??

Ans: Let us assume we have a 2-D array wherein I want to check if that every array element is greater than value 10. If yes, then replace them with True otherwise False. So in return, I will get TRUE FALSE MATRIX. Below is the code snippet:

Now, if I want the values of array ‘arr’ which is greater than 10 ??  This can be achieved in just a line as shown in the below code snippet:

Now we can also replace these with specific flag values like -1 or 0 or anything.

The code snippet is as follows:

So all those elements greater than 10 is replaced by -1

THAT IS ALL FOR THIS ARTICLE !!! 

ENDNOTES: 

I am sure, you, as a beginner must have found this article to be useful. All the necessary basics have been covered and I have tried to cover the concepts in detail with the practicals where most people find difficulty in understanding. Thank you for your time.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

You're reading Quick Notes On The Basics Of Python And The Numpy Library

Learn The Working Of The Math Library In Lua

Introduction to Lua math

In Lua, math is defined as a standard library that consists of mathematical functions that are used in Lua programs for dealing with mathematical concepts easily by declaring this library in the program. In general, Lua math is a standard library provided by Lua programming language to work with mathematical concepts using the functions that are defined in this library for solving mathematical related problems such as logarithmic functions (log), exponential functions (exp), trigonometric functions (sin, cos, tan, etc), rounding functions (floor, ceil) and many other functions like max, min, pi, etc are defined within this Lua math library which is an interface to standard C math library.

Start Your Free Software Development Course

Syntax:

math.function_name(argument_list or parameter);

In the above syntax, we can when we want to use any math functions then we need to use math library provided by Lua standard library and to define or use any kind of math functions then we have to use math library starting with math then followed by a dot and then followed by the function name along with the parameters that need to be declared or used in the function for calculations.

Working of math library in Lua programming language

The Lua math library can be included directly with its function name and the parameters are passed to this function to include the variable values that are needed to calculate according to the function defined. In Lua, there are various functions for different mathematical calculations in the math library and a few of them are listed in the below section. But before that, we will see a list of math functions that are defined within this math library.

math.sin(x), math.cos(x), math.tan(x), math.asin(x), math.acos(x), etc are some of the trigonometric functions for getting sine, cosine and tangent for radiant values that are passed as parameter to the function and asin and acos are for getting arc sine and arc cosine in radians for the given values in the parameter.

In Lua, the math library also provides random functions such as math.random([x [,y]]) to get random numbers.

Example:

In the below example let us see few math functions demonstration using math library.

print( "Demonstration of math library function in Lua programming language ") print("n") print("Rounding functions in math library are as follows:") a = 89.50983 print(" The floor value of a is ", math.floor(a)) print(" The ceil value of a is ", math.ceil(a)) print("n") print(" Comparative function are as follows: ") print(" The maximum of the given numbers is : ", math.max(30, 46, 29, 78, 56, 9)) print(" The minimum of the given numbers is : ",math.min(30, 46, 29, 78, 56,9)) print("n") print(" The trigonometric functions provided in the math library is as follows: ") print("n") p = math.rad(math.pi / 2) print("The pi value for calculating trigonometric values is : ", p) print("n") print("The sine value of the given radian which is 90 degree is : ", string.format("%.1f ", math.sin(p))) print("The cosine value of the given radian which is 90 degree is : ", string.format("%.1f ", math.cos(p))) print("The tangent value of the given radian which is 90 degree is : ", string.format("%.1f ", math.tan(p))) print("The sine hyperbolic value of the given radian which is 90 degree is : ", string.format("%.1f ", math.sinh(p))) print("The cosine hyperbolic value of the given radian which is 90 degree is : ", string.format("%.1f ", math.cosh(p))) print("The tangent hyperbolic value of the given radian which is 90 degree is : ", string.format("%.1f ", math.tanh(p)))

Output:

In the above program, we can see some of the math functions that are defined using the math library in the program. In the first we are finding the rounding of functions such as math.floor() which will round of to the previous value and math.ceil() will round off the value to the next value of the given number. Then we are defining comparative functions such as max and min which will display the maximum and minimum number in the given list of the numbers as parameters to this function. Then we have defined few trigonometric functions and to display the values for the radian with 90 degrees where we have converted the pi value to the radian value using rad() such as math.rad(math.pi / 2), whereas if we want to display the degree value then we can use deg() function as math.deg(math.pi / 2). In the above program, we have displayed the values of sine, cosine, and tangent along with its hyperbolic values also. The output of all these functions can be seen in the above screenshot.

Example:

print( "Demonstration of some other math library function in Lua programming language ") print("n") print("The squareroot of given number a is :", math.sqrt(24)) print("n") print("The 2 to the power of 5 is :", math.pow(2,5)) print("n") print("The absolute value of the given number is :", math.abs(-45)) print("n") print("The exponential value of the given number is :", math.exp(4)) print("n") print("The logarithmic value of the given number is :",math.log(2)) print("n") print("Random number between 70 and 80 is ",math.random(70,80))

Output:

In the above program, we can see we have defined some other math function provided by the math library such as math.sqrt() will display the square root of the specified number, math.pow() will specify the powers of the given number in the above program, it is 2 ^ 5 =32, then we saw math.abs() which will display the absolute value which will always display positive number, then math.exp() will display the exponential value, math.log() displays logarithmic values, then we have also defined the math.random() function to display the random number. The output in the above screenshot will show the result of these functions respectively.

Conclusion

In this article, we conclude that Lua provides a standard library that contains mathematical functions for some mathematical calculations in the program and it is provided by the math library. In this, we don’t need to separately import or include the math library we can directly use it with the math function with a dot followed by the math function name. Therefore unlike other programming languages, this also has a simple math library for mathematical operations.

Recommended Articles

We hope that this EDUCBA information on “Lua math” was beneficial to you. You can view EDUCBA’s recommended articles for more information.

Itunes: Getting Familiar With The Interface And Importing The Library

iTunes is not only an audio player, but rather a whole multimedia management centre, available for PCs and Macs. Although its main function is playing and tagging your music collection, you can also find features allowing you to play films, radio stations, and podcasts. There is also a built-in iTunes Store where you can purchase new tracks as well as subscribe free channels offering interesting podcasts. iTunes allows also to edit information about your albums, add cover images, and share your music in a local network.

Importing your music collection

When you turn on the program for the first time, it will ask you to import your music collection. To start using the program for playing your favourite songs you have to show where do you store your library. This can be accomplished in a few ways.

You can also drag whole folders (including their subfolders) to the program’s window by using drag-and-drop method. All the songs you drag to the program’s window will be added to the library.

The interface of the program

After you have added your music collection to iTunes Library, the time has come for you to get a bit more familiar with the rudimentary features of the program. iTunes has a well-designed interface with a touch of typical Apple style. However, if until now you have used simple programs, such as Winamp or foobar2000, you may feel a bit lost; but don’t panic yet.

In the upper part of the window you can find navigation buttons and volume adjustion slide. There is also a bar displaying the information about the currently played track, such as the title, the artist, the album, along with the progress bar and interactive icons that allow you to repeat a song (on the left) and shuffle the tracks (randomly choose the next song – on the right).

However, one of the most important components of the interface is a horizontal bar that allows you to switch between various tabs. Each of them displays your collection in a different way by focusing on a certain element.

Songs tab

This is definitely the best way of displaying your music collection. iTunes shows the library in a form of a grid list with icons made of albums’ covers.

Artists and Genres tab

These both tabs look alike and you handle them in a similar way. Just as the names suggest, the first one displays your music collection by sorting according to the artists, while the other places them in order of their genre.

In both tabs you can find a side bar on the left – the list of either artists or genres is displayed here. After choosing one of them iTunes displays a list of matching albums on the right, either all of the CDs of a certain artist or all albums that fit into a chosen genre.

Playlists tab

Here you can find simply displayed lists of tracks, along with a side bar which allows you to create and save your own playlists.

You can create playlists on your own or import those saved in M3U and PLS formats. There are also lists created by default by iTunes. In the right you can see the tracks that comprise a certain playlist. Songs can be, just like in case of the first tab, be sorted by columns.

This is all you should need in the beginning to comfortably embark on your journey with iTunes. In the next articles you can read more about the settings, importing your CD collection (and converting files), editing tags, creating playlists, and using iTunes Store.

How To Flatten A Matrix Using Numpy In Python?

In this article, we will show you how to flatten a matrix using the NumPy library in python.

numpy.ndarray.flatten() function

The numpy module includes a function called numpy.ndarray.flatten() that returns a one-dimensional copy of the array rather than a two-dimensional or multi-dimensional array.

In simple words, we can say that it flattens a matrix to 1-Dimension.

Syntax ndarray.flatten(order='C') Parameters

order − ‘C’, ‘F’, ‘A’, ‘K’ (optional)

When we set the order parameter to ‘C,’ the array is flattened in row-major order.

When the ‘F’ is set, the array is flattened in column-major order.

Only when ‘a’ is Fortran contiguous in memory and the order parameter is set to ‘A’ is the array flattened in column-major order. The final order is ‘K,’ which flattens the array in the same order that the elements appeared in memory. This parameter is set to ‘C’ by default.

Return Value − Returns a flattened 1-D matrix

Method 1 − Flattening 2×2 Numpy Matrix of np.array() type Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Use the import keyword, to import the numpy module with an alias name(np).

Use the numpy.array() function(returns a ndarray. The ndarray is an array object that satisfies the given requirements), for creating a numpy array by passing the 2-Dimensional array(2rows, 2columns) as an argument to it.

Print the given input 2-Dimensional matrix.

Apply flatten() function (flattens a matrix to 1-Dimension) of the numpy module on the input matrix to flatten the input 2D matrix to a one-dimensional matrix.

Print the resultant flattened matrix of an input matrix.

Example

The following program flattens the given input 2-Dimensional matrix to a 1-Dimensional matrix using the flatten()function and returns it −

import

numpy

as

np

inputMatrix

=

np

.

array

(

[

[

3

,

5

]

,

[

4

,

8

]

]

)

print

(

“The input numpy matrix:”

)

print

(

inputMatrix

)

flattenMatrix

=

inputMatrix

.

flatten

(

)

print

(

“Resultant flattened matrix:”

)

print

(

flattenMatrix

)

Output

On executing, the above program will generate the following output −

The input numpy matrix: [[3 5] [4 8]] Resultant flattened matrix: [3 5 4 8] Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Use the numpy.array() function(returns a ndarray. The ndarray is an array object that satisfies the given requirements), for creating a numpy array by passing the 4-Dimensional array(4rows, 4columns) as an argument to it.

Print the given input 4-Dimensional matrix.

Calculate the number of elements of the matrix by multiplying the length of the NumPy array with itself. Here these values give the number of columns required.

Use the reshape() function(reshapes an array without affecting its data) to reshape the array and flatten the input matrix(4D) to a one-dimensional matrix.

Print the resultant flattened matrix of an input matrix.

Example

The following program flattens the given input 4-Dimensional matrix to a 1-Dimensional matrix using reshape() function and returns it −

import

numpy

as

np

inputMatrix

=

np

.

array

(

[

[

1

,

2

,

3

,

97

]

,

[

4

,

5

,

6

,

98

]

,

[

7

,

8

,

9

,

99

]

,

[

10

,

11

,

12

,

100

]

]

)

matrixSize

=

len

(

inputMatrix

)

*

len

(

inputMatrix

)

print

(

“The input numpy matrix:”

)

print

(

inputMatrix

)

flattenMatrix

=

np

.

reshape

(

inputMatrix

,

(

1

,

matrixSize

)

)

print

(

“Resultant flattened matrix:”

)

print

(

flattenMatrix

)

Output

On executing, the above program will generate the following output −

The input numpy matrix: [[ 1 2 3 97] [ 4 5 6 98] [ 7 8 9 99] [ 10 11 12 100]] Resultant flattened matrix: [[ 1 2 3 97 4 5 6 98 7 8 9 99 10 11 12 100]] Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Use the numpy.matrix() function(returns a matrix from a string of data or an array-like object. The resulting matrix is a specialized 4D array), for creating a numpy matrix by passing the 4-Dimensional array(4 rows, 4 columns) as an argument to it.

Print the resultant flattened matrix of an input matrix.

Example

The following program flattens the given input 4-Dimensional matrix to a 1-Dimensional matrix using the flatten()function and returns it −

import

numpy

as

np

inputMatrix

=

np

.

matrix

(

‘[11, 1, 8, 2; 11, 3, 9 ,1; 1, 2, 3, 4; 9, 8, 7, 6]’

)

print

(

“The input numpy matrix:”

)

print

(

inputMatrix

)

flattenMatrix

=

inputMatrix

.

flatten

(

)

print

(

“Resultant flattened matrix:”

)

print

(

flattenMatrix

)

Output

On executing, the above program will generate the following output −

The input numpy matrix: [[11 1 8 2] [11 3 9 1] [ 1 2 3 4] [ 9 8 7 6]] Resultant flattened matrix: [[11 1 8 2 11 3 9 1 1 2 3 4 9 8 7 6]] Conclusion

In this article, we learned how to flatten the matrix in python using three different examples. We learned how to take a matrix in Numpy using two different methods: numpy.array() and NumPy.matrix(). We also learned how to flatten a matrix using the reshape function.

How To Install Numpy For Python 3 3 5 On Mac Osx 10 9

This article will show you how to install Numpy in Python on MacOS using 3 different methods as below.

Using Homebrew

Using Anaconda

Using pip

What is Numpy

NumPy is gaining popularity and being used in various commercial systems. As a result, it’s critical to understand what this library has to offer. NumPy is a powerful Python library due to its syntax, which is compact, powerful, and expressive all at the same time. It allows users to manage data in vectors, matrices, and higher-dimensional arrays, and it is also used in array computing in the industry.

Method 1: Using Homebrew Homebrew

Homebrew (brew) is a free and open-source package manager that allows users to install apps and software in macOS based on their preferences. It has been recommended because of its ease of use and effectiveness in saving time and effort. Its catchphrase is “the missing package manager for macOS”

Installation

This option is a little more complicated and may require more time invested upfront, but it can save you time and headaches in the long run because you have more control and freedom with how you want to set up Python and other command-line tools.

The first step is to install Homebrew. Currently, this is done with a single terminal command that will walk you through the installation process.

You will also need to install XCode (free from the App Store) and its associated command line tools. This is why this option takes so long.

After installing Homebrew, you’ll have access to a new command in the terminal called brew. This command will install Python 3, NumPy.

# Install Python 3, which Homebrew will manage. brew install python3

# Installing Numpy using the brew install command brew install numpy --with-python3

Checking whether the numpy is installed or not.

Then you’re all set! This option grants you access to powerful tools such as pip and brew. It means that in the future if you want to install a new Python package, you should be able to do so with pip install . Other command-line tools, such as git, can be installed using brew install git. In the end, it will make programming on Mac OS X much easier!

See the Homebrew and Python wiki page for more information.

Method 2: Using Anaconda Anaconda

languages that aims to simplify package management and deployment. Anaconda’s package management system, conda, manages package versions by analyzing the current environment before executing an installation to avoid disrupting other frameworks and packages.

Once installed, you should be able to run Numpy and Matplotlib-based Python code. You should be able to open a terminal and type pip install if you need a new Python package. Some command-line tools and libraries are configured to install with conda install, but conda does not have nearly as many packages as the brew. But that may not be a big deal depending on what you end up using in the future!

macOS graphic Installation

Download the graphical macOS installer for the Python version you want.

On the Introduction, Read Me, and License screens, follow the prompts.

Note

If the message “You cannot install Anaconda in this location,” appears, reselect Install for me only.

After a successful installation, the following screen appears −

Command line installation

If you prefer to use a terminal window, use this method.

Download the command-line version of the macOS installer for your system in your browser.

(Recommended) Using SHA-256, check the installer’s data integrity. See Cryptographic hash validation for more information on hash verification.

Start a terminal and type the following −

vikram -a 256 /PATH/FILENAME # Replace /PATH/FILENAME with your installation's path and filename.

Python 3.7 or 2.7 installation −

Enter the following for Python 3.7 −

# Include the bash command regardless of whether or not you are using the Bash shell # Replace the .sh file name with the name of the file you downloaded

For Python 2.7, launch the chúng tôi or iTerm2 terminal application and type

# Include the bash command regardless of whether or not you are using the Bash shell # Replace the .sh file name with the name of the file you downloaded

Enter to read the license agreement. Then, to scroll, press and hold Enter.

To accept the license agreement, enter “yes”.

Note

Anaconda recommends that you use the default installation location. For the Anaconda/Miniconda installation, do not use the /usr path.

The installer prompts you to choose whether to run conda init to initialize Anaconda Distribution. I recommend typing “yes.”If you enter “no,” conda will make no changes to your shell scripts. After the installation is complete, run source PATH-TO-CONDA/bin/activate followed by conda init to initialize conda.

Note

Close and re-open your terminal window to make the installation take effect, or type source /.bashrc to refresh the terminal.

You can also choose whether your shell opens with the base environment activated or not.

# The base environment is activated by default conda config --set auto_activate_base True # The base environment is not activated by default conda config --set auto_activate_base False # The above commands only work if conda init has been run first # conda init is available in conda versions 4.6.12 and later

Note

If you install multiple versions of Anaconda, the system defaults to the most recent version as long as the default install path is not changed.

Method 3: Using Pip

Pip comes with Python by default but is not installed; however, installing pip is very simple; see How do I install pip on macOS or OS X? Simply run sudo easy install pip to install pip (this assumes you already have Python installed on your system; if you don’t, install it before running this command). Then, using sudo pip install numpy, you can install numpy, which will install the package numpy. You can also use pip to search for packages by using pip search, which searches through a list of Python packages.

Step 1 − To open Spotlight search, press command(⌘) + Space Bar. Enter Terminal and hit enter.

Step 2 − Use the pip command below in the terminal to install the NumPy package.

pip install numpy

Numpy was Installed successfully.

Conclusion

We learned how to install NumPy for Python 3.3.5 on Mac OSX 10.9 in this article. using the three distinct methods We also learned how these softwares can be used for a variety of other tasks.

Operating On The Pandas Dataframe In Python

Overview

DataFrame in Python

Performing Data Cleaning Operations on the Pandas DataFrame

Introduction

Undoubtedly, a DataFrame in python is the most important structure used to store the data because it is used in all practical cases to store our given data set which we will be using for creating our models. It is defined under the Pandas library of Python. While doing any kind of analysis on our given dataset with the help of the Python tool the very next step after importing the required libraries is to create a data frame which is mostly done by reading the data file having our data set into Python. And now since we get our data set stored in a structure (Data Frame) we have to perform all our operations on this data frame only which makes it a big deal to learn about the various operations we have to perform on a data frame i.e on its constituent rows and columns in almost every case as a part of the Data Cleaning and hence the Data Preparation process.

Before moving on to the DataFrame, it would be helpful to first understand some of the basic data structures defined in Python like the Series and the built-in data structures. You will get all of this knowledge just by referring to one article:  A Beginners’ Guide to Data Structures in Python.

Table of Contents

Introducing the Dataset

Importing the Python Libraries

Reading data into a DataFrame

Subsetting a DataFrame

Renaming the Variables

Re-ordering the Variables

Creating Calculated Columns

Dropping a Variable

Filtering the Data in a DataFrame

Sorting the Data

Grouping and Binning

Creating Summaries

Introducing the Dataset

For this article, we will be using the Iris dataset which can be downloaded from here. We will use this data set to learn how these operations are actually performed on some actual data.

Importing the Python Libraries

Let’s import all the python libraries we will be needed for operating on a DataFrame namely NumPy and Pandas.

import numpy as np import pandas as pd Reading Data into a DataFrame

Before going to the operations we first need to create a DataFrame and here we will be reading the data from a CSV (comma-separated values) file into a Pandas DataFrame naming it as df here.

df=pd.read_csv('C:/Users/ACER/Desktop/Iris.csv')

By this our data frame df is created and to have a basic look at the data we can give the command:

df.head()

Subsetting a DataFrame

By subsetting a DataFrame we mean selecting particular columns from the table. It is one of the frequently used operations. There are various ways to subset the data. We will discuss each of them one by one.

Let’s determine the column names first!

df.columns

Let’s start subsetting!

df.SepalLengthCm

Here we use the name of the column and using this method we can get the data out of a single column only.

df['Species']

Using this method we can subset one or more columns on the basis of the column names.

df.iloc[:,1]

By this, we get all the rows and the column with the index as 1 i.e. the second column only and hence the column is taken out using the default index. As is clear from the slicers being used here multiple columns can be taken out at the same time.

df.loc[:,['PetalLengthCm','PetalWidthCm']]

Here we get all the rows and two columns namely Item_Type and Item_MRP.

Re-ordering the Variables

While there is no specific way to reorder the variables in the original data frame we have two options to reorder them. Firstly, we can view the columns of a Data Frame in a specific order as per our wish by subsetting the data in that same order. Secondly, we can update the original data frame with the data subsetted in the first option.

To view the data with the column names in a specific order we can do the following:

df.loc[:,['Species','SepalLengthCm', 'PetalWidthCm', 'PetalLengthCm', 'SepalWidthCm','Id']]

However, do remember that it does not lead to any permanent change in df.

To overwrite df simply command:

df=df.loc[:,['Species','SepalLengthCm', 'PetalWidthCm', 'PetalLengthCm', 'SepalWidthCm','Id']] Creating Calculated Columns

Also known as the derived columns, the calculated columns take their values from existing columns or a combination of them. In this case we can replace an existing column or create a new one both of which will be seen as permanent changes in the table.

Let’s take a scenario!

I want to get the area of the Sepal for some kind of analysis. How can I do so?

df['SepalArea']=df.SepalLengthCm*df.SepalWidthCm df

A new column SepalArea is created towards the end. However, it makes more sense to have the area column besides the parameter columns. Well, that can be done too using the insert method.

df.insert(5,'PetalArea',df.PetalLengthCm*df.PetalWidthCm) df.head()

A new column PetalArea is created at the sixth position.

In both of the cases above the derived column was added in df. However to first view the output before making a permanent change in df we can go for the assign method.

df.assign(Ratio=df.PetalArea/df.SepalArea)

The ratio column is displayed in the output only and not added to df.

Renaming the Variables

The rename method comes as a saviour when we get a data set having misspelt column names or sometimes when the variables are not self-explanatory giving us no idea about the data they are storing.

For example, I wish the Species variable to be called NameOfSpecies.

df.rename(columns={'Species':'NameOfSpecies'},inplace=True) df.tail()

Dropping a Variable

An extremely important step as a part of the Data Cleaning process is to remove the unnecessary variables we have in our data usually which do not affect our analysis in any way and do not relate to the given business problem we are trying to solve.

Let me show you how it is done by dropping the variables we created above!

df.drop(columns=['PetalArea','SepalArea'],inplace=True) df.head()

Filtering the Data in a DataFrame

Filtering a data set essentially means filtering the rows which in turn refers to selecting particular rows from the data frame. This selection can be done both manually and conditionally. Let’s try filtering our data by both methods one by one!

Manual Filtering

You might have noticed that we have already filtered our data in some of the steps above! Recall! Yes , Using .head() and .tail()

#display the first 4 rows of df df.head(4)

#display the last 3 rows of df df.tail(3)

There are other ways too by which filtering can be done.

Using [ ] we can slice the data. Giving a slicer in the first argument gives us the required rows on the basis of their default index.

df[:4]

Using .iloc[ ] we can extract out the rows on the basis of their default index i.e the default row names. It takes out the rows with index from start to end – 1 if we slice as .iloc[start:end]

df.iloc[:2]

We get the rows with the default index as 0 and 1 i.e the first two rows of df.

Using .loc[ ] we can extract out the rows on the basis of their user-defined index i.e the row names. It takes out the rows with index from start to end if we slice as .loc[start:end]

df.loc[:5]

We get the rows with the User Defined Index in (0,1,2,3,4,5) i.e the first six rows of df.

You must notice that in this case, the UDI is the same as the DI.

Suppose we want to extract only some specific rows, not necessarily consequent ones. How can that be done? Just mention the individual index and we are done!

df.iloc[[3,0,12,5,9]]

Conditional Filtering

Unlike the manual filtering where we mentioned the row indices manually in order to filter the rows, in the case of conditional filtering we filter the rows by indexing i.e checking conditions on the data. This can be done using [ ] and .loc[ ] on df but not with .iloc[ ]. Let’s take a different approach to learn indexing by considering some scenarios.

Task 1: Get details for virginica species.

df[df.NameOfSpecies=='Iris-virginica'].head()

Task 2 : Get details for virginica and setosa species.

Although the above method can also be used, let’s try a different approach here where we will be using .isin

names = ['Iris-setosa','Iris-virginica'] df[df.NameOfSpecies.isin(names)]

By this we get all the records where the NameOfSpecies value is Iris-setosa or Iris-virginica.

Task 3 : Get the records for which the petal length is greater than the average petal length.

df.PetalLengthCm.mean() gives the average petal length ~ 3.75.

So we get the records where the petal length is greater than 3.75(approximately).

We can combine task 2 and task 3 to get all those records where the species is virginica or setosa and petal length is more than the overall average petal length.

And we get along data frame in this case! Let me show you a few rows and columns from it.

Sorting the Data

And now comes an interesting operation which is sorting. Our primary purpose of sorting the data in a data frame is to arrange it in order for the better readability of the data. To sort the values inside a particular column we use the sort_values method.

Let’s take a scenario where we want the sepal lengths to be in ascending order!

df.sort_values('SepalLengthCm',inplace=True) df

What do we see?

The individual records have been sorted according to the sepal length values. (Check the row indices !)

Now, to sort the data by sepal width from highest to lowest value we can simply write the command as:

df.sort_values(by='SepalWidthCm' , ascending=False , inplace=True) df

The data frame is changed which is evident from the jumbled row indices!

But what if I am not happy with the indices being in this way and rather want them to be ordered starting from 0 while at the same time the records should be sorted by the sepal width from highest to lowest. We can simply give another argument in the above method!

df.sort_values(by='SepalWidthCm' , ascending=False , inplace=True , ignore_index=True) df

By this, we are just resetting the index to the default index from the user-defined index we obtained on sorting initially.

The next thing I am going to do is combine the above two examples we studied. We can actually sort the sepal length in the ascending order and within that sort the sepal width in the descending order by giving the command!

df.sort_values(by=['SepalLengthCm','SepalWidthCm'],ascending=[True,False],ignore_index=True)

Grouping and Binning

We just learned about derived columns and it’s time to introduce another kind of them. According to our business problem, the values in an existing column can be grouped or binned to make a new column known as a grouped/binned column. Why is it even done? To convert the continuous variables to categorical variables.

Both of these falls in the category of derived columns however they differ in some way. While binning is done only on continuous variables, grouping can be performed on categorical variables too. This is due to the fact that bins are of equal frequencies.

But why do we even want these columns? They help us reduce the cardinality of the columns.

Let’s try grouping and binning the variables in our dataset!

Grouping 

To create groups in Python we have 3 main methods two of which are defined in Pandas library and one comes from the numpy library.

Method 1 : pd.cut()

This is used to group the values of a single continuous variable only.

Task: Group the Petal length values into groups!

pd.cut(df.PetalLengthCm , [0,2,5,8])

Here we are creating user-defined groups (0,2] , (2,5] , (5,8]. We get the class intervals in ascending order.

Method 2 : pd.qcut()

Just like pd.cut() it is used to group the values of a single continuous variable only. But it divides the values into groups having equal frequencies i.e. each group has an equal number of values.

Task: Group the Petal width into three equal parts!

pd.qcut(df.PetalWidthCm,3)

In this case, first, the values inside the PetalWidthCm column is sorted and then the data is divided into 3 equal parts and hence we get the groups.

Method 3 : np.where()

Unlike the previous 2 methods, it can be used for one or multiple columns for any type of variable.

Task: Create a column ‘grouped’ with a few columns in one category and the rest in other.

#np.where(df.NameOfSpecies.isin(['Iris-virginica']),'Major','Minor' ) df['grouped']=pd.Series(np.where(df.NameOfSpecies.isin(['Iris-virginica']),'Major','Minor' )) df

Binning 

To create bins we used the pd.cut() method!

Creating 4 bins of equal class interval

pd.cut(df.SepalLengthCm , 4)

There is yet another way where we do not even need to mention the number of bins!

pd.cut(df.SepalLengthCm,range(0,10,2))

Creating Data Summaries

To summarize the data in Python and create tables we have three ways.

Method 1 : Using .groupby()

Task: Determine species wise total Sepal Length.

df.groupby('NameOfSpecies').SepalLengthCm.sum()

df.groupby(['NameOfSpecies','grouped']).SepalLengthCm.sum()

Task: Determine species wise total Sepal Length and average Sepal Length.

df.groupby('NameOfSpecies').SepalLengthCm.agg([np.sum,np.mean])

df.groupby('NameOfSpecies')['SepalLengthCm','SepalWidthCm'].agg([np.sum,np.mean])

So we have grouped the data successfully and created a summary. Now let’s learn a bit about tables. There are three tables we come across: Vertical tables are those having their first row as header, horizontal tables are those having their first column as header and crosstables are those having header in both rows and columns.

To create a cross table on top of the summarized data we use the .pivot() method.

But these are two different steps. Instead, we can use just one method and do all the operations: group the data, aggregate it and create the table on top of the summarized data. This can be done using .pivot_table().

Method 2 : Using .pivot_table()

To create a cross table we can give the following command:

df.pivot_table(index=’col1 ‘,columns=’col2 ‘ , values =‘col3 ‘ , aggfunc=’sum’)

Method 3 : Using pd.crosstab()

With this method, only the cross tables can be created and it is used to create the frequency tables.

I will mention the syntax here to create a frequency table:

pd.crosstab(index=df.col1 , columns=df.col2 ,values=df.col3 , aggfunc=’count’)

EndNotes

Finally, we have come to the end of this article. In this article we performed various operations on a Pandas DataFrame in Python which is typically done while cleaning the data, manipulating it and preparing it for our analysis. However, this is not all, A lot more operations can be performed on a data frame like dealing with the duplicates, outliers and missing values followed by their treatment. These are really important steps in the EDA part and hence should not be missed.

I strongly recommend you to read this article on Exploratory Data Analysis in Python which will help you understand much more crucial operations performed on a DataFrame.

You can connect with me on LinkedIn.

Related

Update the detailed information about Quick Notes On The Basics Of Python And The Numpy Library on the Hatcungthantuong.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!