Trending December 2023 # Determining The Market Price Of Old Vehicles Using Python # Suggested January 2024 # Top 17 Popular

You are reading the article Determining The Market Price Of Old Vehicles Using Python updated in December 2023 on the website Hatcungthantuong.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Determining The Market Price Of Old Vehicles Using Python

This article was published as a part of the Data Science Blogathon.

Introduction

OLX Group is a Dutch-domiciled online marketplace that over 300 million people use every month for buying, selling, and exchanging products and services ranging from cars, furniture, and electronics to jobs and services listings.

Scenario

We will attempt to determine the market price for a car that we would like to sell. The details of our car are as follows:

Make and Model – Swift Dzire

Year of Purchase – 2009

Km Driven – 80,000

Current Location – Rajouri Garden

Approach

Our approach to addressing the issue would be as follows:

1. Search for all the listings on the OLX platform for the same make and model of our car.

2. Extract all the relevant information and prepare the data.

3. Use the appropriate variables to build a machine learning model that, based on certain inputs be able to determine the market price of a car.

4. Input the details of our car to fetch the price that we should put on our listing.

WARNING! Please refer to the chúng tôi of the respective website before scrapping any data. In case the website does not allow scrapping of what you want to extract, please mark an email to the web administrator before proceeding.

Stage 1 – Search

We will start with importing the necessary libraries

In order to automatically search for the relevant listing and extract the details, we will use Selenium

import selenium from selenium import webdriver as wb from chúng tôi import By from chúng tôi import WebDriverWait from selenium.webdriver.support import expected_conditions as EC

For basic data wrangling, format conversion and cleaning we will use pandas, numpy, datetime and time

import pandas as pd import numpy as np import datetime import time from datetime import date as dt from datetime import timedelta

For building our model, we will use Linear Regression

from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split

We firstly create a variable called ‘item’, to which we assign the name of the item we want to sell.

item = 'Swift Dzire' location = 'Rajouri Garden'

Next, we would want to open the OLX website using chrome driver and search for Swift Dzire in the location we are interested in.

Source: Olx.in

driver = wb.Chrome(r"PATH WHERE CHROMEDRIVER IS SAVEDchromedriver.exe") driver.find_element_by_xpath('//*[@id="container"]/header/div/div/div[2]/div/div/div[1]/div/div[1]/input').clear() driver.find_element_by_xpath('//*[@id="container"]/header/div/div/div[2]/div/div/div[1]/div/div[1]/input').send_keys(location) time.sleep(5) driver.find_element_by_xpath('//*[@id="container"]/header/div/div/div[2]/div/div/div[2]/div/form/fieldset/div/input').send_keys(item) time.sleep(5) while True: try: except TimeoutException: break

Now that we have loaded all the results, we will extract all the information that we can potentially use to determine the market price. A typical listing looks like this

Source: OLX

Stage 2 – Data Extraction and Preparation

From this we will extract the following and save the information to an empty dataframe called ‘df’:

1. Maker name

2. Year of purchase

3. Km driven

4. Location

5. Verified Seller or not

6. Price

df = pd.DataFrame() n = 200 for i in range(1,n): try: make = driver.find_element_by_xpath('//*[@id="container"]/main/div/div/section/div/div/div[4]/div[2]/div/div[3]/ul/li['+str(i)+']/a/div[1]/div[2]/div[2]').text make = pd.Series(make) det = driver.find_element_by_xpath('//*[@id="container"]/main/div/div/section/div/div/div[4]/div[2]/div/div[3]/ul/li['+str(i)+']/a/div[1]/div[2]/div[1]').text year = pd.Series(det.split(' - ')[0]) km = pd.Series(det.split(' - ')[1]) price = driver.find_element_by_xpath('//*[@id="container"]/main/div/div/section/div/div/div[4]/div[2]/div/div[3]/ul/li['+str(i)+']/a/div[1]/div[2]/span').text price = pd.Series(price) det2 = driver.find_element_by_xpath('//*[@id="container"]/main/div/div/section/div/div/div[4]/div[2]/div/div[3]/ul/li['+str(i)+']/a/div[1]/div[2]/div[3]').text location = pd.Series(det2.split('n')[0]) date = pd.Series(det2.split('n')[1]) try: verified = driver.find_element_by_xpath('//*[@id="container"]/main/div/div/section/div/div/div[4]/div[2]/div/div[3]/ul/li['+str(i)+']/a/div[2]/div/div[1]/div/div/div').text verified = pd.Series(verified) except: verified = 0 except: continue df_temp = pd.DataFrame({'Car Model':make,'Year of Purchase':year,'Km Driven':km,'Location':location,'Date Posted':date,'Verified':verified,'Price':price}) df = df.append(df_temp)

Within the obtained dataframe, we will first have to do some basic data cleaning where we remove the commas from Price and Km Driven and convert them to integers.

df['Price'] = df['Price'].str.replace(",","").str.extract('(d+)') df['Km Driven'] = df['Km Driven'].str.replace(",","").str.extract('(d+)') df['Price'] = df['Price'].astype(float).astype(int) df['Km Driven'] = df['Km Driven'].astype(float).astype(int)

As you can see in the image above, for the listings that are put up on the same day, there instead of a date ‘Today’ is mentioned. Similarly, for the items listed one day prior, ‘Yesterday’ is mentioned. For dates that are listed as ‘4 days ago’ or ‘7 days ago’, we extract the first part of the string, convert it to an integer and subtract those many days from today’s date to get the actual date of posting. We will convert such strings into proper dates as our objective is to create a variable called ‘Days Since Posting’, using the same.

df.loc[df['Date Posted']=='Today','Date Posted']=datetime.datetime.now().date() df.loc[df['Date Posted']=='Yesterday','Date Posted']=datetime.datetime.now().date() - timedelta(days=1) df.loc[df['Date Posted'].str.contains(' days ago',na=False),'Date Posted']=datetime.datetime.now().date() - timedelta(days=int(df[df['Date Posted'].str.contains(' days ago',na=False)]['Date Posted'].iloc[0].split(' ')[0])) def date_convert(date_to_convert): return datetime.datetime.strptime(date_to_convert, '%b %d').strftime(str(2023)+'-%m-%d') for i,j in zip(df['Date Posted'],range(0,n)): try: df['Date Posted'].iloc[j] = date_convert(str(i)) except: continue df['Days Since Posting'] = (pd.to_datetime(datetime.datetime.now().date()) - pd.to_datetime(df['Date Posted'])).dt.days

Once created, we will convert this along with ‘Year of Purchase’ to integers.

df['Year of Purchase'] = df['Year of Purchase'].astype(float).astype(int) df['Days Since Posting'] = df['Days Since Posting'].astype(float).astype(int)

Further, we will use one-hot encoding to convert the verified seller column

df['Verified'] = np.where(df['Verified']==0,0,1)

Finally, we will get the following dataframe.

The ‘Location‘ variable in its current form cannot be used in our model given that it’s categorical in nature. Thus, to be able to make use of it, we will first have to transform this into dummy variables and then use the relevant variable in our model. We convert this to dummy variables as follows:

df = pd.get_dummies(df,columns=['Location']) Stage 3 – Model Building

As we have got our base data ready, we will now proceed toward building our model. We will use ‘Year of Purchase’, ‘Km Driven’, ‘Verified’, ‘Days Since Posting’ and ‘Location_Rajouri Garden’ as our input variables and ‘Price’ as our target variable.

X = df[['Year of Purchase','Km Driven','Verified','Days Since Posting','Location_Rajouri Garden']] y = df[['Price']]

We will use a 25% test dataset size and fit the Linear Regression model on the training set.

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25) model = LinearRegression().fit(X_train,y_train)

We check the training and test set accuracies.

print("Training set accuracy",model.score(X_train,y_train)) print("Test set accuracy",model.score(X_test,y_test))

Let’s check out the summary of our model

Stage 4 – Predicting the Market Price

Finally, we will use details of our own car and feed them into the model. Let’s revisit the input variable details we have of our own car

Year of Purchase – 2009

Km Driven – 80,000

Verified – 0

Days Since Posting – 0

Location-Rajouri Garden – 1

Till now we are not a verified seller and would have to use 0 for the relevant feature. However, as we saw in our model summary the coefficient for ‘Verified’ is positive, i.e., being a verified seller should enable us to list our vehicle at a higher price. Let’s test this with both the approaches – for a non-verified seller first and then a verified seller.

print("Market price for my car as a non-verified seller would be Rs.",int(round(model.predict([[2009,80000,0,0,1]]).flatten()[0])))

{answer image}

print("Market price for my car as a verified seller would be Rs.",int(round(model.predict([[2009,80000,1,0,1]]).flatten()[0]))) Conclusion

Thus, we saw how we could use the various capabilities of Python to determine the market price of items we want to sell on an online marketplace like OLX, Craiglist, or eBay. We extracted information from all similar listings in our area and built a basic machine learning model, which we used to predict the price to be set based on the features of our vehicle. Further, we also got to know that it would be better to list our vehicle as a verified seller on OLX. Being a verified seller would fetch us a 17% higher price as compared to being a non-verified seller.

 The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

You're reading Determining The Market Price Of Old Vehicles Using Python

The Snails Of Old Ohio

by Photo: John B. Carnett

Growing up on the Maine coast, Stephen Felton remembers, “I spent practically every Sunday at a clambake. And I was always looking at echinoderms, gastropods, and clams.” He’d pick up the shells and wonder: How old were they? How did they get their food? Why did they die? He never lost his curiosity about these creatures, even when he moved to landlocked Cincinnati in 1955 to marry and become a building contractor. And soon, he found something better than seashells: “Fossils! Boy, there were fossils literally everywhere, from down by the Ohio River to the highest hills.”

climate rich in invertebrate marine life. Back around 1960, Felton often found himself doing brick and stone work on home construction sites. At lunchtime, he’d put down his trowel, grab the homemade chicken and iced tea his wife, Emma, had packed for him, and search the site for fossil starfish or echinoderms, some of which are now in the permanent collection at the University of Cincinnati. Then, in 1965, he came across a brachiopod fossil with a hole in it. He’d found his specialty.

“I thought, this is what the snails do on the coast of Maine when they pick up quahog clams: drill a hole in them and eat them,” he says. “So I picked up the snail fossils surrounding it, and searched other places, and found more of these holes, and decided I needed to know more.” He went to fossil shows and lectures, read like crazy, and continued searching for specimens. He concluded that the ancient snails probably secreted an acid on their prey, much as modern snails do, then used a tooth-like structure to bore a hole into the shell. Scholars doubted snails did so during the Ordovician period, but Felton persevered, found more samples with larger holes, and, finally, presented a paper. “Now more people believe this happened during the Ordovician time, rather than several hundred million years later.”

Such research is one reason why the Paleontology Society presented Felton, now 67, the 2001 Strimple Award for outstanding contributions to scientific knowledge by an amateur. David Meyer, a University of Cincinnati professor of geology, is effusive in his praise. Felton, he says, tears down the barriers between scientists and amateurs. “He willingly donates time, specimens, and information,” says Meyer. “He helps grad students and amateur clubs. He has written and presented papers. He is . . . just like a scientist.”

GIVE IT A TRY

Got a penchant for paleontology? Here are Stephen Felton’s tips on how to get started:

To find one in your area, see the Mid-America Paleontology Society Web site: chúng tôi

Two excellent books

The Fossil Book, by Pat Vickers Rich and Thomas Hewitt Rich; and Handbook of Paleontology for Beginners and Amateurs, Part I, by Winifred Goldring

Stock Price Analysis With Python

Stock price analysis with Python is crucial for investors to understand the risk of investing in the stock market. A company’s stock prices reflect its evaluation and performance, which influences the demand and supply in the market. Technical analysis of the stock is a vast field, and we will provide an overview of it in this article. By analyzing the stock price with Python, investors can determine when to buy or sell the stock. This article will be a starting point for investors who want to analyze the stock market and understand its volatility. So, let’s dive into the stock price analysis with Python.

Libraries Used in Stock Price Analysis With Python

The following are the libraries required to be installed beforehand which can easily be downloaded with the help of the pip function. A brief description of the Library’s name and its application is provided below

LibraryApplicationYahoo FinanceTo download stock dataPandasTo handle data frames in pythonNumpyNumerical PythonMatplotlibPlotting graphs

import pandas as pd import datetime import numpy as np import matplotlib.pyplot as plt from pandas.plotting import scatter_matrix !pip install yfinance import yfinance as yf %matplotlib inline Data Description

We have downloaded the daily stock prices data using the Yahoo finance API functionality. It’s a five-year data capturing Open, High, Low, Close, and Volume

Open: The price of the stock when the market opens in the morning

Close: The price of the stock when the market closed in the evening

High: Highest price the stock reached during that day

Low: Lowest price the stock is traded on that day

Volume: The total amount of stocks traded on that day

Here, we will take the Example of three companies TCS, Infosys, and Wipro which are the industry leaders in providing IT services.

start = "2014-01-01" end = '2023-1-01' tcs = yf.download('TCS',start,end) infy = yf.download('INFY',start,end) wipro = yf.download('WIPRO.NS',start,end) Exploratory Analysis for Stock Price Analysis With Python

Python Code:

The above graph is the representation of open stock prices for these three companies via line graph by leveraging matplotlib library in python. The Graph clearly shows that the prices of Wipro is more when comparing it to other two companies but we are not interested in the absolute prices for these companies but wanted to understand how these stock fluctuate with time.

tcs['Volume'].plot(label = 'TCS', figsize = (15,7)) infy['Volume'].plot(label = "Infosys") wipro['Volume'].plot(label = 'Wipro') plt.title('Volume of Stock traded') plt.legend()

The Graph shows the volume traded by these companies which clearly shows that stocks of Infosys are traded more compared to other IT stocks.

#Market Capitalisation tcs['MarktCap'] = tcs['Open'] * tcs['Volume'] infy['MarktCap'] = infy['Open'] * infy['Volume'] wipro['MarktCap'] = wipro['Open'] * wipro['Volume'] tcs['MarktCap'].plot(label = 'TCS', figsize = (15,7)) infy['MarktCap'].plot(label = 'Infosys') wipro['MarktCap'].plot(label = 'Wipro') plt.title('Market Cap') plt.legend()

Only volume or stock prices do not provide a comparison between companies. In this case, we have plotted a graph for Volume * Share price to better compare the companies. As we can clearly see from the graph that Wipro seems to be traded on a higher side.

Moving Averages for Stock Price Analysis With Python

As we know the stock prices are highly volatile and prices change quickly with time. To observe any trend or pattern we can take the help of a 50-day 200-day average

tcs['MA50'] = tcs['Open'].rolling(50).mean() tcs['MA200'] = tcs['Open'].rolling(200).mean() tcs['Open'].plot(figsize = (15,7)) tcs['MA50'].plot() tcs['MA200'].plot() Scattered Plot Matrix data = pd.concat([tcs['Open'],infy['Open'],wipro['Open']],axis = 1) data.columns = ['TCSOpen','InfosysOpen','WiproOpen'] scatter_matrix(data, figsize = (8,8), hist_kwds= {'bins':250})

The above graph is the combination of histograms for each company and a subsequent scattered plot taking two companies’ stocks at a time. From the graph, we can clearly figure out that Wipro stocks are loosely showing a linear correlation with Infosys.

Percentage Increase in Stock Value

A percentage increase in stock value is the change in stock comparing that to the previous day. The bigger the value either positive or negative the volatile the stock is.

#Volatility tcs['returns'] = (tcs['Close']/tcs['Close'].shift(1)) -1 infy['returns'] = (infy['Close']/infy['Close'].shift(1))-1 wipro['returns'] = (wipro['Close']/wipro['Close'].shift(1)) - 1 tcs['returns'].hist(bins = 100, label = 'TCS', alpha = 0.5, figsize = (15,7)) infy['returns'].hist(bins = 100, label = 'Infosysy', alpha = 0.5) wipro['returns'].hist(bins = 100, label = 'Wipro', alpha = 0.5) plt.legend()

It is clear from the graph that the percentage increase in stock price histogram for TCS is the widest which indicates the stock of TCS is the most volatile among the three companies compared.

Conclusion

The above analysis can be used to understand a stock’s short-term and long-term behaviour. A decision support system can be created which stock to pick from industry for low-risk low gain or high-risk high gain depending on the risk apatite of the investor.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Bitcoin & Ethereum Price Plunges, While Bitgert Price Skyrocket In This Bear Market

It is during the current bear market that Bitgert marketcap has also skyrocketed.

The frequent bear market conditions have plunged the price of most cryptocurrencies in the second week of March. The largest cryptocurrencies are the most affected. Both Bitcoin and Ethereum have dropped over 10% in the past 7 days, making them among the biggest losers in the market. But not every cryptocurrency has been plunging because

Bitgert

The Bitgert price has been growing during the bear market because of the huge attraction the cryptocurrency has created after the launch of its own blockchain. The

Centcex

The

Bitcoin

The price of the Bitcoin coin has been dropping during the bear market. In fact, Bitcoin is among the cryptocurrencies that recorded over a 10% drop during the past 7 days. The bear market has dropped BTC price to below $40k, which is a price that most crypto investors never thought they would see in March. However, Bitcoin is among the cryptocurrencies that are expected to make a strong comeback from the plunge. Therefore, the drop should not scare Bitcoin holders. However, Bitcoin is getting tough competition from the likes of

Ethereum

The frequent bear market conditions have plunged the price of most cryptocurrencies in the second week of March. The largest cryptocurrencies are the most affected. Both Bitcoin and Ethereum have dropped over 10% in the past 7 days, making them among the biggest losers in the market. But not every cryptocurrency has been plunging because Bitgert price has been skyrocketing during this bear market. It is during the current bear market that Bitgert marketcap has also skyrocketed. But is Bitgert bullish during the current bear market? Well, read more below:The Bitgert price has been growing during the bear market because of the huge attraction the cryptocurrency has created after the launch of its own blockchain. The Bitgert BRC20 blockchain is the hottest thing right now in the industry. It is the first gasless blockchain with a $0.0000000000001 gas fee. This is a near-zero figure and the lowest the industry has ever gone. The Bitgert BRC20 blockchain also overtook Solana to be the fastest chain at 100k TPS. These are the major reasons why Bitgert has been skyrocketing during the bear market. With the mass adoption of the Bitgert chain projected to start soon, investors, including whales from the large cryptocurrencies, are buying and accumulating BRISE. That’s why the Bitgert coin is chúng tôi Centcex project has created a lot of attraction around it because of the unlimited number of products the team is developing and the huge income that will be coming from the staking process. The staking program for the Centcex project has 100% APY going to the staked token. The hundreds of products on the ecosystem will also attract thousands or millions of users, which will increase Centcex adoption. Therefore, the Centcex coin price is going to skyrocket as more products are launched. Centcex might be the next project to challenge the Bitgert chain in terms of utilities if the team launches products chúng tôi price of the Bitcoin coin has been dropping during the bear market. In fact, Bitcoin is among the cryptocurrencies that recorded over a 10% drop during the past 7 days. The bear market has dropped BTC price to below $40k, which is a price that most crypto investors never thought they would see in March. However, Bitcoin is among the cryptocurrencies that are expected to make a strong comeback from the plunge. Therefore, the drop should not scare Bitcoin holders. However, Bitcoin is getting tough competition from the likes of Bitgert in terms of chain speed and the cost of gas.Ethereum has also plunged over 10% in this bear market, making it among cryptocurrencies that have been hit hard by the crash. But the Ethereum high gas fee has got something to do with this drop. There are few investors buying Ethereum because of the high gas fee, making the project less attractive to developers than Bitgert . However, the ongoing upgrade of the Ethereum network might make the cryptocurrency more competitive against Bitgert and many other cryptocurrencies. The current upgrade will make the Ethereum chain faster and cheaper by reducing the gas fee. That’s when it will be able to compete with Bitgert.

Visualizing Netflix Data Using Python!

Image Source

We can say that data visualization is basically a graphical representation of data and information. It is mainly used for data cleaning, exploratory data analysis, and proper effective communication with business stakeholders. Right now the demand for data scientists is on the rise. Day by day we are shifting towards a data-driven world. It is highly beneficial to be able to make decisions from data and use the skill of visualization to tell stories about what, when, where, and how data might lead us to a fruitful outcome.

Data visualization is going to change the way our analysts work with data. They’re going to be expected to respond to issues more rapidly. And they’ll need to be able to dig for more insights – look at data differently, more imaginatively. Data visualization will promote that creative data exploration. -Simon Samuel

Table of contents

Why do we need Data Visualization?

Types of Data Visualization.

Brief about tools we will be using

Data pre-processing

Data Visualization

Keep in mind

Conclusion

Why do we need good Data Visualizations?

Our eyes are drawn to colours and patterns. We can quickly recognize blue from yellow, circle from a square. Data visualization is a form of visual art that not only grabs our interests but also keeps our eyes on the message. We can literally narrate our entire numerical data to the stakeholders in a form of captivating graphs with the help of data visualization.

Right now we are living in “an age of Big data” trillions of rows of data are being generated every day. Data visualization helps us in curating data into a form that is easily understandable and also helps in highlighting a specific portion. Plain graphs are too boring for anyone to notice and even fail to keep the reader engaged. Hence, today we will be seeing how to create some mind-blowing visualization using matplotlib and seaborn.

Types of Data visualization

In this article we will be creating two types of Data visualization:

1. Bar Plot( Horizontal ): 

It is a graph that represents a specific category of data with rectangular bars with length and height proportional to the values they represent.

Syntax: matplotlib.pyplot.barh(y,width,height) 

Parameters: 

Y: Co-ordinates of the Y bar.

Width: Width of the bar.

Height: Height of the bar.

2. Timeline (Customized Horizontal line):

Syntax: axhline(y=0, xmin=0, xmax=1, c, zorder )

Parameters:

Y: Co-ordinates of Y in a horizontal line with a default value of 0.

xmin: This parameter should be between 0 and 1. 0 means the extreme left of the plot and 1 means the extreme right of the plot with 0 being the default value.

xmax: This parameter should be between 0 and 1. 0 means the extreme left of the plot and 1 means the extreme right of the plot with 1 being the default value.

Before we get started, I want you to know that we won’t be using any python library other than Matplotlib, seaborn and we will be using Netflix’s dataset for the explanation.

By the end of this article, you will be able to create some awesome data visualization using matplotlib and seaborn. So without further ado, let’s get started.

Brief about Data Visualization libraries we will be using

*Feel free to skip this part if you are already aware of these libraries…

Matplotlib: It is a plotting library for the Python programming language and it has numerical mathematics extension Numpy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, QT, WxPython, or GTX. (Source)

Seaborn: It is an amazing visualization library for statistical graphics plotting in python. It provides beautiful default styles and color palettes to make statistical plots more attractive. It is built on top of matplotlib library and is also closely integrated into the data structures from pandas. The aim of seaborn is to make visualization the central part of exploring and understanding data. It also provides dataset-oriented APIs so that we can switch between different visual representations for the same variables for a better understanding of the dataset. (Source)

Numpy: It is a library for python that supports multi-dimensional arrays and matrices with many high-level mathematical operations to perform on these arrays and matrices.

Pandas: It is a powerful, flexible, and easy-to-use data manipulation tool for the python programming language.

Best time to grab a Coffee !!

  Data pre-processing  Importing all the necessary libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns Pre-processing the data



Calculating the missing data  for i in df.columns: null_rate = df[i].isna().sum()/len(df) * 100 print("{} null rate: {}%".format(i,round(null_rate,2))) director missing percentage: 30.68% cast missing percentage: 9.22% country missing percentage: 6.51% date_added missing percentage: 0.13% rating missing percentage: 0.09% Dealing with the missing data 

Here we will be replacing the missing country with the most country (mode), cast, and director with no data.

df['country'] = df['country'].fillna(df['country'].mode()[0]) df['cast'].replace(np.nan,'No data',inplace=True) df['director'].replace(np.nan,'No data',inplace=True) df.dropna(inplace=True) df.drop_duplicates(inplace=True)

Now we are done with missing values, but the dates are still not quite right…

df['date_added'] = pd.to_datetime(df['date_added']) df['month_added'] = df['date_added'].dt.month df['month_name_added'] = df['date_added'].dt.month_name() df['year_added'] = df['date_added'].dt.year Okay, let’s visualize now!!!

Netflix’s Brand Palette

Always use a color palette, it is a great way in achieving good integrity and helps us to give a professional look keeping all the readers engaged.

sns.palplot(['#221f1f', '#b20710', '#e50914','#f5f5f1']) plt.title("Netflix brand palette",loc='left',fontfamily='serif',fontsize=15,y=1.2) plt.show()

We will use Netflix brand colors wherever we can…

Let’s visualize the ratio between Netflix’s TV shows and Movies

Awesome !! Isn’t it?

Steps:

1. Calculating the ratio

x = df.groupby(['type'])['type'].count() y = len(df) r=((x/y)).round(2) mf_ratio = pd.DataFrame(r).T

Drawing the figure:

fig, ax = plt.subplots(1,1,figsize=(6.5,2.5)) ax.barh(mf_ratio.index, mf_ratio['Movie'], color='#b20710', alpha=0.9, label='Male') ax.barh(mf_ratio.index, mf_ratio['TV Show'], left=mf_ratio['Movie'], color='#221f1f', alpha=0.9, label='Female') ax.set_xlim(0, 1) ax.set_xticks([]) ax.set_yticks([]) plt.show()

2. Annotating the figure:

fig, ax = plt.subplots(1,1,figsize=(6.5,2.5)) ax.barh(mf_ratio.index, mf_ratio['Movie'], color='#b20710', alpha=0.9, label='Male') ax.barh(mf_ratio.index, mf_ratio['TV Show'], left=mf_ratio['Movie'], color='#221f1f', alpha=0.9, label='Female') ax.set_xlim(0, 1) ax.set_xticks([]) ax.set_yticks([]) for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['Movie'][i]*100)}%", xy=(mf_ratio['Movie'][i]/2, i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("Movie", xy=(mf_ratio['Movie'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['TV Show'][i]*100)}%", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2,i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("TV Shows", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') plt.show()

3. Adding text and removing legend & spines:

fig, ax = plt.subplots(1,1,figsize=(6.5,2.5)) ax.barh(mf_ratio.index, mf_ratio['Movie'], color='#b20710', alpha=0.9, label='Male') ax.barh(mf_ratio.index, mf_ratio['TV Show'], left=mf_ratio['Movie'], color='#221f1f', alpha=0.9, label='Female') ax.set_xlim(0, 1) ax.set_xticks([]) ax.set_yticks([]) for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['Movie'][i]*100)}%", xy=(mf_ratio['Movie'][i]/2, i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("Movie", xy=(mf_ratio['Movie'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') for i in mf_ratio.index: ax.annotate(f"{int(mf_ratio['TV Show'][i]*100)}%", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2,i), va = 'center', ha='center',fontsize=40, fontweight='light', fontfamily='serif', color='white') ax.annotate("TV Shows", xy=(mf_ratio['Movie'][i]+mf_ratio['TV Show'][i]/2, -0.25), va = 'center', ha='center',fontsize=15, fontweight='light', fontfamily='serif', color='white') fig.text(0.125,1.0,'Movie & TV Show distribution',fontfamily='serif',fontsize=15,fontweight='bold') fig.text(0.125,0.90,'We see vastly more movies than TV shows on Netflix.',fontfamily='serif',fontsize=12,fontweight='light') for s in ['top','left','right','bottom']: ax.spines[s].set_visible(False) ax.legend().set_visible(False) plt.show()

Boom!!

Now let’s visualize Netflix’s Timeline

Steps:

1. Initializing the timeline list:

from datetime import datetime tl_dates = [ "1997nFounded", "1998nMail Services", "2003nGoes Public", "2007nStreming service", "2023nGoes Global", "2023nNetflix & Chill" ] tl_x = [1,2,4,5.3,8,9]

2. Drawing the figure :

fig,ax = plt.subplots(figsize=(15,4),constrained_layout=True) ax.set_ylim(-2,1.5) ax.set_xlim(0,10) ax.axhline(0, xmin=0.1, xmax=0.9,c="#000000",zorder=1) ax.scatter(tl_x,np.zeros(len(tl_x)),s=120,c="#4a4a4a",zorder=2) ax.scatter(tl_x, np.zeros(len(tl_x)), s=30, c='#fafafa', zorder=3) for x, date in zip(tl_x, tl_dates): ax.text(x, -0.55, date, ha='center', fontfamily='serif', fontweight='bold', color='#4a4a4a',fontsize=12) for spine in ["left", "top", "right", "bottom"]: ax.spines[spine].set_visible(False) ax.set_xticks([]) ax.set_yticks([]) ax.set_title("Netflix through the years", fontweight="bold", fontfamily='serif', fontsize=16, color='#4a4a4a') plt.show()

Boom!!

Now let’s visualize histogram displaying countries

For that, we need to pre-process the data a little bit more:

Firstly, let’s print the country columns see what we get…

df['country']

As can we see that in 7782 and 7786 there are multi countries in a single column so what we will do is we will create another column that will store only the first country.

df['first_country'] = df['country'].apply(lambda x: x.split(",")[0]) df['first_country']

Now we will replace some of the country names with their short form.

df['first_country'].replace('United States', 'USA', inplace=True) df['first_country'].replace('United Kingdom', 'UK',inplace=True) df['first_country'].replace('South Korea', 'S. Korea',inplace=True)

After that, we calculate the total occurrence of each country.

df['count']=1 #helper column data = df.groupby('first_country')['count'].sum().sort_values(ascending=False)[:10]

output:

Now let’s get started with the visualization:

#Drawing the figure color_map = ['#f5f5f1' for _ in range(10)] color_map[0] = color_map[1]= color_map[2] = '#b20710' fig,ax = plt.subplots(1,1,figsize=(12,6)) #Annotating the figure ax.bar(data.index,data,width=0.5,edgecolor='darkgray',linewidth=0.6,color=color_map) for i in data.index: ax.annotate(f"{data[i]}",xy=(i,data[i]+100),va='center',ha='center',fontweight='light',fontfamily='serif') for s in ['top','left','right']: ax.spines[s].set_visible(False) #Adding text fig.text(0.125,1,'Top 10 countries on Netflix',fontsize=15,fontweight='bold',fontfamily='serif') fig.text(0.125,0.95,'The three most frequent countries have been highlighted.',fontsize=10,fontweight='light',fontfamily='serif') fig.text(1.1, 1.01, 'Insight', fontsize=15, fontweight='bold', fontfamily='serif') fig.text(1.1, 0.67, ''' Here we see that US is major content producers for Netflix, and on second we have India after UK and so on. Netflix being a US Company, it makes sense that they major producers. ''' , fontsize=12, fontweight='light', fontfamily='serif') ax.grid(axis='y', linestyle='-', alpha=0.5)

At last, we will create a word cloud:

To create a word cloud you will be needing a mask(Structure of our word cloud) in this example mask image is given below, feel free to create any shape you want.

Importing necessary libraries:

from wordcloud import WordCloud import random from PIL import Image import matplotlib

Creating a word cloud and displaying it:

paintcmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ['#221f1f', '#b20710']) text = str(list(df['title'])).replace(',', '').replace('[', '').replace("'", '').replace(']', '').replace('.', '') mask = np.array(Image.open('mask.png')) wordcloud = WordCloud(background_color = 'white', width = 500, height = 200,colormap=cmap, max_words = 150, mask = mask).generate(text) plt.figure( figsize=(5,5)) plt.imshow(wordcloud, interpolation = 'bilinear') plt.axis('off') plt.tight_layout(pad=0) plt.show()   Keep in Mind

Always make sure that you keep your data visualizations organized and Coherent.

Make sure to use proper colours to represent and differentiate information. Colours can be a key factor in a reader’s decisions.

Use high contrast colours and annotate the elements of your data visualization properly.

Never distort the data, data visualization is said to be great when it tells the story clearly without distortions.

Never use a graphical representation that does not represent the data set accurately(For eg: 3D pie charts).

Your Data visualization should be easy to comprehend at a glance.

Never forget that our agenda of data visualization is to enhance the data with the help of design, not just draw attention to the design itself.

Conclusion

So, we wrap up our first tutorial on Netflix Data visualization – Part 1 here. There is still a problem with this visualization i.e. these are not interactive, the ones we can build with plotly and cufflinks.

Sometimes data visualization should be captivating and attention-grabbing which I think we have achieved here even if it isn’t precise. So by customizing our visualization like what we did here reader’s eye is drawn exactly where we want.

Connect me on LinkedIn

Email: [email protected]

Thank You !!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Secrets To Using Facebook To Market Your Business

Your business can’t afford to not have a Facebook presence. The social networking site now has more than half a billion users–despite users’ privacy concerns, and a recent survey suggesting that customer satisfaction is abysmal. Here are some tips to follow if you want your business to tap into that audience of half a billion potential customers.

Build It

First, you need to set up a page. They were once known as Fan Pages, and those that wanted to follow had to elect to be a “fan”, but Facebook changed them to simply Pages, and members can now “like” the page rather than becoming a fan of it.

This was a good move because being a fan of a business or its products or services intimidated some users who felt it sounded like too much of a commitment. However, most members don’t have any qualms with sharing which products and services they like.

You can create a Facebook page for a company, or for a specific brand or product offered by the business. You can also set up a page for an artist, band, or public figure, or use a page to promote a cause.

Once the page is created, you need to configure and customize it. Add a logo or photo, and basic information about the business, product, or brand you want to promote. At that point, you need to get other Facebook users to “like” your page and start to build an audience.

Attract Customers

The first thing Facebook suggests is that you invite all of your Facebook friends to “like” your new page. That may be fine, but understand that your friends are probably already familiar with your company, or its products and services, and that–at least as a marketing and customer relations tool–there is probably little value in having your friends see the page.

Inviting your Facebook friends may defy the goal of setting up the page, but if your business has an established e-mail, newsletter, or blog following those are exactly the audience you want to connect with your Facebook page. Post or distribute an announcement with a link to the Facebook page and invite them to join the community.

You should also add a link to the Facebook page to your standard e-mail signature, and you can cross-link with Twitter–both to promote the existence of the Facebook page and to cross-post content so it appears on the Facebook page and on Twitter simultaneously.

If you have a budget, and want to pursue Facebook members more aggressively, you can purchase a Facebook ad to promote the existence of the page as well.

Engage Customers

There is little point in going to the effort of building a Facebook page and attracting an audience if you don’t follow through to engage customers. Now that you have built an audience for your Facebook page, you have to give the audience a compelling reason to visit the page.

The rules of the Facebook page are similar to the rules for effectively building an audience for a blog. Make sure you are adding content frequently–preferably at least daily. You want to provide a reason for the Facebook page audience to check in and see what’s new.

Equally important as the frequency of posting is the content of the posts. Customers want to be informed and engaged, not pitched and harassed. It’s OK to tie in your products and services where they’re relevant, but don’t simply use the Facebook page as a platform for marketing soundbites.

You can follow Tony on his Facebook page , or contact him by email at . He also tweets as @Tony_BradleyPCW .

Follow Tech Audit on Twitter.

Update the detailed information about Determining The Market Price Of Old Vehicles Using Python on the Hatcungthantuong.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!