You are reading the article How To Layout A Book With Openoffice.org: Part 1 updated in December 2023 on the website Hatcungthantuong.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 How To Layout A Book With Openoffice.org: Part 1
It is all in your head: the plot, the characters, the locations, and even the scenes, but for some reason, staring at the blank page and blinking cursor makes you freeze. You like the idea of writing a book but cannot imagine actually completing it. If that feeling sounds familiar, then this might be the right article for you. Even if you have written a book and have it all ready to go, you may intend to self-publish it, start your own publishing company, send it to an editor, or just layout your book so you can see how it looks.
1. Start chúng tôi Writer with a regular blank document template.
The first thing you will need to do is set the size of your book. In my personal experience, it helped me tremendously with writing to be able to see each page at a normal book size rather than the 8.5″x11″ college essay size. Suddenly, writing one hundred or two hundred pages will not take so long or seem so daunting.
3. Enter in your custom width and height.
The current format will be “Letter”. Many non-fiction paperbacks will be 6″x9″, while fiction paperbacks are often smaller sizes, such as 5.25″x7.5″. If you are doing this purely for effect, then it is entirely up to you. If you have to meet certain printing press specifications, follow them precisely.
There are lead pages that always proceed the actual text of a book. If you are preparing a book for printing, they will be crucial. Bookstores rely on that information for stocking, and libraries rely on it for cataloging.
4. Create a title page.
There is no particular format set in stone, but a title page should include at least the title of the book and statement of responsibility (author’s name). If available, it should also include the publisher and place of publication. This is the place to be creative. You can use whatever font style and size you want. Create a manual page break at the end of the title page.
5. Create the title page verso.
This is the page directly on the other side of the title page and usually contains more detailed publication information, including copyright, ISBN, and CIP (cataloging in publication) data.
7. Insert another page break, and you can enter any number of optional pages such as a dedication.
This can include a half title page with just the title or title and author’s name on the second to last page before the text. The final page before the text will be blank, and the text should begin on the recto or right-hand side.
8. Insert a page break and begin your text.
Most books will begin with a chapter number, chapter name, or both.
In part 2, you will learn how to properly align page numbers and format paragraphs. You are now well on your way to preparing your book for publication. Writing a book can be a very rewarding experience, even if you have no intentions of publishing. With chúng tôi free and open source software, you have all the tools you need to make it happen. Happy writing!
Tavis J. Hampton
Tavis J. Hampton is a freelance writer from Indianapolis. He is an avid user of free and open source software and strongly believes that software and knowledge should be free and accessible to all people. He enjoys reading, writing, teaching, spending time with his family, and playing with gadgets.
Subscribe to our newsletter!
Our latest tutorials delivered straight to your inbox
Sign up for all newsletters.
By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.
You're reading How To Layout A Book With Openoffice.org: Part 1
A Comprehensive Tutorial On Deep Learning – Part 1
This article was published as a part of the Data Science Blogathon.
guide is mostly for beginners, and I’ll try to define and emphasize the topics as much as I can. Since Deep learning is a very Huge topic, I would divide the whole tutorial into few parts. Be sure to read the other parts if you find this one useful.
Contents1) Introduction
What is Deep Learning?
Why Deep Learning?
What amount of Data is Big?
Fields where Deep Learning is used
Difference between Deep Learning and Machine Learning
2) Importing necessary libraries
3) Overview
4) Logistic Regression
Computational graph
Initializing parameters
Forward Propagation
Optimizing with Gradient Descent
5) Logistic regression with Sklearn
6) Endnotes
Introduction What is Deep Learning?
It is a subfield of Machine Learning, inspired by the biological neurons of a brain, and translating that to artificial neural networks with representation learning.
Why Deep learning?
When the volume of data increases, Machine learning techniques, no matter how optimized, starts to become inefficient in terms of performance and accuracy, whereas Deep learning performs soo much better in such cases.
What amount of Data is big?
Well one cannot quantify a threshold for data to be called big, but intuitively let’s say a Million sample might be enough to say “It’s Big”( This is where Michael Scott would’ve uttered his famous words “That’s what she said” )
Fields where DL is used
Image Classification, Speech recognition, NLP(Natural language Processing), recommendation systems, etc.
Difference Between Deep Learning and Machine Learning
Deep Learning is a subset of Machine Learning.
In Machine Learning features are provided manually.
Whereas Deep Learning learns features directly from the data.
Image Source: Kaggle
We will use the Sign Language Digits Dataset which is available on Kaggle here. Now let us begin.
Importing Necessary Libraries import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) import matplotlib.pyplot as plt # Input data files are available in the "../input/" directory. # import warnings import warnings # filter warnings warnings.filterwarnings('ignore') from subprocess import check_output print(check_output(["ls", "../input"]).decode("utf8"))
Overview of the Data
There are 2062 sign language Digit Images in this dataset.
Since there are 10 digits from 0-9, there are 10 unique sign images.
In the beginning, we will only use 0 and 1 (To keep it simple for learners)
In the data, the hand sign for 0 is between indices 204 and 408. There are 205 samples for 0.
Also, the hand sign for 1 is between indices 822 and 1027. There are 206 samples.
Thus we shall use 205 samples from each class (Note: in reality 205 samples are very much less for a proper Deep Learning model, but since this is a tutorial, we can ignore that),
Now we will prepare our arrays X and Y, where X is our Image array(Features) and Y is our label array (0 and 1).
# load data set x_l = np.load('../input/Sign-language-digits-dataset/X.npy') Y_l = np.load('../input/Sign-language-digits-dataset/Y.npy') img_size = 64 plt.subplot(1, 2, 1) plt.imshow(x_l[260].reshape(img_size, img_size)) plt.axis('off') plt.subplot(1, 2, 2) plt.imshow(x_l[900].reshape(img_size, img_size)) plt.axis('off') # Join a sequence of arrays along an row axis. # from 0 to 204 is zero sign and from 205 to 410 is one sign X = np.concatenate((x_l[204:409], x_l[822:1027] ), axis=0) z = np.zeros(205) o = np.ones(205) Y = np.concatenate((z, o), axis=0).reshape(X.shape[0],1) print("X shape: " , X.shape) print("Y shape: " , Y.shape)To create our X array, we first slice and concatenate our segments of 0’s and 1’s hand sign images from the dataset to the array X. Next we do something similar with Y, but use the labels instead.
1) So we see that the shape of our X array is (410, 64, 64)
The 410 means 205 images of 0, 205 images of 1.
the 64 means that the size of our images is 64 x 64 pixels.
2) The shape of Y is (410,1) thus 410 1’s and 0’s.
3) Now we split X and Y into train and test sets.
train = 75%, train = 15%
random_state = Uses a particular seed while randomizing, thus if the cell runs multiple times, the random number generated does not change every time. The same test and train distribution are created every time.
# Then lets create x_train, y_train, x_test, y_test arrays from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=42) number_of_train = X_train.shape[0] number_of_test = X_test.shape[0]We have a 3 Dimensional input array, so we have to flatten it to 2D to feed into our first Deep Learning model. Since y is already 2D, we leave it just as it is.
X_train_flatten = X_train.reshape(number_of_train,X_train.shape[1]*X_train.shape[2]) X_test_flatten = X_test .reshape(number_of_test,X_test.shape[1]*X_test.shape[2]) print("X train flatten",X_train_flatten.shape) print("X test flatten",X_test_flatten.shape)Now we have a total of 348 images, each with 4096 pixels in the training array X. And 62 images of the same pixel density 4096 in the test array. Now we transpose the arrays. This is just a personal choice and you will see in the upcoming codes why I dis this.
x_train = X_train_flatten.T x_test = X_test_flatten.T y_train = Y_train.T y_test = Y_test.T print("x train: ",x_train.shape) print("x test: ",x_test.shape) print("y train: ",y_train.shape) print("y test: ",y_test.shape)So now we are done with preparing our required data. This is how it looks:
Now we will get familiar with one of the basic models of Dl, called Logistic Regression.
Logistic RegressionWhen talking about binary classification, the first model that comes to mind is Logistic regression. But one might wonder what is the use of logistic regression in Deep learning? The answer is simple since logistic regression is a simple neural network. The terms neural network and Deep learning go hand in hand. To understand Logistic regression, first, we have to learn about Computational graphs.
Computation GraphComputational graphs can be considered as a pictorial way of representing mathematical expressions. Let us understand that with an example. Suppose we have a simple mathematical expression like:
c = ( a2 + b2 ) 1/2
Its computational graph will be:
Image Source: Author
Now let us view a computational graph of Logistic regression:
Image Source: Kaggle Dataset
The weights and bias are called parameters of the model.
The weights depict the coefficients of each pixel.
Bias is the intercept of the curve formed by plotting parameters against labels.
Z = (px1*wx1) + (px2*wx2) + …. + (px4096*wx4096)
y_head = sigmoid_funtion(Z)
What the sigmoid function does is essentially scale the value of Z between 0 and 1, so it becomes a probability.
Why use the Sigmoid Function?
It gives us a probabilistic result.
Since it’s a derivative, we can use it in the gradient descent algorithm.
Now we will examine each of the components of the above computational graph in detail.
Initializing ParametersImage source: Microsoft Docs
Each pixel has its own weight. But the question is what will be their initial weights? There are several techniques to do that which I shall cover in part 2 of this article but for now, we can initialize them using any random value, let’s say 0.01.
The shape of the weights array will be (4096, 1), since there are in total 4096 pixels per image, and let the initial bias be 0.
# lets initialize parameters # So what we need is dimension 4096 that is number of pixels as a parameter for our initialize method(def) def initialize_weights_and_bias(dimension): w = np.full((dimension,1),0.01) b = 0.0 return w, b w,b = initialize_weights_and_bias(4096) Forward PropagationAll the steps from pixels to cost function is called forward propagation.
To calculate Z we use the formula: Z = (w.T)x + b. where x is the pixel array, w weights, and b is bias. After calculating Z we feed it into the sigmoid function which returns y_head(probability). After that, we calculate the loss(error) function.
The cost function is the summation of all the losses and penalizes the model for the wrong predictions. This is how our model learns the parameters.
# calculation of z #z = np.dot(w.T,x_train)+b def sigmoid(z): y_head = 1/(1+np.exp(-z)) return y_head y_head = sigmoid(0) y_head > 0.5The mathematical expression for loss function(log) is :
Like I said previously, what the loss function essentially does is penalize for wrong predictions. here is the code for the forward propagation:
# Forward propagation steps: # find z = w.T*x+b # y_head = sigmoid(z) # loss(error) = loss(y,y_head) # cost = sum(loss) def forward_propagation(w,b,x_train,y_train): z = np.dot(w.T,x_train) + b y_head = sigmoid(z) # probabilistic 0-1 loss = -y_train*np.log(y_head)-(1-y_train)*np.log(1-y_head) cost = (np.sum(loss))/x_train.shape[1] # x_train.shape[1] is for scaling return cost Optimizing with Gradient DescentImage Source: Coursera
We aim to find the values for our parameters for which, the loss function is the minimum. The equation for gradient descent is:
Where w is the weight or the parameter. greek letter alpha is something called stepsize. What it signifies is the size of the iterations we’ll take while going down the slope to find local minima. And rest is the derivative of the loss function, also known as the gradient. The algorithm for gradient descent is simple:
First, we take a random datapoint in our graph and find its slope.
Then we find the direction in which the value loss function decreases.
Update the weights using the above formula. (This method is also called backpropagation)
Select the next point by taking a stepsize of α.
Repeat.
# In backward propagation we will use y_head that found in forward progation # Therefore instead of writing backward propagation method, lets combine forward propagation and backward propagation def forward_backward_propagation(w,b,x_train,y_train): # forward propagation z = np.dot(w.T,x_train) + b y_head = sigmoid(z) loss = -y_train*np.log(y_head)-(1-y_train)*np.log(1-y_head) cost = (np.sum(loss))/x_train.shape[1] # x_train.shape[1] is for scaling # backward propagation derivative_weight = (np.dot(x_train,((y_head-y_train).T)))/x_train.shape[1] # x_train.shape[1] is for scaling derivative_bias = np.sum(y_head-y_train)/x_train.shape[1] # x_train.shape[1] is for scaling gradients = {"derivative_weight": derivative_weight,"derivative_bias": derivative_bias} return cost,gradientsNow we update the learning parameters:
# Updating(learning) parameters def update(w, b, x_train, y_train, learning_rate,number_of_iterarion): cost_list = [] cost_list2 = [] index = [] # updating(learning) parameters is number_of_iterarion times for i in range(number_of_iterarion): # make forward and backward propagation and find cost and gradients cost,gradients = forward_backward_propagation(w,b,x_train,y_train) cost_list.append(cost) # lets update w = w - learning_rate * gradients["derivative_weight"] b = b - learning_rate * gradients["derivative_bias"] if i % 10 == 0: cost_list2.append(cost) index.append(i) print ("Cost after iteration %i: %f" %(i, cost)) # we update(learn) parameters weights and bias parameters = {"weight": w,"bias": b} plt.plot(index,cost_list2) plt.xticks(index,rotation='vertical') plt.xlabel("Number of Iterarion") plt.ylabel("Cost") plt.show() return parameters, gradients, cost_list parameters, gradients, cost_list = update(w, b, x_train, y_train, learning_rate = 0.009,number_of_iterarion = 200)Till this point, we learned our parameters. It means we are fitting the data. In the prediction step, we have x_test as input and using it, we make forward predictions.
# prediction def predict(w,b,x_test): # x_test is a input for forward propagation z = sigmoid(np.dot(w.T,x_test)+b) Y_prediction = np.zeros((1,x_test.shape[1])) # if z is bigger than 0.5, our prediction is sign one (y_head=1), # if z is smaller than 0.5, our prediction is sign zero (y_head=0), for i in range(z.shape[1]): if z[0,i]<= 0.5: Y_prediction[0,i] = 0 else: Y_prediction[0,i] = 1 return Y_prediction predict(parameters["weight"],parameters["bias"],x_test)Now we make our predictions. Let us put it all together:
def logistic_regression(x_train, y_train, x_test, y_test, learning_rate , num_iterations): # initialize dimension = x_train.shape[0] # that is 4096 w,b = initialize_weights_and_bias(dimension) # do not change learning rate parameters, gradients, cost_list = update(w, b, x_train, y_train, learning_rate,num_iterations) y_prediction_test = predict(parameters["weight"],parameters["bias"],x_test) y_prediction_train = predict(parameters["weight"],parameters["bias"],x_train) # Print train/test Errors print("train accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_train - y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100)) logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 0.01, num_iterations = 150)So as you can see, even the most fundamental model of Deep learning is quite tough. It is not easy for you to learn, and beginners sometimes might feel overwhelmed while studying all of this in the one go. But the thing is we haven’t even touched deep learning yet, this is like the surface of it. There’s soo much more which I’ll add to in part 2 of this article.
Since we have learned the logic behind Logistic regression, we can use a library called SKlearn which already has many of the models and algorithms built in it, so you don’t have to start everything from scratch.
Logistic regression using SklearnI am not going to explain much in this section since you know almost all the logic and intuition behind Logistic regression. If you are interested in reading about the Sklearn library, you can read the official documentation here. Here is the code, and I’m sure you will be flabbergasted to see how little effort it takes:
from sklearn import linear_model logreg = linear_model.LogisticRegression(random_state = 42,max_iter= 150) print("test accuracy: {} ".format(logreg.fit(x_train.T, y_train.T).score(x_test.T, y_test.T))) print("train accuracy: {} ".format(logreg.fit(x_train.T, y_train.T).score(x_train.T, y_train.T)))Yes! this is all it took, just 1 line of code!
EndnotesWe’ve learned a lot today. But this is just the beginning. Be sure to check out part 2 of this article. You can find it at the below link. If you like what you read, you can read some of the other interesting articles that I’ve written.
I hope you had a good time reading my article. Cheers!!
The media shown in this article on Top Machine Learning Libraries in Julia are not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
How To Define A Gui Layout Using Xml Files In Android?
Introduction
GUI (Graphical User Interface) of any android application lets users interact with the various functionalities of android applications. The GUI is the main part of the application which is visible to the user. GUI of any android applications can be designed in several different ways such as using XML, Dart, Kotlin and many more. In this article we will take a look at How to define a GUI layout of an android application using XMl files.
ImplementationWe will be creating a simple application in which we will be creating a text view for displaying the heading of our application. Then we will be creating an image view and one more text view for displaying the UI for our application. Now let’s move towards android studio for creating a new project.
Step 1 : Creating a new project in Android StudioInside this screen we have to simply specify the project name. Then the package name will be generated automatically.
Note − Make sure to select the Language as Java.
Once our project has been created we will get to see 2 files which are open i.e activity_main.xml and chúng tôi file.
Step 2 : Working with activity_main.xmlandroid:layout_width=”match_parent”
<TextView android:id=”@+id/idTVHeading” android:layout_width=”match_parent” android:layout_height=”wrap_content” android:layout_above=”@id/idIVImage” android:layout_margin=”20dp” android:padding=”4dp” android:text=”GUI Layout using XML in Android” android:textAlignment=”center” android:textColor=”@color/black” android:textSize=”20sp”
<ImageView android:id=”@+id/idIVImage” android:layout_width=”200dp” android:layout_height=”200dp” android:layout_centerInParent=”true” android:layout_margin=”20dp”
<TextView android:id=”@+id/idTVMessage” android:layout_width=”match_parent” android:layout_height=”wrap_content” android:layout_below=”@id/idIVImage” android:layout_margin=”20dp” android:gravity=”center” android:text=”Good Morning.” android:textAlignment=”center” android:textAllCaps=”false” android:textColor=”@color/black” android:textDirection=”ltr” android:textSize=”20sp”
Explanation − In the above code we are creating a root layout as a Relative Layout. Inside this layout we are creating a text view which is used to display the heading of our application. After that we are creating an image view in which we are displaying the image of android. Lastly we are creating one more text view in which we are displaying the text view. Lastly we are adding a closing tag for the Relative Layout.
Note − Make sure you are connected to your real device or emulator.
Output ConclusionIn the above article we have taken a look on How to define the GUI layout of an Android application using XML files in Android.
Guide To Data Visualization With Python : Part 2
This article was published as a part of the Data Science Blogathon
Hey Guys, Hope You all are doing well.
I will be providing a link to my Kaggle notebook so don’t worry about the coding part.
The article will cover the topics mentioned below.
Table of Content2.7 Joint Plot / Marginal Plots
3.7 Geographical Maps
IntroductionLet’s have a quick introduction to data visualization.
Data Visualization: Data visualization is the graphical representation of information that is present inside a dataset with the help of visual elements such as charts, maps, graphs, etc.
n this article we will be using multiple datasets to show exactly how things work. The base dataset will be the iris dataset which we will import from sklearn. We will create the rest of the dataset.
Let’s import all the libraries which are required for doing
import math,os import pandas as pd import numpy as np import seaborn as sns import scipy.stats as stat import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3DReading the data (Main focused dataset)
iris = pd.read_csv('../input/iris-flower-dataset/IRIS.csv') # iris dataset iris_feat = iris.iloc[:,:-1] iris_species = iris.iloc[:,-1]Let’s Start the Hunt for ways to visualize data.
Note: The combination of features used for illustration may or may not make sense. They were only used for demo purposes.
Box PlotThis is one of the most used methods by data scientists. Box plot is a way of displaying the distribution of data based on the five-number theory. It basically gives information about the outliers and how much spread out data is from the center. It can tell if data symmetry is present or not. It also gives information about how tightly or skewed your data is. They are also known as Whisker plots
sepal_length = iris_feat['sepal_length'] petal_length = iris_feat['petal_length'] petal_width = iris_feat['petal_width'] sepal_width = iris_feat['sepal_width'] data = [sepal_length , petal_length , petal_width , sepal_width] fig1, ax1 = plt.subplots() ax1.set_title('Basic Plot') ax1.boxplot(data) plt.show()The dots or bubbles outside the 4th boxplot are the outliers. The line inside the box is depicting the median of data points of that category of variables.
Bubble PlotLike, scatter plot bubble plots are used to depict the relationship between two variables. However, the addition of a third allows you to add another element to the comparison. For example, you have coordinates of the location in latitude and longitude format and you also have the population size of that location. A Scatter plot will not be able to plot this data. But with a bubble plot, you can use X and Y axis for representing the location and size of population for the size of the bubble.
N = 50 # Creating own dataset. x = np.random.normal(200, 20, N) y = x + np.random.normal(5, 25, N) area = (30 * np.random.rand(N))**2 df = pd.DataFrame({ 'X': x, 'Y': y, "size":area}) # Bubble plot plt.scatter('X', 'Y', s='size',alpha=0.5, data=df) plt.title('Title') plt.xlabel('X axis') plt.ylabel('Y axis') plt.show()3. Scale the size of bubbles.
Area Charts # Making some temporary data y = [1,2,3,4,5,6,10,4] x = list(range(len(y))) #Creating the area chart plt.fill_between(x, y) #Show the plot plt.show() # Making up some random data y= [[1,2,3,4,5,6,10,8,4] , [2,3,4,5,6,10,11,4,7] , [3,4,5,6,10,11,12,8,10]]
x = list(range(len(y[0]))) ax = plt.gca() ax.stackplot(x, y, labels=['A','B','c'],alpha=0.5) plt.show() Pie PlotA pie plot is a circular representation of data that can be represented in relative proportions. A pie chart is divided into various parts depending on the number of numerical relative proportions.
#Creating the dataset students = ['A','B','C','D','E'] scores = [30,40,50,10,5] #Creating the pie chart plt.pie(scores, explode=[0,0.1,0,0,0], labels = students, colors = ['#EF8FFF','#ff6347','#B0E0E6','#7B68EE','#483D8B']) #Show the plot plt.show()By removing argument explodes from the above function you can create a completely joined pie chart.
Venn DiagramVenn diagrams are also called a set or logic diagrams they allow all possible relationships between a finite set. Venn diagrams are best suited when you have 2 or 3 finite sets and you want to gain insights about differences and commonalities between them. The intersection represented with different colors depicts the similarity while the area with different colors depicts the differences.
Below is the code for 2 set Venn diagram
from matplotlib_venn import venn2 #Making venn diagram # Venn Diagram with 2 groups venn2(subsets = (50, 10, 20), set_labels = ('A', 'B'),alpha=0.5) plt.show() # Venn Diagram with 3 groups venn3(subsets=(20, 10, 8, 10, 12, 4, 3), set_labels=('A', 'B', 'C'),alpha=0.5) Pair PlotPair plots are used to plot the pairwise relationship between the data points. This method is used for bivariate analysis. It is also possible to show a subset of variables or plot different variables on the rows and columns. The total number of combinations generated is (n,2). Seaborn provides a simple default method for making pair plots that can be customized. So we will be using Seaborn for implementing pair plots. They are actually the best way to visualize data with more than 4 columns as it would create nC2 graphs only.
There are various methods of creating a pair-plot. We will be discussing only 2 here. Others can be read here
# type-1 sns.pairplot(iris) # type-2 sns.pairplot(iris, hue="species") Joint PlotThis method is used for doing bivariate and univariate analysis at the same time.
Seaborn provides a convenient method to plot these graphs, so we will be using seaborn. There are various ways to plot joint graphs. We will be only looking at 2. You can learn more from here.
# Type -1 sns.jointplot(data=iris_feat, x="sepal_length", y="petal_length") # Type -2 sns.jointplot(data=iris_feat, x="sepal_length", y="petal_length",kind="reg")‘kind’ argument in the above function is quite powerful it creates a regression line in the data which may help to get some insights with ease.
The plots above depicting sepal_length and to right depicting petal_length to the square grid are for univariate analysis while the scatter plot created at the center is for bivariate analysis.
This concludes Our Section-2 of Guide to Data Visualization.
SECTION – 3 Violin PlotViolin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual data points, the violin plot features a kernel density estimation of the underlying distribution.
sns.violinplot(x="species", y="petal_length", data=iris, size=6) DendrogramsAccording to Wikipedia
A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analysis.
So basically dendrograms are used to show relationships between objects in a hierarchical manner. It is most commonly used to show the output of hierarchical clustering. The best use of this method is to form clusters of objects. The key to read dendrograms is to focus height at which they are joined. If we have to give clusters to dendrograms we start by breaking the highest link between them,
import plotly.figure_factory as ff X = np.random.rand(15, 10) fig = ff.create_dendrogram(X, color_threshold=1.5) fig.update_layout(width=800, height=500) fig.show() Andrew CurvesAndrew curves are used for visualizing high dimensional data by mapping each observation onto a function. Scatter plots are good till 3-dimension so we need some method for more than 3-dimension data which was suggested by Andrew. The formula for Andrew curves is given by
T(n) = x_1/sqrt(2) + x_2 sin(n) + x_3 cos(n) + x_4 sin(2n) + x_5 cos(2n) + …Andrew curves are most preferred for multivariate analysis. Another usage of Andrew Curves is to visualize the structure of independent variables. The possible usage is a lightweight method of checking whether we have enough features to train a classifier or whether we should keep doing feature engineering and cleaning data because there is a mess in data.
Tree MapsTreeMap displays each element of a dataset as a rectangle. It helps to display what proportions of each element. The size of each element in rectangle form is proportional to its value. Greater the size of the rectangle larger will be its value. It is somewhat similar to a piechart except for its way of representation.
To plot treemaps we need to use an external package which is squarify.
import squarify sizes = [50, 40, 15, 30, 20,5] squarify.plot(sizes) # Show the plot plt.show() Network ChartsAs the name suggests these charts help in understanding the relationship between different entities by connecting them together. Each entity is represented as a node and the connection between these nodes is represented as an edge. Their basic use is to get an insight into how nodes are connected with each other. In our example nodes in from list and nodes in to list got connected with links in between them.
import networkx as nx # Build a dataframe with 4 connections df = pd.DataFrame({ 'from':['A', 'B', 'C','A' ,'A' ,'E'], 'to':['D', 'A', 'E','C','E','B']}) # Build your graph graph=nx.from_pandas_edgelist(df, 'from', 'to') # Plot it nx.draw(graph, with_labels=True) plt.show() 3-D PlotsThis method is used to plot Interactive 3-D plots. It is more like a scatter plot in 3-Dimension. These plots follow all properties that we have discussed in the scatter plot. They are quite a powerful tool for visualization.
import plotly.express as px fig = px.scatter_3d(iris, x='sepal_length', y='sepal_width', z='petal_length', color='species') fig.show() Geographical MapsAs the name suggests these graphs help us in plotting and showing data with different colors for different locations. Each color represents some values. For our example, dark color represents smaller values and light color represents greater values. The idea of Geographical maps is similar to heatmaps with the exception that heatmap forms a grid but here we have geographical representation for our data.
import plotly.express as px df = pd.DataFrame({ 'Country': ['India','Russia','United States', 'China','Sri Lanka'], 'Random_Value': [100,101,80,5,50] }) fig = px.choropleth(df, locations="Country", color="Random_Value", locationmode='country names', color_continuous_scale=px.colors.sequential.Plasma) fig.show() EndNoteYou can find Code here
In this article, we saw various techniques for univariate, bivariate, and multivariate analysis. We tried to represent different types of data in the most effective way possible. With the idea of these many data visualization techniques, readers will be able to ace any competition. And will also be able to explain their data stories in the most effective way. There are many more techniques in the market that are used for data visualization. But these are the most commonly used techniques. And keeping the length of article in mind all other techniques can not be explained.
If you think this article contains mistakes, please let me know through the below links.
About Author
My Github Repository for Deep Learning is here.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Topic Modeling And Latent Dirichlet Allocation(Lda) Using Gensim And Sklearn : Part 1
This article was published as a part of the Data Science Blogathon
IntroductionLet’s say you have a client who has a publishing house. Your client comes to you with two tasks: one he wants to categorize all the books or the research papers he receives weekly on a common theme or a topic and the other task is to encapsulate large documents into smaller bite-sized texts. Is there any technique and tool available that can do both of these two tasks?
Lo and behold! We enter the world of Topic Modeling. I’ll break this article into three parts. In the current one, we’ll explore the basics of how text data is seen in Natural Language Processing, what are topics, what is topic modeling.
We shall see what are the applications of topic modeling, where all it is used, what are the methodologies to perform topic modeling, and what are the types of models available.
In the second article, we will dive in-depth into the most popular topic modeling technique called LDA, how it works, and in the third article how we apply it in Python.
Table of Contents
What are Topics?
What is Topic Modeling?
What are the Uses of Topic Modeling?
Topic Modeling Tools and Types of Models
Discriminative Models
Generative Models
But first, let us get clear on what the topic means?
What are Topics?Topics or themes are a group of statistically significant “tokens” or words in a “corpus”.
In case, the terminologies corpus and token are new to you, so here’s a quick refresher:
A corpus is the group of all the text documents whereas a document is a collection of paragraphs.
A paragraph is a collection of sentences and a sentence is a sequence of words (or tokens) in a grammatical construct.
So basically, a book or research paper, which collectively has pages full of sentences, can be broken down into words. In the world of Natural Language Processing (NLP), these words are known as tokens that are a single and the smallest unit of text. The vocabulary is the set of unique tokenized words.
And, the first step to work through any text data is to split the text into tokens. The process of splitting a text into smaller units or words is known as tokenization.
For instance, the sentence “The stock price of Google is USD2,450.” Tokenizing over each word, punctuation marks, and symbols on this sentence, we have the tokens as:
Following is an illustration of a text data structure:
Now, turning towards what are topics?
As a human, we can easily read through a text or review or book and based on this context tell what a topic the book or text is referring to, right? Yes! However, how would a machine tell us what is the topic of the book? How can you tell if a machine can rightly classify a book or text into the correct category? The only way to interpret what a machine builds for us in the language of Statistics.
Therefore, had said above that a topic or a theme is a group of statically significant words or tokens.
So, the next question that arises for us is to unravel what do we mean by statistical significance in the context of the text data? The statistically significant words imply that this collection of words are similar to each other and we see that in the following way within a text data:
The group of words occurs together in the documents
These words have similar
TF-IDF
term and inverse document frequencies
This group of words occurs regularly at frequent intervals
Some of the examples of words having common topics are:
In the above table, we have three different topics. Topic 1 on food, Topic 2 talks about games, and Topic 3 have words related to neuroscience. In each case, the words that are similar to each other come together as a topic.
We will see in the later sections how we get these weights, how are the words grouped.
What is Topic Modeling?Now that we have understood what topics are, it would be easier to grasp what topic modeling is.
Topic modeling is the process of automatically finding the hidden topics in textual data. It is also referred to as the text or information mining technique that has the aim to find the recurring patterns in the words present in the corpus.
It is an unsupervised learning method as we do not need to supply the labels to the topic modeling algorithm for the identification of the themes or the topics. Topics are automatically identified and classified by the model.
Essentially, topic modeling can be seen as a clustering methodology, wherein the small groups (or clusters) that are formed based on the similarity of words are known as topics. Additionally, topic modeling returns another set of clusters which are the group of documents collated together on the similarity of the topics. It is visually depicted below:
For understanding and illustrative purposes, we have a corpus with the following five documents:
Document 1: I want to watch a movie this weekend.
Document 2: I went shopping yesterday. New Zealand won the World Test Championship by beating India by eight wickets at Southampton.
Document 3: I don’t watch cricket. Netflix and Amazon Prime have very good movies to watch.
Document 4: Movies are a nice way to chill however, this time I would like to paint and read some good books. It’s been so long!
Document 5: This blueberry milkshake is so good! Try reading Dr. Joe Dispenza’s books. His work is such a game-changer! His books helped to learn so much about how our thoughts impact our biology and how we can all rewire our brains.
Here, P implies that the respective topic is present in the current document and 0 indicates the absence of the topic in the document.
And, if the topic is present in the document then the values (which are random as of now) assigned to it convey how much weightage does that topic has in the particular document.
As seen above, a document may be a combination of many topics. Our intention here with topic modeling is to find the main dominant topic or the theme.
We will be working with the same set of documents in the following parts of the article as well.
What are the Uses of Topic Modeling?Topic Modeling is a very important Natural Language Processing tool that is extensively used for the following objectives:
Document Categorization: The goal is to categorize or classify a large set of documents into different categories based on the common underlying theme. We saw above to classify the books or the research papers into different categories.
Document Summarization: It is a very handy tool for generating summaries of large documents; say in our case we want to summarize the large stack of research papers.
Intent Analysis: Intent analysis means what each sentence (or tweet or post or complaint) refers to. It tells what is the text trying to explain in a particular document.
Information Retrieval: Information retrieval, concerned with storing, searching, and retrieving information also leverages the utility of topic modeling by deciphering the themes and content of the larger data.
Dimensionality Reduction: In any model, reducing dimensions is a key aspect. It’s a hassle to make a model with a large number of dimensions. Topic modeling helps to decrease the dimensions or features of the text data as the documents which at the start contain words are further converted into documents consisting of the topics. Hence, similar words are clubbed together to form the topics, which reduces the dimensions of the corpus.
Recommendation Engines: Recommendation engines are also built and the engines are used to share the preferred content with the users based on the themes filtered after applying topic modeling.
Topic Modeling Tools and Types of ModelsNow, moving on to the techniques for executing topic modeling on a corpus. There are many methods for topic modeling such as:
Latent Dirichlet Allocation (LDA)
Latent Semantic Allocation (LSA)
Non-negative Matrix-Factorization (NNMF)
Of the above techniques, we will dive into LDA as it is a very popular method for extracting topics from textual data.
Now, we’ll take a small detour from topic modeling to the types of models. We will soon see the need for that. There are two types of model available:
Discriminative models, and
Generative models
Discriminative modelsDiscriminative models are more analogous and differentiate the classes with the observed data as defect or no-defect, having the disease or no disease. These models are applied in all spheres of artificial intelligence:
Logistic Regression
Decision Tree
Random Forest
Support Vector Machine (SVM)
Traditional Neural Network
Generative ModelsOn the other hand, generative models use statistics to generate or create new data. These models estimate the probabilities using the joint probability distribution P(X, Y). These not only estimate the probabilities but also models the data points and differentiates the classes based on these computed probabilities of the class labels. These types of models are known as statistical or conditional models.
As compared to the discriminative models, the generative models have the capacity of handling more complicated tasks and are empowered with the ability to create more data to build the model on. These are unsupervised learning techniques that are used to discover the hidden patterns within the data.
In NLP, the generative models are the Naive Bayes, N-gram Language Model. The Naive Bayes classifiers and Bayesian networks are constructed on the underlying Bayes theorem which uses the joint probability.
Examples of other generative models are:
Gaussian Mixture Model (GMM)
Hidden Markov Model (HMM)
Linear Discriminant Analysis (LDA)
Generative Adversarial Networks (GANs)
Autoencoders
Boltzmann Machines
Moving back to our discussion on topic modeling, the reason for the diversion was to understand what are generative models.
The topic modeling technique, Latent Dirichlet Allocation (LDA) is also a breed of generative probabilistic model. It generates probabilities to help extract topics from the words and collate documents using similar topics. We will see in part 2 of this blog what LDA is, how does LDA work? How is LDA similar to PCA and in the last part we will implement LDA in Python. Stay tuned for more!
About meHi there! I am Neha Seth, a technical writer for AnalytixLabs. I hold a Postgraduate Program in Data Science & Engineering from the Great Lakes Institute of Management and a Bachelors in Statistics. I have been featured as Top 10 Most Popular Guest Authors in 2023 on Analytics Vidhya (AV).
My area of interest lies in NLP and Deep Learning. I have also passed the CFA Program. You can reach out to me on LinkedIn and can read my other blogs for ALabs and AV.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Caring For A Loved One With Dementia? Bu Neurologists’ New Book Offers Guidance
Caring for a Loved One with Dementia? BU Neurologists’ New Book Offers Guidance
BU neurologist Andrew Budson and BU neuropsychologist Maureen O’Connor have published a new book, Six Steps to Managing Alzheimer’s Disease and Dementia: A Guide for Families. Photo courtesy of Budson
Alzheimer’s Care
Caring for a Loved One with Dementia? BU Neurologists’ New Book Offers Guidance An excerpt from Andrew Budson and Maureen O’Connor’s Six Steps to Managing Alzheimer’s Disease and Dementia: A Guide for FamiliesAs a sequel to their 2023 book Seven Steps to Managing Your Memory: What’s Normal, What’s Not, and What to Do About It, BU neurologist Andrew Budson and BU neuropsychologist Maureen O’Connor have published a new book, Six Steps to Managing Alzheimer’s Disease and Dementia: A Guide for Families (Oxford University Press, 2023). For World Alzheimer’s Day, Tuesday, September 21, BU Today is publishing a brief excerpt from their new book. (It’s estimated that worldwide more than 55 million people are living with the disease and that as many as 139 million people could have it by 2050.)
Caregiving is hard. It’s hard whether you’re caring for your spouse, parent, grandparent, sibling, other family member, or friend. Even if you had an extra 10 hours each day to do it, it’s hard to manage all the problems that come with dementia. And caring for a loved one with dementia can sometimes feel like a long, lonely journey.
Everyone is familiar with the terms “dementia” and “Alzheimer’s disease,” but not everyone knows exactly what they mean and how they are related. In our practices as a neurologist and a neuropsychologist, we have worked with several thousand families who are struggling with dementia, just like you. We give them tips for communication to diffuse tense situations. We explain why their loved ones may have false memories, hallucinate, not recognize them, or think they have been replaced by an imposter. We also help them deal with tremors, falls, wandering, agitation, aggression, and incontinence.
THE 4Rs: REASSURE, RECONSIDER, REDIRECT, and RELAXAlthough many specific problems in dementia are best managed by equally specific solutions, there are some general approaches that can be used in a wide range of situations. One helpful approach when you are in the midst of dealing with a difficult situation is the 4Rs.
ReassureIt is important to understand that your loved one with dementia may have difficulty interacting with the world around them. Because of their memory loss, people and things once familiar may become unfamiliar. Noise, crowds, and activity may be difficult to understand, and they may feel easily overwhelmed. They may be worried or scared when they can’t see you—thinking you’ve been gone for hours even if it’s only been a few minutes. In short, there are many reasons why an individual with dementia may feel anxious and afraid, even if they have never had trouble with these emotions before. It can be helpful to remind yourself that if your loved one is yelling or acting agitated, it may be related to their feeling afraid or nervous. Reassure them that everything is alright. Phrases like, “You’re safe,” “Everything is OK,” and “I’m here for you” can provide comfort. You may need to reassure them repeatedly. Reassurance from you can help reduce or stop many problem behaviors.
ReconsiderIt is important to consider your loved one’s perspective. Their experience of situations might be very different than you might imagine. For example, perhaps your loved one becomes angry every time the home health aide visits and tries to help him bathe. This behavior may seem mysterious, but reconsidering things from his perspective may help explain it. Because of his memory loss, he may perceive the aide as a complete stranger—even though she has been bathing him for months! He also may not remember that he needs help bathing. So, from his perspective, a stranger is asking him to take his clothes off so she can bathe him, and he may feel outraged, anxious, or confused. Reconsidering the situation from your loved one’s point of view can improve your ability to empathize with them, help you feel calmer, and provide you with clues about what you might be able to do to manage the problem behavior.
RedirectSimply telling your loved one to stop a problem behavior rarely works. Redirecting them to something they like often does. When you redirect your loved one, you change the focus and direct them from the upsetting or counterproductive event or environment to something else. This change may be accomplished by taking your loved one into a different room, starting a fun conversation or activity, pointing out something interesting, or giving your loved one a novel, interesting, comforting, or well-loved object. Use a nurturing touch and tone of voice to redirect your loved one. Using a loud or harsh tone will generally escalate the behavior, which brings us to our last R—relax.
RelaxWith diminishing abilities, your loved one may increasingly rely on you to help them interpret the world around them. Consciously or unconsciously, they may use your emotions as a way to know how they should be feeling and responding. If you are anxious and upset—whether because of their behavior or something else—your loved one may feed off of your feelings and also become anxious and upset. Even if the words you are using are reassuring, if your tone of voice or body language reflects that you’re feeling frustrated or angry, your loved one is likely to pick up on these nonverbal signals. This is why it is so important that you remain calm and relaxed—especially when faced with problem behavior.
Of course, it isn’t always easy to relax your posture, uncross your arms, loosen your hands, and speak calmly and reassuringly. Remaining relaxed in the face of aggressive, agitated, embarrassing, and irritating behaviors is hard for everyone. Practicing good self-care will make it easier for you to remain calm and collected when your loved one is not. Learning deep-breathing and relaxation techniques can help you control your emotions.
Andrew E. Budson is a School of Medicine professor of neurology, associate director of BU’s Alzheimer’s Disease Center, and chief of cognitive and behavioral neurology at the Veterans Affairs Boston Healthcare System. He can be reached at [email protected]. Follow him @abudson.
Maureen K. O’Connor is a MED assistant professor of neurology, director of neuropsychology at the Bedford Veterans Affairs Hospital, and a member at large of the National Academy of Neuropsychology. She can be reached at [email protected].
Explore Related Topics:
Update the detailed information about How To Layout A Book With Openoffice.org: Part 1 on the Hatcungthantuong.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!