You are reading the article How Statistical Analysis Is Performed With Advantage? updated in December 2023 on the website Hatcungthantuong.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 How Statistical Analysis Is Performed With Advantage?
What is Statistical Analysis?Statistical Analysis is the scientific way to collect, preprocess and apply a set of statistical methods to discover the insights or underlying pattern of the data. With the increase in cheap data and incremental bandwidth, we are now sitting on a ton of structured and unstructured data. Along with the need for acquiring and maintaining this huge data, one main challenge is to deal with the noise and convert the data into a meaningful way. The statistical analysis comes up with a set of statistical methodologies and tools to address the problem.
Start Your Free Data Science Course
How Statistical Analysis is Performed?Statistical analysis is a vast literature of data analysis itself. Let us discuss the most common approaches of statistical data analysis:
Searching for Central TendencyWhile working with structural data it is often the preliminary step to get an idea on the central tendency of the data set. Suppose you are analyzing the salary data of an organization. Then you may be interested in the following questions like what is the average salary of a manager working in the organization for 3 years with so and so qualification? The following are used as a measurement of central tendency.
Mean: Mean is basically the average of all the data points. Mean is the total salary divided by the number of data points.
Median: Median is the 50th percentile of the data. When we are seeking information like average salary, the median will be a more robust measure. It is less sensitive to outliers.
Mode: Mode is the most frequent value in the list of numbers. Suppose we are dealing with a list of numbers [12, 33, 44, 55, 67, 55, 8, 55], here the mode with be 55.
Searching for DispersionStandard Deviation: Standard Deviation quantifies how much the data point varies from its central tendency (dispersion). The lower the value, the more the data points are identical with its central value.
Variance: Variance is the square of standard deviation. The variance gives us the spread (variability) of the data. While working with high dimensional data we often come up with a situation where we need to reduce the dimensionality or analyze the important variables of the data set. In such situations, we convert the axis in such a way that maximum variability is preserved. This new rotating axis is called the principal components. We choose N important components (an axis with high variance) from the rotating components.
Interquartile Range (IQR): Interquartile range is the range of data between the 25th and 75th percentile values of the data set. We use box plot, violin plot, etc. to analyze the IQR in graphical ways.
Regression Problems Advantages of Using Statistical Analysis
In the era of Big Data, while implementing any machine learning use case it is the utmost importance of how we choose the sample from the huge data lake. Statistical analysis helps us to determine the proper sampling methodology (i.e random, random without substitution, stratified sampling, etc) and reduce the sampling bias.
For example, we are dealing with binary classification problem where 80% of data points belong to the class A and only 20% belong to class B. Now if we want to perform any statistical test with samples from the population, we must ensure the samples are also in 80:20 ratio (80% class A: 20% class B).
Be it sampling or decision making the basis of statistical analysis is historical data. This makes statistical data analysis more acceptable as an industry-standard than another manual process of data analysis.
Why Do We Need Statistical Analysis?The main goal of statistical analysis is to find valuable insights from the data which may be used to discover Industry trends, customer rate of attrition to a product or service, making a valuable business decision, etc.
From the collection of data to find the underlying patterns of the data, statistical analysis is the base of all data-driven methodologies and classical machine learning.
Scope of Statistical AnalysisThe following are the points that explain the scope of Statistical Analysis:
In today’s world, more and more Industries are switching to data-based decision-making systems instead of classical deterministic rule-based approaches.
Statistical analysis is being used dominantly to solve various business problems across domains like Manufacturing, Insurance, Banking and Finances, Automobile, etc. from the industry point of view.
From a technical perspective statistical analysis helps to solve linear regress, time series forecasting, predictive analysis, etc.
ConclusionIn this article, we have discussed the various aspects of statistical data analysis like methodologies, the need, and scope of use cases, etc. Statistical analysis is a very old area of study which lays out the base for modern machine learning and data-driven business models. The practical implementation of statistical analysis methodologies differs based on the type of use case and industry.
Recommended ArticlesYou're reading How Statistical Analysis Is Performed With Advantage?
Stock Price Analysis With Python
Stock price analysis with Python is crucial for investors to understand the risk of investing in the stock market. A company’s stock prices reflect its evaluation and performance, which influences the demand and supply in the market. Technical analysis of the stock is a vast field, and we will provide an overview of it in this article. By analyzing the stock price with Python, investors can determine when to buy or sell the stock. This article will be a starting point for investors who want to analyze the stock market and understand its volatility. So, let’s dive into the stock price analysis with Python.
Libraries Used in Stock Price Analysis With PythonThe following are the libraries required to be installed beforehand which can easily be downloaded with the help of the pip function. A brief description of the Library’s name and its application is provided below
LibraryApplicationYahoo FinanceTo download stock dataPandasTo handle data frames in pythonNumpyNumerical PythonMatplotlibPlotting graphs
import pandas as pd import datetime import numpy as np import matplotlib.pyplot as plt from pandas.plotting import scatter_matrix !pip install yfinance import yfinance as yf %matplotlib inline Data DescriptionWe have downloaded the daily stock prices data using the Yahoo finance API functionality. It’s a five-year data capturing Open, High, Low, Close, and Volume
Open: The price of the stock when the market opens in the morning
Close: The price of the stock when the market closed in the evening
High: Highest price the stock reached during that day
Low: Lowest price the stock is traded on that day
Volume: The total amount of stocks traded on that day
Here, we will take the Example of three companies TCS, Infosys, and Wipro which are the industry leaders in providing IT services.
start = "2014-01-01" end = '2023-1-01' tcs = yf.download('TCS',start,end) infy = yf.download('INFY',start,end) wipro = yf.download('WIPRO.NS',start,end) Exploratory Analysis for Stock Price Analysis With PythonPython Code:
The above graph is the representation of open stock prices for these three companies via line graph by leveraging matplotlib library in python. The Graph clearly shows that the prices of Wipro is more when comparing it to other two companies but we are not interested in the absolute prices for these companies but wanted to understand how these stock fluctuate with time.
tcs['Volume'].plot(label = 'TCS', figsize = (15,7)) infy['Volume'].plot(label = "Infosys") wipro['Volume'].plot(label = 'Wipro') plt.title('Volume of Stock traded') plt.legend()The Graph shows the volume traded by these companies which clearly shows that stocks of Infosys are traded more compared to other IT stocks.
#Market Capitalisation tcs['MarktCap'] = tcs['Open'] * tcs['Volume'] infy['MarktCap'] = infy['Open'] * infy['Volume'] wipro['MarktCap'] = wipro['Open'] * wipro['Volume'] tcs['MarktCap'].plot(label = 'TCS', figsize = (15,7)) infy['MarktCap'].plot(label = 'Infosys') wipro['MarktCap'].plot(label = 'Wipro') plt.title('Market Cap') plt.legend()Only volume or stock prices do not provide a comparison between companies. In this case, we have plotted a graph for Volume * Share price to better compare the companies. As we can clearly see from the graph that Wipro seems to be traded on a higher side.
Moving Averages for Stock Price Analysis With PythonAs we know the stock prices are highly volatile and prices change quickly with time. To observe any trend or pattern we can take the help of a 50-day 200-day average
tcs['MA50'] = tcs['Open'].rolling(50).mean() tcs['MA200'] = tcs['Open'].rolling(200).mean() tcs['Open'].plot(figsize = (15,7)) tcs['MA50'].plot() tcs['MA200'].plot() Scattered Plot Matrix data = pd.concat([tcs['Open'],infy['Open'],wipro['Open']],axis = 1) data.columns = ['TCSOpen','InfosysOpen','WiproOpen'] scatter_matrix(data, figsize = (8,8), hist_kwds= {'bins':250})The above graph is the combination of histograms for each company and a subsequent scattered plot taking two companies’ stocks at a time. From the graph, we can clearly figure out that Wipro stocks are loosely showing a linear correlation with Infosys.
Percentage Increase in Stock ValueA percentage increase in stock value is the change in stock comparing that to the previous day. The bigger the value either positive or negative the volatile the stock is.
#Volatility tcs['returns'] = (tcs['Close']/tcs['Close'].shift(1)) -1 infy['returns'] = (infy['Close']/infy['Close'].shift(1))-1 wipro['returns'] = (wipro['Close']/wipro['Close'].shift(1)) - 1 tcs['returns'].hist(bins = 100, label = 'TCS', alpha = 0.5, figsize = (15,7)) infy['returns'].hist(bins = 100, label = 'Infosysy', alpha = 0.5) wipro['returns'].hist(bins = 100, label = 'Wipro', alpha = 0.5) plt.legend()It is clear from the graph that the percentage increase in stock price histogram for TCS is the widest which indicates the stock of TCS is the most volatile among the three companies compared.
ConclusionThe above analysis can be used to understand a stock’s short-term and long-term behaviour. A decision support system can be created which stock to pick from industry for low-risk low gain or high-risk high gain depending on the risk apatite of the investor.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Sentiment Analysis With Lstm And Torchtext With Code And Explanation
In this article, we will see every single details that you need to know for sentiment data analysis using the LSTM network using the torchtext library. We will see, how to use spacy tokenizer in torchtext data class and the use of tabular and bucket iterator. We will use the embedding matrix with or without pre-trained Glove embedding as an input and we will also see how to process text data of different lengths in a batch with pack_padded_sequence. And you can use these techniques in your problem
What are Field and LabelField?In sentiment data, we have text data and labels (sentiments). The torchtext came up with its text processing data types in NLP. The text data is used with data-type: Field and the data type for the class are LabelField. In the older version PyTorch, you can import these data-types from chúng tôi but in the new version, you will find it in torchtext.legacy.data. You can find detailed information for Field here.
Some important arguments of the data types, that you will use are ‘tokenize’, ‘use_vocab’, ‘batch_first’, ‘include_lengths’, ‘sequential’, and ‘lower’. Let’s first understand the argument tokenize. In simple words, tokenization is a process to split your sentence into words or more basic words. You can use tokenize in many ways either defining your function of a tokenizer, or you can define a function in torch with get_tokenizer, or you can use an inbuilt tokenizer of Field. First, we will install spacy then we will see the tokenizer function.
pip install spacy python -m spacy download en_core_web_sm # Build tokenizer def tokenizer(text): return [token.text for token in spacy_en.tokenizer(text)]You can also define using torch get_tokenizer as well (another way to define) :
from torchtext.data.utils import get_tokenizer tokenizer = get_tokenizer('spacy', language='en_core_web_sm')Let’s see the output of any of the tokenizer we defined above. Both are the same.
print(tokenizer("I can't run whole day")) Output: ['I', 'ca', "n't", 'run', 'whole', 'day']After defining the tokenizer, you can pass it into your Filed. Filed is data-type for your input text. For the article purpose let’s define some sample data in a CSV file.
TEXT = data.Field(tokenize=tokenizer, use_vocab=True, lower=True, batch_first=True, include_lengths=True) LABEL = data.LabelField(dtype=torch.long, batch_first=True, sequential=False) fields = [('text', TEXT), ('label', LABEL)]In the above data-set and the code: Text input is sequential data and sequential argument is True by default so no need to pass in the first line of code and we pass it in the label field. The include_lengths argument will return the length of each sentence in a batch, we will see this in BucketIterator section of this article in more detail. We can also use tokenizer within the Field without using any tokenizer function we did above (we are not using any of the tokenizer functions we defined above)-
TEXT = data.Field(use_vocab=True, lower=True, tokenize='spacy', tokenizer_language='en_core_web_sm', batch_first=True, include_lengths=True) TabularDataset for the Project training_data = data.TabularDataset( path='sample.csv', format='csv', fields=fields, skip_header=True, ) for example in training_data.examples: print(example.text, example.label) Output: ['she', 'good'] 1 ['he', 'is', 'sad'] 2 ['i', 'am', 'very', 'happy'] 1We will do the same thing we do always, splitting data into trains and test data as we do with train_test_split of Sklearn. Here TabularDataset has a split function itself, and we will use that function to split our data with a random state:
train_data, val_data = training_data.split(split_ratio=0.7, random_state=random.seed(SEED)) Glove Embedding for Sentiment Analysis LSTM TorchTextUp to this point, we have read our data and converted it into TabularDataset. Now we will see, how to use embedding in this data. I am giving basic informative notes on embedding, which will be helpful for you if you are not aware. Neural Net only deals with numbers. Embedding converts words into integers and there is a vector corresponding to each integer. Refer to the below image, suppose we have 10k words in our dictionary and you have assigned each word a value between1 to 10k.
Create a zero vector of dimension 10k, Now suppose if you want to represent the word “man”, because its value is 1 in the dictionary(refer to the image below), so in the vector put 1 in the first index and keep others to zero. Such types of vectors are one-hot encode vectors and the problem with these vectors is their dimension. If we have 2B words in our dictionary, we have to make a 2B dimension vector.
To overcome such a problem we generate a dense vector and Glove is one such approach that has a dense vector for a word. Here we will download and use pre-trained Glove Embedding in our problem. You can download the Glove vector using the torch and all the dimensional details can be found at this link.
vectors = Vectors(name='glove.6B.50d.txt') TEXT.build_vocab(train_data, vectors=vectors, max_size=10000, min_freq=1) LABEL.build_vocab(train_data)
In the above code, we initialized the vector and build our training data vocabulary with this vector. I mean, we get a vector for all known tokens from the data set (word/ token). We can restrict the size of vocabulary also. If you do not have the Glove text file, use the following code to download the vector. The cache argument will help you to store the downloaded file for future use. I mean, no need to download the same file again and again.
cache = '.vector_cache' if not os.path.exists(cache): os.mkdir(cache) vectors = Glove(name='840B', dim=50, cache=cache)When you have built the vocabulary, you can check out the dictionary. Here I have small data so I can print whole tokens here for demonstration purposes.
print(list(TEXT.vocab.stoi.items())) output: [('', 0), ('', 1), ('am', 2), ('good', 3), ('happy', 4), ('he', 5), ('i', 6), ('is', 7), ('sad', 8), ('she', 9), ('very', 10)]If you have noticed, we have two extra tokens UNK and PAD and the corresponding indices of these two are 0 and 1. If you want to see the vector corresponding to token=’good’, you can do this by the code below.
print(TEXT.vocab.vectors[TEXT.vocab.stoi['good']])Here TEXT.vocab.vectors contains 50 dimensional vectors for 11 different tokens. chúng tôi converts string to integer(index). The vectors for UNK and PAD are always zero vectors. I am not printing the values as it will take more space here, but you can play around with it. Now I am getting the device type I have because it is going to be used in Bucket-Iterator.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') BucketIterator for Sentiment Analysis LSTM TorchTextBefore the code part of BucketIterator, let’s understand the need for it. This iterator rearranges our data so that similar lengths of sequences fall in one batch with descending order to sequence length (seq_len=Number of tokens in a sentence). If we have the text of length=[4,6,8,5] and we want to split this data into two batches the BucketIterator will split it into [8,6] and [5,4].
Figure 3: BucketIterator for one batch
Arranging data in descending order is required for efficient calculations. Should we replace the question mark with PAD tokens? You will get the answer in this article. BucketIterator helps to keep a similar length of sentences in one batch. This will reduce the padding tokens overhead for computational points of view, first see how to code the BucketIterator:
BTACH_SZIE = 2 train_itr, val_itr = BucketIterator.splits( (train_data, val_data), batch_size=BATCH_SIZE, sort_key=lambda x:len(x.text), device=device, shuffle=True, sort_within_batch=True, sort=False )I hope every argument is self-explanatory here, we passed the batch size of 2. Choose batch size wisely as it is a crucial hyper-parameter and its value also depends on how much data you can process in your GPU/CPU memory. We did not sort the entire data-set but we did sort the data samples within a batch(sort_within_batch=True). See how our batches look:
for batch_no, batch in enumerate(train_itr): text, batch_len = batch.text print(text, batch_len) print(batch.label) output: (tensor([[ 6, 2, 10, 4], [ 5, 7, 8, 1]]), tensor([4, 3])) tensor([0, 1])Each batch contains the token ids and labels, here we got the length of each sentence in a batch as well because we passed include_length as true in the TEXT Field. If you have more sentences of different lengths, you will see BucketIterator arrange the data very nicely.
Basics of LSTM ModelLong short-term memory (LSTM) is a family member of RNN. RNN learns the sequential relationship and this is the reason RNN works well in NLP because the next token has some information from the previous tokens. LSTM can learn longer sequences compare to RNN or GRU. Example: “I am not going to say sorry, and this is not my fault.”
Here the same person who does not want to say sorry is also confident of not being guilty. To understand such logic the network has to be capable of learning the relationship between the first word to the last word of a sentence if necessary. For longer sentences, the network has to understand the relevant relationship between all words and the order of the sequence (which token is coming next in the sentence).
The LSTM plays a very good role here and remembers longer dependency in the sequence due to its capability of remembering relevant information and forgetting irreverent information in a sequence. You can explore this article for more details, you will get all the RNN basics.
Input Shape and Hidden
The input can be given in two ways: 1. (Sequence First: Sequence Length, Batch Size, Input Dimension) 2. (Batch First: Batch Size, Sequence Length, Input Dimension). We will use the second format of the input here. We already have defined the batch size in the BucketIterator, the sequence_length is the number of tokens in a batch and the input dimension is the Glove vector dimension which is 50 in our case.
The hidden shape is (No of Direction * Number of Layers, Batch Size, Hidden Size). Sentiment text information can be extracted using Bi-directional LSTM so the number of directions is 2, we will use 2 number of LSTM layers so its value is 2 in our case. The batch size we already discussed and hidden size you can choose suitable value 8, 16, 32, 64, etc.
Figure 4: Input shape for LSTM(RNN)
Model class SentimentClassifier(nn.Module): def __init__(self, vocab_size, embed_dim, hidden, n_label, n_layers): super(SentimentClassifier, self).__init__() self.hidden = hidden self.n_layers = n_layers self.embed = nn.Embedding(vocab_size, embed_dim) chúng tôi = nn.LSTM(embed_dim, hidden, num_layers=n_layers, bidirectional=True, batch_first=True)#dropout=0.2 chúng tôi = nn.Linear(hidden * 2, n_label) def forward(self, input, actual_batch_len): embed_out = self.embed(input) hidden = torch.zeros(self.n_layers * 2 , input.shape[0], self.hidden) cell = torch.zeros( self.n_layers * 2, input.shape[0], self.hidden) pack_out = nn.utils.rnn.pack_padded_sequence( embed_out, actual_batch_len,batch_first=True).to(device) out_lstm, (hidden, cell) = self.lstm(pack_out, (hidden, cell))#dropout hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]),dim=1) out = self.fc(hidden) return out VOCAB_SIZE = len(TEXT.vocab) EMBEDDING_DIM = TEXT.vocab.vectors.shape[1] HIDDEN= 64 NUM_LABEL = 4 # number of classes NUM_LAYERS = 2 model = SentimentClassifier(VOCAB_SIZE, EMBEDDING_DIM, HIDDEN, NUM_LABEL, NUM_LAYERS)This is our model, do not worry we will break this code step by step. VOCAB_SIZE: Total tokens in data set, EMBEDDING_DIM: Glove vector dimension (50 here), HIDDEN we took 64, NUM_LABEL is our number of classes and NUM_LAYERS is 2: 2 stacked LSTM layer. First, we defined the embedding layer which is a mapping of the vocabulary size to a dense vector, this is the reason, we have mapped total vocab size to the vector dimension. See an example for torch embedding where we have only 2 tokens in the vocab and we want it to transform into a 4-dimensional vector:
emb = nn.Embedding(2,4)# size of vocab = 2, vector len = 4 print(emb.weight) output: tensor([[ 0.2626, -0.7775, -0.7230, 0.6391], [-0.7772, 0.4914, -0.9622, 1.2316]], requires_grad=True)In the above code, the first and second output list is a 4-dimensional embedding vector for emb(0)[token 1] and emb(1)[token[2] respectively. The second thing we defined in the classifier is the LSTM layer, we did a mapping of the vector (Embedding dimension) to the hidden. You can also pass dropout in LSTM for regularization. At last, we defined a fully connected layer which resulted out in our desired number of classes and the input for this linear transformation is two times the hidden. Why have two times hidden? Because this is bidirectional LSTM and we are concatenating the final hidden cells from the forward and backward direction of the last layer of LSTM (As we have bidirectional LSTM layers).
Time to discuss what we did in the forward method of SentimentClassifier class. We are passing two-argument input (batched data) and the number of tokens in each sequence of the batch. Very first we passed input to embedding layers we created but wait….. This embedding does not aware of the Glove embedding, we just downloaded before. If you do not want to use any pretrained embedding just go ahead (parameters learning from scratch for the embedding) else do the following code to copy existing vectors for each token we have.
model.embed.weight.data.copy_(TEXT.vocab.vectors) print(model.embed.weight) Output: tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [-0.2660, 0.4732, 0.3187, ..., -0.1116, -0.2955, -0.2576], ..., [ 0.1777, 0.1764, 0.0684, ..., 0.1164, -0.0368, 0.1446], [ 0.4121, 0.0792, -0.4929, ..., 0.0564, 0.1322, -0.5023], [ 0.5183, 0.0194, 0.0089, ..., 0.2638, -0.0442, -0.3650]])The first two vectors are zero vectors as they represent the UNK and PAD tokens(as we have seen in the glove embedding section). Copying the pre-trained embedding will help our model to converge much-mush faster as the tokens are already well-positioned in some hyper-dimensional space. So do not forget to copy existing vectors from the pre-trained embedding.
The hidden and cell need to be reset for the first token of every new sentence in LSTM and this is the reason we initialized it to zero before pass it to the LSTM. If we do not set the hidden and cell to zero Torch does it, so it is optional here. We used pack_padded_sequence and the question is why? As you remember we saw question marks in figure 3 for empty tokens, just go up if you missed them.
pack_padded_sequenceThen we used pack_padded_sequence on the embedding output. As BucketIterator grouped the similar length sequences in one batch with descending order of sequence length, and this is essential for pack_padded_sequence. The pack_padded_sequence returns you new batches from the existing batch. I will give you all the basics through code:
Figure 5: Batch creation pack_padded_sequence
data: tensor([[ 6, 2, 10, 4], [ 9, 3, 1, 1]]) # 1 is padded token len: tensor([4, 2])Let’s have a batch of two sentences (1) “I am very happy” (2) “She good”. The token_ids are written above with length [4,2] The pack_padded_sequence converts the data into batches of [2, 2, 1, 1] as shown in figure 5. Let us understand this with a small example with code for that we are passing the embedding output to pack_padded_sequence with a list of seq_len we have [4, 2].
for batch in train_itr: text, len = batch.text emb = nn.Embedding(vocab_size, EMB_DIM) emb.weight.data.copy_(TEXT.vocab.vectors) emb_out = emb(text) pack_out = nn.utils.rnn.pack_padded_sequence(emb_out, len, batch_first=True) rnn = nn.RNN(EMB_DIM, 4, batch_first=True) out, hidden = rnn(pack_out)If we print the hidden here we will get:
Hidden Output: [[[ 0.9451, -0.9984, -0.4613, 0.9768], [ 0.9672, -0.9905, -0.1192, 0.9983]]]If we print the complete output we will get:
rnn_output: [[ 0.9092, -0.9358, -0.8513, 0.9401], [ 0.8691, -0.9776, 0.5006, 0.1485], [ 0.8109, -0.9987, 0.9487, 0.9641], [ 0.9672, -0.9905, -0.1192, 0.9983], [ 0.9926, -0.9055, -0.5543, 0.9884], [ 0.9451, -0.9984, -0.4613, 0.9768]]Refer to figure 5 for this explanation (focus on purple lined tokens). The hidden of the last token will explain the sentiment for the sentence. Here is the first hidden output, that is corresponding to the last token (“happy”) of the first sequence and in rnn_output list it is the last one. The second last(5th) rnn_output is (“good”) of no use here. But the last hidden output belongs to the last token of the second sequence(“good”) and it is the 4th rnn_output. If our sequence length and data set will grow, we can save a lot of computations with pack_padded_sequence. You can transform the output to its original form of sequences by printing the following lines and I leave this part for you to analyze.
print(nn.utils.rnn.pad_packed_sequence(out, batch_first=True))Now we have completed all the required things we need to know, we have data in our hands, we have made our model ready and we copied Glove embedding to our model’s embedding. So at last we will define some hyper-parameters then we will start training data.
Calculate Loss
opt = torch.optim.Adam(model.parameters(), lr=0.001) criterion = nn.CrossEntropyLoss() model.to(device)We have defined CrossEntropyLoss (multi-class) as a loss function as we have 4 numbers of the output class and we used Adam as the optimizer. If you remember we passed the data to the device in BucketIterator so if you have Cuda then call model.to() method because data and model are to be in the same memory, either CPU or GPU. Now we will define functions to calculate the loss and accuracy of our model.
def accuracy(preds, y): _, preds = torch.max(preds, dim= 1) acc = torch.sum(preds == y) / len(y) return acc def calculateLoss(model, batch, criterion): text, text_len = batch.text preds = model(text, text_len.to('cpu') ) loss = criterion(preds, batch.label) acc = accuracy(preds, batch.label) return loss, len(batch.label), accThe accuracy function consists of simply Torch operations: matching our predictions with actuals. In calculateLoss we passed input to our model, the only thing to note here we shifted the batch_sequence_lengths (text_len in above code) to the CPU before.
Epoch Loop
N_EPOCH = 100 for i in range(N_EPOCH): model.train() train_len, train_acc, train_loss = 0, [], [] for batch_no, batch in enumerate(train_itr): opt.zero_grad() loss, blen, acc = calculateLoss( model, batch, criterion) train_loss.append(loss * blen) train_acc.append(acc * blen) train_len = train_len + blen loss.backward() opt.step() train_epoch_loss = np.sum(train_loss) / train_len train_epoch_acc = chúng tôi train_acc ) / train_len model.eval() with torch.no_grad(): for batch in val_itr: val_results = [calculateLoss( model, batch, criterion) for batch in val_itr] loss, batch_len, acc = zip(*val_results) epoch_loss = np.sum(np.multiply(loss, batch_len)) / np.sum(batch_len) epoch_acc = np.sum(np.multiply(acc , batch_len)) / np.sum(batch_len) print('epoch:{}/{} epoch_train_loss:{:.4f},epoch_train_acc:{:.4f}' ' epoch_val_loss:{:.4f},epoch_val_acc:{:.4f}'.format(i+1, N_EPOCH, train_epoch_loss.item(), train_epoch_acc.item(), epoch_loss.item(), epoch_acc.item()))If you are new to Torch: we use three important functionality (1) zero_grad to set all gradients to zero (2) loss.backward() to computes the gradients (3) opt.step() to update the parameters. All these three are only for training data so we set torch.no_grad() during evaluation phase.
ConclusionWow, we have completed this article, and it’s time for you to hands-on your data set. In my experience in many real-world applications, we are using sentiment analysis heavily in the industry. I hope this article helps your understanding much better than before. See you next time with some other interesting NLP article.
All the images used in this article are designed by the author.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
What Is Data Analysis? Research, Types & Example
What is Data Analysis?
Data analysis is defined as a process of cleaning, transforming, and modeling data to discover useful information for business decision-making. The purpose of Data Analysis is to extract useful information from data and taking the decision based upon the data analysis.
A simple example of Data analysis is whenever we take any decision in our day-to-day life is by thinking about what happened last time or what will happen by choosing that particular decision. This is nothing but analyzing our past or future and making decisions based on it. For that, we gather memories of our past or dreams of our future. So that is nothing but data analysis. Now same thing analyst does for business purposes, is called Data Analysis.
In this Data Science Tutorial, you will learn:
Why Data Analysis?To grow your business even to grow in your life, sometimes all you need to do is Analysis!
If your business is not growing, then you have to look back and acknowledge your mistakes and make a plan again without repeating those mistakes. And even if your business is growing, then you have to look forward to making the business to grow more. All you need to do is analyze your business data and business processes.
Data Analysis ToolsData Analysis Tools
Data analysis tools make it easier for users to process and manipulate data, analyze the relationships and correlations between data sets, and it also helps to identify patterns and trends for interpretation. Here is a complete list of
Types of Data Analysis: Techniques and MethodsThere are several types of Data Analysis techniques that exist based on business and technology. However, the major Data Analysis methods are:
Text Analysis
Statistical Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
Text AnalysisData analysis tools make it easier for users to process and manipulate data, analyze the relationships and correlations between data sets, and it also helps to identify patterns and trends for interpretation. Here is a complete list of tools used for data analysis in research.
Text Analysis is also referred to as Data Mining. It is one of the methods of data analysis to discover a pattern in large data sets using databases or data mining tools. It used to transform raw data into business information. Business Intelligence tools are present in the market which is used to take strategic business decisions. Overall it offers a way to extract and examine data and deriving patterns and finally interpretation of the data.
Statistical AnalysisStatistical Analysis shows “What happen?” by using past data in the form of dashboards. Statistical Analysis includes collection, Analysis, interpretation, presentation, and modeling of data. It analyses a set of data or a sample of data. There are two categories of this type of Analysis – Descriptive Analysis and Inferential Analysis.
Descriptive Analysisanalyses complete data or a sample of summarized numerical data. It shows mean and deviation for continuous data whereas percentage and frequency for categorical data.
Inferential Analysis Diagnostic AnalysisDiagnostic Analysis shows “Why did it happen?” by finding the cause from the insight found in Statistical Analysis. This Analysis is useful to identify behavior patterns of data. If a new problem arrives in your business process, then you can look into this Analysis to find similar patterns of that problem. And it may have chances to use similar prescriptions for the new problems.
Predictive AnalysisPredictive Analysis shows “what is likely to happen” by using previous data. The simplest data analysis example is like if last year I bought two dresses based on my savings and if this year my salary is increasing double then I can buy four dresses. But of course it’s not easy like this because you have to think about other circumstances like chances of prices of clothes is increased this year or maybe instead of dresses you want to buy a new bike, or you need to buy a house!
So here, this Analysis makes predictions about future outcomes based on current or past data. Forecasting is just an estimate. Its accuracy is based on how much detailed information you have and how much you dig in it.
Prescriptive AnalysisPrescriptive Analysis combines the insight from all previous Analysis to determine which action to take in a current problem or decision. Most data-driven companies are utilizing Prescriptive Analysis because predictive and descriptive Analysis are not enough to improve data performance. Based on current situations and problems, they analyze the data and make decisions.
Data Analysis ProcessThe Data Analysis Process is nothing but gathering information by using a proper application or tool which allows you to explore the data and find a pattern in it. Based on that information and data, you can make decisions, or you can get ultimate conclusions.
Data Analysis consists of the following phases:
Data Requirement Gathering
Data Collection
Data Cleaning
Data Analysis
Data Interpretation
Data Visualization
Data Requirement GatheringFirst of all, you have to think about why do you want to do this data analysis? All you need to find out the purpose or aim of doing the Analysis of data. You have to decide which type of data analysis you wanted to do! In this phase, you have to decide what to analyze and how to measure it, you have to understand why you are investigating and what measures you have to use to do this Analysis.
Data CollectionAfter requirement gathering, you will get a clear idea about what things you have to measure and what should be your findings. Now it’s time to collect your data based on requirements. Once you collect your data, remember that the collected data must be processed or organized for Analysis. As you collected data from various sources, you must have to keep a log with a collection date and source of the data.
Data CleaningNow whatever data is collected may not be useful or irrelevant to your aim of Analysis, hence it should be cleaned. The data which is collected may contain duplicate records, white spaces or errors. The data should be cleaned and error free. This phase must be done before Analysis because based on data cleaning, your output of Analysis will be closer to your expected outcome.
Data AnalysisOnce the data is collected, cleaned, and processed, it is ready for Analysis. As you manipulate data, you may find you have the exact information you need, or you might need to collect more data. During this phase, you can use data analysis tools and software which will help you to understand, interpret, and derive conclusions based on the requirements.
Data Interpretation Data VisualizationData visualization is very common in your day to day life; they often appear in the form of charts and graphs. In other words, data shown graphically so that it will be easier for the human brain to understand and process it. Data visualization often used to discover unknown facts and trends. By observing relationships and comparing datasets, you can find a way to find out meaningful information.
Summary:
Data analysis means a process of cleaning, transforming and modeling data to discover useful information for business decision-making
Types of Data Analysis are Text, Statistical, Diagnostic, Predictive, Prescriptive Analysis
Data Analysis consists of Data Requirement Gathering, Data Collection, Data Cleaning, Data Analysis, Data Interpretation, Data Visualization
Technical Analysis – Beginners Guide To Technical Analysis
What is Technical Analysis
Technical Analysis is a method of analyzing securities such as stock, commodities, etc. in order to forecast the direction of pricing by studying past data such as the price and volume. It focuses on how stick prices are moving and how powerful these moves are. Technical analysis is based solely on the data generated by the market and by the actions of people in the market. Data are never revised later. Analysts do not make any guesses on the value of the data. It is based on the premise that people will act in similar ways when faced with similar conditions.
But to be more pragmatic, it is a tool used to make investment decisions. It helps assess risk and reward. And it can assist investors in allocating their resources among stocks, sectors, and asset classes.
The above picture depicts the factors upon which Technical Analysis is dependent upon.
What Technical Analysis is not?Technical analysis is not predicting the future of an endorsement or criticism of any company. There is an element of prediction and judgment as it attempts to find the probability of future action of a company’s stock but not the company itself that is under scrutiny.
Also, this analysis does not give an absolute prediction about the price movement but helps the investors and traders anticipate what could likely happen and accordingly take an investment decision.
What is a chart?Technical analysis has been a bit of a mistake as it’s not that technical. Though there are complex mathematical tied to it but at its core, it is the method of determining if a stock or market as a whole is going up or going down. What we need to do in order to identify these trends is simply looking at a chart. Now let’s understand charts and how they help in Technical Analysis.
A chart is a tool both investors and traders use to help them determine whether to buy or sell the stock-bond commodity or currency. As mentioned bar charts summarize all the trading for any given time period such as a day or a week when all those summaries are plotted together trends emerge and patterns form – all revealing where a stock is right now and how it got there. After all, knowing a stock is trading at a price of 50 is not of much help but knowing it was at 45 last month and 40 the month before gives us a good idea that it has been a bullish trend.
Charts are where perception meets reality. For instance, a stock may look cheap according to an analyst’s calculations based on projected future earnings, but if there were no demand for the stock it is simply not going to go up.
Some analysts look at a chart and simply draw an arrow on the actual data plot if the arrow is pointing up they know the trend is up and vice versa.
On the charts, we look at what is happening right now and how it came to be. From there we make an educated guess about the future, but the goal is not to predict where prices will be in a year. The real goal is to determine what we do about it right now. If we decide to buy based on a chart, we will already know what has to happen to prove us wrong and that helps us limit losses.
Understanding Each Part Of Chart
Price
Price action tells us what the supply and demand equilibrium is at any given point in time..
Volume
Volume is very useful In determining whether a stock is cooling back in correction or changing direction. And another important use is is in the identification of both the final panic and initial surges as investor moods change from one extreme to another.
Momentum
Momentum indicators quantify what the naked eye can tell us about the price action. Momentum In the market causes trends to stay in effect until halted by outside factors.
Structure
The way price action is used on charts in twofold; either the stock in question is moving or is not. The former creates a trend either higher or lower the latter creates resting zones and further shapes of these resting zones give us clues as to when the next trend will be up or down. It is the structure of these ups downs and flats that is analyzed.
Sentiment
The sentiment is the summation of all market expectations. It ranges from fear and hopelessness to indifference to greed and complacency. At the bottom of a bear market, the expectations of the participants are almost unanimous for a lower prices and more financial losses.
What are the support and resistance?Support
At some price level, a falling stock price will stabilize. Enough investors will perceive it to be good value and demand shares; while others will perceive the price to be too low for them to want to sell any more of their holdings.
Resistance
Why use Technical Analysis?The next logical question that comes to one’s mind is what does technical analysis do? The answer to this is that it has the ability to recognize when a stock has reached support or resistance level or a shift in perception takes place can help investors take investment decisions i.e. to use:
Buy low, sell high approach or
Buy high, sell higher approach, or
Whether to buy the stock or not
The ability to apply these aspects of the chart will reveal the market to investors when it is safe to buy a stock or not. Technical analysis is the only investment decision discipline that tells you when you are wrong to minimize losses.
Practice effective risk analysis and management. Practice effective risk analysis and management. Design and deliver creative solutions to your clients’ investment needs.
Practice effective risk analysis and management. Practice effective risk analysis and management. Design and deliver creative solutions to your clients’ investment needs.
Goals of Technical Analysis
Seeing where the stock is currently trading and figuring out how it got there
This can be done by using charting tools such as the
Stock trends
Support levels
Resistance levels
We also try to find out a pattern or a trend to help.
Determining the power of a trend
This can be determined by looking at important technical concepts such as trading volume and momentum.
Making comparisons of the stock to the market, peer companies, and itself
For this we look at the relative performances and moving averages (average prices over a defined period of time, usually 50-200 days)
Criteria for Investment using technical analysis toolsIn looking for a stock, the following are the key technical analysis tools that should be met, not necessarily all bot most.
Trends and trendlines
Trends can be classified in three ways: Up, Down or Range bound. In an uptrend, a stock rallies often with intermediate periods of consolidation or movement against the trend. In a downtrend, a stock declines often with intermediate periods of consolidation or movement against the trend. In range-bound, there is no apparent direction to the price movement on the stock chart and there will be little or no rate of price change.
These trends can be measured using trendlines. All we want is stocks that are in rising trends.
Support and Resistance
This basically tells us what price levels are likely to bring put buyers or the sellers. Here we want to see that if the current price has either just moved through resistance or one that is far from the next resistance level.
Moving Averages
They help in determining if the trend is turning, also it shows if the existing trend is progressing in an orderly manner or no. here we are looking at prices to be above-selected averages but not too far above them.
Relative Performance
This divides the price of a stock by a relative market index or industry group. Here the theory is to find out if the ratio is going up the stock is outperforming the market and is thus a strong candidate for further gains and vice versa. Here we are looking for stocks whose relative performances are increasing.
Volume
The number of shares traded and when (either when prices rise or when they fall). We basically analyze if buying is spreading to other investors and for urgency for all to buy when prices start to rise.
Momentum
We want to know if the days when the stock rises outnumber those when it falls. If losing days are more and frequent then we can say the trend is weakening.
Sentiment
It is to find out if everybody is thinking the same thing? Is it time to go the other way? Here we want to know what everyone is thinking about the same thing.
A career in Technical AnalysisAny person who enjoys working with numbers and is keen about statistics and capital markets may enjoy being a technical analyst.
Technical Analyst Job Description
Technical analysts study the trends and patterns of the stocks to make predictions about its future performance. They find out this sophisticated information to find out the best time and price at which to sell stocks. These professionals are often employed by finance and investment agencies, financial institutions, and brokerage houses.
Technical Analyst Pre-requisites
A bachelor’s degree in commerce major, like economics or finance, is required for a career in technical analysis. Some firms may require employees to have Master of Business Administration or master’s degrees in finance.
Skills
Critical-thinking
Analytical skills
Communication skills
Good in maths and statistics
Knowledge of computer software programs
Ib Pitchbook – Liquidity Analysis
IB Pitchbook – Liquidity Analysis
Download our free Investment Banking Pitchbook Template
Written by
CFI Team
Published January 19, 2023
Updated July 7, 2023
Liquidity AnalysisAn analysis of a company’s liquidity is important because it gives us insight into its capacity to pursue an M&A transaction. We need to identify trends in a company’s liquidity position, what their needs are over time, and the implications to the company’s liquidity if a transaction is to be pursued. When performing a liquidity analysis, the main points to consider are the company’s cash flow profile, capital expenditures, debt, and future funding requirements.
Download the Free TemplateEnter your name and email in the form below and download the free template now!
Cash Flow ProfileThe company’s ability to generate cash from operation is the main focus of liquidity analysis. As an investment banker, you must consider any significant trends or shifts in the company’s variable and fixed costs. How is the performance of the company’s margins over time? What inventory costing do they use (LIFO, FIFO, weighted average cost)? What depreciation methods do they use (straight line, declining balance)? How does their financial accounting differ from their tax accounting, and its implications on tax deferral? Are there any major gains and impairments that should be considered?
All of these questions contribute to the overall sustainability of the company’s operations and its overall capacity to pursue a transaction. How much of the transaction can be funded internally? How much additional capital must be raised? What type of capital can be raised and what is the strategic rationale for raising one form of capital over another?
Capital ExpendituresThe company’s CapEx schedule is very important when pitching a transaction opportunity because it is the main opportunity cost to consider against a transaction opportunity. For example, a company can invest in its own capital that will replicate the benefit of a transaction. Furthermore, the amount of capital available to a company may already be committed to specific capital requirements, it is an investment banker’s job to calculate the requirement and frame a strategic recommendation around these existing commitments.
When we consider a company’s capital expenditures, it is important to distinguish between growth and maintenance CapEx. While it is critical for a company to continually invest in maintenance CapEx to replace any depreciation, the amount of growth CapEx could be the amount that a company might forgo to pursue a transaction. If M&A is a regular course of business (i.e. AutoCanada, Premium Brands), the growth CapEx may already be factoring in transactions on an ongoing basis.
DebtThe company’s leverage is probably the most important element to consider when pitching a transaction. If we think about accretion/dilution, due diligence lets us choose an appropriate range of stock vs cash breakdown in terms of limits on the amount of leverage a company and its creditors may be comfortable with. Furthermore, taking on too much debt to fund a transaction may cause the company to incur interest beyond what it can pay down. Also, if a company faces any major debt maturities in the near future, it may opt to conserve its dry powder in anticipation of the debt coming due.
A company may include items that behave like debt, and we must consider any operating or financial leases by the company, as well as any pension obligations the company is committed to paying out. Additionally, we must think about how much room the company has in short-term credit facilities, and the company’s capital allocation priorities before pitching a transaction.
If the target company is also leveraged, we must take into consideration the fact that the target’s enterprise value includes the value of its debt. Therefore, it is important to consider the change in a company’s leverage ratios pro forma the transaction.
Additional ResourcesUpdate the detailed information about How Statistical Analysis Is Performed With Advantage? on the Hatcungthantuong.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!