Trending December 2023 # Cnn Image Classification In Tensorflow With Steps & Examples # Suggested January 2024 # Top 12 Popular

You are reading the article Cnn Image Classification In Tensorflow With Steps & Examples updated in December 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Cnn Image Classification In Tensorflow With Steps & Examples

What is Convolutional Neural Network?

Convolutional Neural Network, also known as convnets or CNN, is a well-known method in computer vision applications. It is a class of deep neural networks that are used to analyze visual imagery. This type of architecture is dominant to recognize objects from a picture or video. It is used in applications like image or video recognition, neural language processing, etc.

In this TensorFlow CNN tutorial, you will learn-

Architecture of a Convolutional Neural Network

Think about Facebook a few years ago, after you uploaded a picture to your profile, you were asked to add a name to the face on the picture manually. Nowadays, Facebook uses convnet to tag your friend in the picture automatically.

A convolutional neural network for image classification is not very difficult to understand. An input image is processed during the convolution phase and later attributed a label.

A typical convnet architecture can be summarized in the picture below. First of all, an image is pushed to the network; this is called the input image. Then, the input image goes through an infinite number of steps; this is the convolutional part of the network. Finally, the neural network can predict the digit on the image.

Architecture of a Convolutional Neural Network (CNN)

An image is composed of an array of pixels with height and width. A grayscale image has only one channel while the color image has three channels (each one for Red, Green, and Blue). A channel is stacked over each other. In this tutorial, you will use a grayscale image with only one channel. Each pixel has a value from 0 to 255 to reflect the intensity of the color. For instance, a pixel equals to 0 will show a white color while pixel with a value close to 255 will be darker.

Let’s have a look at an image stored in the MNIST dataset. The picture below shows how to represent the picture of the left in a matrix format. Note that, the original matrix has been standardized to be between 0 and 1. For darker color, the value in the matrix is about 0.9 while white pixels have a value of 0.

Convolutional operation

The most critical component in the model is the convolutional layer. This part aims at reducing the size of the image for faster computations of the weights and improve its generalization.

During the convolutional part, the network keeps the essential features of the image and excludes irrelevant noise. For instance, the model is learning how to recognize an elephant from a picture with a mountain in the background. If you use a traditional neural network, the model will assign a weight to all the pixels, including those from the mountain which is not essential and can mislead the network.

Instead, a Keras convolutional neural network will use a mathematical technique to extract only the most relevant pixels. This mathematical operation is called convolution. This technique allows the network to learn increasingly complex features at each layer. The convolution divides the matrix into small pieces to learn to most essential elements within each piece.

Components of Convolutional Neural Network (ConvNet or CNN)

There are four components of a Convnets


Non Linearity (ReLU)

Pooling or Sub Sampling

Classification (Fully Connected Layer)


The purpose of the convolution is to extract the features of the object on the image locally. It means the network will learn specific patterns within the picture and will be able to recognize it everywhere in the picture.

Convolution is an element-wise multiplication. The concept is easy to understand. The computer will scan a part of the image, usually with a dimension of 3×3 and multiplies it to a filter. The output of the element-wise multiplication is called a feature map. This step is repeated until all the image is scanned. Note that, after the convolution, the size of the image is reduced.

Below, there is a URL to see in action how convolution works.

There are numerous channels available. Below, we listed some of the channels. You can see that each filter has a specific purpose. Note, in the picture below; the Kernel is a synonym of the filter.


Arithmetic behind the convolution

The convolutional phase will apply the filter on a small array of pixels within the picture. The filter will move along the input image with a general shape of 3×3 or 5×5. It means the network will slide these windows across all the input image and compute the convolution. The image below shows how the convolution operates. The size of the patch is 3×3, and the output matrix is the result of the element-wise operation between the image matrix and the filter.

You notice that the width and height of the output can be different from the width and height of the input. It happens because of the border effect.

Border effect

Image has a 5×5 features map and a 3×3 filter. There is only one window in the center where the filter can screen an 3×3 grid. The output feature map will shrink by two tiles alongside with a 3×3 dimension.

To get the same output dimension as the input dimension, you need to add padding. Padding consists of adding the right number of rows and columns on each side of the matrix. It will allow the convolution to center fit every input tile. In the image below, the input/output matrix have the same dimension 5×5

When you define the network, the convolved features are controlled by three parameters:

Depth: It defines the number of filters to apply during the convolution. In the previous example, you saw a depth of 1, meaning only one filter is used. In most of the case, there is more than one filter. The picture below shows the operations done in a situation with three filters

    Stride: It defines the number of “pixel’s jump” between two slices. If the stride is equal to 1, the windows will move with a pixel’s spread of one. If the stride is equal to two, the windows will jump by 2 pixels. If you increase the stride, you will have smaller feature maps.

    Example stride 1

    stride 2

      Zero-padding: A padding is an operation of adding a corresponding number of rows and column on each side of the input features maps. In this case, the output has the same dimension as the input.

      Non Linearity (ReLU)

      At the end of the convolution operation, the output is subject to an activation function to allow non-linearity. The usual activation function for convnet is the Relu. All the pixel with a negative value will be replaced by zero.

      Pooling Operation

      This step is easy to understand. The purpose of the pooling is to reduce the dimensionality of the input image. The steps are done to reduce the computational complexity of the operation. By diminishing the dimensionality, the network has lower weights to compute, so it prevents overfitting.

      In this stage, you need to define the size and the stride. A standard way to pool the input image is to use the maximum value of the feature map. Look at the picture below. The “pooling” will screen a four submatrix of the 4×4 feature map and return the maximum value. The pooling takes the maximum value of a 2×2 array and then move this windows by two pixels. For instance, the first sub-matrix is [3,1,3,2], the pooling will return the maximum, which is 3.

      There is another pooling operation such as the mean.

      This operation aggressively reduces the size of the feature map

      Fully Connected Layers

      The last step consists of building a traditional artificial neural network as you did in the previous tutorial. You connect all neurons from the previous layer to the next layer. You use a softmax activation function to classify the number on the input image.


      TensorFlow Convolutional Neural network compiles different layers before making a prediction. A neural network has:

      A convolutional layer

      Relu Activation function

      Pooling layer

      Densely connected layer

      The convolutional layers apply different filters on a subregion of the picture. The Relu activation function adds non-linearity, and the pooling layers reduce the dimensionality of the features maps.

      All these layers extract essential information from the images. At last, the features map are feed to a primary fully connected layer with a softmax function to make a prediction.

      Train CNN with TensorFlow

      Now that you are familiar with the building block of a convnets, you are ready to build one with TensorFlow. We will use the MNIST dataset for CNN image classification.

      The data preparation is the same as the previous tutorial. You can run the codes and jump directly to the architecture of the CNN.

      You will follow the steps below for image classification using CNN:

      Step 1: Upload Dataset

      Step 2: Input layer

      Step 3: Convolutional layer

      Step 4: Pooling layer

      Step 5: Second Convolutional Layer and Pooling Layer

      Step 6: Dense layer

      Step 7: Logit Layer

      Step 1: Upload Dataset

      Create a train/test set

      You need to split the dataset with train_test_split

      Scale the features

      Finally, you can scale the feature with MinMaxScaler as shown in the below image classification using TensorFlow CNN example.

      import numpy as np import tensorflow as tf from sklearn.datasets import fetch_mldata #Change USERNAME by the username of your machine ## Windows USER ## Mac User print( print( from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(,, test_size=0.2, random_state=42) y_train = y_train.astype(int) y_test = y_test.astype(int) batch_size =len(X_train) print(X_train.shape, y_train.shape,y_test.shape ) ## resclae from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() # Train X_train_scaled = scaler.fit_transform(X_train.astype(np.float64)) # test X_test_scaled = scaler.fit_transform(X_test.astype(np.float64)) feature_columns = [tf.feature_column.numeric_column('x', shape=X_train_scaled.shape[1:])] X_train_scaled.shape[1:]

      Define the CNN

      A CNN uses filters on the raw pixel of an image to learn details pattern compare to global pattern with a traditional neural net. To construct a CNN, you need to define:

      A convolutional layer: Apply n number of filters to the feature map. After the convolution, you need to use a Relu activation function to add non-linearity to the network.

      Pooling layer: The next step after the convolution is to downsample the feature max. The purpose is to reduce the dimensionality of the feature map to prevent overfitting and improve the computation speed. Max pooling is the conventional technique, which divides the feature maps into subregions (usually with a 2×2 size) and keeps only the maximum values.

      Fully connected layers: All neurons from the previous layers are connected to the next layers. The CNN will classify the label according to the features from the convolutional layers and reduced with the pooling layer.

      CNN architecture

      Convolutional Layer: Applies 14 5×5 filters (extracting 5×5-pixel subregions), with ReLU activation function

      Pooling Layer: Performs max pooling with a 2×2 filter and stride of 2 (which specifies that pooled regions do not overlap)

      Convolutional Layer: Applies 36 5×5 filters, with ReLU activation function

      Pooling Layer #2: Again, performs max pooling with a 2×2 filter and stride of 2

      1,764 neurons, with dropout regularization rate of 0.4 (probability of 0.4 that any given element will be dropped during training)

      Dense Layer (Logits Layer): 10 neurons, one for each digit target class (0–9).

      There are three important modules to use to create a CNN:

      conv2d(). Constructs a two-dimensional convolutional layer with the number of filters, filter kernel size, padding, and activation function as arguments.

      max_pooling2d(). Constructs a two-dimensional pooling layer using the max-pooling algorithm.

      dense(). Constructs a dense layer with the hidden layers and units

      You will define a function to build the CNN. Let’s see in detail how to construct each building block before to wrap everything together in the function.

      Step 2: Input layer def cnn_model_fn(features, labels, mode): input_layer = tf.reshape(tensor = features["x"],shape =[-1, 28, 28, 1])

      You need to define a tensor with the shape of the data. For that, you can use the module tf.reshape. In this module, you need to declare the tensor to reshape and the shape of the tensor. The first argument is the features of the data, which is defined in the argument of the function.

      Step 3: Convolutional layer # first Convolutional Layer conv1 = tf.layers.conv2d( inputs=input_layer, filters=14, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)

      The first convolutional layer has 14 filters with a kernel size of 5×5 with the same padding. The same padding means both the output tensor and input tensor should have the same height and width. Tensorflow will add zeros to the rows and columns to ensure the same size.

      You use the Relu activation function. The output size will be [28, 28, 14].

      Step 4: Pooling layer

      The next step after the convolution is the pooling computation. The pooling computation will reduce the dimensionality of the data. You can use the module max_pooling2d with a size of 2×2 and stride of 2. You use the previous layer as input. The output size will be [batch_size, 14, 14, 14]

      # first Pooling Layer pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) Step 5: Second Convolutional Layer and Pooling Layer

      The second convolutional layer has 32 filters, with an output size of [batch_size, 14, 14, 32]. The pooling layer has the same size as before and the output shape is [batch_size, 14, 14, 18].

      conv2 = tf.layers.conv2d( inputs=pool1, filters=36, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) Step 6: Dense layer

      Then, you need to define the fully-connected layer. The feature map has to be flatten before to be connected with the dense layer. You can use the module reshape with a size of 7*7*36.

      The dense layer will connect 1764 neurons. You add a Relu activation function. Besides, you add a dropout regularization term with a rate of 0.3, meaning 30 percents of the weights will be set to 0. Note that, the dropout takes place only during the training phase. The function cnn_model_fn has an argument mode to declare if the model needs to be trained or to evaluate as shown in the below CNN image classification TensorFlow example.

      pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 36]) dense = tf.layers.dense(inputs=pool2_flat, units=7 * 7 * 36, activation=tf.nn.relu) dropout = tf.layers.dropout( inputs=dense, rate=0.3, training=mode == tf.estimator.ModeKeys.TRAIN) Step 7: Logit Layer

      Finally in the TensorFlow image classification example, you can define the last layer with the prediction of the model. The output shape is equal to the batch size and 10, the total number of images.

      # Logits Layer logits = tf.layers.dense(inputs=dropout, units=10)

      You can create a dictionary containing the classes and the probability of each class. The module tf.argmax() with returns the highest value if the logit layers. The softmax function returns the probability of each class.

      predictions = { # Generate predictions "classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor") }

      You only want to return the dictionnary prediction when mode is set to prediction. You add this codes to dispay the predictions

      if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

      The next step consists to compute the loss of the model. In the last tutorial, you learnt that the loss function for a multiclass model is cross entropy. The loss is easily computed with the following code:

      # Calculate Loss (for both TRAIN and EVAL modes) loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

      The final step of the TensorFlow CNN example is to optimize the model, that is to find the best values of the weights. For that, you use a Gradient descent optimizer with a learning rate of 0.001. The objective is to minimize the loss

      optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001) train_op = optimizer.minimize( loss=loss, global_step=tf.train.get_global_step())

      You are done with the CNN. However, you want to display the performance metrics during the evaluation mode. The performance metrics for a multiclass model is the accuracy metrics. Tensorflow is equipped with a module accuracy with two arguments, the labels, and the predicted values.

      eval_metric_ops = { "accuracy": tf.metrics.accuracy(labels=labels, predictions=predictions["classes"])} return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

      That’s it. You created your first CNN and you are ready to wrap everything into a function in order to use it to train and evaluate the model.

      def cnn_model_fn(features, labels, mode): """Model function for CNN.""" # Input Layer input_layer = tf.reshape(features["x"], [-1, 28, 28, 1]) # Convolutional Layer conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) # Pooling Layer pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) # Convolutional Layer #2 and Pooling Layer conv2 = tf.layers.conv2d( inputs=pool1, filters=36, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) # Dense Layer pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 36]) dense = tf.layers.dense(inputs=pool2_flat, units=7 * 7 * 36, activation=tf.nn.relu) dropout = tf.layers.dropout( inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN) # Logits Layer logits = tf.layers.dense(inputs=dropout, units=10) predictions = { # Generate predictions (for PREDICT and EVAL mode) "classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor") } if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions) # Calculate Loss loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits) # Configure the Training Op (for TRAIN mode) if mode == tf.estimator.ModeKeys.TRAIN: optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001) train_op = optimizer.minimize( loss=loss, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) # Add evaluation metrics Evaluation mode eval_metric_ops = { "accuracy": tf.metrics.accuracy( labels=labels, predictions=predictions["classes"])} return tf.estimator.EstimatorSpec( mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

      The steps below are the same as the previous tutorials.

      First of all, you define an estimator with the CNN model for image classification.

      # Create the Estimator mnist_classifier = tf.estimator.Estimator( model_fn=cnn_model_fn, model_dir="train/mnist_convnet_model")

      A CNN takes many times to train, therefore, you create a Logging hook to store the values of the softmax layers every 50 iterations.

      # Set up logging for predictions tensors_to_log = {"probabilities": "softmax_tensor"} logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)

      You are ready to estimate the model. You set a batch size of 100 and shuffle the data. Note that we set training steps of 16.000, it can take lots of time to train. Be patient.

      # Train the model train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": X_train_scaled}, y=y_train, batch_size=100, num_epochs=None, shuffle=True) mnist_classifier.train( input_fn=train_input_fn, steps=16000, hooks=[logging_hook])

      Now that the model is train, you can evaluate it and print the results

      # Evaluate the model and print results eval_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": X_test_scaled}, y=y_test, num_epochs=1, shuffle=False) eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn) print(eval_results) INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Starting evaluation at 2023-08-05-12:52:41 INFO:tensorflow:Graph was finalized. INFO:tensorflow:Restoring parameters from train/mnist_convnet_model/model.ckpt-15652 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Finished evaluation at 2023-08-05-12:52:56 INFO:tensorflow:Saving dict for global step 15652: accuracy = 0.9589286, global_step = 15652, loss = 0.13894269 {'accuracy': 0.9689286, 'loss': 0.13894269, 'global_step': 15652}

      With the current architecture, you get an accuracy of 97%. You can change the architecture, the batch size and the number of iteration to improve the accuracy. The CNN neural network has performed far better than ANN or logistic regression. In the tutorial on artificial neural network, you had an accuracy of 96%, which is lower the CNN. The performances of the CNN are impressive with a larger image set, both in term of speed computation and accuracy.


      A convolutional neural network works very well to evaluate picture. This type of architecture is dominant to recognize objects from a picture or video.

      To build a TensorFlow CNN, you need to follow Seven steps:

      Step 1: Upload Dataset:

      Step 2: Input layer:

      This step reshapes the data. The shape is equal to the square root of the number of pixels. For instance, if a picture has 156 pixels, then the shape is 26×26. You need to specify if the picture has colour or not. If yes, then you had 3 to the shape- 3 for RGB-, otherwise 1.

      input_layer = tf.reshape(tensor = features["x"],shape =[-1, 28, 28, 1])

      Step 3: Convolutional layer

      Next, you need to create the convolutional layers. You apply different filters to allow the network to learn important feature. You specify the size of the kernel and the amount of filters.

      conv1 = tf.layers.conv2d( inputs=input_layer, filters=14, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)

      Step 4: Pooling layer

      In the third step, you add a pooling layer. This layer decreases the size of the input. It does so by taking the maximum value of the a sub-matrix. For instance, if the sub-matrix is [3,1,3,2], the pooling will return the maximum, which is 3.

      pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

      Step 5: Add Convolutional Layer and Pooling Layer

      In this step, you can add as much as you want conv layers and pooling layers. Google uses architecture with more than 20 conv layers.

      Step 6: Dense layer

      The step 6 flatten the previous to create a fully connected layers. In this step, you can use different activation function and add a dropout effect.

      pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 36]) dense = tf.layers.dense(inputs=pool2_flat, units=7 * 7 * 36, activation=tf.nn.relu) dropout = tf.layers.dropout( inputs=dense, rate=0.3, training=mode == tf.estimator.ModeKeys.TRAIN)

      Step 7: Logit Layer

      The final step is the prediction.

      logits = tf.layers.dense(inputs=dropout, units=10)

      You're reading Cnn Image Classification In Tensorflow With Steps & Examples

      Evaluate Your Model – Metrics For Image Classification And Detection

      This article was published as a part of the Data Science Blogathon

      Deep learning techniques like image classification, segmentation, object detection are used very commonly. Choosing the right evaluation metrics is very crucial to decide which model to use, how to tune the hyperparameters, the need for regularization techniques, and so on. I have included the metrics I have used to date.

      Classification Metrics

      Let’s first consider Classification metrics for image classification. Image classification problems can be binary or multi-classification. Example for binary classification includes detection of cancer, cat/dog, etc. Some examples for Multi-label classification include MNIST, CIFAR, and so on.

      The first metric that you think of usually is Accuracy. It’s a simple metric that calculates the ratio of Correct predictions to wrong predictions. But is it always valid?

      Let’s take a case of an imbalanced dataset of cancer patients. Here, the majority of the data points will belong to the negative class and very few in the positive class. So, just by classifying all the patients as “Negative”, the model would have achieved great accuracy!

      Confusion Matrix

      The next step usually is to plot the confusion Matrix.  It has 4 categories: True positives, True negatives, false positives, and false negatives. Using this matrix, we can calculate various useful metrics!

      Accuracy =  (TP + TN) / ( TP + TN + FP + FN)

      You can find this using just a few lines of code with sklearn metrics library.

      from sklearn.metrics import confusion_matrix, accuracy_score # Threshold can be optimized for each problem threshold=0.5 tn, fp, fn, tp = confusion_matrix(labels_list, preds_list).ravel() accuracy = accuracy_score(labels_list, preds_list

      You would have probably heard terms like recall or sensitivity. They are the same!

      Sensitivity/ True Positive Rate:

      TPR/Sensitivity denotes the percentage/fraction of the positive class that was correctly predicted and classified! It’s also called Recall.

      Sensitivity = True Positives/ (True Positives + True Negatives)

      An example: What percent of actual cancer-infected patients were detected by the model?

      Specificity / True Negative Rate:

      While it’s essential to correctly predict positive class, imagine what would happen if a cancer-negative patient has been told incorrectly that he’s in danger! (False positive)

      Specificity is a metric to calculate what portion of the negative class has been correctly predicted and classified.

      Specificity = True Negatives/ (False Positives + True Negatives)

      This is also called as True Negative Rate (TPR)

      Specificity and Sensitivity are the most commonly used metrics. But, we need to understand FPR also to get ROC.

      False Positive Rate

      This calculates how many negative class samples were incorrectly classified as positive.

      FPR = 1 – Specificity

      For a good classification model, what is that we desire?

      A higher TPR and lower FPR!

      Another useful method is to get the AUC ROC curve for your confusion matrix. Let’s look into it!

      AUC ROC Curve

      ROC stands for Receiver Operator Characteristic (ROC). AUC just means Area under the curve. Here, we plot the True Positive Rate against False Positive Rate for various thresholds.

      Generally, if a prediction has a value above 0.5, we classify it into positive class, else, negative class. Here, this deciding boundary 0.5 is denoted as the threshold. It’s not always necessary to use 0.5 as the threshold, sometimes other values might give the best results. To find out this, we plot TPR  vs  FPR against a range of threshold values. Usually, the thresholds are varied from 0.1, 0.2, 0.3, 0.4, and so on to 1.

      Image: Source

      For a particular threshold, if you want to calculate a ROC AUC Score, sklearn provides a function. You can use it as shown.

      from sklearn.metrics import roc_auc_score   roc_auc = roc_auc_score(labels, predictions)

      The top left corner of the graph is where you should look for your optimal threshold!

      If you want to plot the ROC AUC graph, you can use blow snippets

      fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2) import matplotlib.pyplot as plt plt.plot(fpr,tpr)

      Here, the fpr and tpr is given by the function will be a list/array containing the respective values for each threshold value in the list.

      You can also plot sensitivity and specificity against thresholds to get more information.

      Object detection Metrics

      Object detection has many applications including face detection, Lane detection in Auto-driver systems, and so on. Here, we need to use a different set of metrics to evaluate. The most popular one is IOU. Let’s begin!

      IOU (Intersection over Union)

      So in object detecting or segmentation problems, the ground truth labels are masks of a portion or a bounding box where the object is present. The IOU metric finds the difference between the prediction bounding box and the ground truth bounding box.

      IOU = Area of Intersection of the two bounding boxes / Area of Union

      Source: Image

      IOU will be a value between 0-1. For perfectly overlapping boxes, it will be 1 and 0 for non-overlapping prediction. Generally, IOU should be above 0.5 for a decent object detection model.

      Mean Average Precision (mAP)

      Using the IOU, precision, and recall can be calculated.


      You have to set an IOU Threshold value. For example, let’s say I keep the IOU threshold as 0.5. Then for a prediction of IOU as 0.8, I can classify it as True positive. If it’s 0.4 (less than 0.5) then it is a False Positive.  Also note that if we change the threshold to 0.4, then this prediction would classify as True Positive. So, varying thresholds can give different metrics.

      Next, Average Precision (AP) is obtained by finding the area under the precision-recall curve. The mAP for object detection is the average of the AP calculated for all the classes to determine the accuracy of a set of object detections from a model when compared to ground-truth object annotations of a dataset.

      The mean Average Precision is calculated by taking the mean of AP over all classes and/or overall IoU thresholds.

      Many object detection algorithms including Faster R-CNN, MobileNet use this metric. This metric provides numerical value making it easier to compare with other models.

      Thanks for reading! You can connect with me at [email protected]

      The media shown in this article on Metrics for Image Classification are not owned by Analytics Vidhya and are used at the Author’s discretion.


      How Debugging Works In Tensorflow?

      Introduction to TensorFlow Debugging

      In this article, we will try and understand what the different ways of debugging can be done in TensorFlow. Generally debugging is very useful for finding out the values getting flown down in the code and where exactly the code is breaking. All the languages present in the market provide inbuilt functionality for debugging. Similarly, in TensorFlow also provides different classes and packages with which we can identify the flow of the data in the algorithms and optimize the algorithm’s performance.

      Start Your Free Data Science Course

      How Debugging Works in TensorFlow?

      Now let’s see how the debugging works in TensorFlow.

      The core program part where the debugging can be enabled in TensorFlow are:

      graph(though the use of this function we can build a computation graph)

      session(though the use of this function we can execute the graph)

      there are in total 4 ways as shown below through which we can perform debugging in TensorFlow

      1. Fetching and Printing Values for a Particular Tensor

      This is the easiest to use step where we can add breakpoints and print out the values to get the required information


      It is very easy and quick to implement.

      And information can be fetched from anywhere we want.

      If we print any information at any point, then that will create a reference to that particular tensor which is not a good practice to keep

      2. The tf.print function

      This method can come handy while checking some output in runtime. It will just create a log for the particular line in the with the use of the method.


      This method is handy as it helps us to monitor the development of the values during the run time.

      Since this creates a log of the terminal data during the execution of the algorithm, it might fill up the screen with the logs that are not a good practice Afterall.

      just want to discuss the tool which TensorFlow provides called Tensor Board. It’s a web UI for TensorFlow visualization developed by Google and runs locally in the system. Below is the screenshot for the website. It is generally used to visualize the performance of the TensorFlow algorithm and monitor its performance. This Dashboard also comes with a plugin for debugging.

      3. TensorBoard visualization

      With this visualization, we can use to monitor various things about the out model, such as:

      We can summarize the model.

      View the performance.

      Serialize the data in the model.

      Clean the graph and give proper nomenclature.

      This is basically more or less a monitoring tool used to monitor the performance of our model.

      Now moving on to TensorBoard Debugger.

      4. TensorBoard Debugger

      As explained earlier, TensorBoard is a visualizing tool so that visualization can be debugged using this plugin. It provides various cool debugging features such as:

      We can select particular nodes in the Tensor and debug them.

      Graphically we can control the execution of the model.

      And finally, we can also visualize the tensors and their values.

      Below is the screenshot of this TensorBoard Debugger in action:

      The code TensorFlow packages which are used for the debugging are:

      Here tf_debug is the debugger that needs to be imported from the TensorFlow.python package to run the debugging on TensorFlow.

      And the below two lines are used to invoke the TenorBoard locally through the terminal.

      Advantages of TensorFlow Debugging

      We can identify we can output value and a particular stage through the use of debugging while the algorithm is getting trained.

      Using the Tensor board application, we can identify and see the performance of our algorithm in a graphical format.

      We can also run the execution of each and every step of our model using the GUI provided in the Tensor Board.

      The TensorBoard application is very user friendly and easy to understand.

      With the use of a debugger or rather Tensor Board, we can identify if we still need more data cleaning is required on our training data.


      In this article, we learned about the debugging in TensorFlow, the packages present for TensorFlow’s debugging purpose, and how to implement them. We have also seen the use of tensor board applications, which is a useful tool to debug the algorithm while getting trained.

      Recommended Articles

      Automated Intent Classification Using Deep Learning In Google Sheets

      We also learned how to automatically populate Google Sheets in Python.

      Wouldn’t it be cool if we could perform our intent classification directly in Google Sheets?

      That is exactly what we will do here!

      Introducing Google Apps Script

      One limitation of the built-in functions in Google Sheets is that it limits you to predefined behavior.

      The good news is that you can define custom functions with new behavior if you can code them yourself in Google Apps Script.

      Google Apps Script is based on JavaScript and adds additional functionality that helps interact with Sheets, Docs and other Google Apps.

      We are going to define a new custom function named fetchPrediction that will take keywords in Google Sheet cells, and run them through a BERT-powered predictive model to get the intention of search users.

      Here is our plan of action:

      Learn to review and update values in Google Sheets from Apps Script.

      Practice fetching results from an API and populate a sheet with the retrieved values.

      Train our BERT-powered predictive model using Uber’s Ludwig.

      Use Ludwig to power an API we can call from Apps Script.

      Learn some new tools and concepts that help us connect both services together.

      Let’s get started!

      Retrieving Keyword Data From Google Sheets

      This is an empty Google sheet with some barcode related keywords we pulled from SEMrush.

      In our first exercise, we will read and print the first 10 keywords from column A.

      This is a built-in IDE (Integrated Development Environment) for Google Sheets.

      We are going to write a simple JavaScript function called logKeywords that will read all the keywords in our sheet and log them to the console.

      Please refer to the official documentation here.

      function logKeywords() { var data = sheet.getDataRange().getValues(); for (var i = 0; i < data.length; i++) { console.log('Keyword: ' + data[i][0]); } }

      Let’s walk over the function, step by step.

      We first get a reference to the active sheet, in this case, it is Sheet1.

      We didn’t need to authenticate.

      It is a good idea to keep this page in another tab as you will refer to it often as your code and want to see if the changes worked.

      Now, we printed more than 100 rows, which took a bit of time. When you are writing and testing your code, it is better to work with smaller lists.

      We can make a simple change in the loop to fix that.

      function logKeywords() { var data = sheet.getDataRange().getValues(); for (var i = 0; i < 10; i++) { console.log('Keyword: ' + data[i][0]); } }

      When you run this, it not only runs faster but checking the log is also a lot faster.

      Add a Column with keyword IDs

      Next, let’s learn to add data to the sheet.

      We are going to write a new function named addIDtoKeywords. It creates a column with one numeric ID per keyword.

      There isn’t a lot of value in doing this, but it should help you test the technique with something super simple.

      Here is the code to do that.

      function addIDtoKeywords() { var data = sheet.getRange("B1"); var values = []; length = 100; for (var i = 1; i <= length+1; i++){ values.push([i]); } console.log(values.length); var column = sheet.getRange("B2:B102"); column.setValues(values); }

      You should get a new column in the sheet with numbers in increasing order.

      We can also add a column header in bold named Keyword ID using the following code.

      data.setValue("Keyword ID"); data.setFontWeight("bold");

      This is what the updated output looks like.

      It is a very similar code. Let’s review the changes.

      I added a JavaScript array named values to hold the keyword IDs.

      During the loop, I added a line to add each ID generated within the loop to the array.


      I printed the length of the value array at the end of the loop to make sure the correct number of IDs was generated.

      Finally, I need to get the values to the sheet.

      var column = sheet.getRange("B2:B102");

      This code selects the correct cells to populate and then I can simply set their value using the list I generated.


      It can’t get simpler than this!

      Fetching API Results From Apps Script

      In the next exercise, we will learn to perform API requests from Apps Script.

      We are going to adapt code from step 11 which pulls data from a Books API.

      Instead of fetching books, we will translate keywords using the Google Translate API.

      Now, we are starting to write more useful code!

      Here is a new function named fetchTranslation based on code adapted from step 11.

      function fetchTranslation(TEXT){ API_KEY="INPUT YOUR API KEY"; var response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true}); var json = response.getContentText(); translation = JSON.parse(json); return translation["data"]["translations"][0]["translatedText"]; }

      This function takes an input text, encodes it and inserts it into an API URL to call the Google Translate service.

      There is an API key we need to get and also we need to enable to Translate service. I also recommend restricting the API to the IP you are using to test during development.

      Once we have the API URL to call, it is as simple as calling this code.

      var response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true});

      The next lines get us the response in JSON format and after a bit of navigation down the JSON tree, we get the translated text.

      As you can see in my code, I like to log almost every step in the code to the console to confirm it is doing what I expect.

      Here is one example of how I figured out the correct JSON path sequence.

      You can see the progression in the logs here, including the final output.

      Translating Keywords

      As we tested the function and it works, we can proceed to create another function to fetch and translate the keywords from the sheet.

      We will build up from what we’ve learned so far.

      We will call this function a super original name TranslateKeywords!

      function TranslateKeywords() { var header = sheet.getRange("B1"); header.setValue("Translation"); header.setFontWeight("bold"); var keyword = sheet.getRange("A2").getValue(); console.log(keyword); translated_keyword = fetchTranslation(keyword); console.log(translated_keyword); var data = sheet.getRange("B2"); data.setValue(translated_keyword); }

      The code in this function is very similar to the one we used to set Keyword IDs.

      The main difference is that we pass the keyword to our new fetchTranslation function and update a single cell with the result.

      Here is what it looks like for our example keyword.

      As you can probably see, there is no for loop, so this will only update one single row/keyword. The first one.

      Please complete the for loop to get the translation for all keywords as a homework exercise.

      Building an Intent Classification Model

      Let’s move to build our intent classification service that we will call to populate keyword intents.

      In my previous deep learning articles, I’ve covered Ludwig, Uber’s AI toolbox.

      I like it a lot because it allows you to build state-of-the-art deep learning models without writing a single line of code.

      It is also very convenient to run in Google Colab.

      We are going to follow the same steps I described in this article, this will give us a powerful intent prediction model powered by BERT.

      Here is a quick summary of the steps you need paste into Google Colab (make sure to select the GPU runtime!).

      Please refer to my article for the context:

      %tensorflow_version 1.x import tensorflow as tf; print(tf.__version__) !pip install ludwig #upload Question_Classification_Dataset.csv and 'Question Report_Page 1_Table.csv' from google.colab import files files.upload() import pandas as pd df = pd.read_csv("Question_Classification_Dataset.csv", index_col=0) !unzip # create the ludwig configuration file for BERT-powered classification template=""" input_features: - name: Questions type: text encoder: bert config_path: uncased_L-12_H-768_A-12/bert_config.json checkpoint_path: uncased_L-12_H-768_A-12/bert_model.ckpt preprocessing: word_tokenizer: bert word_vocab_file: uncased_L-12_H-768_A-12/vocab.txt padding_symbol: '[PAD]' unknown_symbol: '[UNK]' output_features: - name: Category0 type: category - name: Category2 type: category text: word_sequence_length_limit: 128 training: batch_size: 32 learning_rate: 0.00002 """ with open("model_definition.yaml", "w") as f: f.write(template) !pip install bert-tensorflow !ludwig experiment --data_csv Question_Classification_Dataset.csv --model_definition_file model_definition.yaml

      After completing these steps in Google Colab, we should get a high accuracy predictive model for search intent.

      We can verify the predictions with this code.

      test_df = pd.read_csv("Question Report_Page 1_Table.csv") #we rename Query to Questions to match what the model expects predictions = model.predict(test_df.rename(columns={'Query': 'Questions'} )) test_df.join(predictions)[["Query", "Category2_predictions"]]

      We get a data frame like this one.

      The intentions predicted are not the ones you typically expect: navigational, transactional, informational, but they are good enough to illustrate the concept.

      Please check an awesome article by Kristin Tynski that explains how to expand this concept to get true search intents.

      Turning Our Model Into an API Service

      Ludwig has one super cool feature that allows you to serve models directly as an API service.

      The command for this is Ludwig serve.

      I was trying to accomplish the same thing following a super complicated path because I didn’t check that something like this already existed. 🤦

      It is not installed by default, we need to install it with this command.

      !pip install ludwig[serve]

      We can check the command-line options with:

      !ludwig serve --help

      Creating an API from our model is as simple as running this command.

      !ludwig serve -m results/experiment_run/model INFO: Started server process [5604] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Shutting down INFO: Finished server process [5604]

      As we are running this code in the notebook, we need to use a little trick to push this process to the background (a separate thread).

      %%bash --bg

      The magic command %%bash –bg runs the shellcode in a separate thread returning control to the notebook so we can run code that can interact with the service.

      I found this to be a super cool and valuable trick. I’m also introducing more shell tricks that I learned many years ago.

      The nohup command prevents the process from getting killed when the parent dies. It is optional here.

      We can track the progress of the background process using this command.

      !tail debug.log

      After you see this message, you can proceed to the next step.

      Let’s send a test API request using curl to see if the service works.

      You should get this response back.

      {"Category0_predictions":"HUMAN","Category0_probabilities_":0.00021219381596893072,"Category0_probabilities_ENTITY":7.17515722499229e-05,"Category0_probabilities_HUMAN":0.9988889098167419,"Category0_probabilities_DESCRIPTION":0.000423480843892321,"Category0_probabilities_NUMERIC":2.7793401386588812e-05,"Category0_probabilities_LOCATION":0.0003020864969585091,"Category0_probabilities_ABBREVIATION":7.374086999334395e-05,"Category0_probability":0.9988889098167419,"Category2_predictions":"ind","Category2_probabilities_":8.839580550557002e-05,"Category2_probabilities_ind":0.9759176969528198,"Category2_probabilities_other":0.0013697665417566895,"Category2_probabilities_def":3.929347076336853e-05,"Category2_probabilities_count":4.732362140202895e-05,"Category2_probabilities_desc":0.014149238355457783,"Category2_probabilities_manner":7.225596345961094e-05,"Category2_probabilities_date":7.537546480307356e-05,"Category2_probabilities_cremat":0.00012272763706278056,"Category2_probabilities_reason":0.00042629052768461406,"Category2_probabilities_gr":0.0025540771894156933,"Category2_probabilities_country":0.0002626778441481292,"Category2_probabilities_city":0.0004305317997932434,"Category2_probabilities_animal":0.00024954770924523473,"Category2_probabilities_food":8.139225974446163e-05,"Category2_probabilities_dismed":7.852958515286446e-05,"Category2_probabilities_termeq":0.00023714809503871948,"Category2_probabilities_period":4.197505040792748e-05,"Category2_probabilities_money":3.626687248470262e-05,"Category2_probabilities_exp":5.991378566250205e-05,"Category2_probabilities_state":0.00010361814202342297,"Category2_probabilities_sport":8.741072088014334e-05,"Category2_probabilities_event":0.00013374585250858217,"Category2_probabilities_product":5.6306344049517065e-05,"Category2_probabilities_substance":0.00016623239207547158,"Category2_probabilities_color":1.9601659005274996e-05,"Category2_probabilities_techmeth":4.74867774755694e-05,"Category2_probabilities_dist":9.92789282463491e-05,"Category2_probabilities_perc":3.87108520953916e-05,"Category2_probabilities_veh":0.00011915313370991498,"Category2_probabilities_word":0.00016430433606728911,"Category2_probabilities_title":0.0010781479068100452,"Category2_probabilities_mount":0.00024070330255199224,"Category2_probabilities_body":0.0001515906333224848,"Category2_probabilities_abb":8.521509153069928e-05,"Category2_probabilities_lang":0.00022924368386156857,"Category2_probabilities_plant":4.893113509751856e-05,"Category2_probabilities_volsize":0.0001462997024646029,"Category2_probabilities_symbol":9.98345494735986e-05,"Category2_probabilities_weight":8.899033855414018e-05,"Category2_probabilities_instru":2.636547105794307e-05,"Category2_probabilities_letter":3.7610192521242425e-05,"Category2_probabilities_speed":4.142118996242061e-05,"Category2_probabilities_code":5.926147059653886e-05,"Category2_probabilities_temp":3.687662319862284e-05,"Category2_probabilities_ord":6.72415699227713e-05,"Category2_probabilities_religion":0.00012743560364469886,"Category2_probabilities_currency":5.8569487009663135e-05,"Category2_probability":0.9759176969528198} Exposing Our Service Using Ngrok

      So, we have a new API that can make intent predictions, but one big problem is that it is only accessible from within our Colab notebook.

      Let me introduce another cool service that I use often, Ngrok.

      Ngrok helps you create publicly accessible URLs that connect to a local service like the one we just created.

      I do not recommend doing this for production use, but it is very handy during development and testing.

      You don’t need to create an account, but I personally do it because I get to set up a custom subdomain that I use very frequently.

      Here are the steps to give our API a public URL to call from App Script.

      We first download and uncompress ngrok.

      %%bash --bg

      The code above tells ngrok to connect to the local service in port 8000. That is all we need to do.

      You can confirm it works by repeating the curl call, but calling the public URL. You should get the same result.

      If you don’t want to set up a custom domain, you can use this code instead.

      %%bash --bg

      This will generate a random public URL and you get retrieve with this code.

      "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

      Now, we get back to our final steps.

      Fetching Intent Predictions

      We are going to adapt the code we used to make Google Translate API requests so we can make intent prediction requests.

      One big difference between the two API services is that we need to make HTTP POST requests instead of simpler HTTP GET requests.

      Let’s see how that changes our code and learn a bit more about HTTP in the process.

      function fetchPrediction(question = "who is the boss?"){ TEXT = encodeURI(TEXT); console.log(TEXT); var options = { "method" : "POST", "contentType" : "application/x-www-form-urlencoded", "payload" : TEXT, 'muteHttpExceptions': true }; var response = UrlFetchApp.fetch(url, options); var json = response.getContentText(); prediction = JSON.parse(json); console.log(prediction["Category0_predictions"]); return prediction["Category0_predictions"]; }

      The function fetchPrediction calls the API service we created and returns the predicted intent. It basically reproduces the equivalent of the curl call we made Colab, but in Apps Script.

      I highlighted some key changes in the code. Let’s review them.

      One key difference between GET and POST requests is that in GET requests the data is passed in the URL as parameters.

      In POST requests, the data is passed inside the body of the request.

      We need to format the data before we pass it in the body and we need to set the correct content type so the server knows how to decode it.

      This line encodes the question we are passing.

      TEXT = encodeURI(TEXT);

      This is an example of what the encoded TEXT looks like.


      The correct content type for this encoding is application/x-www-form-urlencoded. This is recommended encoding for HTML form data.

      We create an options data structure where we specify these settings and the correct request type and we are set to go.

      You should see the encoded input and predicted intent in the logs.

      How do we get the intentions for all the keywords in the sheet?

      You might be thinking we will create another function that will read the keywords in a loop and populate the intentions. Not at all!

      We can simply call this function by name directly from the sheet! How cool is that?

      Resources to Learn More

      Combining simple App Script functions with powerful API backends that you can code in any language opens the doors to infinite productivity hacks.

      Here are some of the resources I read while putting this together.

      Finally, let me highlight a very important and valuable project that JR Oakes started.

      It is an awesome repository for Python and JavaScript projects from the coders in the SEO community. I plan to find time to upload my code snippets, please make sure to contribute yours.

      For some reason, this non-issue keeps popping up in my Twitter feed. I will leave this tweet here as a friendly reminder. ✌️

      — Hamlet 🇩🇴 🇺🇸 (@hamletbatista) March 10, 2023

      More Resources:

      Image Credits

      All screenshots taken by author, March 2023

      How To Read An Image File In External Storage With Runtime Permission In Android?

         android:layout_width = “match_parent”    android:layout_height = “match_parent”    tools:context = “.MainActivity” <Button android:id = “@+id/read” android:text = “read” android:layout_width = “wrap_content” <ImageView android:id = “@+id/imageView” android:layout_width = “wrap_content” import; import; import; import android.os.Build; import android.os.Bundle; import android.os.Environment; import; import; import; import android.util.Log; import android.view.View; import android.widget.Button; import android.widget.ImageView; import android.widget.Toast; import; public class MainActivity extends AppCompatActivity {    private static final int PERMISSION_REQUEST_CODE<100;    Button read;    ImageView imageView;    @Override    protected void onCreate(Bundle savedInstanceState) {       super.onCreate(savedInstanceState);       setContentView(R.layout.activity_main);       imageView<findViewById(;       read<findViewById(;          @Override             String state<Environment.getExternalStorageState();             if (Environment.MEDIA_MOUNTED.equals(state)) {                   if (checkPermission()) {                      File dir<new File(Environment.getExternalStorageDirectory().getAbsolutePath() + “/images.jpeg”);                      if (dir.exists()) {                         Log.d(“path”, dir.toString());                         BitmapFactory.Options options<new BitmapFactory.Options();                         options.inPreferredConfig<Bitmap.Config.ARGB_8888;                         Bitmap bitmap<BitmapFactory.decodeFile(String.valueOf(dir), options);                         imageView.setImageBitmap(bitmap);                      }                   } else {                      requestPermission();                   }                } else {                   File dir<new File(Environment.getExternalStorageDirectory().getAbsolutePath() + “/images.jpeg”);                   if (dir.exists()) {                      Log.d(“path”, dir.toString());                      BitmapFactory.Options options<new BitmapFactory.Options();                      options.inPreferredConfig<Bitmap.Config.ARGB_8888;                      Bitmap bitmap<BitmapFactory.decodeFile(String.valueOf(dir), options);                      imageView.setImageBitmap(bitmap);                   }                }             }          }       });    }    private boolean checkPermission() {       int result<ContextCompat.checkSelfPermission(MainActivity.this,     android.Manifest.permission.READ_EXTERNAL_STORAGE);       if (result<= PackageManager.PERMISSION_GRANTED) {          return true;       } else {          return false;       }    }    private void requestPermission() {       if (ActivityCompat.shouldShowRequestPermissionRationale(MainActivity.this, android.Manifest.permission.READ_EXTERNAL_STORAGE)) {          Toast.makeText(MainActivity.this, “Write External Storage permission allows us to read  files. Please allow this permission in App Settings.”, Toast.LENGTH_LONG).show();       } else {          ActivityCompat.requestPermissions(MainActivity.this, new String[] {android.Manifest.permission.READ_EXTERNAL_STORAGE}, PERMISSION_REQUEST_CODE);       }    }    @Override    public void onRequestPermissionsResult(int requestCode, String permissions[], int[] grantResults) {       switch (requestCode) {          case PERMISSION_REQUEST_CODE:             Log.e(“value”, “Permission Granted, Now you can use local drive .”);          } else {             Log.e(“value”, “Permission Denied, You cannot use local drive .”);          }          break;       }    }    <application       android:allowBackup = “true”       android:icon = “@mipmap/ic_launcher”       android:label = “@string/app_name”       android:roundIcon = “@mipmap/ic_launcher_round”       android:supportsRtl = “true”

      Drawing A Cross On An Image With Opencv

      OpenCV is an Open Source Computer Vision Library in python. It provides numerous functions to perform various Image and video processing operations. The library uses the Numpy module to represent all the video frames and images as a ndarray type. It needs the numpy library, we need to make sure that the numpy module is also installed in our python interpreter.

      In this article, we will see different ways to draw a cross on an image using OpenCV Python. Let’s observe the input-output scenario to understand how to draw a cross on an image.

      Input Output Scenarios

      Let’s discuss what are the different ways draw cross on an image.

      Using cv2.drawMarker() function

      The function draws a marker on an image by a predefined position. And it supports several marker types. Following is the syntax of this function –

      cv.drawMarker(img, position, color[, markerType[, markerSize[, thickness[, line_type]]]]) Parameters

      img: The source image where to draw the marker.

      position: The position where the crosshair is positioned.

      color: It specifies the color of the marker.

      thickness: It is an optional parameter. It specifies the line thickness of the marker.

      markerType: The specifies the marker type. The available types are:

      cv2.MARKER_CROSS: A crosshair marker shape.

      cv2.MARKER_TILTED_CROSS: A 45-degree tilted crosshair marker shape.

      cv2.MARKER_STAR: A star marker shape, which is combination of cross and tilted cross.

      cv2.MARKER_DIAMOND: A diamond marker shape.

      cv2.MARKER_SQUARE: A square marker shape.

      cv2.MARKER_TRIANGLE_UP: An upwards-pointing triangle marker shape.

      cv2.MARKER_TRIANGLE_DOWN: A downwards-pointing triangle marker shape.

      thickness: it specifies the line thickness.

      lineType (Optional): It specifies the type of line we want to use. The available 4 LineTypes are:





      markerSize: it specifies the length of the marker by default it is set to 20 pixels


      In this example, we will draw a black cross on the input image.

      import cv2 from random import randint img = cv2.imread('Images/butterfly1.jpg') cv2.imshow('Input image', img) cv2.drawMarker(img, (250, 160), color=[0, 0, 0], thickness=10, markerType= cv2.MARKER_TILTED_CROSS, line_type=cv2.LINE_AA, markerSize=100) cv2.imshow('Output image', img) cv2.waitKey(0) Input Image

      Output Image

      Using cv2.line() function

      The function draws a line between two connecting points pt1 and pt2 in the image. Following is the syntax of the line() function –

      cv.line(img, pt1, pt2, color[, thickness[, lineType[, shift]]]) Parameters

      img: The source image where to draw the marker.

      pt1: A tuple with the x and y coordinates of the image where the line should start.

      pt2: A tuple with the x and y coordinates of the image where the line should end.

      color: It specifies the color of the marker.

      thickness: It is an optional parameter. It specifies the line thickness of the marker.

      lineType (Optional): It specifies the type of line we want to use. The available 4 LineTypes are:





      shift: it specifies the number of fractional bits in the point coordinates.


      Let’s take an image and draw a cross using the cv2.line() method.

      import cv2 from random import randint img = cv2.imread('Images/flower-black-background.jpg') cv2.imshow('Input image', img) coordinates = [[(420, 280), (520,280)], [(470, 220), (470,350)]] cv2.line(img, coordinates[0][0],coordinates[0][1], color=[0, 0, 250], thickness=20) cv2.line(img, coordinates[1][0],coordinates[1][1], color=[0, 0, 250], thickness=20) cv2.imshow('Output image', img) cv2.waitKey(0) Input Image


      In this example, we will draw cross lines by covering the 4 corners of the image. Initially, we will get the dimensions of an image using the shape attribute of the numpy array(image array), and from those values, we can identify the image corners.

      import cv2 from random import randint img = cv2.imread('Images/Lenna.png') cv2.imshow('Input image', img) # image height = shape[0] # image width = shape[1] shape = img.shape cv2.line(img, (0,0),shape[:2], color=[0, 0, 250], thickness=20) cv2.line(img, (shape[1], 0), (0, shape[0]), color=[0, 0, 250], thickness=20) cv2.imshow('Output image', img) cv2.waitKey(0) Input Image

      Output Image

      This is how the python OpenCV functions cv2.lines() and drawMarker() are drawn a cross on the specific image.

      Update the detailed information about Cnn Image Classification In Tensorflow With Steps & Examples on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!