GOVERNMENT ENGINEERING COLLEGE THRISSUR 2018 M
GOVERNMENT ENGINEERING COLLEGE
M.TECH SEMINAR REPORT ON
MACHINE LEARNING APPROACHES OF LEAF RECOGNITION
NAYANA P B (Reg. No. TCR17ECCP13)
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING GOVERNMENT ENGINEERING COLLEGE, THRISSUR THRISSUR – 680 009
The Kerala is famous for Ayurveda therapies. These medicines are based
on plant parts. The Ayurveda medicines are preferred for the hepatitis pa-
tients and for fracture treatments etc. So it is necessary to identify the plants
accurately in order to make medicines.If the plant selected is not accurate
then it leades to poor quality of the drugs.One of the method is to recog-
nize the leaf is visual exploration.But it may be inaccurate sometimes.The
botanist may have the ability to recognize the medicinal plants.But it is di-
cult for common people.so there is need for automatic recognition of medicinal
plants,which will mainly help common people.The ma jor steps of leaf recog-
nition are feature extraction and classication.There are dierent methods of
leaf classication,which include deep learning and machine learning.Some of
the machine learning approaches are Nave bayes classier algorithm,Support
vector machine algorithm,Articial neural networks,Random forests and K-
nearest neighbor algorithm.
I express my indebtedness to the Almighty for, among many other things,
the success of this seminar.
I take this opportunity to place on record my heartfelt gratitude and thanks
to Dr.Tha judin Ahamed V I , Head of the Department of Electronics
and Communication Engineering, Govt. Engineering College, Thrissur for his
advice, support and guidance throughout the seminar.
I am deeply grateful to Mr.mohanan k.p, who supported us in numerous
ways and for her invaluable role in coordinating the seminar.
I express my gratitude to all faculty members and supporting sta of the
department for the help and support given to me.
Finally I would like to acknowledge my deep sense of gratitude to all well
wishers and friends who helped me directly or indirectly to complete this work.
1 INTRODUCTION 1
2 FEATURES OF LEAF IMAGE 2
3 MACHINE LEARNING 3 3.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2.1 Nave Bayes Classier Algorithm . . . . . . . . . . . . . . 3
3.2.2 Support Vector Machine Learning Algorithm . . . . . . . 5
3.2.3 Random Forest Machine Learning Algorithm . . . . . . . 5
3.2.4 Articial neural networks . . . . . . . . . . . . . . . . . . 7
3.2.5 Multilayer perceptron . . . . . . . . . . . . . . . . . . . . 8
3.2.6 K nearest neighbor . . . . . . . . . . . . . . . . . . . . . 8
3.2.7 Probablistic neural networks . . . . . . . . . . . . . . . . 9
4 CONCLUSION 10
List of Figures
3.1 Architecture of ANN . . . . . . . . . . . . . . . . . . . . . . . . 8
List of abbreviations
ANN Articial neural networks
CNN Convolutional neural networks
KNN K Nearest neighbor
MLP Multilayer perceptron
OSH Optimal Separating Hyper Plane
PNN Probabilistic neural network
SVM Support vector machine
Ayurvedic medicines are based on herbs.Correct identication leads to qual-
ity assurance of ayurvedic drugs. Each plant is dierent from other plants by
their unique properties.Depending on their shape,colour,texture we can iden-
tify the plants uniquely.
FEATURES OF LEAF IMAGE
3.1 Concept Machine learning applications are highly automated and self-modifying which
continue to improve over time with minimal human intervention as they learn
with more data.To address the complex nature of various real world data prob-
lems, specialized machine learning algorithms have been developed that solve
these problems perfectly.Machine Learning algorithms are classied as
ˆSupervised Machine Learning Algorithms: Machine learning algorithms
that make predictions on given set of samples. Supervised machine learn-
ing algorithm searches for patterns within the value labels assigned to
ˆ Unsupervised Machine Learning Algorithms: There are no labels associ-
ated with data points. These machine learning algorithms organize the
data into a group of clusters to describe its structure and make complex
data look simple and organized for analysis.
ˆ Reinforcement Machine Learning Algorithms These algorithms choose an
action, based on each data point and later learn how good the decision
was. Over time, the algorithm changes its strategy to learn better and
achieve the best reward.
3.2 ALGORITHMS The Common Machine Learning Algorithms are Nave Bayes Classier Al-
gorithm,Algorithm,Support Vector Machine Algorithm,Articial Neural Net-
works,Random Forests and k Nearest Neighbours.
3.2.1 Nave Bayes Classier Algorithm It would be dicult and practically impossible to classify a web page, a
document, an email or any other lengthy text notes manually. This is where
Nave Bayes Classier machine learning algorithm comes to the rescue. A
M.TECH SEMINAR, 2018
classier is a function that allocates a populations element value from one of
the available categories. For instance, Spam Filtering is a popular application
of Nave Bayes algorithm. Spam lter here, is a classier that assigns a label
Spam or Not Spam to all the emails. Nave Bayes Classier is amongst the most popular learning method grouped
by similarities, that works on the popular Bayes Theorem of Probability- to
build machine learning models particularly for disease prediction and document
classication. It is a simple classication of words based on Bayes Probability
Theorem for sub jective analysis of content.
When to use the Machine Learning algorithm – Nave Bayes Classi-
ˆIf you have a moderate or large training data set.
ˆ If the instances have several attributes.
ˆ Given the classication parameter, attributes which describe the in-
stances should be conditionally independent.
Applications of Nave Bayes Classier ˆSentiment Analysis: It is used at Facebook to analyse status updates
expressing positive or negative emotions.
ˆ Document Categorization: Google uses document classication to in-
dex documents and nd relevancy scores i.e. the PageRank. PageRank
mechanism considers the pages marked as important in the databases
that were parsed and classied using a document classication technique
ˆ Nave Bayes Algorithm is also used for classifying news articles about
Technology, Entertainment, Sports, Politics, etc.
ˆ Email Spam Filtering-Google Mail uses Nave Bayes algorithm to classify
your emails as Spam or Not Spam.
Advantages of the Nave Bayes Classier Machine Learning Algo-
ˆNave Bayes Classier algorithm performs well when the input variables
ˆ A Nave Bayes classier converges faster, requiring relatively little train-
ing data than other discriminative models like logistic regression, when
the Nave Bayes conditional independence assumption holds.
ˆ With Nave Bayes Classier algorithm, it is easier to predict class of the
test data set. A good bet for multi class predictions as well.
Department of ECE, GEC, Thrissur 4
M.TECH SEMINAR, 2018
ˆ Though it requires conditional independence assumption, Nave Bayes
Classier has presented good performance in various application do-
3.2.2 Support Vector Machine Learning Algorithm Support Vector Machine is a supervised machine learning algorithm for clas-
sication or regression problems where the dataset teaches SVM about the
classes so that SVM can classify any new data. It works by classifying the
data into dierent classes by nding a line (hyperplane) which separates the
training data set into classes. As there are many such linear hyperplanes, SVM
algorithm tries to maximize the distance between the various classes that are
involved and this is referred as margin maximization. If the line that maximizes
the distance between the classes is identied, the probability to generalize well
to unseen data is increased.
SVMs are classied into two categories:
ˆLinear SVMs : In linear SVMs the training data i.e. classiers are sepa-
rated by a hyperplane.
ˆ Non-Linear SVMs : In non-linear SVMs it is not possible to separate the
training data using a hyperplane.
Advantages of Using SVM ˆSVM oers best classication performance (accuracy) on the training
ˆ SVM renders more eciency for correct classication of the future data.
ˆ The best thing about SVM is that it does not make any strong assump-
tions on data.
ˆ It does not over-t the data.
Applications of Support Vector Machine SVM is commonly used for stock market forecasting by various nancial
institutions. For instance, it can be used to compare the relative performance
of the stocks when compared to performance of other stocks in the same sector.
The relative comparison of stocks helps manage investment making decisions
based on the classications made by the SVM learning algorithm.
3.2.3 Random Forest Machine Learning Algorithm Random Forest is the go to machine learning algorithm that uses a bagging
approach to create a bunch of decision trees with random subset of the data.
A model is trained several times on random sample of the dataset to achieve
good prediction performance from the random forest algorithm.In this ensem-
ble learning method, the output of all the decision trees in the random forest,
Department of ECE, GEC, Thrissur 5
M.TECH SEMINAR, 2018
is combined to make the nal prediction. The nal prediction of the random
forest algorithm is derived by polling the results of each decision tree or just
by going with a prediction that appears the most times in the decision trees.
Why use Random Forest Machine Learning Algorithm? ˆThere are many good open source, free implementations of the algorithm
available in Python and R.
ˆ It maintains accuracy when there is missing data and is also resistant to
ˆ Simple to use as the basic random forest algorithm can be implemented
with just a few lines of code.
ˆ Random Forest machine learning algorithms help data scientists save
data preparation time, as they do not require any input preparation
and are capable of handling numerical, binary and categorical features,
without scaling, transformation or modication.
ˆ Implicit feature selection as it gives estimates on what variables are im-
portant in the classication.
Advantages of Using Random Forest Machine Learning Algorithms ˆOvertting is less of an issue with Random Forests, unlike decision tree
machine learning algorithms. There is no need of pruning the random
ˆ These algorithms are fast but not in all cases. A random forest algorithm,
when run on an 800 MHz machine with a dataset of 100 variables and
50,000 cases produced 100 decision trees in 11 minutes.
ˆ Random Forest is one of the most eective and versatile machine learning
algorithm for wide variety of classication and regression tasks, as they
are more robust to noise.
ˆ It is dicult to build a bad random forest. In the implementation of
Random Forest Machine Learning algorithms, it is easy to determine
which parameters to use because they are not sensitive to the parameters
that are used to run the algorithm. One can easily build a decent model
without much tuning.
ˆ Random Forest machine learning algorithms can be grown in parallel.
ˆ This algorithm runs eciently on large databases.
ˆ Has higher classication accuracy.
Department of ECE, GEC, Thrissur 6
M.TECH SEMINAR, 2018
Drawbacks of Using Random Forest Machine Learning Algorithms ˆThey might be easy to use but analysing them theoretically, is dicult.
Large number of decision trees in the random forest can slow down the
algorithm in making real-time predictions.
ˆ If the data consists of categorical variables with dierent number of levels,
then the algorithm gets biased in favour of those attributes that have
more levels. In such situations, variable importance scores do not seem
to be reliable.
ˆ When using RandomForest algorithm for regression tasks, it does not
predict beyond the range of the response values in the training data.
Applications of Random Forest Machine Learning Algorithms ˆRandom Forest algorithms are used by banks to predict if a loan applicant
is a likely high risk.
ˆ They are used in the automobile industry to predict the failure or break-
down of a mechanical part.
ˆ These algorithms are used in the healthcare industry to predict if a pa-
tient is likely to develop a chronic disease or not.
ˆ They can also be used for regression tasks like predicting the average
number of social media shares and performance scores.
ˆ Recently, the algorithm has also made way into predicting patterns in
speech recognition software and classifying images and texts.
3.2.4 Articial neural networks An ANN is based on a collection of connected units or nodes called articial
neurons . Each connection between articial neurons can transmit a signal
from one to another. The articial neuron that receives the signal can pro-
cess it and then signal articial neurons connected to it. In common ANN
implementations, the signal at a connection between articial neurons is a real
number, and the output of each articial neuron is calculated by a non-linear
function of the sum of its inputs. Articial neurons and connections typically
have a weight that adjusts as learning proceeds. The weight increases or de-
creases the strength of the signal at a connection. Articial neurons may have
a threshold such that only if the aggregate signal crosses that threshold is
the signal sent. Typically, articial neurons are organized in layers. Dierent
layers may perform dierent kinds of transformations on their inputs. Signals
travel from the rst (input), to the last (output) layer, possibly after traversing
the layers multiple times.
Department of ECE, GEC, Thrissur 7
M.TECH SEMINAR, 2018
The phase of operationsof ANN are testing phase and training phase.Input
images are trained with the ANN in the training phase and in testing phase
the the image is detected which is more closer to the trained image. Figure 3.1: Architecture of ANN
The feed forward back propagation neural network is shown in the gure.A
feedforward neural network is an articial neural network wherein connections
between the units do not form a cycle.Backpropagation is a method used in
articial neural networks to calculate a gradient that is needed in the calcu-
lation of the weights to be used in the network.It is commonly used to train
deep neural networks, a term used to explain neural networks with more than
one hidden layer.The O1,O2,….Om represents the output vector,which is basi-
cally the plant class.F1,F2……..Fn represents the input vector,which consist of
features of image.As the number of hidden layers increases then the accuracy
also increases and complexity of the system decreases.
3.2.5 Multilayer perceptron A multilayer perceptron (MLP) is a class of feedforward articial neural net-
work. An MLP consists of at least three layers of nodes. Except for the input
nodes, each node is a neuron that uses a nonlinear activation function. MLP
utilizes a supervised learning technique called backpropagation for training.
Its multiple layers and non-linear activation distinguish MLP from a linear
perceptron. It can distinguish data that is not linearly separable. The MLP
consists of three or more layers (an input and an output layer with one or
more hidden layers) of nonlinearly-activating nodes making it a deep neural
network. Since MLPs are fully connected, each node in one layer connects with
a certain weight to every node in the following layer.
3.2.6 K nearest neighbor in KNN the ob jects are classied based on the similarity between the training
image and testing image.Training stage include thefeature extraction,storing
feature vectors and labelling the training images.The neighbors are the near
by pixels.Unlabelled points are taken as it’s neighbors.According to the labels
Department of ECE, GEC, Thrissur 8
M.TECH SEMINAR, 2018
of k nearest neighbors ob jects are classied.k-NN is a type of instance-based
learning, or lazy learning, where the function is only approximated locally and
all computation is deferred until classication. The k-NN algorithm is among
the simplest of all machine learning algorithms.
3.2.7 Probablistic neural networks
PNN is also a feed forward neural network.When the input is applied then the
rst layer calculate the distance between input vector of image and training
image vector.The second layer perform summation of each class of input and
produces it’s net output as a vector of probabilities.
Department of ECE, GEC, Thrissur 9
Leaf recognition composed of mainly two steps, feature extraction and classi-
cation.There are a number of classication algorithms which is used for image
recognition.we can use SVM ,KNN,Or ANN etc.the KNN is more suitable for
medicinal plant recognition.Because it is mainly used for multiclass classica-
tion. As the number of features are increased the eciency of leaf recognition
system is also increases. The morphological parameters are mainly used for
the recognition of leaf.If the number features used to recognize the plant is
increases then the eciency of the system also increases.
1 Amala Sabu & Sreekumar K. (2017). Literature review of image features and classiers used in leaf based plant recognition through image anal-
ysis approach. Inventive Communication and Computational Technolo-
2 D Venkataraman & Mangayarkarasi N(2016). Computer Vision Based Feature Extraction of Leaves for Identication of Medicinal Values of
Plants.In Computational Intelligence and Computing Research. IEEE.
3 T. Sathwik et al. Classication of Selected Medicinal Plant Leaves Using Texture Analysis. In computing,communication and network technologies.
4 Ankur Gupta, Dr. B. S. Rai. (May 2014).Recognition of plants by leaf image using nearest neighborhood classication . In International Journal
For Technological Research In Engineering , Volume 1, Issue 9.
5 E. Sandeep Kumar and Viswanath Talasila.( April 2014).Leaf Features based approach for Automated Identication of Medicinal Plants . Com-
munication and Signal Processing, IEEE.
6 Adams Begue, Venitha Kowlessur et al. (2017, September).Automatic Recognition of Medicinal Plants using Machine Learning Techniques. In
(IJACSA) International Journal of Advanced Computer Science and Ap-
plications, Vol. 8, No. 4.
7 https://www.dezyre.com/article/top-10-machine-learning- algorithms/202