Rezumat Limba Romana
State of the art
Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to “learn”(i.e progressively improve performance on a specific task) with data, without being explicitly programmed. The name machine learning was coined in 1959 by Arthur Samuel. The term started evolving from the study of pattern recognition and computational theory in artificial intelligence. The subset of machine learning explored the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data-driven prediction or decisions through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is very difficult or infeasible, such as e-mail spam filtering detection of network intruders or malicious insiders working towards a data breach, computer vision.
The term of Deep Learning was first introduced into the Machine Learning community by Rina Dechter in 1986 and to Artificial Neural Networks by Igor Aizenberg and colleagues in 2000 in the context of Boolean threshold neurons. The first general, working learning algorithm for supervised, deep, feedforward, multilayer perceptrons was published by Alexey Ivakhnenko and Lapa in 1965. A 1971 paper described a deep neural network with 8 layers trained by the group method data of handling algorithm.
What makes Deep Learning a State-of-the-Art? In a word, accuracy. Advanced tools and techniques have dramatically improved deep learning algorithms to the point where they can outperform humans at classifying images, win against world’s best GO player, or enable a voice controlled assistant like Amazon Echo, and Google Home to find download that new song you like.
A simple illustration can very easily illustrate the improvement of deep learning techniques in contrast with the simple machine learning techniques.
center635 Fig1.Machine Learning Vs Deep Learning Error on Images
UCLA researchers build an advanced microscope that yields a high-dimensional data set used to train a deep learning network to identify cancer cells in tissue samples. The research relied on a convolutional neural network, a type of deep-learning algorithm that is transforming how biologists analyse images. Scientists are using the approach to find mutations in genomes and predict variations in the layout of single cells.
center635 Fig.2 Cancer Cell illustration
Another reason why deep learning is a huge thing nowadays is that the data we have now is maybe 1000 times more and digitized than in 1980s. Also the performance of the computers that can process this amount of data exceeds any expectations especially when it is proven that a strong GPU is way more reliable rather than a high end CPU.
Three technology enablers make this degree of accuracy possible:
Easy access to massive sets of labelled data, data sets such as ImageNet and PASCAL VoC are freely available, and are useful for training on many different types of objects.
A very known application that is widely accepted and started to be used is the image recognition software that can distinguish a vehicle and classify it as a car, truck, even bicycle. This process is possible by using a Convolutional neural network (CNN) that can extract specific features and help classify the objects in images with that help.
center635Fig.3 Illustration of an CNN
Increased computing power. High-performance GPUs accelerate the training of the massive amounts of data needed for deep learning, reducing training time from weeks to hours. There has been a recent study that proved that the GPUs are most likely a better choice for deep learning instead of the regular CPU. CPUs are designed for more general computing workloads. GPUs in contrast are less flexible, however GPUs are designed to compute in parallel the same instructions. Deep Neural Networks(DNN) are structured in a very uniform manner such that at each layer of the neural network thousands of identical artificial neurons perform the same computation. Therefore the structure of a DNN fits quite well wit the kinds of computation that a GPU can efficiently perform. GPUs have additional advantages over CPUs, these include having more computation units and having a higher bandwidth to retrieve from memory. The primary weakness of GPUs as compared with CPUs is memory capacity, on GPUs are lower than CPUs.
The highest known GPU contains 24GB of RAM, in contrast with a CPU that can reach 1TB or RAM.
center635Fig4. Illustration of GPUs connected
There is no doubt that in the last years machine learning, more specifically deep learning has gained a huge traction over the programming world and not only. Statisticians people that work with patterns from the medical field and people that work on a daily basis with big data and IoT are making huge progress with this improved technology that seamlessly offers solutions way beyond any human could obtain by hard coding as the difficulty of the manual programming can be daunting due to the high complexity of problems that emerge nowadays.
Self-driving cars, companies building these types of driver assistance services, as well as full blown self-driving cars like Google’s, need to teach a computer how to take over key parts(all) of driving using digital sensor systems instead of human’s senses. You can think of it how a child learns through constant experiences and replication. These new services could provide unexpected business models for companies.
Deep Learning in Healthcare, breast or skin-cancer diagnostics? Mobile and Monitoring Apps? Or prediction and personalised medicine on the basis of Biobank-Data? AI is completely reshaping life sciences, medicine, and healthcare as an industry. Innovations in AI are advancing the future of precision medicine and population health management in unbelievable ways. Computer-aided detections, quantitative imaging, decision support tools and computer-aided diagnostics will play a big role in the years to come.
Other fields that can benefit of this new area of computer science is: Voice Search and Voice-Activated Assistants, Automatically adding sounds to silent films, automatic machine translation, image recognition, automatic image caption generation, neural networks for brain cancer detection, neural networks in finance, energy market price forecasting and many others as this could be considered just a short list of applications that can and will be implemented in the nearest future, as many of them already exist.
A simple diagram of the workflow of this proposed thesis could be simply illustrated as follows
center635Fig.1 Workflow of the thesis
In the following part of this chapter I will explain different functions that I have been able to use in matlab and leverage the power of matlab’s computational power to easily represent without introducing calculus in the code as the matlab application has already built in functions that satisfy the needs of this workload.
1.Type II Chebyshev filters
Also known as inverse Chebyshev filters, the Type II Chebyshev filter type is less common because it does not roll off as fast as Type I, and requires more components. It has no ripple in the passband, but does have equiripple in the stopband. The gain is:
In the stopband, the Chebyshev polynomial oscillates between -1 and 1 so that the gain will oscillate between zero and
the smallest frequency at which this maximum is attained is the cutoff frequency . The parameter ? is thus related to the stopband attentuation ? in decibels by:
center635Gain and group delay of a fifth-order type II Chebyshev filter with ? = 0.1.
As with most analogue filters, the Chebyshev may be converted to digital(discrete-time) recursive via the bilinear transform. However, as digital filters have a finite bandwidth the response shape of the transformed Chebyshev is warped. Alternatively, the Matched Z-transform method may be used, which does not warp the response
Root Mean Square
In statistics and it’s applications, the root mean square(abbreviated RMS or rms) is defined as the square root of the mean square(the arithmetic mean of the squares of a set of numbers). The RMS is also known as the quadric mean and is a particular case of the generalized mean with exponent 2. The RMS can also be defined for a continously varying function of an integral of the squares of the instantaneous values during a cycle.
The Root mean Square(RMS) value is the most important parameter that signifies the size of a signal. In signal processing, a signal is viewed as a function of time. The term size of a signal is used to represent the “strength of the signal”. It is very important to know the “size” of a signal used in a certain application. A given signal’s size can be measured in many ways, some of them are: total energy, root means square, integral absolute value, average absolute, square root of total energy.
RMS values of a signal (x(t)) is calculated as the square root of average of the squared value of the signal, mathematically represented as:
For a signal represented as N discrete sampled values -X0,X1,X2….,XN?1 the RMS value is given as:
If the signal can be represented in Frequency domain as X(f), then as a result Parseval’s theorem, the RMS value can be calculated as:
The significance of the RMS value is one of the most important parameter that is used to describe the strength of an alternating current(AC), the RMS value of an AC voltage/current is equivalent to the DC voltage/current that produces the same heating effect when applied across an identical resistor. Hence, it is also a measure of energy content in a given signal. In statistics, for any zero-mean random stationary signal, the RMS value is the same as the standard deviation of the signal. When two uncorrelated (or orthogonal) signals are added together, such as noise from two independent sources, the RMS value of their sum is equal to the square root of sum of the square of their individual RMS values.
The power spectrum Sxxfof a time series xtdescribes the distribution of power into frequency components composing that signal. According to Fourier analysis any physical signal can be decomposed into a number of discrete frequencies, or a spectrum of frequencies over a continuous range. The statistical average of a certain signal or sort of signal(including noise) as analyzed in terms of its frequency content, is called its spectrum.
center635Power Spectral Density
When the energy of the signal is concentrated around the finite time interval, especially if its total energy is finite, one may compute the energy spectral density. More commonly used is the power spectral density(or simply power spectrum) which applies to signals existing over all time, or over a time period large enough that it could as well have been over infinite time interval. The PSD then refers to the spectral energy distribution that would be found per unit time, since the total energy of such a signal over all time would generally be infinite.
The average power P of a signal x(t) over all time is therefore given by the following time average:
Some properties of the power spectral density, the spectrum of a real valued process is real and an even function of frequency Sxx??=Sxx?If the process is continuous and purely indenterministic, the autocovariance function can be reconstructed using the Inverse Fourier Transform.
The PSD can be used to compute the variance(net power) of a process by integrating over frequency:
VarXn=?0=1?0?Sxx?d?Being based on the fourier transform, the power spectrum density is a linear function of the autocovariance function in the sense that if ?is decomposed into two functions
??=?1?1?+?2?2?then f=???Sxx?;d?;The goal of spectral density estimation is to estimate the spectral density of a random signal from a sequence of time samples.
The Periodogram is an estimator for the power spectral density(PSD) ?xxej?of a random signal xk. We assume a weakly ergodic real-valued random process in the following.
The PSD is given as the discrete time Fourier Transfrom (DTFT) of the auto-correlation function (ACF)
?xxej?=F;?xxkcenter635Power Spectral Density Estimate
Also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay.
In signal processing, is often used without the normalization, that is without subtracting the mean and dividing by the variance. When the autocorrelation functions is normalized by mean and variance, it sometimes referred to as the autocorrelation coefficient or autocovariance function.
Given a signal f(t), the continuous autocorrelation ?xx?is most often defined as the continuous cross-correlation integral of f(t) with itself, at lag ?.
where frepresents the complex conjugate, g?1is a function which manipulates the function f and is defined as g?1fu=f?uand * represents convolution.
ImplementationIn the last few years there is an increased interest in machine learning, deep learning and Artificial Intelligence.
In this thesis, we propose to make a classification of some signals that are generated from an accelerometer. This accelerometer is used on a smartphone and his purpose is to detect the human activity movement. The following activities are send as signal feedback from the accelerometer: Walking, Walking Up Stairs, Walking Down Stairs, Sitting, Standing, Laying.
Training a deep learning model can take hours, days, or weeks, all depending on the size of the data and the amount of precessing power you have available. Selecting a computational resource is a very important step when taking in consideration the needs to set up your workflow.
Currently, there are about 3 computation options : CPU-based, GPU-based, and cloud-based.
CPU-based computation is one of the simplest and most readily available option. The CPU-based computation is best suited in case of simple examples using an already pre-trained network.
GPU-based computation can reduce the network training time from weeks to days or from days to hours. We can use the GPU in MATLAB without doing any additional programming. It is recommended a Nvidia 3.0 compute-capable GPU. Having multiple GPUs can speed up the process of training the network and reducing the time significantly.
Cloud-Based GPU Computation means that you don’t have to buy and set up hardware yourself. That means that the code you write in MATLAB for using a local GPU can be easily extended to use cloud resources with just a few settings changes.
center635Figure SEQ Figure * ARABIC 1 – Example of connection between code and hardware.
The main purpose of this experiment is to train a network that can learn specific human activities and then predict them using the training that it got previously. For the purpose of training and testing a specific set of signals were offered to the program. These signals were already labeled and they were already classified for each activity. The set of signals was created originally in a controlled environment so that there were no noises and no other interference.
Figure SEQ Figure * ARABIC 2 – All the signals represented with a specific color. 24.
DatabaseThis example describes an analysis approach on accelerometer signals captured with a smartphone. The smartphone is worn by a subject during 6 different types of physical activity.
The goal of the analysis is to build an algorithm that automatically identifies the activity type given the sensor measurements.
The examples uses data from a recorded dataset, courtesy of: Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorje L.
For the Data division the algorithm used is Random (dividerand).
For the Training part of the algorithm we used Scaled Conjugate Gradient(trainscg)
For the Performance measurement of the algorithm we used Cross-Entropy(crossentropy)
For the Calculations part of the project we have used MEX.
Divide targets into three sets using random indices. The syntax for this specific function is
trainInd,valInd,testInd=dividerand(Q,trainRatio,valRatio,testRatio) , separates targets into three sets : training validation, and testing. It takes the following inputs :
Q- number of targets to divide up
trainRatio – ratio of vectors for training, the default value is set to 0.7
valRatio – Ratio of vectors for validation, the default value is set to 0.15
testRatio – ratio of the vectors for testing, the default value is set to 0.15
and the returns are
trainInd – training indices , valInd validation indices testInd – test indices.
trainscg can train any network as long as its weight, net input, and transfer functions have derivative functions.Backpropagation is used to calculate derivatives of performance perf with respect to the weight and the bias variables x.
The scaled conjugate gradient algorithm is based on conjugate directions, as in traincgp, traincgf, and traincgb, but this algorithm does not perform a line search as each iteration.
Training stops when any of these conditions occurs:
– The maximum number of epochs(repetitions) is reached.
-The maximum amount of time is exceeded.
-Performance is minimized to the goal.
-The performance gradient falls below min_grad.
-Validation performance has incread more than max_fail times since last time it decreased(when using validation)
It calculates a network performance given targets and outputs, with optional performance weights and other parameters. The function returns a result that heavily penalizes outputs that are extremely inaccurate ( y near 1-t), with very little penalty for fairly correct classifications (y near t). Minimizing cross-entropy leads to good classifiers.
2.8858451e-001 -2.0294171e-002 -1.3290514e-001 -9.9527860e-001 -9.8311061e-001 -9.1352645e-001 -9.9511208e-001 -9.8318457e-001 -9.2352702e-001 -9.3472378e-001 -5.6737807e-001 -7.4441253e-001 8.5294738e-001 6.8584458e-001 8.1426278e-001 -9.6552279e-001 -9.9994465e-001 -9.9986303e-001 -9.9461218e-001 -9.9423081e-001 -9.8761392e-001 -9.4321999e-001 -4.0774707e-001 -6.7933751e-001 -6.0212187e-001 9.2929351e-001 -8.5301114e-001 3.5990976e-001 -5.8526382e-002 2.5689154e-001 -2.2484763e-001 2.6410572e-001 -9.5245630e-002 2.7885143e-001 -4.6508457e-001 4.9193596e-001 -1.9088356e-001 3.7631389e-001 4.3512919e-001 6.6079033e-001 9.6339614e-001 -1.4083968e-001 1.1537494e-001 -9.8524969e-001 -9.8170843e-001 -8.7762497e-001 -9.8500137e-001 -9.8441622e-001 -8.9467735e-001 8.9205451e-001 -1.6126549e-001 1.2465977e-001 9.7743631e-001 -1.2321341e-001 5.6482734e-002 -3.7542596e-001 8.9946864e-001 -9.7090521e-001 -9.7551037e-001 -9.8432539e-001 -9.8884915e-001 -9.1774264e-001 -1.0000000e+000 -1.0000000e+000 1.1380614e-001
Figure SEQ Figure * ARABIC 3 – Example of an Training file
Preparing the Data for the script to be run
The data needs to be prepared prior to running the code before running the script.
To prepare the data we run firstly the DataPreparation.m first, that guides through the process of downloading the data and preparing it for this example.
At the end of the process, the folder .DataPrepared must contain the following four data files:
For the time being before starting explaining each part of the project we are going to look at the final result a? to understand how this process will look like. For that we execute the following comand în matlab.
RunTrainedNetworkOnBufferedData, and the result will be the following
Fig.11 Preview of the Final results
5.4 Building up the code filtering and labelling the signals
Let’s look at the same data, colored based on the activity type.
Given this data, we would like to be able to tell the difference between the different activities, just based on the content of the signal.
Note that this case of coloring is based on existing knowledge (actid).
Labeled data can be used to „train” a classification algorithm so it can later predict the class of new(unlabeled) data.
Visualising the same signal using a custom plotting function which also uses the information în actid.
center635Fig.11 Plotted Acceleration Colored based on Activities
Running a mean measure
We would like to distinguish the difference between all the values that we have obtained from the raw data and for that we can easily separate this by using a mean measure.
The mean is the average of the numbers: a calculated “central” value of a set of numbers. To calculate: Just add up all the numbers, then divide by how many numbers there are.
In the following graph we are going to differentiate the walking and the laying signals as they are very similar in the main graph, using the power of the mean we can clearly observe the difference here.
Although the process of applying different mathematical functions to the signals to prepare the data, using the correct functions to differentiate them is highly important to fully understand the feedback received from the smartphones sensor.
We are going to use the mean measure, RMS or standard deviation measure between specific signals that we must compare such as signals from “Walking” and “Laying” and also from “Walking” and “Standing”.
center635 Fig.12.RMS or standard deviation measure
There are multiple graphs that need to be differentiated using the specific sorting/filtering tools so that we can observe the signals. The signals are very well differentiated from the number of occurrences as well as from the acceleration values. Both of these values are very important in understanding the signals that are coming from the smartphone.
Amplitude only methods are often not enough, for example it would be very hard to distinguish between simply walking and Walking Upstairs(very similar statistical moments) .
An initial conclusion is that simple statistical analysis is often not sufficient. For signals one should also consider methods that measure signal variations over time.
center635Fig.14. Walking vs WalkingUpstairs.
Time-Domain analysis – preliminary considerations
There are two main different types of causes behind our signals:
-one to do with “fast” variations over time, due to body dynamics(physical movements of the subject)
-the other, responsible for “slow” variations over time, due to the position of the body with respect to the vertical gravitational field
As we focus on classifying the physical activities, we should focus time-domain analysis on the effects of body movements. These are responsible for the most “rapid”(or frequent) variations of our signal.
In this specific case a simple average over a period of time would easily estimate the gravitational component, which could be then subtracted from the relevant samples to obtain the signal due to the physical movements.
For the sake of generality here we introduce an approach based on digital filters, which is much more general and can be reused in more challenging situations.
Digital Filtering workflow
To isolate the rapid signal variations from the slower ones using digital filtering:
We design a high pass filter ( e.g using the filter design and analysis tool, fdatool, in Matlab) and applying the filter to the original signal .
We are also going to filter out the gravitation acceleration.
As well as interactively, filters can be designed programmatically. In this case the function hpfilter was generated automatically from the Filter Design and Analysis Tool but it could have just as well been created manually.
Fhp = hpfilter;
The script hpfilter has been automatically generated using the mentioned tool above and looks like this.
Fs = 50 ; Sampling Frequency
Fstop = 0.4 ; Stopband Frequency
Fpass = 0.8; Passband Frequency
Astop = 60 ; Stopband attenuation (dB)
Apass= 1 ; Passband Ripple (dB)
match = ‘passband’ ; Band to match exactly.
Construct an FDESIGN object and call its ELLIP method.
h=fdesign.highpass(Fstop, Fpass, Astop, Apass, Fs);
hd=design(h, ‘cheby2’, ‘MatchExactly’, match) ;
This could have been written exactly as above , but using the tool that matlab provides the design of this filter was much easier obtained.
center635 Fig 15. Filter Designer tool from Matlab.
The process of designing or programming the filter was a success as we can see from the following graph where we have both the original signal and the filtered one.
Fig16. Comparison between the High-Pass and Original Signal
Further on we are going to focus on a single activity first: select first portion of Walking signal,
Using logical indexing. We are going to only select samples for which the activity was Walking and for which the time was less than 250seconds, after which we are going to plot the walking-only signal so that we can note the quasi-periodic behaviour.
The next step in the implementation process is to plot power spectral density using the Welch method with it’s default options, using known sample frequency.When running the power spectral density one can expect to have the following output in the conditions given.
center635Fig 17. Welch Power Spectral Density Estimate
After observing this graph we are going to validate the potential of PSD to differentiate between different activities, as follows here walking vs WalkingUpstairs
center635Fig18. Power Spectral Density Comparison
The results given by comparing different activities from one subject are very important to our process of implementing deep learning but a specific result, particularly from one subject will not be as good as if we would do this process for each subject and we would compare the power spectral density of walking signals across all subjects in the datasheet. This helper function uses a linear amplitude scale so PSD peaks visually stand out better.
center635Fig19. PSD across the datasheet subjects.
We can further automate the peak identification. We used the findpeaks that was in the Signal Processing Toolbox, so that we can identify amplitude and locations of spectral peaks.
We compute numerical values of PSD and frequency vector. The output of the automated findpeaks PSD looks as following.
center635Fig20. Power Spectral Density with Peaks Estimates
The visualisation of the data is very useful because the interpretation is much easier, but as we can see from this we need to refine the data. By doing a refinement of the PSD with Peaks Estimates means that we add more specific requirements such as finding a maximum of 8 peaks with at least 0.25Hz apart from each other with a given prominence value .
Fmindist= 0.25 ; Minimum Distance in Hz
N= 2*(length(f)-1); number of FFT points
minpkdist = floor(fmindist/(fs/N)); Minimum number of frequency bins.
After multiple testing the result that was mentioned above is given by plotting the PSD and overlaying peaks.
Autocorrelation can be also powerful for frequency estimation. It is especially effective for estimating low-pitch fundamental frequencies.
Xcor with only one input will compute the autocorrelation
c,lags = xcorr(abw) ;
We are going to highlight the main t=0 peak (overall energy ) and a few secondary peaks. The position of the second highest peaks identifies the main period, after we can successfully plot the autocorrelation and three key peaks.
Fig21. Autocorrelation with Peaks Estimates
To better understand the Autocorrelation with Peak Estimates, as we did this on one activity we are going to compare the results after applying the same steps for another activity .
Leveraging the power of data visualisation this will be fairly easy to distinguish as the differences between two activities can be easily seen from the next comparison.
center635Fig 22. Autocorrelation Comparison between activities.
The graph was generated by comparing different activities from the same subject.
After exploring interactively a few different techniques to extract descriptive features from this type of signals, we can collect all the analysis methods identified into a single function.
The responsibility of this function is to extract a fixed number of features for each signal buffer provided as input.
Being provided with so many options and features to extract we can now proceed to training the neural network .
To train the network, assume we :
Re-organise the acceleration signals into shorter buffers of fixed length L, each labeled with a single activity ID.
Extract a vector of features for each Lx3 signal buffer ax, ay, az using the function featuresFromBuffer.
Provide the network with two sets of feature vectors and corresponding activity ID.
The buffered data is already available and stored in the file BufferedAccelerations.mat .
Computing the features is a fairly efficient process, but it takes a while in this case because of the high number of signal vectors available.
The pre-computed set of feature vectors for all available signal buffers is available in the file BufferFeatures.mat .
To re-compute all features use the function extractAllFeatures, which will:
Read the buffered signals from BufferedAccelerations.mat
Compute a feature vector for each buffer using featuresFromBuffer.mat
Save all feature vectors into the file BufferFeatures.mat
ExtractAllFeatures can distribute the computations to a pool of workers if Parallel Computing Toolbox is installed.
Now we simply clear all variables that are not the relevant anymore, and load pre-saved variables.
Training the neural network for signal classification is a fairly simple process as we have all the data prepared and ready to be used for the neural network. In the following I am going to include the code lines and will also explain line by line for an easy follow up with the process of training the network.
Reset random number generators
Initialize a Neural Network with 18 nodes in hidden layer(we assume the choice of the number 18 here arbitrary)
Organize features and known activity IDs so they can be consumed by the train function.
For real problems consider partitioning the datasheet into training, validation and test subsets. This step has been left out here for simplicity.
net = train(net, X, tgtall);
We have now completed all the algorithmic steps necessary to implement the classification system presented at the very beginning of this example.
Opening the function runTrainedNetworkOnBufferedData will reveal the same code in this script section.
The process can not be performed as follows:
We get data – one buffer for each acceleration component
We then extract the feature vector
f= featuresFromBuffer(ax, ay, az, fs);
Classify with neural network
Interpret the result using index of the maximum score to retrieve the name of the activity:
~,maxidix = max(scores);
Plot three signals and display and display prediction result as title
h=plotAccelerationBufferAndPrediction(ax, ay, az, t,actualActivity,estimatdActivity);
Validation of the network more systematically, using a confusion matrix.
In the previous code cell we validated the predictive behavior of our trained neural network using visual and qualitative approach.
To quantitatively asses the performance of a classification algorithm one would normally measure the prediction over a whole test datasheet, and compare them against the known class values.
The ultimate prediction performance can be represented visually in a number of different ways. Below we present the confusion matrix the confusion matrix is a square matrix that summarizes the cumulative prediction results for all couplings between actual and predicted classes, respectively.
Normally it is good practice to use a test set different from the training set. This ensures that the results are not biased by the particular training dataset used.
Even though we have ignored this principle so far, the neural network toolbox includes dividerand, which automates most of the dataset partition mechanism.
Experimental ResultsIn the thesis proposed about machine learning, deep learning on a smartphone sensor signals to classify the human activity we propose to observe the results and explain each part of the results from the training to the validation and testing.
To have a better understanding of the data we have fed to the algorithm we will firstly explain the final neural network training tool
center635Fig. Neural Network Training Tool
The neural network has been generated with the help of a tool provided by MATLAB, namely Neural Network Toolbox the first part we can start looking at the number of inputs that is 66, meaning that there are 66 values of data that will be inserted into the training testing and validation network. The Hidden diagram shows the number or artificial neurons that have been established to be used, 18. The output is refering to the number of target classes that can be used to classify the data, în our case meaning all the 6 different activites that we have fed the network with.
The next part of the neural network is showing exactly what algorithms have been used to classify the activities.
center635Fig. Algorithms used to train the network.
All the above algorithms in the figure above have been explained în a previous chapter of this thesis.
As part of the machine learning, deep learning process there are a few terms from which we can see the progress of the neural network and also to have a better understanding if the data is enough for a good usage of the network or do we have to add more data and refine the learning process a bit more.
center635Fig. Progress of Learning
As per our illustration above we can state that the results are more than enough for our network to perform in our desired field of interest as the network has been able to have a good performance overall and the time needed to classify the data we have provided the network was extremely short, even with the high time of the network needed to learn. The results are strictly based on a trained network that we have implemented and the results represent the data and information in regards to the new set of data we have provided after finishing the training state.
The term of epoch in the training neural network case means one pass of the full training set. Usually it may contain a few iterations. One epoch means 1 forward pass and 1 backward pass for all training samples. The batch size is the number of training samples in 1 forward and 1 backward pass. The number of iterations equals to the number of passes. One pass meaning 1 forward pass plus 1 backward pass, as forward pass and backward pass are not counted differently. The performance represents the overall performance of the neural network that can be influenced by the parameters we chose to use such as number of neurons in the hidden layer the specs of the computer itself such as GPU, CPU, algorithms. The gradient in machine learning we are basically trying to reach an optimal solution, gradient is simple a vector which gives the direction of maximum rate of change. By taking steps in that direction, we hope to reach our optimal solution. Validation checks represents the number of validations that we have set for our network so when it reaches that specific number of validations the program stops as it concludes that the learning process and the classification was finalized and outputs the results.
In the following we are going to explain each plot that has been generated on the data fed to the neural network, the performance obtained, and the moment it was obtained represented in epochs. We will also output the best validation performance number and at what exact epoch was that obtained.
In our case the best Validation Performance is 0.040472 at epoch 90.
The performance plot shows how the networks mean square error drops rapidly as it learns. The blue line shows the decreasing error on the training data. The green line shows the validation error. Training stops when the validation error stops decreasing. The red line shows the error on the test data indicating how well the network will generalise to new data .
center635Fig. Training State plot
The training state figure shows variation in gradient coefficient with respect to number of epochs. The final value of the gradient coefficient at epoch number 96 is 0.014132 which is approximate near to zero. Minimum the value of gradient coefficient better will be training and testing of networks. From the figure it can be seen that the gradient goes on decreasing with the increase in number of epochs.
center635 Fig. Error Histogram
This graph represents the Error Histogram which is calculated as the difference between the targets of our neural networks and the actual outputs.The figure shows the confusion matrix for the training, testing, and validation steps, and the all confusion matrix of the neural networks, using this matrix we can calculate the specificity, sensitivity and accuracy of our artificial neural network.
A confusion matrix contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix.
The green indicates the number of inputs assigned correctly to their classes, and the red indicates the misclassification of the inputs. The black and blue cells indicate the overall results.
Fig. Confusion matrix
The following figure shows the ROC(Receiver Operating Characteristic) graph for the training, testing and validation phases of the classification system. At the last overall ROC of the system.
A ROC represents the classes by the false positive rate in function of the true positive rate.
center635Fig. ROC – Receiver Operating Characteristic.
A perfect test would show points in the upper-left corner, with 100% sensitivity and 100% specificity.
From the graphs we can see that for the class 6 the reach of 100% sensitivity and 100% specificity was reached very fast whereas for the other classes it took some time, but generally the results are quite good.
În this diploma thesis, we proposed a signal classification system, that classified the activities one was doing while having a smartphone on him that was collecting data în form of signals from the users from an accelerometer sensor that was within the phone. The classification had to take into consideration that there are 6 different activities that should be properly classified with the help of feature extraction. The signals chosen for evaluation were pre-registered în an controlled environment and were specially dentined to be used în this sort of project.
The Human Activity classification based on smartphone sensor signals was implemented with the help of specific tools that gave the opportunity to simply implement a complex problem în such ways that visualisation could be part of the evaluation process. For this specific project we have implemented the code în MATLAB R2017b. This specific example describes an analysis approach of accelerometer signals captured with a smartphone. The smartphone is worn by a subject during 6 different types of physical activities. The number of subjects that were involved în this controlled testing environment were 30.
The original dataset was provided by a group of people that were involved în a workshop namely IWAAL 2012( Interna?ional Workshop of Ambient Assisted Living) în december 2012. The dataset was put available online for users to download and use the signals created.
Measuring the actual code of the application we have implemented we have an outstanding of 65 lines of code if we exclude the comments and any other imformation that is not an actual line of code. This was achieved by creating separate scripts and then calling them into the main script called DemoMain.m . Counting the number of actual code lines was used the function „sloc”.
Problems that have been tackled în this thesis were solved using machine learning, deep learning, due to the impractical ways of solving this sort of problems when trying to scale them at a larger number a? hard coding these problems can be very useful when we are having not such a big dataset but while trying to include a lot more data this would be highly unpractical, and for this we have chosen to use a classification algorithm în our case Neural Network classification. The reality is that a? the problem complexity goes up, comming up with a manual logic to classify this sort of data is highly impractical, plus is not guaranteed we can take advantage of all the information în the feature vector. The way these features are usually used is by means of classification algorithm a? we said earlier. There are many types of classification algorithms out there, more specifically within MATLAB, în this case neural network. This was the best way to approach this because, în one simple line of code we could create the neural network with 18 nodes (neurons) în the hidden layer. The availability of training the network în one line of code using just a portion of our newly created network, and the training process adapts the internal parameters of the model în this case the network so it can optimally identify the right activity for the supplied signal segment, taking în cosideration that our dataset was composed of recordings of both known activities and also the ones needed to be classified. For every new signal buffer we were plotting 3 components of acceleration computing our 66 features and the finally using our trained network to predict what the subject was doing.
În total we have automated the measurement of 66 high-quality features extracted with only 65 lines of code. And we also took good avantage of the language and the build în visualisation features to establish what worked and what not. By doing that we were able to get the solution of this task pretty quickly. We have leveraged the build-in algorithms that matlab had included în the Signal Processing Toolbox such a? cheby2, filetrs, RMS, pwelch, periodogram, xcov,findpeaks.
We have also took real good advantage of the Neural Network Toolbox which allowed us to build and train a conventional type of neural network în just 2 lines of code. Which usually takes a lot of lines of code even when trying to build simple neural networks using any other langauge or software especially when done from scratch.
The most important issue that was to overcome în this thesis was the lack of resources în computational power, a? deep learning requires a massive amount of computational power în order for anyone to be able to compute the results în a fast manner and also be able to implement with easy different type of classifications.
A? future developments we should consider more signal for each type of class and also more features to be extracted în order for our algorithm to be more exact.
References1 Ray, C., Mondada, F., Siegwart, R.: What do people expect from robots? In: Proceedings of the IEEE/ RSJ 2008 International Conference on Intelligent Robots and Systems, pp. 3816–3821. IEEE, Piscataway (2008).
2 Srinivasa, S.S., Ferguson, D., Helfrich, C.J., Berenson, D., Collet, A., Diankov, R., Gallagher, G., Hollinger, G., Kuffner, J., Weghe, M.V.: Herb: a home exploring robotic butler. Auton. Robots 28(1), 5–20 (2010).
3 Meeussen, W., Wise, M., Glaser, S., Chitta, S., McGann, C., Mihelich, P., Marder-Eppstein, E., Muja, M., Eruhimov, V., Foote, T., Hsu, J., Rusu, R.B., Marthi, B., Bradski, G., Konolige, K., Gerkey, B.P., Berger, E.: Autonomous door opening and plugging in with a personal robot. In: ICRA (2010).
4Reiser, U., Connette, C., Fischer, J., Kubacki, J., Bubeck, A., Weisshardt, F., Jacobs, T., Parlitz, C., Hägele, M., Verl, A.: Care-o-bot 3—creating a product vision for service robot applications by integrating design and technology. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009) (2009).
5 Ploeger, P., Pervoelz, K., Mies, C., Eyerich, P., Brenner, M., Nebel, B.: The desire service robotics initiative. In: KI—Zeitschrift Küunstliche Intelligenz, vol. 4 (2008).
6Beetz, M., Jain, D., Mösenlechner, L., Tenorth, M.: Towards performing everyday manipulation activities. Robot. Auton. Syst. 58(9), 1085–1095 (2010).
7Buehler, M., Iagnemma, K., Singh, S.: The 2005 DARPA Grand Challenge: The Great Robot Race, 1st edn. Springer, New York (2007)
8Buehler, M., Iagnemma, K., Singh, S.: The DARPA Urban Challenge: Autonomous Vehicles in City Traffic, 1st edn. Springer, New York (2009)
9Schneider, F.E., Wildermuth, D., Brüggemann, B., Röhling, T.: European land robot trial (ELROB). Towards a realistic benchmark for outdoor robotics. In: Proceedings of the 1st International Conference on Robotics in Education, RiE2010, pp. 65–70. FEI STU, Slovakia (2010)
10Wisspeintner, T., van der Zan, T., Iocchi, L., Schiffer, S.: [email protected]: results in benchmarking domestic service robots. In: Baltes, J., Lagoudakis, M., Naruse, T., Ghidary, S. (eds.) RoboCup 2009: Robot Soccer World Cup XIII, Lecture Notes in Computer Science, vol. 5949, pp. 390–401. Springer, Berlin (2010)
11 Wisspeintner, T., van der Zant, T., Iocchi, L., Schiffer, S.: [email protected]: scientific competition and benchmarking for domestic service robots. Interact. Stud. 10(35), 392–426 (2009)
12M. Janvier, X. Alameda-Pineda, L. Girin, and R. Horaud, “Sound representation and classification benchmark for domestic robots,” in IEEE International Conference on Robotics and Automation (ICRA), Hong-Kong, China, May 2014, pp. 6285–6292.
13C. Rusu, L. Grama, “Recent Developments in Acoustical Signal Classification for Monitoring,” in 5th International Symposium on Electrical and Electronics Engineering (ISEEE), Oct. 20-22, 2017, Galati, Romania, pp. 1-10.
14P. Naronglerdrit, I. Mporas, “Recognition of Indoors Activity Sounds for Robot-Based Home Monitoring in Assisted Living Environments,” Interactive Collaborative Robotics, Second International Conference, ICR 2017, Hatfield, UK, September 12-16, 2017, pp. 153-161.
15A. S. Bregman, Auditory scene analysis: The perceptual organization of sound. MIT Press, 1994.
16 J. Huang, T. Supaongprapa, I. Terakura, F. Wang, N. Ohnishi, and N. Sugie, “A model-based sound localization system and its application to robot navigation,” Robotics and Autonomous Systems, vol. 27, no. 4, pp. 199–209, 1999.
17 K. Nakadai, H. G. Okuno, and H. Kitano, “Real-time sound source localization and separation for robot audition,” in Int. Conf. on Spoken Language Processing, 2002, pp. 193–196.
18 S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, K. Komatani, T. Ogata, and H. G. Okuno, “Real-time robot audition system that recognizes simultaneous speech in the real world,” in Int. Conf. on Intell. Rob. and Syst., 2006, pp. 5333–5338.
19 K. Nakadai, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, “An open source software system for robot audition hark and its evaluation,” in Int. Conf. on Humanoid Robots, 2008, pp. 561–566.
20 Y. Sakagami, R. Watanabe, C. Aoyama, S. Matsunaga, N. Higaki, and K. Fujimura, “The intelligent ASIMO: System overview and integration,” in Int. Conf. on Intell. Rob. and Syst., 2002, pp. 2478– 2483.
21NAR dataset.” Online. Available: https://team.inria.fr/perception/nard/
22 J. Saunders, “Real-time discrimination of broadcast speech/music,” in Int. Conf. Acoust., Speech, Sig. Process., 1996, pp. 993–996.
23 https://mirlab.org/jang/books/audioSignalProcessing/24Michael Christoph Büchler dipl. El. Ing. ETH, “Algorithms for Sound Classification in Hearing Instruments”, SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH.
23 S. E. I Kucukbay¸ M. Sert, “Audio Event Detection Using Adaptive Feature Extraction Scheme,” 7th International Conferences on Advances in Multimedia (MMEDIA), pp. 44-49, 2015.
*1 PAL Robotics, TIAGo Handbook, version 1.4.2, Barcelona 2016.
24 L. Grama, C. Rusu, “Îmbun?t??irea performan?elor unui robot prin analiza contextului ambiental din informa?ii acustice (ROXAC)”, Raportare 2 54BG/2016, decembrie 2017.
25 A. Electronics, “USB-SA Array Microphone.” Online. Available: http://www.andreaelectronics.com/array-microphone/
26 A.Wojna, L. Kowalski. RSESLIB Programmer’s Guide, http://rseslib.mimuw.edu.pl/rseslib.pdf, 2017.
27 WEKA – The University of Waikato, “Weka 3: Data Mining Software in Java”, available http://www.cs.waikato.ac.nz/ml/weka, version 3.9.0.
28 C-C. Chang and C-J. Lin, LIBSVM: A library for support vector machines. ACM Transactions Intelligent Systems and Technology, 2:27:1–27:27, 2011
29 Z. Fu, G. Lu, K-M. Ting, and D. Zhang. Building sparse support vector machines for multiinstance classification. In Euro. Conf. Machine Learning, pages 471–486, 2011.
30 H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Intl. Conf. Machine Learning, 2009
31 Ch.Srinivasa Kumar et al., “Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm”, International Journal on Computer Science and Engineering (IJCSE).
32 Yongwha Chung, Seunggeun Oh, Jonguk Lee, Daihee Park, Hong-Hee Chang and Suk Kim, “Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems”, Sensors 2013.
33 F. Bimbot et al, “A Tutorial on Text-Independent Speaker Verification,” EURASIP Journal on Advances in Signal Processing, vol. 2004, issue 4, pp. 430-451, 2004.
34 Ernest Fokoue and Zichen Ma, “Speaker Gender Recognition via MFCCs and SVMs”, Rochester Institute of Technology RIT Scholar Works, 2013.
35 Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and DynamicTime Warping (DTW) Techniques, Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi
36 https://mirlab.org/jang/books/audioSignalProcessing/37 Linlin Pan, “Research and simulation on speech recognition by Matlab”, University of Gavle, Dec 2013.
38 R. Mammone, X. Zhang, and R. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Mag., vol. 13, no. 5, 1996, pp. 58–71.
39 B. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” J. Acoustic. Soc. Amer., vol. 55, p. 1304, 1974.
40 D. Reynolds, “Experimental evaluation of features for robust speaker identification,” IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 639–643, 1994.
41 David Gerhard “Audio Signal Classification: History and Current Techniques”, Technical Report TR-CS 2003-07 November, 2003.
42M. Kühne, D. Pullella, R. Togneri, and S. Nordholm, “Towards the use of full covariance models for missing data speaker recognition,” in Proc. IEEE Int. Conf. Acoustics Speech Signal Processing (ICASSP), Las Vegas, Navada, 2008
43 Huang et al., 2001 Speech Analysis for Automatic Speech Recognition
45 Wikipedia: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm44Ian H. Witten; Eibe Frank; Mark A. Hall (2011). "Data Mining: Practical machine learning tools and techniques, 3rd Edition". Morgan Kaufmann, San Francisco. Retrieved 2011-01-19.
45 Ian H. Witten, Eibe Frank “Data Mining: Practical Machine Learning Tools and Techniques” Second edition.
46 Stevens & Volkman, 1940. The Relation of Pitch to Frequency: A Revised Scale
47 Holmes, 2001, Bayesian regression with multivariate linear splines
48 S. V. N. Vishwanathan, “Non-Parametric Density Estimation”, June 9, 2014
49 K. Bache and M. Lichman. (2013) UCI Machine Learning Repository.
49K. Q. Weinberger and L. K. Saul, "Distance Metric Learning for Large Margin Nearest Neighbor Classification," Journal of Machine Learning Research, vol. 10, pp. 207-244, 2009.
50 M. Jirina and M. J. Jirina, "Classifier Based on Inverted Indexes of Neighbors," Institute of Computer Science, Technical Report No. V-1034, 2008.
51 M. Jirina and M. J. Jirina, "Using Singularity Exponent in Distance Based Classifier," in Proceedings of the 10th International Conference on Intelligent Systems Design and Applications (ISDA2010), Cairo, 2010, pp. 220-224.
52 M. Jirina and M. J. Jirina, "Classifiers Based on Inverted Distances," in New Fundamental Technologies in Data Mining, K. Funatsu, Ed. InTech, 2011, vol. 1, Ch. 19, pp. 369-387.
53 Arkadiusz Wojna, Lukasz Kowalski “RSESLIB Programmer’s Guide”, April 1, 2017
54 M. Moshkov, M. Piliszczuk, and B. Zielosko. “Partial covers, reducts and decision rules in rough sets: Theory and applications”. Studies in Computational Intelligence, 145, 2008.
55 D. Aha, D. Kibler (1991). Instance-based learning algorithms. Machine Learning. 6:37-66.
56 M. A. Hall (1998). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand.