2. LITERATURE SURVEY
2.1. Machine learning basics
Definition : A common definition of machine learning is (Mitchell, 1997): “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
2.1.1. Training Set
The act of creating a prediction model from previously known data is called training, and such data is called the training data or a training set. After the model is created, it must be applied to another data set to test its effectiveness. Data used for such purpose is called test data or test set.
2.2. Educational Data Mining
Data mining is a process of sorting data and extracting information from existing databases. With the help of pattern mining and data analysis, hidden information can be obtained from huge datasets. The strategy of data mining is now applied in the field of education by researchers. They are busy in exploiting a lot of dimensions in education sector. This is now known as educational data mining. Data mining is being applied in educational sector by considering the performance of students and finding the position of students by using their academic records. Educational dataset is being collected from various resources such as interactive learning systems, computer-supported collaborative systems, and administrative datasets of school, colleges and universities. Data mining methods are now implemented in well known universities to analyze the patterns of student performance from the dataset through which information can be extract and decision making may become easier for the management of institutions.
With the incremental growth in the use of technology everywhere, educational institutions are now busy in finding hidden trends and patterns in their larger datasets. With the help of these sources, dataset can easily be collected if authorization is accessed. One purpose of extracting information from its own dataset is to make its prestige among other educational institutions stronger. Another purpose is to build the student career. Data mining is often used to build predictive/inference models aimed to predict future trends or behaviors based on the analysis of structured data. In this context, prediction is constructing the model and used to assess the class of an unlabeled example, or to assess the value or value ranges of an attribute.
We proposed data mining process for evaluation of school dropout and failure. Experiment done on real information of 200 university students of Mehran University of Engineering and Technology. Data mining should work the same way as a human brain. It uses historical information (experience) to learn. However, in order for data mining technology to get information out of the database, the user must “tell it” what the information looks like (i.e. what is the problem that the user would like to solve). It uses the description of that information to look for similar example in database, and uses these pieces of information from the past to develop a predictive model of what will happen in the future. The essential ingredient in building a successful predictive model is to have some information in the database that describes what has happened in the past. Data mining tools are designed to “learn” from these past success and failure (theoretically as a human being would), and then be able to predict what is going to happen next.
However, one of the major advantages of a data mining tool over a human mind is that data mining tool can automatically go through a very large database quickly, and find even the smallest pattern that may help in a better prediction.
Our main objectives to this proposed work are:
To understand, analyze and then find the difference between different prediction techniques of data mining in education.
To identify and understand different student attributes which are mainly used for the predicting the student performance.
To identify and understand the
2.3. Predicting Student Performance
Predicting student’s performance by using data mining techniques to extract information from the academic dataset of universities has become state of the art research in the scientific society. Universities are facing with some challenges now a day to analyze the performance of their students; only being active in class is not to analyze student performance that’s why we create such a system which will try to improve student performance. We are focusing on student’s profiles and characteristics to make the university management aware of student’s performance and overall academic result. There is another dimension of student’s performance that is the dependence of student retention upon student student’s performance. To minimize the problem of student retention cases in the universities, different researchers have proposed different methods to predict the performance of students in their future semester based on the performance of previous one.
2.3.1 Student Data Attributes
For predicting the next semester academic performance of student based on previous academic record of student we taken data of two batches of Computer System (15CS & 16CS) till now and have considered following attributes in our project that are :-
Mid – term Marks
Based upon these parameters, recommended system can be trained to predict the grades of students accurately in any of the educational institution. We had used KNN algorithm approach for predicting student academic performance.
2.3.2 K – Nearest Neighbor :-
K – Nearest Neighbor (KNN) algorithm is a classic method for clustering samples based on similarity. It is basically a non-parametric learning algorithm which belongs to data mining class. Its purpose is to use a database in which the data points are separated into several classes to predict the classification of a new sample point.
Firstly algorithm is applied to a data set to build prediction models. Then, predictions made by these models are compared using common evaluation criteria, such as accuracy, precision, and recall.
Feature selection is also a commonly compared criteria. However, what these studies are missing is a more comprehensive comparison between distinct approaches such as method selection and feature engineering. This is the part where this thesis can introduce a new approach. By comparing the effectiveness of different processes used in machine learning, this thesis can provide insight into the more efficient ways to improve predictions in student performance.
There are several factors that have the potential to influence academic performance such as student ability motivation, and the quality of secondary education obtained, age and gender. Additionally, students’ demographic characteristics, psychological characteristics and prior academic performance, social and institutional factors as well s outcomes of the learning process were found to affect academic performance of students in Science Education program.
Abeer and Elaraby conducted a similar research that mainly focuses on generating classification rules and predicting students’ performance in a selected course program based on previously recorded students’ behavior and activities. Abeer and Elaraby processed and analyzed previously enrolled students’ data in a specific course program across 6 years (2005–10), with multiple attributes collected from the university database. As a result, this study was able to predict, to a certain extent, the students’ final grades in the selected course program, as well as, “help the student’s to improve the student’s performance, to identify those students which needed special attention to reduce failing ration and taking appropriate action at right time”.
2.4. Student attrition and retention
With the passage of time, growth of private educational institutions has been increased up to the remarkable extend. These institutions have become source of higher learning and business entity. Therefore, maximum number of student’s enrollment is its lifeline. For the survival of private institutions, profitability, proper management and alignment are mandatory. In this respect, student retention until the completion of degree is quite necessary. That’s why institutions are finding that factor that ultimately causes student attrition. After analyzing those factors, it is important for educational institutions to make strategic adjustments accordingly to improve student retention in institutions. The problem of student attrition and retention is not new for the educational institutions. It has been enlightened by the researchers from the fields of data mining and information visualization. Now it has become very common research problem for the researchers. Student attrition and retention been observed by the researchers when this problem was raised up to the ratio of 50% on the colleges of Ontario. To reduce attrition rates, institutions should focus on student retention.
University students in all degree programs are motivated to enroll into university programs by a desire for personal accomplishment and completion of a previously set goal. All mature-age students in all degree programs are often believed to be highly motivated to return to university for promotion in their employment, improvement of their professional skills. Kantanis (1999) observed that some mature-age students engage in studies because they want to enjoy personal advancement and achieve a higher status in their professional positions. Hence, motivation to embark on a career is clearly linked to expectations that the career will bring about the desired rewards and prestige. Science is considered to be challenging, hence, students doing Science Education feel proud once they achieve their goal of successful completing the program. For instance, interest in the subject, perception of its usefulness, general desire to achieve, self-confidence, self-esteem, patience and persistence are factors motivating students to engage in studies. In Science Education some students are motivated to choose the program in this area by approval from significant others while other students are motivated by the desire to overcome the perceived challenges in these program as they acquire new knowledge and skills.
2.6. Social factor:
Social support is a factor that can affect academic performance of students both negatively and positively The social support networks have great value to enhance academic performance as students form friendship groups to exchange information on assignments and find out about tutorials and lecture schedules. Peer support and relationships have been found to enhance persistence of students both directly and indirectly.Parker and Johnson (1981) note that student-to-student interactions with peers have shown to be an extremely effective form of learning.