KLI

검색

Ulsan Univ. Repository Thesis Industry Smart IT Convergence Engineering

Data Mining and Analysis of Supervised & Unsupervised Learning Algorithms

Metadata Downloads

Abstract: Data mining is a process of investigating large pre-existing databases to gather new information. Data mining is a connection between computer science and statistics used to discover patterns in the data. The main objective of the data mining process is to mine the useful information from the data and formulate it into an understandable/logical structure for further use. The large data is sorted into sets to categorize patterns and create relationships to resolve problems through data analysis. Supervised (Classification) and unsupervised (Clustering) learning techniques are discussed in this research work.
Supervised machine learning is a method of machine learning. It involves allocating the specific data in such a way that a specific type of pattern or function can be extracted from that labeled data. Classification is defined as the function of learning in which provided data items are mapped into more than a few classes that are predefined.
Unsupervised techniques are essentially initiated from the sets of unlabeled data so, these are directly associated to figure out the unfamiliar properties in clusters. Clustering is a technique and process of unsupervised learning, used for the analysis of the statistical data exploited in several fields.
The analysis of Supervised (Classification) and Unsupervised (Clustering) learning techniques are based on accuracy and time studied in this research work. The Classification algorithms K Nearest Neighbor (KNN), Backpropagation (BP), Naïve Bayes, and Support Vector Machine (SVM) are compared by using different datasets through the testing tool Weka 3.8. The clustering algorithms K-means and Expectation-Maximization (EM) are also compared based upon accuracy and time by using Rapid miner and Weka 3.8 tools. The results show that the classification algorithm back-propagation performs with good accuracy as compared to the remaining classification algorithms. KNN performs timely executions as compared to other classification algorithms in supervised learning techniques. The clustering algorithm k-means shows good accuracy as compared to Expectation-Maximization (EM). K-means algorithm produces quality clusters as compared to Expectation-Maximization (EM).