Data Mining
Data mining includes all the stages of extracting meaningful data from a
large dataset, formulation, sense-making, and purposeful use. The data
used through the data mining stages are stored in large data warehouses
and databases. The data used in this study were obtained from the
databases of Selcuk University Medical School
Hospital.
Preparation before Data
Processing
A preliminary preparation work was conducted on the dataset first to be
able to implement linear regression and J48 Decision Tree algorithms.
Identity details such as name, surname and identity number were cleared
off the dataset. At the data selection stage, the laboratory parameters
intended to be used in COVID-19 diagnosis process were identified in the
dataset with the help of doctors specialized in their fields.
Finally, the binary and nominal values in the dataset were converted
into numeric data to be able to run the Linear Regression algorithm.
Absence of any missing values in the dataset used in the study provided
advantage for the rest of the work.
WEKA (Waikato Environment for
Knowledge
Analysis)
Used in data mining, WEKA is an open-source software developed in
Waikato University. This data mining instrument provides ready-made
algorithms for data preparation and machine learning. It also provides
tools for data and result visualization.
This study used the Nominal to Numeric and Binary to Numeric algorithms
of WEKA at the dataset preparation stage. The Linear Regression and J48
Decision Tree algorithms were also run on WEKA.
Supervised Learning
In data mining, various algorithms are used to process large amounts of
data. Designed to make estimations using data, these algorithms are
divided into 2, Supervised and Unsupervised Learning. The Linear
Regression and J48 algorithms used in this study are Supervised Learning
algorithms. The main objective in Supervised Learning Algorithms is to
infer, by training the dataset, a formula which has inputs and outputs
and can be shown mathematically as y=f(x), In this formula x represents
the input data, f the entire mathematical procedure applied to the data,
and y estimated results from x.
Supervised Learning Algorithms are used in 2 areas, regression and
classification. Both aim to form a model that can train itself using the
dataset and make estimations.