C5.0 Algorithm Implementation On Web-Based Software and Usability Evaluation

Software is a tool that makes it easy for users to process data quickly and precisely. Decision makers need an alternative software that can be used at any time with the C5.0 algorithm data classification technique according to the desired criteria. However, the existing software generally consists of a number of techniques and cannot be used online. As one of the popular classification algorithms in data mining science, C5.0 can provide better results. This study aims to build software that can classify data using the web-based C5.0 algorithm. Software can be used by anyone, especially decision makers. This research is also complemented by testing Usability software before used. The test results showed that the software built can be accepted with a Usability value of 76.892% and is in the Good predicate. It is hoped that through this research, it can provide alternative software that is able to solve classification problems using the C5.0 algorithm.


Introduction
Data mining is a way to describe the knowledge contained in large-volume databases [1]. The various parsing techniques include classification. The prediction process for an object class whose class label is unknown can be seen from the discovery of a classification model that is able to explain and differentiate data classes [2].
The C5.0 algorithm is the classification that has helped many users in solving problems such as bank customer credit [3] and system recommendations [4]. In several studies, the C5.0 algorithm was run with several application tools, such as R language [5], RapidMiner [6], [7] and SPSS [8]. These applications support the processing of classified data to produce a decision tree. Generally, the

Literature Review
The C5.0 algorithm has been used to classify many cases in various fields and various applications. Research [5] classified 15 factors that influence on-time graduation for students, including gender, regional origin, entry status, number of credits and GPA, parent's occupation and so on. The criteria were tested on the student data of a college using the R programming. The R results show that the 6th semester GPA, 6th semester SKS, 4th semester GPA, gender, 2nd semester GPA, high school type, regional origin, and 4th semester GPA are the selected factors that influence on time graduation from a student.
In research [6], testing of product certification data for SNI mark users on bottled drinking water uses the Rapid Miner application. This application generates 7 rules that form the basis of classification. In addition, Rapid Miner was also successful in solving the cleaning service selection problem at PT. ISS Indonesia Medan. The criteria used are education, height, weight, and experience. RapidMiner provides the greatest Gain value on the experience criterion.
Based on the explanation above, it shows that the application used in solving the C5.0 Algorithm case cannot be done online. The diversity of users and information needs in real time raises different interests, thus enabling the development of software specifically designed for a particular interest.
But as a consequence, the software created must be tested. The purpose of being web based is to make it easier for interested users in an organization to classify data. Software can be tested with Usability. In research [10], usability successfully tested a smart academic system based on user experience and a web-based shortest route search system [11].
This study is to develop software to classify data using a web-based C5.0 Algorithm with good and interactive standards. It is still possible to do this because there are still companies or users who want it so that it can be an alternative to the classification process.

Research Method
This research was conducted following a framework designed to facilitate the achievement of objectives. It is shown in Figure 1. The step is started from studying about the C5.0 algorithm and its implementation. C5.0 algorithm is developed by Ross Quinlan in 1987 [1]. The principle of this algorithm is to produce a decision tree based on highest information gain with the following equation [14]: Where: Info (D) or Entropy is the information needed to classify class labels, whereas pi is a non-zero probability with a random tuple in D. To generate Entropy based on attribute A, equation (2) is used.
To get the information gain from partition A, equation (3) is used. ( Where: Gain (A) states the number of branches that will be obtained on A.
The next step is designing and building a software based on web. The software design used Hypertext Preprocessor (PHP) and enter equations (1), (2) and (3) to obtain the desired classification. For software testing, the Covid-19 dummy data was used. The data was obtained from the explanation of health experts generally at a hospital in Pekanbaru. Data is shown in Table 1. Data is classified to get status of monitored person (ODP), monitored patient (PDP) dan people without symptoms (OTG).

Table 1. Dummy Data
The last step is testing the software using usability, because software generally provides a number of features or menus to make it easier for users [15]. In [11] and [10] research, explained the five components of usability which are used as a measure of software success, namely the system is easy to operate and understand (learnability), the speed of the system can help the user (efficiency), the system is easy to learn, so if it is not used in the long term, the user still able to easily operate (memorability), the system has a minimum error rate (error) and it means that the user is satisfied to use it and feels helped by this system (satisfaction Stage III : determine the usability value of the software with equation (6) Usability testing was carried out before implementing online by distributing questionnaires to 30 respondents. Respondents filled out a questionnaire consisting of 14 questions using a five-value Likert scale, as shown in Table 2.

Table 2. Likert Scale
Source: [16] In Table 2, the scale shows the description of each questionnaire answer. 1 is the lowest scale while 5 is the highest.

Results and Analysis
To be able to use software that has been designed, users can login by entering a username and password as in Figure 2. Next, the user is directed to the main menu which consists of Dashboard, Training Data, Calculation Results, Decision Tree and Sign Out.  Figure 3 is a software dashboard display that is made. Below the dashboard writing, there is a display that shows the amount of data that has been previously entered. If the data does not yet exist or data needs to be taken from outside (import data), a menu of Training Data has been prepared to carry out these activities. The display of the Training Data menu is in Figure 4.

Figure 4. Data Training Display
After the data is prepared according to the C5.0 algorithm testing needs, the user can carry out the classification process through the Decision Tree menu. To provide users convenience, this software provides a Calculation Result menu for confirmation before the classification process is carried out.

Figure 5. Confirmation Display
The confirmation display shown in Figure 5 aims to ensure the user that the selected data will be tested using the C5.0 algorithm. if there is a data error, the user can return to the previous menu.
Next is the classification process using the C5.0 algorithm by selecting the Process Data Now button. The calculation results will appear in the Decision Tree menu.  The classification process is shown in its entirety, as shown in Figure 6, starting from the search for the Root Node to the leaf node which later becomes a decision tree by referring to equation (1), (2) and (3). Decision tree logic, built automatically using Array. Criteria that can form the next level of leaf nodes as well as criteria that only have one value (positive or negative), can directly form a decision tree as shown in Figure 7. Based on the decision tree, there are rules that shape the desired knowledge as a result of extracting Covid-19 data as shown in Figure 8. The classification rules generated by web-based software automatically become knowledge about the characteristics of Covid sufferers with ODP, PDP and OTG status. There are 13 rules. Details can be seen in Figure 9 below.

Figure 9. All Rules
Software testing is done by distributing questionnaires then processed according to the Usability testing stage. To get the initial value for each Usability component, equation (4) is used with the results as in Table 3.
. Furthermore, the number of respondents who filled out the questionnaire was calculated based on the Likert scale value. Based on the filled questionnaire data, there were 8 respondents who filled in the value 5, 11 respondents filled in the value 4, 8 respondents filled in the value 3, 3 respondents filled in the value 2 and no respondent filled the questionnaire with a value of 0. To get the maximum score, the number of respondents who fill in according to value, multiplied by the number of respondents. Referring to the equation (5), the percentage of each component of Usability in Table 4.  To be able to provide The predicate of the assessment is based on the percentage of each component according to Table 5, then dividing the percentage interval for the five scales used:

Table 5. Percentage of Likert Scale
Provisions Table 5 shows that the software assessment for each component is in a different predicate, where only the Efficient component is considered Very Good by the respondent because it has a percentage value of 81.53%. The other components are in the Good predicate.
The last step of calculating Usability is obtaining the Usability value of the software itself. Referring to equation (6), the Usability value ofl the software designed is 76.892%. This means that respondents rated the web-based software that applies the C5.0 algorithm is Good.

Conclusion
Overall, the software produced from this study can assist users in classifying the Covid-19 dummy data using the C5.0 algorithm. This is indicated by the usability results of 76.892% and a good level. The various menus in the software are very easy to understand and execute by the user. The software can process data well with the right results. In addition, a menu is provided to retrieve data from outside the software to make it easier for users to change data and add classification criteria as needed. However, when viewed from the satisfaction component, the software still has to be equipped with an attractive appearance and menus that better support the C5.0 algorithm, such as the process of turning and simplifying the rules of the decision tree. This is what makes satisfaction get the lowest score, as shown in table 4. For this reason, the limitations of the software being built can still be developed in order to produce better and more accurate knowledge or information.