|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Course Lecturers |
Christian Gibson, Tiberiu Lupascu |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
E-mail addresses |
c.h.s.gibson@hva.nl t.d.lupascu@hva.nl |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Lecture 5 |
Study the decision tree Tutorial
This lecture examined Datamining from an AI perspective. In particular we took a look at how to build a decision tree - an AI tool which learns from past experience. Essentially, a tree is a basic data structure. Previously we handled the tree data structures and traversing the tree in the IIE course modules Data Structures M2.06 (see also Deitel & Deitel, Java How to Program 6th edition, chapter 17) and KDD M4.03. In this lecture we showed how you can build a decision tree by choosing the attributes of a dataset as nodes for the tree.
To start of with you need to build up a data set with experience of the subject for which you need to make intelligent predictions. Perhaps you want to make sales forecast for a new car model or maybe you want to predict whether a share value is likely to increase or decrease during the coming year. To demonstrate the possibilities with a decision tree we use the test data set containing 20 basket ball match records from a basket ball team called the Mall rats. The test dataset and the applets which you can use to build the decision trees are provided by the University of Alberta – which explains the choice of basket ball matches as an example.
The decision tree is probably the most frequently used tool for KDD software programs. The effectiveness of the tree in making accurate predictions is based on the choice of the ‘splitting algorithm’ and on the quality of the recorded data. We are assuming that past experience will help us to predict the future. In other words we hold that the way events transpire in the world is not random. The age of the players will probably be one factor which influences their performance. A team of 6-year old children would be unlikely to win a football match when pitted against the number 1 adult team of the national football league. I also wouldn’t bet on a team of ‘seniors’ of 50+ years of age playing against the 1st team of Manchester United or Barcelona.
When we build our dataset, we therefore want to record as much information as possible about attributes of the subject which we expect to be relevant in determining the performance. The next trick is to select these attributes to build a decision tree.
Take the test data set for the basket ball matches:
The dataset consists of 20 records and 7 attributes per record. The last attribute of each record is the target attribute which we want to predict. Each record represents a ‘case in hand’ or an ‘instance’. In a new match the data in the first six attributes is known. What we don’t know is the result of the match which still has to be played. In order to build a decision tree which can predict this result we have to choose an attribute as the root of the tree.In this data set we see that all the matches which started at 7pm resulted in a win for the Mall Rats. Perhaps this is a good indicator for future results? We could choose this attribute as the root node of the tree and then continue, level by level to choose attributes to fit int our tree. Actually, a random selection of attributes is not a good idea. That is, the resulting tree is unlikely to have any real predictive value. How then should we select attributes to function as nodes in our decision tree? To discover how to do this I recommend that you download the document I have made using material from the Alberta University website. The document provides a guide to building a decision tree using various techniques for selecting attributes to function as a node in the tree. Most selection procedures are in one way or another connected with the information gain principle. Information Gain The rules are simple. A tree consists of:
Attributes which have been selected from the dataset as a node in the tree may not be repeated in the same branch. If we chose the time of the match as the root node, then this attribute may not be used again in any branch of the tree since all branches originate from the root node. Also attributes should have a discrete value. If a non discrete value (e.g. temperature or price) is recorded in the dataset then we must classify the values into discrete classes in order to be able to use the data in a decision tree. Entropy - the 2nd law of thermodynamics During the lecture we briefly mention the concept of entropy - a mathematical concept which basically refers to the ammount of 'disorder' present in a universum. The second law of thermodynamics holds the everything in the universe is heading towards a maximum state of 'disorder' (chaos). We can get a feeling of what this means by considering a bathtub full of cold water. If I run the hot water tap the hot water will slowly mix with the cold water until finally the average temperature of the water is the same. Initially however there will be a difference. In some parts of the bath, the water will be warmer than in other parts of the bath - that is, the water molecules show a certain amount of order. There will be concentrations of rapidly moving molecules (warm water) and slowly moving molecules (cold water) which are not perfectly mixed. In the end however the individual concentrations will disappear. The ordering of the molecules in separate concentrations will have completely disappeared.
The idea of information gain is that we should ideally chose nodes which produce the maximum reduction in entropy. The entropy measure of a leaf node is 0. That is, all the instances in the data set which reach this leaf node have the same properties. The principle of Occam's razor dictates that a smaller tree is better than a larger tree. By achieving a maximum reduction in entropy at each level in the tree, we will reach zero entropy (the leaf node) with a minimum number of levels. The entropy formula is:
and the information gain formula (entropy reduction between two levels) is:
The first term on the right hand side of the expression is the entropy of a node at level n and the second term is the average entropy of the nodes in the same branch at level n+1; the bigger the drop in entropy between the two levels, the greater the information gain.
I have included several documents in the files which can be downloaded for this lecture, which explain the relationship between entropy and information gain in more detail. It is not necessary to understand the mathematics in order to apply the gain algorithm into a procedure for building a tree, but for readers who are interested I recommend you try to follow the explanations given. It is less difficult than it looks. Actually with a reasonable command of school mathematics you should be able to follow it. The tutorial link at the start of this summary provides an excellent explanation and is well recommended by a number of sites. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
5 minutes break |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After the lecture students started on the homework assignment. The task was to:
To carry out this assignment you will probably need to refer to the documents and applets which can be downloaded for this lecture. To run the applets you will have to have the Java runtime engine installed on your computer. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Homework The homework assignment associated with this lecture is an extension of the workshop exercises. In all there are 6 different versions:
For assistance in completing this assignment you can refer to tutorials and calculation applets provided by: http://www.cs.ubc.ca/labs/lci/CIspace/Version4/dTree/index.html
and
http://www.cs.ualberta.ca/~aixplore/learning/DecisionTrees/index.html
1) Create a data set Create a data set for predicting whether a team will win or lose a football match. Ø Each record (instance) in the data set should consist of 5 attributes (input parameters) and 1 label (output parameter). Ø The dataset should consist of 20 records.
2) Build a decision tree for predicting the outcome of new unlabelled instances, using 15 instances from your existing data set Ø Use the information gain algorithm you have used for node splitting Ø Specify the entropy reduction values generated by each node choice.
3) Test the decision tree with the remaining 5 instances
The alternative homework assignments were the same as shown here with the exception of the first part of step 1. The other versions were:
The homework assignment should be submitted at the latest by the last day prior to lecture 6 (hybrid systems). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||