The Cloistered Monkey (Posts about machineLearning entropy decisionTrees)https://necromuralist.github.io/enContents © 2019 <a href="mailto:necromuralist@protonmail.com">Cloistered Monkey</a> Tue, 02 Apr 2019 05:13:36 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rss- Decision Tree Entropyhttps://necromuralist.github.io/posts/decision-tree-entropy/Cloistered Monkey<p>
These are some basic notes on how entropy is used when making decision trees. The examples are taken from <i>Process Mining: Data Science In Action</i>.
</p>
<div id="outline-container-orgcee01a4" class="outline-2">
<h2 id="orgcee01a4">Entropy</h2>
<div class="outline-text-2" id="text-orgcee01a4">
<p>
The equation for entropy is \(E = - \sum\limits_{i=1}^n p_i \log_2 p_i\), where \(p_i\) is the probability of variable \(i\). In other words, for variable \(i\), \(p_i\) is the count of instances for that variable divided by the total number of instances for all variables. Well, I'm probably not saying this clearly enough. A concrete example might be better.
</p>
</div>
</div>
<div id="outline-container-org6f6c168" class="outline-2">
<h2 id="org6f6c168">Dying Young</h2>
<div class="outline-text-2" id="text-org6f6c168">
<p>
This example uses a data set that contains various attributes that might predict if someone died 'young' (less than 70) or 'not young' (70 or older). There are 860 entries with 546 dying young and 314 dying old. We can calculate the entropy for the root node using the proportions of \(young\) (died young) and \(\lnot young\) (didn't die young).
</p>
<p>
\[
E = -(E_{young} + E_{\lnot young})\\
= -(\frac{546}{860} \log_2 \frac{546}{860} + \frac{314}{860} \log_2 \frac{314}{860})\\
\approx 0.9468
\]
</p>
</div>
</div>machineLearning entropy decisionTreeshttps://necromuralist.github.io/posts/decision-tree-entropy/Mon, 06 Mar 2017 22:14:02 GMT