机器学习（七） — 决策树

发布时间：2024年01月18日

model 4 — decision tree

1 decision tree

1. component

usage: classification

root node
decision node

2. choose feature on each node

maximize purity (minimize inpurity)

3. stop splitting

a node is 100% on class
splitting a node will result in the tree exceeding a maximum depth
improvement in purity score are below a threshold
number of examples in a node is below a threshold

2 meature of impurity

use entropy( $H$ ) as a meature of impurity

$H(p) = -plog_2(p) - (1-p)log_2(1-p)\\ note: 0log0 = 0$

在这里插入图片描述

3 information gain

1. definition

$infomation\_gain = H(p^{root}) - (w^{left}H(p^{left}) + w^{right}H(p^{right}))$

2. usage

meature the reduction in entropy
a signal of stopping splitting

3. continuous

find the threshold that has the most infomation gain

在这里插入图片描述

4 random forest

generating a tree sample

given training set of size m
for b = 1 to B:
	use sampling with replacement to create a new training set of size m
	train a decision tree on the training set

randomizing the feature choice: at each node, when choosing a feature to use to split, if n features is available, pick a random subset of k < n(usually $\sqrt{n}$ ) features and alow the algorithm to only choose from that subset of features

文章来源:https://blog.csdn.net/m0_65591847/article/details/135641799
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！