1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
| class DecisionTreeNode: def __init__(self): self.feature = None self.threshold = None self.left = None self.right = None self.value = None
class DecisionTree: def __init__(self, max_depth=10, min_samples_split=2): self.max_depth = max_depth self.min_samples_split = min_samples_split self.root = None
def _best_split(self, X, y): best_gain = -1 best_feature = None best_threshold = None n_features = X.shape[1]
for feature in range(n_features): thresholds = np.unique(X[:, feature]) for threshold in thresholds: left_mask = X[:, feature] <= threshold right_mask = ~left_mask
if len(y[left_mask]) == 0 or len(y[right_mask]) == 0: continue
gain = information_gain(y, y[left_mask], y[right_mask]) if gain > best_gain: best_gain = gain best_feature = feature best_threshold = threshold
return best_feature, best_threshold
def _build_tree(self, X, y, depth=0): node = DecisionTreeNode()
if depth >= self.max_depth or len(y) < self.min_samples_split: node.value = np.argmax(np.bincount(y.astype(int))) return node
feature, threshold = self._best_split(X, y)
if feature is None: node.value = np.argmax(np.bincount(y.astype(int))) return node
left_mask = X[:, feature] <= threshold node.feature = feature node.threshold = threshold node.left = self._build_tree(X[left_mask], y[left_mask], depth + 1) node.right = self._build_tree(X[~left_mask], y[~left_mask], depth + 1)
return node
def fit(self, X, y): self.root = self._build_tree(X, y) return self
def _predict_one(self, x, node): if node.value is not None: return node.value if x[node.feature] <= node.threshold: return self._predict_one(x, node.left) return self._predict_one(x, node.right)
def predict(self, X): return np.array([self._predict_one(x, self.root) for x in X])
|