Take Home Exam: Exploring the Depths of Tree-Based Ensemble Models

1. Introduction

In this project, we delve into the intricate world of tree-based ensemble models, using the renowned Iris dataset—a cornerstone in machine learning for classification tasks. The Iris dataset, easily accessible from the scikit-learn library, comprises 150 samples of iris flowers across three species (setosa, versicolor, and virginica), with four features describing each sample: sepal length, sepal width, petal length, and petal width.

Your task is to navigate through the complexities of ensemble modeling to predict the species of iris flowers based on these features. This endeavor will not only deepen your understanding of tree-based models and their ensemble but also sharpen your skills in manipulating data, engineering features, and critically evaluating model performance.

2. Objective

You are tasked with constructing a Gradient Boosting model from scratch to tackle this classification problem. Your model should incorporate key components of gradient boosting algorithms, such as decision trees as weak learners, a loss function to be minimized, and a mechanism for adding trees to the model in a way that reduces the overall prediction error.

Specific Requirements:

3. Evaluation

Your project will be evaluated based on the following criteria, totaling 100 points:

Extra Mission - Expert Question (40 Points):

Note: The extra mission points are additive, raising the maximum possible score to 140 points. This challenge is designed to push your understanding and capabilities to the expert level, focusing on one of the key aspects that make tree-based models particularly valuable for practical machine learning tasks.