Model Evaluation Visualizations
Feature Importance Summary (i) Shows the importance of each feature in the model's predictions.

This bar plot summarizes the importance of each feature in the model's predictions. The engineered feature `ph*Hardness` has the greatest impact.
Mutual Information Scores (i) Mutual information measures the dependency between features and the target variable.

This plot shows mutual information scores, indicating feature relevance to potability. High `ph*Hardness` values are significant.
Feature Interaction: pH*Hardness

This scatter plot shows how `ph*Hardness` affects predictions, with higher values favoring potability.
Confusion Matrices

These heatmaps display confusion matrices for all models. The Stacking model has the fewest misclassifications.
ROC Curves

This plot shows ROC curves for all models. The Stacking model achieves the highest AUC (e.g., 0.93).
Precision-Recall Curves

This plot displays Precision-Recall curves, focusing on potable water. The Stacking model excels in precision.
Calibration Curves

This plot shows calibration curves, assessing predicted probability reliability. The Stacking model is well-calibrated.
Model Performance Bar Plot
Note: This is an interactive Plotly chart saved as model_performance_bar.html. It compares Accuracy, F1-Score, and ROC-AUC across all models.
5-Fold Cross-Validation Scores

This box plot shows the distribution of F1-scores from 5-fold cross-validation. The Stacking model has the highest median F1-score.
Classification Report

This heatmap provides a classification report for the Stacking model, showing precision, recall, F1-score, and support.
Additional Data Insights
pH Distribution (KDE)

This KDE plot shows the distribution of pH values in potable and non-potable water samples. Potable water is closer to neutral pH.
Hardness Distribution

This QQ plot compares Hardness distribution in potable and non-potable water, showing moderate levels for potable water.
Note: An interactive version is available at hardness_distribution.html.
Chloramines Distribution (KDE)

This KDE plot shows Chloramines distribution, with potable water typically below 4 mg/L.
Correlation Heatmap

This heatmap shows correlations between features, with `ph*Hardness` moderately correlated with potability.
Feature Importance (Stacking Model)

This bar plot shows feature importance from the Stacking model, with `ph*Hardness` as a top contributor.
Class Distribution

This plot shows the distribution of potable and non-potable classes in the dataset.