Model Card
Model Card Authors
Mathew
Model Description
This is a KMeans clustering model trained on the UCI Wine dataset. The model groups wines into clusters based on 13 chemical analysis features such as alcohol, flavanoids, color intensity, and proline. The dataset has three ground truth classes (wine cultivars; simply called classes in the dataset), which were used to evaluate clustering performance but not during training. The used K-value was K=3, for 3 different classes.
Intended Uses & Limitations
This clustering model is for educational purposes only. It is not suitable for production use because the dataset is relatively small (178 samples) and well-structured, which makes clustering easier than in more complex, real-world datasets. Results should not be generalized beyond this dataset.
Training Data
Data source: UCI Wine dataset (https://archive.ics.uci.edu/dataset/109/wine). The dataset contains 178 wines described by 13 continuous chemical features. Ground truth labels (three classes) were used only for evaluation.
Evaluation Metrics
- Silhouette Score: 0.264
- Adjusted Rand Index (ARI): 0.849
- Normalized Mutual Information (NMI): 0.82
Ethical Considerations
While clustering can be used to show patterns in data, it's influence in decision-making should be used with caution.The model may find a 'cluster', but just because it places two things in the same group doesn't inherently mean anything.In this context, we are given the true labels, and can verify how well the model performed;In real-world applications, this is not the case. It is also worth noting that this dataset is clean and small, making it more useful for educational topics,less so in real-world applications.
Audit Questions
- How stable are the clusters across different random seeds or initialization methods?
- How do ARI and NMI compare to supervised classification accuracy?
Plots
Clusters Plot
- Downloads last month
- 3
