Loom Video - https://www.loom.com/share/d952bab219c4444589ddaef174c0e34d
π Buenos Aires Airbnb Price Tier Classifier
π Project Overview
This project focuses on predicting the price tier (Budget, Standard, or Luxury) of Airbnb listings in Buenos Aires. Instead of a simple price prediction, we engineered advanced features using Unsupervised Learning (K-Means Clustering) to capture neighborhood characteristics and listing types, which were then fed into a Random Forest Classifier.
π Dataset & Features
The dataset contains Airbnb listings from Buenos Aires. Key Features Used:
latitude,longitude: Spatial coordinates.minimum_nights: Rental policy.availability_365: Professionalism indicator.number_of_reviews: Popularity.- Engineered Features:
cluster_id: Generated via K-Means to group similar listings.dist_to_centroid: Distance of listing from its cluster center.
π§ Methodology
1. Unsupervised Learning (Clustering)
Before classification, we applied K-Means Clustering to identify hidden market segments.
- Algorithm: K-Means
- Features for Clustering: Location, Price, Availability.
- Insight: We identified distinct groups such as "Budget & High Traffic", "Luxury/Professional", and "Long-term Residential".
2. Classification Task
We transformed the continuous price target into 3 balanced classes using Quantile Binning:
- Class 0: Low / Budget (Bottom 33%)
- Class 1: Medium / Standard (Middle 33%)
- Class 2: High / Luxury (Top 33%)
3. Model Selection
We trained and evaluated three models:
- Logistic Regression (Baseline)
- Gradient Boosting
- Random Forest (Winner) π
π Model Performance
The Random Forest Classifier was selected as the best model. It demonstrated superior ability to handle non-linear relationships (especially location data) and achieved the best balance between Precision and Recall.
- Selected Model: Random Forest Classifier
- Metric: Weighted F1-Score
π How to Use the Model
You can download the model using the huggingface_hub library and use it in Python:
import pickle
from huggingface_hub import hf_hub_download
import pandas as pd
# 1. Download the model
model_path = hf_hub_download(repo_id="Orib24/Buenos_Aires_Airbnb_Data", filename="airbnb_price_classifier.pkl")
# 2. Load the model
with open(model_path, "rb") as f:
model = pickle.load(f)
# 3. Prepare Data (Example)
# Ensure you have the same features: [minimum_nights, number_of_reviews, availability_365, latitude, longitude, cluster_id, dist_to_centroid]
sample_data = pd.DataFrame([[2, 50, 360, -34.58, -58.42, 1, 0.5]],
columns=['minimum_nights', 'number_of_reviews', 'availability_365', 'latitude', 'longitude', 'cluster_id', 'dist_to_centroid'])
# 4. Predict
prediction = model.predict(sample_data)
print(f"Predicted Class: {prediction[0]}") # Output: 0, 1, or 2
- Downloads last month
- -








