🏠 Buenos Aires Airbnb Price Tier Classifier

📌 Project Overview

This project focuses on predicting the price tier (Budget, Standard, or Luxury) of Airbnb listings in Buenos Aires. Instead of a simple price prediction, we engineered advanced features using Unsupervised Learning (K-Means Clustering) to capture neighborhood characteristics and listing types, which were then fed into a Random Forest Classifier.

📊 Dataset & Features

The dataset contains Airbnb listings from Buenos Aires. Key Features Used:

latitude, longitude: Spatial coordinates.
minimum_nights: Rental policy.
availability_365: Professionalism indicator.
number_of_reviews: Popularity.
Engineered Features:
- cluster_id: Generated via K-Means to group similar listings.
- dist_to_centroid: Distance of listing from its cluster center.

🧠 Methodology

1. Unsupervised Learning (Clustering)

Before classification, we applied K-Means Clustering to identify hidden market segments.

Algorithm: K-Means
Features for Clustering: Location, Price, Availability.
Insight: We identified distinct groups such as "Budget & High Traffic", "Luxury/Professional", and "Long-term Residential".

2. Classification Task

We transformed the continuous price target into 3 balanced classes using Quantile Binning:

Class 0: Low / Budget (Bottom 33%)
Class 1: Medium / Standard (Middle 33%)
Class 2: High / Luxury (Top 33%)

3. Model Selection

We trained and evaluated three models:

Logistic Regression (Baseline)
Gradient Boosting
Random Forest (Winner) 🏆

🏆 Model Performance

The Random Forest Classifier was selected as the best model. It demonstrated superior ability to handle non-linear relationships (especially location data) and achieved the best balance between Precision and Recall.

Selected Model: Random Forest Classifier
Metric: Weighted F1-Score

🚀 How to Use the Model

You can download the model using the huggingface_hub library and use it in Python:

import pickle
from huggingface_hub import hf_hub_download
import pandas as pd

# 1. Download the model
model_path = hf_hub_download(repo_id="Orib24/Buenos_Aires_Airbnb_Data", filename="airbnb_price_classifier.pkl")

# 2. Load the model
with open(model_path, "rb") as f:
    model = pickle.load(f)

# 3. Prepare Data (Example)
# Ensure you have the same features: [minimum_nights, number_of_reviews, availability_365, latitude, longitude, cluster_id, dist_to_centroid]
sample_data = pd.DataFrame([[2, 50, 360, -34.58, -58.42, 1, 0.5]], 
                           columns=['minimum_nights', 'number_of_reviews', 'availability_365', 'latitude', 'longitude', 'cluster_id', 'dist_to_centroid'])

# 4. Predict
prediction = model.predict(sample_data)
print(f"Predicted Class: {prediction[0]}") # Output: 0, 1, or 2

Downloads last month: -