Loom Video - https://www.loom.com/share/d952bab219c4444589ddaef174c0e34d

🏠 Buenos Aires Airbnb Price Tier Classifier

Python Library Task

πŸ“Œ Project Overview

This project focuses on predicting the price tier (Budget, Standard, or Luxury) of Airbnb listings in Buenos Aires. Instead of a simple price prediction, we engineered advanced features using Unsupervised Learning (K-Means Clustering) to capture neighborhood characteristics and listing types, which were then fed into a Random Forest Classifier.

πŸ“Š Dataset & Features

The dataset contains Airbnb listings from Buenos Aires. Key Features Used:

  • latitude, longitude: Spatial coordinates.
  • minimum_nights: Rental policy.
  • availability_365: Professionalism indicator.
  • number_of_reviews: Popularity.
  • Engineered Features:
    • cluster_id: Generated via K-Means to group similar listings.
    • dist_to_centroid: Distance of listing from its cluster center.

🧠 Methodology

1. Unsupervised Learning (Clustering)

Before classification, we applied K-Means Clustering to identify hidden market segments.

  • Algorithm: K-Means
  • Features for Clustering: Location, Price, Availability.
  • Insight: We identified distinct groups such as "Budget & High Traffic", "Luxury/Professional", and "Long-term Residential".

image

image

image

image

image

image

image

image

2. Classification Task

We transformed the continuous price target into 3 balanced classes using Quantile Binning:

  • Class 0: Low / Budget (Bottom 33%)
  • Class 1: Medium / Standard (Middle 33%)
  • Class 2: High / Luxury (Top 33%)

3. Model Selection

We trained and evaluated three models:

  1. Logistic Regression (Baseline)
  2. Gradient Boosting
  3. Random Forest (Winner) πŸ†

πŸ† Model Performance

The Random Forest Classifier was selected as the best model. It demonstrated superior ability to handle non-linear relationships (especially location data) and achieved the best balance between Precision and Recall.

  • Selected Model: Random Forest Classifier
  • Metric: Weighted F1-Score

image

πŸš€ How to Use the Model

You can download the model using the huggingface_hub library and use it in Python:

import pickle
from huggingface_hub import hf_hub_download
import pandas as pd

# 1. Download the model
model_path = hf_hub_download(repo_id="Orib24/Buenos_Aires_Airbnb_Data", filename="airbnb_price_classifier.pkl")

# 2. Load the model
with open(model_path, "rb") as f:
    model = pickle.load(f)

# 3. Prepare Data (Example)
# Ensure you have the same features: [minimum_nights, number_of_reviews, availability_365, latitude, longitude, cluster_id, dist_to_centroid]
sample_data = pd.DataFrame([[2, 50, 360, -34.58, -58.42, 1, 0.5]], 
                           columns=['minimum_nights', 'number_of_reviews', 'availability_365', 'latitude', 'longitude', 'cluster_id', 'dist_to_centroid'])

# 4. Predict
prediction = model.predict(sample_data)
print(f"Predicted Class: {prediction[0]}") # Output: 0, 1, or 2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support