SkyPrice: Predicting Apartment Prices in Mexico City

Estimate the price of apartments in Mexico City using regression models trained on current data from the real estate market.

SkyPrice

Introduction

SkyPrice is a web application designed to estimate the prices of apartments in Mexico City using advanced regression models. This project, part of my BSc thesis at ESCOM, IPN, leverages current data from the real estate market to provide accurate price predictions, helping potential buyers and investors make informed decisions.

SkyPrice Homepage

Visit the SkyPrice website: https://skyprice.xyz to explore the application and learn more about the project.

Motivation

The motivation behind creating SkyPrice was to address the challenges faced in real estate valuation, where accuracy and efficiency are crucial1. The complexity of the real estate market often makes it difficult to determine appropriate prices, and existing tools do not fully meet the needs of the Mexican market2.

Technologies Used

SkyPrice integrates various technologies, including:

Data

The data for SkyPrice was sourced from various reliable platforms such as datos.gob.mx and the Mexico City data portal. The datasets incorporated include:

KeplerGL Data Visualization

Importance of Each Dataset

Each dataset plays a crucial role in the prediction models:

Data Challenges and Solutions

We faced several challenges in data collection and cleaning:

Methodology

The methodology of SkyPrice involves several key steps:

  1. Data Collection and Cleaning: Gathering data from multiple online platforms and cleaning it to ensure accuracy and consistency.
  2. Model Selection and Training: Using regression models such as SVM, Random Forest, and Neural Network. A grid of hyperparameters was used to fine-tune the models.
  3. Model Evaluation: Comparing the performance of different models using metrics like R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).

The following diagram illustrates the methodology of SkyPrice (in spanish):

Model Training Pipeline

Detailed Process

Model Performance

We evaluated the models using metrics like R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), and selected the best-performing model based on these evaluations for each algorithm.

The following plots show the predictions vs. actual prices for each model:

Model Performance Comparison

Public API

SkyPrice offers a public API built with FastAPI, allowing users to integrate apartment price predictions into their own applications. The API endpoints provide functionalities such as:

The API is deployed on AWS and is accessible at https://api.skyprice.xyz.

The public API is intended to be “fair use” and is available for free to users who comply with the terms of service, which include rate limiting and proper attribution.

API Documentation

In this blog post, we provide a brief overview of the API endpoints and their functionality. The API documentation is available through Swagger UI and Redoc, providing detailed information on the available endpoints and their usage.

You can access each of the following API documentation interfaces:

In the following image you can see both the Swagger UI and Redoc interfaces:

API Documentation

Predict Endpoint

The /predict endpoint allows users to input property details and get price predictions. The input JSON should include the following fields:

And the output JSON will include the predicted price for each model, allowing users to compare the predictions.

Example API Request

1
2
3
4
5
6
7
8
9
10
11
{
  "Size_Terrain": 140,
  "Size_Construction": 130,
  "Bedrooms": 3,
  "Bathrooms": 2,
  "Parking": 2,
  "Age": 7,
  "Lat": 19.4326,
  "Lon": -99.1332,
  "Municipality": "Álvaro Obregón"
}

Example API Response

1
2
3
4
5
{
  "Random_Forest": 6420750.0,
  "SVM": 6164968.58,
  "Neural_Network": 5167572.5
}

Metadata Endpoint

The /models endpoint provides metadata about the models used in the API, including information about the features, hyperparameters, and evaluation metrics. It also includes information about the data used for training the models.

Example Metadata Response

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
{
  "dataset": {
    "original": [32520, 18],
    "training": {
      "X": [11169, 9],
      "y": [11169, 1]
    },
    "testing": {
      "X": [3724, 9],
      "y": [3724, 1]
    }
  },
  "models": {
    "random_forest": {
      "mse": 2767108007816.376,
      "rmse": 1663462.6559728885,
      "ci": [-3288575.5823341985, 3232135.171812599],
      "mae": 704079.571484675,
      "r2": 0.9038900065218292,
      "feature_importances": [
        0.0011281899895630631, 0.0017738994702383077, 0.0019866359182250486,
        0.0007018637582167237, 0.0014665387673904304, 0.0023498577692012985,
        0.00004089436452670091, 0.000018750183784930174,
        0.000025032684545761422, 0.0001471126291344862, 0.0022393854044217604,
        0, 0.0000018591757405458183, 0.00033878248531502486,
        0.000024947352434527034, 0.000004803359328795989, 0.10737325614938063,
        0.6581596171662714, 0.018088970068768237, 0.0078064922484012405,
        0.011072231256017567, 0.06306614405016867, 0.05947059290138905,
        0.06271414284753582
      ],
      "max_features": 1,
      "max_depth": 30,
      "n_estimators": 1000,
      "oob_score": true
    },
    "svm": {
      "mse": 8718507842976.787,
      "rmse": 2952711.947172766,
      "ci": [-6072782.238924998, 5435880.829712623],
      "mae": 1843353.6959805356,
      "r2": 0.697179969281674,
      "kernel": "linear",
      "C": 1000,
      "epsilon": 0.01
    },
    "neural_network": {
      "mse": 6774725145058.313,
      "rmse": 2602830.2182544125,
      "ci": [-5522911.848758079, 4471074.026658187],
      "mae": 1597042.1520542428,
      "r2": 0.637984037399292,
      "learning_rate": 0.10000000149011612,
      "beta_1": 0.9,
      "beta_2": 0.999,
      "epsilon": 1e-7
    }
  }
}

Web Application

The web application for SkyPrice is developed using React and Next.js with Material-UI for a responsive and user-friendly interface. Key features of the application include:

User Interface Design

The web application emphasizes usability and accessibility, with features like:

Responsive design for price prediction tool

Price Prediction Tool

The price prediction tool allows users to input property details and get instant price estimates based on the trained models. Users can compare the predictions from different models and make informed decisions.

In the following image, you can see the price prediction tool form:

Price Prediction Tool Form

Now let’s see an example of the results obtained from the price prediction tool:

Price Prediction Tool Results

Currency Conversion

The prices are displayed in Mexican Pesos (MXN), but the application also provides a currency conversion feature that allows users to view the prices in other currencies, such as USD, EUR, or CAD.

Currency Conversion

Interactive Map

The interactive map provides a visual representation of apartment listings and nearby amenities, such as hospitals, schools, and supermarkets. Users can explore the map to understand the distribution of properties and their proximity to essential services.

By integrating KeplerGL, we were able to create an interactive map that provides valuable insights into the real estate market in Mexico City while also allowing users to explore the data in a user-friendly manner.

Interactive Map

If you want to explore the interactive map, you can visit the SkyPrice website: https://skyprice.xyz.

Model Information

The web application also includes detailed information about the models used for price prediction, the data sources, and the methodology followed. This section provides transparency and helps users understand the process behind the price predictions.

In the following image, you can see the model information section, particularly the charts showing the models used and their performance:

Model Information

Telegram Chatbot

SkyPrice features an advanced Telegram chatbot, “SkyPriceChatbot,” which leverages the power of natural language processing using GPT-3.5. This chatbot connects to the SkyPrice API and provides users with instant price predictions. While the primary language for instructions is Spanish, the chatbot can interpret multiple languages.

You can access the chatbot in Telegram by searching for @SkyPriceChatbot or using the following URL: https://t.me/SkyPriceChatbot.

Setup and Features

SkyPriceChatbot Screenshot

Example Interaction

The chatbot tries to extract the necessary information from the user’s input and provides a detailed response with price predictions. Here’s an example interaction:

User: “Quiero saber el precio de un departamento en Álvaro Obregón con 3 recámaras, 2 baños y 2 estacionamientos, de 140m² de terreno y 130m² de construcción, con 7 años de antigüedad.”3

SkyPriceChatbot:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🏢 Detalles del departamento extraídos:

📏 Tamaño del terreno: 140m²
🏗️ Tamaño de la construcción: 130m²
🛏️ Número de habitaciones: 3
🚽 Número de baños: 2.5
🚗 Número de estacionamientos: 2
🕰️ Antigüedad: 7 años
🌍 Alcaldía: Álvaro Obregón

💰 Precios estimados:

🌳 Random Forest: $6,420,750.00 MXN
📈 SVM: $6,164,968.58 MXN
🧠 Neural Network: $5,167,572.50 MXN

🔍 Puedes encontrar más detalles en https://skyprice.xyz 🏡

4

Methodology and Implementation

The chatbot’s functionality is divided into three main components:

  1. Natural Language Processing: The chatbot uses OpenAI’s GPT-3.5 for natural language processing to understand user queries and extract the required information for the SkyPrice API.
  2. API Integration: The chatbot connects to the SkyPrice API to retrieve price predictions based on the user’s input.
  3. Response Generation: The chatbot generates a detailed response with the extracted property details and price predictions.

Conclusion

SkyPrice represents a comprehensive approach to predicting apartment prices in Mexico City, combining data science, machine learning, and web development. This project not only serves as a valuable tool for users but also showcases the integration of multiple technologies to solve real-world problems.

Impact and Lessons Learned

SkyPrice has demonstrated the potential of machine learning in real estate valuation. Throughout the project, we learned about the importance of data quality, feature engineering, and model evaluation.

Future Enhancements

SkyPrice has great potential for future enhancements, such as:

  1. E. Pagourtzi, V. Assimakopoulos, and T. Hatzichristos, “Real estate appraisal: a review of valuation methods,” J. Property Invest. Finance, vol. 21, no. 4, pp. 383–401, 2003. 

  2. Z. Nedović-Budić, A. K. Pinto, and L. D. Budic, “Gis database deve- lopment and exchange: Interaction mechanisms and motivations,” J. Urban Plan. Dev., vol. 126, no. 2, pp. 51–73, 2000. 

  3. “I want to know the price of an apartment in Álvaro Obregón with 3 bedrooms, 2 bathrooms, and 2 parking spaces, 140m² of land and 130m² of construction, 7 years old.” 

    • 🏢 Apartment details extracted:
    • 📏 Land size: 140m²
    • 🏗️ Construction size: 130m²
    • 🛏️ Number of bedrooms: 3
    • 🚽 Number of bathrooms: 2.5
    • 🚗 Number of parking spaces: 2
    • 🕰️ Age: 7 years
    • 🌍 Borough: Álvaro Obregón
    • 💰 Estimated prices:
    • 🌳 Random Forest: $6,420,750.00 MXN
    • 📈 SVM: $6,164,968.58 MXN
    • 🧠 Neural Network: $5,167,572.50 MXN
    • 🔍 You can find more details at https://skyprice.xyz 🏡