SkyPrice: Predicting Apartment Prices in Mexico City
Estimate the price of apartments in Mexico City using regression models trained on current data from the real estate market.
SkyPrice
Introduction
SkyPrice is a web application designed to estimate the prices of apartments in Mexico City using advanced regression models. This project, part of my BSc thesis at ESCOM, IPN, leverages current data from the real estate market to provide accurate price predictions, helping potential buyers and investors make informed decisions.
Visit the SkyPrice website: https://skyprice.xyz
to explore
the application and learn more about the project.
Motivation
The motivation behind creating SkyPrice was to address the challenges faced in real estate valuation, where accuracy and efficiency are crucial1. The complexity of the real estate market often makes it difficult to determine appropriate prices, and existing tools do not fully meet the needs of the Mexican market2.
Technologies Used
SkyPrice integrates various technologies, including:
- Machine Learning (ML) for the regression models trained on real estate data.
- FastAPI for backend API development.
- React and Next.js with Material-UI for frontend
- KeplerGL for data visualization
- Telegram API for Chatbot
- OpenAI for natural language processing.
- Docker for containerization
- AWS for deployment
Data
The data for SkyPrice was sourced from various reliable platforms such as datos.gob.mx and the Mexico City data portal. The datasets incorporated include:
- Real estate listings with price and property details, extracted from online platforms.
- Locations of hospitals, schools, universities, malls, supermarkets.
- Historical earthquake data.
- Postal codes.
- Demographic information.
- Acoustic and water contamination data.
Importance of Each Dataset
Each dataset plays a crucial role in the prediction models:
- Real estate listings: Provide the core data for price estimation, including property details and prices. This data is used for training and testing the models.
- Hospitals, schools, etc.: Influence the desirability and value of the location.
- Earthquake data: Important for assessing risk factors.
- Demographic information: Help in geospatial analysis and segmentation.
Data Challenges and Solutions
We faced several challenges in data collection and cleaning:
- Inconsistencies in data formats: Addressed by standardizing formats during preprocessing.
- Missing values: Handled through imputation techniques.
- Data integration: Managed by using KeplerGL for seamless visualization and analysis.
- Natural language processing: Utilized OpenAI for processing unstructured data in the chatbot.
Methodology
The methodology of SkyPrice involves several key steps:
- Data Collection and Cleaning: Gathering data from multiple online platforms and cleaning it to ensure accuracy and consistency.
- Model Selection and Training: Using regression models such as SVM, Random Forest, and Neural Network. A grid of hyperparameters was used to fine-tune the models.
- Model Evaluation: Comparing the performance of different models using metrics like R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).
The following diagram illustrates the methodology of SkyPrice (in spanish):
Detailed Process
- Data Preprocessing: Included handling missing values, normalizing data, and encoding categorical variables.
- Feature Selection: Identified key features that impact apartment prices, such as location, size, and proximity to amenities.
- Model Training: Used a combination of SVM, Random Forest, and Neural Network with hyperparameter tuning.
Model Performance
We evaluated the models using metrics like R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), and selected the best-performing model based on these evaluations for each algorithm.
The following plots show the predictions vs. actual prices for each model:
Public API
SkyPrice offers a public API built with FastAPI, allowing users to integrate apartment price predictions into their own applications. The API endpoints provide functionalities such as:
- Predicting apartment prices based on input features.
- Accessing metadata about the model and data.
- Retrieving adjustment plots for the models (as shown above).
The API is deployed on AWS and is accessible at https://api.skyprice.xyz
.
The public API is intended to be “fair use” and is available for free to users who comply with the terms of service, which include rate limiting and proper attribution.
API Documentation
In this blog post, we provide a brief overview of the API endpoints and their functionality. The API documentation is available through Swagger UI and Redoc, providing detailed information on the available endpoints and their usage.
You can access each of the following API documentation interfaces:
In the following image you can see both the Swagger UI and Redoc interfaces:
Predict Endpoint
The /predict
endpoint allows users to input property details and get price
predictions. The input JSON should include the following fields:
Size_Terrain
: Size of the terrain in square meters.Size_Construction
: Size of the construction in square meters.Bedrooms
: Number of bedrooms.Bathrooms
: Number of bathrooms.Parking
: Number of parking spaces.Age
: Age of the property in years.Lat
: Latitude of the property.Lon
: Longitude of the property.Municipality
: Municipality where the property is located.
And the output JSON will include the predicted price for each model, allowing users to compare the predictions.
Example API Request
1 2 3 4 5 6 7 8 9 10 11 | { "Size_Terrain": 140, "Size_Construction": 130, "Bedrooms": 3, "Bathrooms": 2, "Parking": 2, "Age": 7, "Lat": 19.4326, "Lon": -99.1332, "Municipality": "Álvaro Obregón" } |
Example API Response
1 2 3 4 5 | { "Random_Forest": 6420750.0, "SVM": 6164968.58, "Neural_Network": 5167572.5 } |
Metadata Endpoint
The /models
endpoint provides metadata about the models used in the API,
including information about the features, hyperparameters, and evaluation metrics.
It also includes information about the data used for training the models.
Example Metadata Response
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | { "dataset": { "original": [32520, 18], "training": { "X": [11169, 9], "y": [11169, 1] }, "testing": { "X": [3724, 9], "y": [3724, 1] } }, "models": { "random_forest": { "mse": 2767108007816.376, "rmse": 1663462.6559728885, "ci": [-3288575.5823341985, 3232135.171812599], "mae": 704079.571484675, "r2": 0.9038900065218292, "feature_importances": [ 0.0011281899895630631, 0.0017738994702383077, 0.0019866359182250486, 0.0007018637582167237, 0.0014665387673904304, 0.0023498577692012985, 0.00004089436452670091, 0.000018750183784930174, 0.000025032684545761422, 0.0001471126291344862, 0.0022393854044217604, 0, 0.0000018591757405458183, 0.00033878248531502486, 0.000024947352434527034, 0.000004803359328795989, 0.10737325614938063, 0.6581596171662714, 0.018088970068768237, 0.0078064922484012405, 0.011072231256017567, 0.06306614405016867, 0.05947059290138905, 0.06271414284753582 ], "max_features": 1, "max_depth": 30, "n_estimators": 1000, "oob_score": true }, "svm": { "mse": 8718507842976.787, "rmse": 2952711.947172766, "ci": [-6072782.238924998, 5435880.829712623], "mae": 1843353.6959805356, "r2": 0.697179969281674, "kernel": "linear", "C": 1000, "epsilon": 0.01 }, "neural_network": { "mse": 6774725145058.313, "rmse": 2602830.2182544125, "ci": [-5522911.848758079, 4471074.026658187], "mae": 1597042.1520542428, "r2": 0.637984037399292, "learning_rate": 0.10000000149011612, "beta_1": 0.9, "beta_2": 0.999, "epsilon": 1e-7 } } } |
Web Application
The web application for SkyPrice is developed using React and Next.js with Material-UI for a responsive and user-friendly interface. Key features of the application include:
- Price Prediction Tool: Users can input property details and get instant price predictions.
- Interactive Map: Integrated with KeplerGL to visualize apartment listings and nearby amenities, hospitals, schools, etc.
- Model and project information: Provides details about the models used, data sources, and methodology.
User Interface Design
The web application emphasizes usability and accessibility, with features like:
- Responsive Design: Ensures usability across devices.
- User-friendly Forms: Simplifies the input process for users.
- Visual Data Representation: Helps users easily interpret the data and predictions.
- Translatable Content: Supports multiple languages for a wider audience ( Spanish, French, Portuguese, English).
Price Prediction Tool
The price prediction tool allows users to input property details and get instant price estimates based on the trained models. Users can compare the predictions from different models and make informed decisions.
In the following image, you can see the price prediction tool form:
Now let’s see an example of the results obtained from the price prediction tool:
Currency Conversion
The prices are displayed in Mexican Pesos (MXN), but the application also provides a currency conversion feature that allows users to view the prices in other currencies, such as USD, EUR, or CAD.
Interactive Map
The interactive map provides a visual representation of apartment listings and nearby amenities, such as hospitals, schools, and supermarkets. Users can explore the map to understand the distribution of properties and their proximity to essential services.
By integrating KeplerGL, we were able to create an interactive map that provides valuable insights into the real estate market in Mexico City while also allowing users to explore the data in a user-friendly manner.
If you want to explore the interactive map, you can visit the
SkyPrice website: https://skyprice.xyz
.
Model Information
The web application also includes detailed information about the models used for price prediction, the data sources, and the methodology followed. This section provides transparency and helps users understand the process behind the price predictions.
In the following image, you can see the model information section, particularly the charts showing the models used and their performance:
Telegram Chatbot
SkyPrice features an advanced Telegram chatbot, “SkyPriceChatbot,” which leverages the power of natural language processing using GPT-3.5. This chatbot connects to the SkyPrice API and provides users with instant price predictions. While the primary language for instructions is Spanish, the chatbot can interpret multiple languages.
You can access the chatbot in Telegram by searching for
@SkyPriceChatbot
or using the following URL:
https://t.me/SkyPriceChatbot
.
Setup and Features
- Easy Setup: Start using the chatbot by adding it on Telegram.
- Multilingual Support: Though instructions are in Spanish, the chatbot can understand other languages.
- Quick Predictions: Get instant price estimates by providing property details.
- Link to Web Application: Users can access the web application for more detailed information and analysis.
Example Interaction
The chatbot tries to extract the necessary information from the user’s input and provides a detailed response with price predictions. Here’s an example interaction:
User: “Quiero saber el precio de un departamento en Álvaro Obregón con 3 recámaras, 2 baños y 2 estacionamientos, de 140m² de terreno y 130m² de construcción, con 7 años de antigüedad.”3
SkyPriceChatbot:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | 🏢 Detalles del departamento extraídos: 📏 Tamaño del terreno: 140m² 🏗️ Tamaño de la construcción: 130m² 🛏️ Número de habitaciones: 3 🚽 Número de baños: 2.5 🚗 Número de estacionamientos: 2 🕰️ Antigüedad: 7 años 🌍 Alcaldía: Álvaro Obregón 💰 Precios estimados: 🌳 Random Forest: $6,420,750.00 MXN 📈 SVM: $6,164,968.58 MXN 🧠 Neural Network: $5,167,572.50 MXN 🔍 Puedes encontrar más detalles en https://skyprice.xyz 🏡 |
Methodology and Implementation
The chatbot’s functionality is divided into three main components:
- Natural Language Processing: The chatbot uses OpenAI’s GPT-3.5 for natural language processing to understand user queries and extract the required information for the SkyPrice API.
- API Integration: The chatbot connects to the SkyPrice API to retrieve price predictions based on the user’s input.
- Response Generation: The chatbot generates a detailed response with the extracted property details and price predictions.
Conclusion
SkyPrice represents a comprehensive approach to predicting apartment prices in Mexico City, combining data science, machine learning, and web development. This project not only serves as a valuable tool for users but also showcases the integration of multiple technologies to solve real-world problems.
Impact and Lessons Learned
SkyPrice has demonstrated the potential of machine learning in real estate valuation. Throughout the project, we learned about the importance of data quality, feature engineering, and model evaluation.
Future Enhancements
SkyPrice has great potential for future enhancements, such as:
- Incorporating more sophisticated machine learning models: Exploring deep learning techniques for improved accuracy.
- Expanding to other cities: Adapting the model for use in other metropolitan areas.
- Improving user interaction: Enhancing the chatbot with more functionalities and extending it to other messaging platforms and extending the web application to provide better market-analysis tools.
- Integrating additional data sources: Incorporating more data for better predictions.
- Predict other real estate property types: Extend the model to predict the prices of houses, commercial properties, etc.
-
E. Pagourtzi, V. Assimakopoulos, and T. Hatzichristos, “Real estate appraisal: a review of valuation methods,” J. Property Invest. Finance, vol. 21, no. 4, pp. 383–401, 2003. ↩
-
Z. Nedović-Budić, A. K. Pinto, and L. D. Budic, “Gis database deve- lopment and exchange: Interaction mechanisms and motivations,” J. Urban Plan. Dev., vol. 126, no. 2, pp. 51–73, 2000. ↩
-
“I want to know the price of an apartment in Álvaro Obregón with 3 bedrooms, 2 bathrooms, and 2 parking spaces, 140m² of land and 130m² of construction, 7 years old.” ↩
-
- 🏢 Apartment details extracted:
- 📏 Land size: 140m²
- 🏗️ Construction size: 130m²
- 🛏️ Number of bedrooms: 3
- 🚽 Number of bathrooms: 2.5
- 🚗 Number of parking spaces: 2
- 🕰️ Age: 7 years
- 🌍 Borough: Álvaro Obregón
- 💰 Estimated prices:
- 🌳 Random Forest: $6,420,750.00 MXN
- 📈 SVM: $6,164,968.58 MXN
- 🧠 Neural Network: $5,167,572.50 MXN
- 🔍 You can find more details at https://skyprice.xyz 🏡