Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.52.1
title: Air Quality Test
emoji: π
colorFrom: gray
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
Air Quality Forecast
Air pollution is a significant environmental concern, especially in urban areas, where the high levels of nitrogen dioxide and ozone can have a negative impact on human health, the ecosystem and on the overall quality of life. Given these risks, monitoring and forecasting the level of air pollution is an important task in order to allow for timely actions to reduce the harmful effects.
In the Netherlands, cities like Utrecht experience challenges concerning air quality due to urbanization, transportation, and industrial activities. Developing a system that can provide accurate and robust real-time air quality monitoring and reliable forecasts for future pollution levels would allow authorities and residents to take preventive measures and adjust their future activities based on expected air quality. This project focuses on the time-series forecasting of air pollution levels, specifically NO2 and O3 concentrations, for the next three days. This task can be framed as a regression problem, where the goal is to predict continuous values based on historical environmental data. Moreover, it provides infrastructure for real-time prediction, based on recent measurements.
How To Run This Code
Currently, this repository finished the model development stage.
To run the data pipeline, run data_pipeline.py under air-quality forecast, which is the folder that contains the source code of this project. The processed and split datasets can be found under data/processed, namely x_train, x_val, x_test, y_train, y_val, y_test.
To see the MLFlow dashboard, used to track experiments, run model_development.py. It will automatically create a server at your localhost port 5000. If this does not work, please run
mlflow ui --port 5000
in your console. You might need to give admin permissions to this process. The MLFlow dashboard contains all information about the experiments ran, including hyperparameters selected for each model. The selected models can be found under the Models menu.
To run the prediction, run main.py. It will display the MSE and RMSE of the train and test data for all three models.
DISCLAIMER
The notebooks in this project were used as scratch for analysis and data merge and do not reflect our thorough methodology (source is under air-quality-forecast). Some extra scripts for the generation of our plots in the report can be found under extra_scripts.
Project Organization
βββ LICENSE <- Open-source license if one is chosen
βββ Makefile <- Makefile with convenience commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ data
β βββ processed <- The final, canonical data sets for modeling. Contains the train-test split.
β βββ raw <- The original, immutable data dump.
β
βββ.github <- Contains automated workflows for reproducibility and flake8 checks.
β
βββ docs <- TODO: A default mkdocs project; see www.mkdocs.org for details
β
ββββmlruns <- Contains all the experiments ran using mlflow.
β
ββββmlartifacts <- Contains the artifacts generated by mlflow experiments.
β
βββ notebooks <- Jupyter notebooks (not to be evaluated, source code is in air-quality-forecast)
β
βββ pyproject.toml <- Project configuration file with package metadata for
β air-quality-forecast and configuration for tools like black
β
βββ references <- TODO: Data dictionaries, manuals, and all other explanatory materials.
β
βββ reports <- TODO: Generated analysis as HTML, PDF, LaTeX, etc.
β βββ figures <- TODO: Generated graphics and figures to be used in reporting
β
βββ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
β generated with `pip freeze > requirements.txt`
β
βββ setup.cfg <- Configuration file for flake8
β
βββ configs <- Configuration folder for the hyperparameter search space (for now)
β
βββ saved_models <- Folder with the saved models in `.pkl` and `.xgb`.
β
βββ extra_scripts <- Some extra scripts in R and .tex to generate figures
β
βββ air-quality-forecast <- Source code for use in this project.
β
βββ __init__.py <- Makes air-quality-forecast a Python module
β
βββ data_pipeline.py <- Loads, extracts, and preprocesses the data. Final result is the train-test under data/processed
β
βββ model_development.py <- Trains the three models using k-fold CV and Bayesian hyperparameter tuning, displays the ML β dashboard if executed
β
βββ prediction.py <- Loads the models and makes an example prediction
β
βββ utils.py <- Utility functions, e.g. validation
β
βββ main.py <- To execute and start the project. Currently to make predictions.
--------
# air-quality-forecast
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference