air_quality_test / README.md
03chrisk's picture
update readme
a2dd0cb

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade
metadata
title: Air Quality Test
emoji: πŸ“Š
colorFrom: gray
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false

Air Quality Forecast

Air pollution is a significant environmental concern, especially in urban areas, where the high levels of nitrogen dioxide and ozone can have a negative impact on human health, the ecosystem and on the overall quality of life. Given these risks, monitoring and forecasting the level of air pollution is an important task in order to allow for timely actions to reduce the harmful effects.

In the Netherlands, cities like Utrecht experience challenges concerning air quality due to urbanization, transportation, and industrial activities. Developing a system that can provide accurate and robust real-time air quality monitoring and reliable forecasts for future pollution levels would allow authorities and residents to take preventive measures and adjust their future activities based on expected air quality. This project focuses on the time-series forecasting of air pollution levels, specifically NO2 and O3 concentrations, for the next three days. This task can be framed as a regression problem, where the goal is to predict continuous values based on historical environmental data. Moreover, it provides infrastructure for real-time prediction, based on recent measurements.

How To Run This Code

Currently, this repository finished the model development stage.

To run the data pipeline, run data_pipeline.py under air-quality forecast, which is the folder that contains the source code of this project. The processed and split datasets can be found under data/processed, namely x_train, x_val, x_test, y_train, y_val, y_test.

To see the MLFlow dashboard, used to track experiments, run model_development.py. It will automatically create a server at your localhost port 5000. If this does not work, please run mlflow ui --port 5000 in your console. You might need to give admin permissions to this process. The MLFlow dashboard contains all information about the experiments ran, including hyperparameters selected for each model. The selected models can be found under the Models menu.

To run the prediction, run main.py. It will display the MSE and RMSE of the train and test data for all three models.

DISCLAIMER

The notebooks in this project were used as scratch for analysis and data merge and do not reflect our thorough methodology (source is under air-quality-forecast). Some extra scripts for the generation of our plots in the report can be found under extra_scripts.

Project Organization

β”œβ”€β”€ LICENSE            <- Open-source license if one is chosen
β”œβ”€β”€ Makefile           <- Makefile with convenience commands like `make data` or `make train`
β”œβ”€β”€ README.md          <- The top-level README for developers using this project.
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ processed      <- The final, canonical data sets for modeling. Contains the train-test split.
β”‚   └── raw            <- The original, immutable data dump.
β”‚
β”œβ”€β”€.github             <- Contains automated workflows for reproducibility and flake8 checks. 
β”‚
β”œβ”€β”€ docs               <- TODO: A default mkdocs project; see www.mkdocs.org for details
β”‚
β”œβ”€β”€β”€mlruns             <- Contains all the experiments ran using mlflow.
β”‚
β”œβ”€β”€β”€mlartifacts        <- Contains the artifacts generated by mlflow experiments.
β”‚
β”œβ”€β”€ notebooks          <- Jupyter notebooks (not to be evaluated, source code is in air-quality-forecast)
β”‚
β”œβ”€β”€ pyproject.toml     <- Project configuration file with package metadata for 
β”‚                         air-quality-forecast and configuration for tools like black
β”‚
β”œβ”€β”€ references         <- TODO: Data dictionaries, manuals, and all other explanatory materials.
β”‚
β”œβ”€β”€ reports            <- TODO: Generated analysis as HTML, PDF, LaTeX, etc.
β”‚   └── figures        <- TODO: Generated graphics and figures to be used in reporting
β”‚
β”œβ”€β”€ requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
β”‚                         generated with `pip freeze > requirements.txt`
β”‚
β”œβ”€β”€ setup.cfg          <- Configuration file for flake8
β”‚
β”œβ”€β”€ configs            <- Configuration folder for the hyperparameter search space (for now)
β”‚
β”œβ”€β”€ saved_models       <- Folder with the saved models in `.pkl` and `.xgb`.
β”‚
β”œβ”€β”€ extra_scripts      <- Some extra scripts in R and .tex to generate figures
β”‚
└── air-quality-forecast   <- Source code for use in this project.
    β”‚
    β”œβ”€β”€ __init__.py             <- Makes air-quality-forecast a Python module
    β”‚
    β”œβ”€β”€ data_pipeline.py        <- Loads, extracts, and preprocesses the data. Final result is the train-test under data/processed
    β”‚
    β”œβ”€β”€ model_development.py    <- Trains the three models using k-fold CV and Bayesian hyperparameter tuning, displays the ML β”‚                              dashboard if executed
    β”‚
    β”œβ”€β”€ prediction.py           <- Loads the models and makes an example prediction
    β”‚
    β”œβ”€β”€ utils.py                <- Utility functions, e.g. validation
    β”‚
    └── main.py                 <- To execute and start the project. Currently to make predictions.

--------

# air-quality-forecast


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference