Skip to content

samueltckong/Predict-Customer-Churn-with-Clean-Code

Repository files navigation

Predict Customer Churn

This project is part of the ML DevOps Engineer Nanodegree by Udacity. It aims to predict customer churn using machine learning while demonstrating best practices in DevOps, including logging, testing, and reproducibility.


Table of Contents

  1. Project Description
  2. Dependencies
  3. Files and Data Description
  4. Running Files
  5. Testing and Logging
  6. Expected Outputs
  7. Additional Notes

Project Description

Customer churn is a critical problem for businesses in various industries. This project analyzes customer data and builds predictive models to identify potential churners. The workflow includes:

  • Data Preprocessing: Importing and preparing the dataset.
  • Exploratory Data Analysis (EDA): Visualizing trends and relationships in the data.
  • Feature Engineering: Transforming data to optimize model performance.
  • Model Training and Evaluation: Using logistic regression and random forest classifiers to predict churn.
  • DevOps Practices: Incorporating logging and testing to ensure maintainable and reproducible code.

Files and Data Description

The repository contains the following key files:

  • churn_library.py: Core library for data processing, EDA, feature engineering, and model training.
  • churn_script_logging_and_tests.py: Script to test the functions in churn_library.py and log the results.
  • requirements.txt: File specifying required libraries.
  • README.md: Project documentation.
  • Data Folder: Contains the input dataset (bank_data.csv).
  • Logs Folder: Stores log files generated during testing (churn_library.log).
  • Images Folder: Stores output images generated by EDA and model evaluation.
  • Models Folder: Stores trained models in pickle format.

Testing and Logging

Testing and logging are integral to the project to ensure that functions in churn_library.py work as expected. Follow the steps below to test the code and generate logs:

  • Run the churn_script_logging_and_tests.py script from the command line: ipython churn_script_logging_and_tests.py

  • This command will execute all test functions in the script. The results, including any errors or successes, will be logged to the churn_library.log file in the /logs folder.

  • Open the churn_library.log file to review the log messages: cat logs/churn_library.log

The log file will include details about:

  • Successful execution of test cases.
  • Any errors encountered during testing.
  • Verification of file outputs, such as EDA images and model files.

Expected Outputs

After running the testing script, you should observe the following outputs:

  • Log File: Located at logs/churn_library.log. Contains information about the test results for each function.

  • EDA Images: Located in the /images folder. Includes visualizations such as distributions, box plots, and count plots.

  • Model Files: Located in the /models folder. Includes trained models (logistic_model.pkl and rfc_model.pkl).

  • Evaluation Reports: Located in the /images folder. Includes classification reports, ROC curves, and feature importance plots.


Dependencies

To run this project, ensure the following are installed:

  • Python: Version 3.7 or higher.
  • Libraries: Install all dependencies using the requirements.txt file:
    pip install -r requirements_py3.10.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published