Height and Weight - Alanmg298/Kaggle-API-to-Clean-Data-in-Python GitHub Wiki
I realize that the data types in the Height and Weight columns are incorrect Please note that the data is a combination of numbers and characters
My first approach was trying to delete the "cm" and "kg" text and adding a numeric value to those columns but... The issue is that some height values are in feet/inches and lbs format instead of cm and kg
This is my attempt to clean those columns I added this piece of code
-The Height conversion
- Detects "cm" format → Directly converts to number
- Detects "in" format → Converts inches to cm (×2.54)
- Handles feet/inches (like "6'2") → Uses regex to extract numbers, converts feet to cm (×30.48) and inches to cm (×2.54)
- Rounds to whole numbers since height in cm doesn't need decimals
What it does:
-The Weight conversion
- Detects "kg" format → Directly converts to number
- Detects "lbs" format → Converts pounds to kg (×0.453592)
- Rounds to one decimal for appropriate precision
And now we can manipulate Height and Weight as numbers
In this case, I use the mean to check if the cleaning was ok