Data Preprocessing Pipeline (NumPy, Pandas)
Python data preprocessing pipeline using NumPy and Pandas that:
- Handles missing values
- Normalizes numerical data
- Encodes categorical data



What this pipeline does:
- Missing Values
- Age → replaced with mean
- Salary → replaced with median
- Name → replaced with
"Unknown"
- Normalization
- Scales numerical features (
Age
,Salary
) to a 0–1 range
- Scales numerical features (
- Encoding
- Converts
Department
into numeric labels
- Converts
Next Step: I can extend this into a function-based pipeline where you just pass a dataset, and it returns the cleaned version (like a mini scikit-learn
pipeline).