Taza Mind

A fusion of fresh thoughts and AI intelligence

Data Preprocessing Pipeline (NumPy, Pandas)

Python data preprocessing pipeline using NumPy and Pandas that:

  1. Handles missing values
  2. Normalizes numerical data
  3. Encodes categorical data

What this pipeline does:

  • Missing Values
    • Age → replaced with mean
    • Salary → replaced with median
    • Name → replaced with "Unknown"
  • Normalization
    • Scales numerical features (Age, Salary) to a 0–1 range
  • Encoding
    • Converts Department into numeric labels

Next Step: I can extend this into a function-based pipeline where you just pass a dataset, and it returns the cleaned version (like a mini scikit-learn pipeline).

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *