Using scikit-learn, train a baseline classification model (e.g., Logistic Regression or Random Forest) on a sample dataset. Split the data into training and testing sets, fit the model, and print accuracy results.
Import Libraries
- Import scikit-learn libraries (datasets, model, train_test_split, accuracy_score).
Load Dataset
- Use a built-in dataset (e.g., Iris, Breast Cancer, Digits) from scikit-learn or your own dataset.
Split Data
- Divide the dataset into features (X) and target (y).
- Use
train_test_split
to split into training and testing sets (e.g., 80% train, 20% test).
Choose a Model
- Select a baseline classifier (e.g., Logistic Regression, Random Forest, Decision Tree).
Train the Model
- Fit the model using the training data (X_train, y_train).
Make Predictions
- Use the trained model to predict outcomes on the test set (X_test).
Evaluate Performance
- Compare predictions with actual labels (y_test).
- Print metrics like accuracy score (and optionally precision, recall, F1).
(Optional) Improve Model
- Try different classifiers, tune hyperparameters, or use feature scaling for better performance