Experiment Tracking: MLflow
import mlflow
import mlflow.sklearn
mlflow.set_experiment("fraud-detection-v2")
with mlflow.start_run(run_name="xgboost-baseline"):
# Log hyperparameters
mlflow.log_params({"n_estimators": 200, "max_depth": 6, "lr": 0.1})
model = XGBClassifier(n_estimators=200, max_depth=6, learning_rate=0.1)
model.fit(X_train, y_train)
# Log metrics
mlflow.log_metrics({"auc": roc_auc_score(y_test, model.predict_proba(X_test)[:,1]),
"precision": precision_score(y_test, model.predict(X_test))})
# Log the model artifact
mlflow.sklearn.log_model(model, "model", registered_model_name="FraudDetector")
# Log feature importance plot
mlflow.log_figure(fig, "feature_importance.png")
MLflow vs W&B: MLflow is self-hosted and open source (great for enterprises). W&B is cloud-hosted with beautiful dashboards, better for research teams. Both support the same core workflow.
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI(title="Fraud Detection API", version="1.0")
# Load model at startup
model = joblib.load("model.pkl")
class PredictRequest(BaseModel):
amount: float
merchant_category: str
hour_of_day: int
is_international: bool
class PredictResponse(BaseModel):
fraud_probability: float
is_fraud: bool
@app.post("/predict", response_model=PredictResponse)
async def predict(req: PredictRequest):
features = np.array([[req.amount, req.hour_of_day, int(req.is_international)]])
prob = model.predict_proba(features)[0, 1]
return PredictResponse(fraud_probability=float(prob), is_fraud=prob > 0.5)
@app.get("/health")
async def health(): return {"status": "ok"}
# Run: uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
CI/CD for ML with GitHub Actions
# .github/workflows/ml-pipeline.yml
name: ML Pipeline
on:
push:
branches: [main]
jobs:
test-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with: { python-version: '3.11' }
- name: Install deps
run: pip install -r requirements.txt
- name: Run tests
run: pytest tests/ -v --cov=src
- name: Train model (if data changed)
run: python train.py --output models/model.pkl
- name: Evaluate model
run: python evaluate.py --min-auc 0.85 # gate on performance!
- name: Build Docker image
run: docker build -t fraud-api:${{ github.sha }} .
- name: Deploy to AWS ECS
run: |
aws ecs update-service --cluster prod --service fraud-api \
--force-new-deployment
Model Gate: Always add a performance threshold check in CI (e.g., AUC ≥ 0.85). This prevents deploying a worse model after a code change accidentally breaks feature engineering.