Show HN: SecureML – Privacy and Compliance Toolkit for ML

2 points by EnzoVEVO 3 months ago

Hi HN!

I'm a second-year law student with a deep fascination for AI governance and data protection. Over the past few months, I’ve been learning how machine learning and data privacy intersect—and I decided to build a tool that sits right at that intersection.

[GitHub: scimorph/secureml](https://github.com/scimorph/secureml) [Docs](https://secureml.readthedocs.io/) `pip install secureml`

---

### What is SecureML?

*SecureML* is an open-source Python library that integrates with PyTorch and TensorFlow to help developers *build privacy-preserving and regulation-aware AI systems*. It provides practical tools to comply with data protection laws like GDPR, CCPA, HIPAA, and Brazil’s LGPD.

It’s designed for both developers and researchers who want to make AI privacy-compliant without reinventing the wheel.

---

### Core Features

- *Data Anonymization* - K-anonymity with adaptive generalization - Format-preserving pseudonymization - Automatic sensitive data detection - Taxonomy-based data generalization

- *Privacy-Preserving ML* - Differential Privacy (via Opacus + TF Privacy) - Federated Learning (via Flower) with secure aggregation

- *Compliance Checkers* - Analyze your datasets and ML pipelines for privacy risks - Built-in presets for GDPR, CCPA, HIPAA, and LGPD

- *Synthetic Data Generation* - Generate high-fidelity synthetic datasets (statistical, GANs, copulas) - SDV integration with support for mixed data types and correlation preservation

- *Audit Trails & Reporting* - Logs and visual dashboards for traceability - Auto-generated reports in HTML/PDF for compliance audits

---

### Example Use Cases

- Check if your model setup is compliant with GDPR: ```python from secureml import check_compliance report = check_compliance(data=df, model_config=config, regulation="GDPR") ```

- Anonymize a dataset before training: ```python from secureml import anonymize anon_df = anonymize(df, method="k-anonymity", k=5, sensitive_columns=["email", "ssn"]) ```

- Train with differential privacy in PyTorch: ```python from secureml import differentially_private_train private_model = differentially_private_train(model=model, data=df, epsilon=1.0) ```

- Generate synthetic datasets for safe sharing: ```python from secureml import generate_synthetic_data synth = generate_synthetic_data(template=df, num_samples=1000, method="sdv-copula") ```

---

### Why I built this

While studying law, I kept wondering how we could bridge the gap between legal theory and real-world ML practice. SecureML is my attempt to bring compliance tooling closer to ML workflows—making it easier for engineers to build responsibly and for companies to stay out of trouble.

This is my *first open-source project*, and I’d love your feedback, bug reports, and ideas!

---

### Looking for contributors

I’d love help expanding regulation support beyond GDPR, CCPA, HIPAA, and LGPD—especially for APPI, PIPEDA, and others. If you're into privacy, ML, or just want to hack on something useful, feel free to jump in.

---

Thanks for reading!

– [@EnzoFanAccount] (law student & privacy geek)