14 Principles To Secure Your Data Pipelines

Part 3 of the 7 Layers of MLOps Security Guide

A simple ETL flow using airflow in AWS using S3, redshift and quicksights
A simple ETL flow using airflow in AWS

Series Links

What are data pipelines?

What is an orchestrator?

Choir conductor
A choir conductor helps designate the flow of the songs

An example data pipeline

Simple Apache Airflow Architecture
A simple ETL flow using airflow in AWS

Protecting a data pipeline — 7 steps and 14 principles

Who would be involved in this flow?

What actions does each user take?

Our user actions
Sneaky Cat
Don’t encourage your employees to sneak! | Looney Toons

Understanding the platform— Airflow

Airflow Gui
Airflow GUI with different jobs (DAGs) | Apache
Airflow Architecture
Airflow Architecture | Apache
Airflow component access

Protecting a managed version of Airflow

AWS MWAA components
AWS Managed Airflow (MWAA) | AWS
MWAA IAM policy | AWS
Role Assignments Airflow
MWAAFullConsoleAccess Snippet | AWS
Amazon MWAA Architecture
MWAA Architecture | AWS
An airflow ui screen with many operators
Monitoring a complex job in airflow | Airflow
class UserAccount {
id: string
username: string
passwordHash: string
firstName: string
lastName: string

...

public toString() {
return "UserAccount(${this.id})";
}
Baby yoda touching a lever
Baby Yoda Touching Everything | Disney

Protecting a deployment — Airflow

GCP Composer Public Facing Architecture
GCP Composer Public Architecture | GCP
postgresql:
enabled: false
externalDatabase:
type: postgres
host: postgres.example.org
port: 5432
database: airflow_cluster1
user: airflow_cluster1
passwordSecret: "airflow-cluster1-postgres-password"
passwordSecretKey: "postgresql-password"
# use this for any extra connection-string settings, e.g. ?sslmode=disable
properties: ""
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: intra-namespace
namespace: airflow
spec:
podSelector:
ingress:
- from:
- namespaceSelector:
matchLabels:
name: airflow
spec:
serviceAccountName: Kubernetes_service-account

An end to end example

DataOps Flow orchestrated by airflow
DataOps flow | MadeWithML
MLModel Training and Deployment Flow | MadeWithML
ML Model Update Flow
ML Model Update Flow | MadeWithML

Using other orchestrators

Conclusion

Next Time — Part 4: ML Model Security

--

--

ML Lead @ Voiceflow

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store