Full Time
2,500
40
Mar 17, 2026
About This Role
We're seeking a Data Engineer to help us architect and implement a modern, automation-first data platform that will transform how our organization manages and leverages data at enterprise scale. This is a hands-on engineering role focused on building self-operating, self-healing, and self-governing data systems.
The Challenge
Scale: Hundreds of databases, TBs of data across multiple environments
Complexity: Information systems, compliance, advanced analytics, geospatial analysis, personnel data
Security: PII handling requiring automated classification and protection
Goal: Reduce manual operations by 80-90% through intelligent automation
What You'll Build
Core Architecture Components
- Medallion Architecture (Bronze/Silver/Gold) with automated quality gates
- Governance Control Plane using Microsoft Purview + Databricks Unity Catalog
- Event-driven pipelines that trigger automatically on data arrival
- Self-healing systems with runbook automation and intelligent remediation
- Zero-touch security with policy inheritance and auto-classification
Key Technologies
Microsoft Purview - Auto-discovery and PII classification across hundreds of databases
Databricks Unity Catalog - Fine-grained access control and data lineage
Azure Data Factory - Parallel pipeline orchestration for massive scale
Databricks Auto Loader - Schema evolution and incremental processing
Delta Live Tables - Declarative data quality with automated expectations
Infrastructure as Code - Terraform/Bicep for complete automation
What We're Looking For (Must-Have Technical Skills)
Databricks expertise: Delta Live Tables, Unity Catalog, Auto Loader, Workflows
Microsoft Purview: Auto-discovery, classification, policy management
Azure ecosystem: ADF, ADLS Gen2, Key Vault, Monitor, Policy
DataOps practices: GitOps, CI/CD, infrastructure as code, automated testing
Data quality frameworks: Building validation layers, reference data, constraint checking
Event-driven architectures: Real-time data processing and automated triggers
Security by design: PII classification, row/column-level security, encryption at rest/transit
Automation Mindset - Critical
You believe that:
Every manual process should be eliminated through intelligent automation
Quality gates should be built-in, not bolted-on after the fact
Security policies should inherit and cascade without human intervention
Self-service should be the default for business users
Infrastructure should be immutable and deployed through code
Required Experience
- 5+ years building enterprise data platforms at 100TB+ scale
- Proven track record implementing medallion architectures (Bronze/Silver/Gold)
- Hands-on experience with Databricks Unity Catalog and Microsoft Purview integration
- Expertise in data governance with automated classification and lineage tracking
- Experience with PII-heavy environments requiring regulatory compliance
- Strong background in event-driven data architectures
Preferred Qualifications
- Hybrid multi-cloud experience (Azure primary, with on-premises integration)
- Machine Learning operations (MLflow, AutoML, Feature Store)
- Advanced SQL and Python for complex data transformations
- Natural language query interfaces and self-service analytics tools
- Cost optimization strategies for large-scale data processing
- Disaster recovery and backup automation for enterprise systems
What You'll Do Day-to-Day
Architecture & Design (40%)
- Design and implement medallion data architecture with automated quality gates
- Build governance control plane using Purview + Unity Catalog hybrid model
- Create comprehensive data quality frameworks with constraint validation and statistical profiling
- Implement Silver layer logic validation to prevent inaccurate data from reaching Gold tier
- Build automated accuracy checks using reference data and business rule engines
- Create self-healing data pipelines with intelligent error handling and recovery
- Establish event-driven workflows that eliminate manual intervention
- Design self-service data access patterns for non-technical users
Hands-On Development (40%)
- Build Delta Live Tables with declarative quality expectations and constraint validation
- Implement automated data quality checks including range validation, statistical outlier detection
- Create quality scoring systems that provide confidence metrics for AI and analytics consumption
- Develop reference data management systems for cross-validation and accuracy checking
- Build anomaly detection pipelines that automatically flag data quality issues
- Implement Auto Loader for schema evolution across enterprise data sources
- Create Infrastructure as Code templates for zero-touch deployments
- Develop automated data classification and PII masking workflows
- Build real-time monitoring and alerting with automated remediation
Platform Enablement (20%)
- Train development team on DataOps best practices and automation patterns
- Create reusable quality components and validation templates for rapid development
- Establish data quality standards and automated testing frameworks
- Build quality dashboards and monitoring for business stakeholders
- Document self-service capabilities and governance procedures
- Mentor team on "automation-first" architectural principles
Key Success Metrics
- Data onboarding time reduced from weeks or days to hours
- 100% automated data validation and quality checking
- Security policies applied in under 5 minutes for new data sources
- Self-service data access for business users
- 80-90% reduction in manual data operations
- Real-time compliance monitoring across all data assets
- Automated discovery and classification of new data sources
- Zero-touch pipeline deployments through GitOps workflows
Why This Role Matters
You'll be building the data infrastructure of the future - a platform that:
- Self-discovers new data sources and automatically applies governance
- Self-heals when issues occur, minimizing downtime and manual intervention
- Self-serves business users with secure, governed access to data they need
- Self-scales based on demand without manual capacity planning
This isn't just about building pipelines - it's about creating an intelligent data ecosystem that operates with minimal human intervention while maintaining the highest standards of security, quality, and compliance.
Ready to Build the Future of Data?
If you're passionate about automation, governance, and self-service data platforms, and you have the technical skills to implement enterprise-scale solutions, we want to hear from you.
To qualify for this application, please submit your GitHub profile or portfolio (a sample of your previous projects). We won't review your application without submitting your sample work.