Sonatype is the software supply chain security company. We provide the world’s best end-to-end software supply chain security solution, combining the only proactive protection against malicious open source, the only enterprise grade SBOM management and the leading open source dependency management platform. This empowers enterprises to create and maintain secure, quality, and innovative software at scale.As founders of Nexus Repository and stewards of Maven Central, the world’s largest repository of Java open-source software, we are software pioneers and our open source expertise is unmatched. We empower innovation with an unparalleled commitment to build faster, safer software and harness AI and data intelligence to mitigate risk, maximize efficiencies, and drive powerful software development.More than 2,000 organizations, including 70% of the Fortune 100 and 15 million software developers, rely on Sonatype to optimize their software supply chains.
The Opportunity
- We’re looking for a Senior Data Engineer to join our growing Data Platform team. You’ll play a key role in designing and scaling the infrastructure and pipelines that power analytics, machine learning, and business intelligence across Sonatype.You ’ll work closely with stakeholders across product, engineering, and business teams to ensure data is reliable, accessible, and actionable. This role is ideal for someone who thrives on solving complex data challenges at scale and enjoys building high-quality, maintainable systems.
- As a Senior Data Engineer, you will have 5+ years of experience. You will architect, build, and maintain the data infrastructure that powers our analytics and data science efforts—partnering closely with Data Analysts, Data Scientists, and product teams. You’ll design scalable, reliable data pipelines, manage our Databricks and AWS environments, and ensure best practices for security, governance, and deployment.
Key Responsibilities
- Data Pipeline Development- Build and optimize ETL/ELT workflows using Scala Spark and PySpark on Databricks, ensuring high throughput and low latency for both batch and streaming data.
- Infrastructure as Code & Deployment-Define and manage infrastructure (Databricks workspaces, AWS resources, Unity Catalog configurations, S3 buckets, IAM roles and service principals) using Terraform.Automate deployments and version control via GitHub, implementing CI/CD pipelines for data jobs.
- AWS Integration & SDK Usage-Leverage Python (boto3) to interact with AWS services—managing S3 data lakes, Secrets Manager for credential rotation, and SQS for event-driven processing.
- Security & Governance-Configure Unity Catalog and Databricks access controls, ensuring data lineage, fine-grained permissions, and auditability across environments.
- Collaboration & Support-Work hand-in-hand with Data Analysts to understand use cases (e.g., gold table creation, dashboard data requirements) and to troubleshoot pipeline or performance issues.Mentor and guide junior engineers on best practices for Spark performance tuning, Python coding standards, and infrastructure management.
- Monitoring & Reliability-Implement robust monitoring, alerting, and logging for data workflows; proactively identify and resolve bottlenecks or failures.
- Performance Optimization-Profile and optimize Spark jobs (Scala and PySpark), tuning cluster configurations and query plans to reduce costs and improve processing times.
Qualifications & Skills
- 5+ years of hands-on experience in data engineering or a related role.
- Expert proficiency in Apache Spark—both Scala and PySpark—for large-scale data processing.
- Strong Python skills (including pandas) and practical experience with AWS’s boto3 SDK.
- Deep familiarity with Databricks concepts: workspaces, clusters, jobs, Unity Catalog, and service principals.
- Solid understanding of AWS data services: S3, Secrets Manager, SQS, IAM, and network/security configurations.
- Nice to have experience with Terraform for defining and managing cloud infrastructure.
- Proficient with GitHub for version control, code reviews, and CI/ CD.
- Excellent debugging and performance-tuning capabilities for distributed systems.
- Strong communication and collaboration skills; able to translate technical solutions into business value.
Preferred Qualifications
- Prior experience building and scaling data platforms in a mid-to-large enterprise environment.
- Familiarity with containerization (Docker) and orchestration tools (Airflow, Databricks Workflows).
- Knowledge of data governance frameworks and best practices (e.g., GDPR, CCPA compliance).
Why You’ll Love Working Here
- Data with purpose: Work on problems that directly impact how the world builds secure software
- Modern tooling: Leverage the best of open-source and cloud-native technologies
- Collaborative culture: Join a passionate team that values learning, autonomy, and impact
At Sonatype, we value diversity and inclusivity. We offer perks such as parental leave, diversity and inclusion working groups, and flexible working practices to allow our employees to show up as their whole selves. We are an equal-opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you have a disability or special need that requires accommodation, please do not hesitate to let us know.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.