Business Area:
Engineering
Seniority Level:
Mid-Senior level
Job Description:
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.
We are seeking an experienced Staff Software Engineer to join our Replication Manager team at Cloudera. In this role, you will be responsible for designing, developing, and maintaining enterprise-grade data replication solutions that enable seamless data movement across hybrid and multi-cloud environments. You'll work on critical infrastructure that helps Fortune 500 companies manage their data lifecycle and migration strategies.
As a Staff Software Engineer you will:
Design and implement scalable data replication services for HDFS, Hive, HBase, Apache Iceberg, and other big data technologies.
Develop robust APIs and microservices for the Replication Manager platform.
Lead complex technical initiatives involving data migration between on-premises clusters and cloud environments (AWS S3, Azure ADLS Gen2).
Architect fault-tolerant, distributed systems for high-volume, petabyte-scale data operations with minimal downtime and data consistency guarantees.
Optimize replication performance and implement advanced features like bandwidth throttling, scheduling, and policy management.
Mentor junior engineers and provide technical guidance on best practices.
Implement monitoring, alerting, and observability features for replication jobs.
Ensure security and governance policies are maintained using tools like Apache Atlas.
Partner with Cloudera Data Platform (CDP) teams and collaborate with field engineering to address customer requirements and escalations.
Participate in code reviews and contribute to technical documentation.
We’re excited about you if you have:
8+ years of software engineering experience with distributed systems.
Strong proficiency in Java, Scala, or Python for backend development.
Deep understanding of Apache Hadoop ecosystem (HDFS, Hive, HBase, YARN).
Experience with modern data formats including Apache Iceberg, Delta Lake, and Hive ACID tables.
Experience with cloud platforms (AWS, Azure, GCP) and their storage services.
Proven experience designing and implementing large-scale distributed systems, understanding CAP theorem and data consistency models.
Knowledge of API design, microservices architecture, security protocols, and data governance frameworks.
Strong background in SDLC, agile methodologies, CI/CD pipelines, and automated testing frameworks.
You may also have:
Experience with Apache Ranger/Atlas or Apache Iceberg replication specifics.
Knowledge of enterprise backup/DR or history in data migration/ETL.
Open-source contributions or experience in customer-facing support roles.
What you can expect from us:
Generous PTO Policy
Support work life balance with
Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Employee Resource Groups
EEO/VEVRAA
#LI-Hybrid
#LI-AB1