Job
                                Description
                            
                            
                                We are seeking an experienced Staff DevOps Engineer who is a self-starter with an entrepreneurial spirit to lead the design, implementation, and maintenance of our next-generation infrastructure and developer platforms. This role will focus on leveraging Kubernetes , Pulumi , Istio , AWS , and related technologies to deliver robust and scalable systems in a microservices environment. You will also play a critical part in shaping our CI/CD pipelines, IoT integrations, and developer tooling strategies to ensure efficient, reliable, and secure delivery of software services. If you thrive in a fast-paced setting, enjoy taking initiative, and are passionate about building innovative platform solutions, we'd love to speak with you Automate Refine : you'll spend most of your day automating and improving how we run things in AWS and Kubernetes basically making sure our microservices (and IoT systems) run smoothly, securely, and cost-effectively. Support Collaborate : you'll jump in with the dev teams, troubleshoot issues, and share best practices so everyone s more productive think of it as making the lives of your teammates easier through better tools and guidance. Strategize Innovate : you'll also have a seat at the table for bigger decisions suggesting new tech, tweaking our pipelines, or finding ways to streamline processes, so we keep pushing the envelope on performance and reliability. Responsibilities Infrastructure as Code (IaC) Automation Develop and maintain infrastructure-as-code (IaC) solutions using Pulumi to automate provisioning, configuration, and lifecycle management of cloud resources. Collaborate with cross-functional teams to identify and automate manual processes, streamlining operations and CI/CD pipelines. Microservices CI/CD Pipeline Design, build, and maintain continuous integration and continuous deployment (CI/CD) pipelines for microservices. Ensure best practices for testing, observability, and security are baked into the pipeline workflows. Identify and implement optimizations to reduce build, test, and deployment times, increasing reliability and quality. Service Mesh Network Management Lead the deployment and maintenance of Istio to enable efficient traffic management, observability, and security across microservices. Oversee the lifecycle management of Istio and related service mesh components, ensuring minimal downtime and seamless upgrades. Cloud IoT Integrations Architect and manage AWS services (eg, EC2, EKS, Lambda, S3, IoT Core) for scalable and cost-effective solutions. Collaborate with IoT-focused teams to design and implement secure and robust data ingestion, storage, and processing pipelines. Developer Tooling Enablement Own the developer experience by providing and maintaining internal tools, environments, and frameworks that boost productivity and quality. Work closely with software engineers to integrate best practices, guidelines, and tools that streamline development workflows. Monitoring Reliability Implement observability solutions (logging, monitoring, alerting) for critical platform components using tools like Prometheus, Grafana, ELK, or equivalent. Define SLOs/SLAs and ensure the platform meets or exceeds reliability standards. We are seeking a seasoned infrastructure engineer with 8+ years demonstrating knowledge and experience in: Education Experience bachelors or masters degree in Computer Science, Engineering, or related field (or equivalent hands-on experience). 5+ years of experience in software engineering, DevOps, or platform engineering roles, with proven leadership in large-scale system design and operations. Technical Skills Kubernetes : Deep experience in architecture, scaling, and operating Kubernetes clusters in production environments. Python: Tooling and IAC development use cases CI/CD : Proficiency with tools like Jenkins, GitLab CI, GitHub Actions, or similar, specifically geared toward microservices deployment. Pulumi / IaC : Hands-on expertise in Pulumi or similar IaC tools (Terraform, AWS CloudFormation) for cloud resource provisioning. Istio / Service Mesh : Proven track record in deploying and managing Istio or similar service mesh solutions. AWS : Strong knowledge of core AWS services (EC2, ECS/EKS, Lambda, S3, RDS, IoT Core) and best practices (networking, security, cost optimization). IoT : Familiarity with IoT technologies, protocols, and data ingestion strategies. Developer Tooling : Experience with build tools, code quality metrics, package management, and integrated developer experiences. Soft Skills Attributes Self-Starter : Demonstrated initiative in identifying problems, owning solutions, and driving innovation. Leadership : Ability to influence and guide teams through complex technical decisions and transformations. Collaboration : Effective communication and interpersonal skills to work across diverse teams (software, QA, product, security). Problem-Solving : Proven ability to diagnose, troubleshoot, and resolve complex issues in distributed systems. Adaptability : Comfortable working in a fast-paced, rapidly changing environment with evolving priorities