On-site
Full Time
Key Responsibilities • Lead and evolve our cloud infrastructure and reliability engineering practices. • Build and improve internal tooling and automation using Golang or Python. • Manage and support Kafka clusters used for event-driven architectures and data pipelines. • Champion infrastructure-as-code using tools like Terraform, Helm, or Pulumi. • Own observability: implement and maintain logging, monitoring, and alerting systems. • Drive incident response processes, perform root cause analyses, and lead retrospectives. • Collaborate with cross-functional teams to align infrastructure with product and data engineering needs. • Mentor junior team members and drive a culture of reliability, automation, and performance. Requirements Must-Have: • 6+ years of experience in Infrastructure, SRE, or DevOps roles. • Proficiency in Python or Golang for automation and tool development. • Experience with Kafka in a production setting (operational knowledge, not expert level). • Hands-on experience with cloud platforms (AWS, GCP, or Azure). • Experience with CI/CD pipelines and deployment automation (GitHub Actions, Jenkins, etc.). • Strong background in Linux systems, containers, and orchestration with Kubernetes. • Familiarity with observability tools (Prometheus, Grafana, ELK, Datadog, etc.). Show more Show less
HyreSnap
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
My Connections HyreSnap
India
Salary: Not disclosed
India
Salary: Not disclosed