Job
Description
Senior Software Engineer, Site Reliability Engineer (SRE) Location, Bangalore About Us Founded in 2014, Circles is a global technology company reimagining the telco industry with its innovative SaaS platform, empowering telco operators worldwide to effortlessly launch innovative digital brands or refresh existing ones, accelerating their transformation into techcos. Today, Circles partners with leading telco operators across 14 countries and 6 continents, including KDDI Corporation, Etisalat Group (e&), AT&T, and Telkomsel, creating blueprints for future telco and digital experiences enjoyed by millions of consumers globally. Besides Its SaaS Business, Circles Operates Two Distinct Businesses Circles.Life: A wholly-owned digital lifestyle telco brand based in Singapore, Circles.Life is powered by Circles’ SaaS platform and pioneering go-to-market strategies. It is the digital market leader in Singapore and has won numerous awards for marketing, customer service, and innovative product offerings beyond connectivity. Jetpac: Specializing in travel tech solutions, Jetpac provides seamless eSIM roaming for over 200 destinations and innovative travel lifestyle products, redefining connectivity for digital travelers. Jetpac was awarded Travel eSIM of the Year. Circles is backed by renowned global investors, including Peak XV Partners (formerly Sequoia), Warburg Pincus, Founders Fund, and EDBI (the investment arm of the Singapore Economic Development Board), with a track record of backing industry challengers. About This Role This role would report to leader in Platform Engineering & Site Reliability Engineering Builds resilient cloud systems, automates operations, and ensures high reliability through robust monitoring and performance optimization Uses advanced tools (VPA, HPA.) for smart resource optimization ("rightsizing"). Creates reusable K8s components (Helm charts, Operators) to speed up deployments. Automates resource adjustments to maximize performance and minimize costs. Drives efficient, scalable Kubernetes infrastructure through automation and best practices. What You'll Do Implement highly available, fault-tolerant applications and infrastructure, leveraging cloud-native technologies. Establish robust monitoring, logging, and alerting systems for proactive issue detection and resolution. Automate infrastructure management, deployment, and provisioning to enhance efficiency and consistency. Manage incident response, conduct post-incident reviews, and implement preventative measures. Mentor engineers and drive the evolution of teams towards a modern, collaborative SRE culture. Uses advanced tools (VPA, HPA.) for smart resource optimization ("rightsizing"). Creates reusable Kubernetes components (Helm charts, stacks, operators) to speed up deployments and management of K8s. Automates resource adjustments to maximize performance and minimize costs. Drives efficient, scalable Kubernetes infrastructure through automation and best practices. What We Are Looking For 8+ years experience with a majority of at least 5 years in Kubernetes / Site Reliability experience in Kubernetes and end-to-end observability Efficiently allocate resources, reduce costs, and enhance performance. Streamline deployments and processes using Terraform, Ansible, Helm, and Kubernetes. Implement observability solutions to track performance, detect issues, and ensure reliability. Apply best practices for security, governance, and regulatory compliance. Stay ahead with cutting-edge Kubernetes tools and best practices. Work closely with development, security, and operations teams while mentoring junior engineers. Conduct performance tuning, capacity planning, and load testing for efficiency. Optimize cloud-native and hybrid architectures across OnPrem, AWS, OCI, and GCP. Show more Show less