At NiCE, we don’t limit our challenges. We challenge our limits. Always. We’re ambitious. We’re game changers. And we play to win. We set the highest standards and execute beyond them. And if you’re like us, we can offer you the ultimate career opportunity that will light a fire within you.
So, what’s the role all about?
We are looking for a highly skilled and motivated
Site Reliability Engineering (SRE) Manager
to lead a team of SREs in designing, building, and maintaining scalable, reliable, and secure infrastructure and services. You will work closely with engineering, product, and security teams to improve system performance, availability, and developer productivity through automation and best practices.
How will you make an impact?
- Build server-side software using Java
- Lead and mentor a team of SREs; support their career growth and ensure strong team performance.
- Drive initiatives to improve availability, reliability, observability, and performance of applications and infrastructure.
- Establish SLOs/SLAs and implement monitoring systems, dashboards, and alerting to measure and uphold system health.
- Develop strategies for incident management, root cause analysis, and postmortem reporting.
- Build scalable automation solutions for infrastructure provisioning, deployments, and system maintenance.
- Collaborate with cross-functional teams to design fault-tolerant and cost-effective architectures.
- Promote a culture of continuous improvement and reliability-first engineering.
- Participate in capacity planning and infrastructure scaling.
- Manage on-call rotations and ensure incident response processes are effective and well-documented.
- Work in a fast-paced, fluid landscape while managing and prioritizing multiple responsibilities
Have you got what it takes?
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 10+ years of overall experience in SRE/DevOps roles, with at least 2 years managing technical teams.
- Proficiency in at least one programming language (e.g., Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell).
- Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc)
- Experience with infrastructure as code tools such as CloudFormation, Terraform.
- Deep understanding of CI/CD concepts and experience with CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI.
- Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK).
- Working experience of Grafana Observability Suite (Loki, Mimir, Tempo).
- Experience in implementing OpenTelemetry protocol in Microservice environment.
- Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
- Experience of Incident management and blameless postmortems that includes driving the incident response efforts during outages and other critical incidents, resolution, and communication in a cross-functional team setup.
Good to have skills:
- Handson experience of working with large Kubernetes Cluster. Certification will be an added plus.
- Administration and/or development experience of standard monitoring and automation tools such as Splunk, Datadog, Pagerduty Rundeck.
- Familiarity with configuration management tools like Ansible, Puppet, or Chef.
- Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent.
About NiCE
NICE Ltd. (NASDAQ: NICE) software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. Every day, NiCE software manages more than 120 million customer interactions and monitors 3+ billion financial transactions.
Known as an innovation powerhouse that excels in AI, cloud and digital, NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries.
NiCE is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, age, sex, marital status, ancestry, neurotype, physical or mental disability, veteran status, gender identity, sexual orientation or any other category protected by law.