So, what s the role all about
NICE is looking for a Site Reliability Engineer. Candidates will work supporting large complex enterprise software clients including applications, servers, SQL, network and must have excellent problem-solving skills. As we expand our customer deployments, we are currently seeking an experienced SRE to deliver insights from massive scale data in real time. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
How will you make an impact
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
Have you got what it takes
- 2+ years programming/scripting experience with any of the following: (Go, Python, . Net (C#), Node)
- Bachelor s degree in computer science, Engineering, or related field (or equivalent experience).
- 2-3 years of working experience in a similar role, with a focus on systems engineering, automation, and reliability.
- Proficiency in at least one programming language (e. g. , Python, Go, Java, C#) and experience with scripting languages (e. g. , Bash, PowerShell).
- Deep understanding of cloud computing platforms (e. g. , AWS), the working and reliability constraints of some of the prominent services (e. g. , EC2, ECS, Lambda, DynamoDB etc)
- Experience with infrastructure as code tools such as CloudFormation, Terraform.
- Deep understanding of CI/CD concepts and experience with CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI.
- Strong knowledge of containerization technologies (e. g. , Docker, Kubernetes) and microservices architecture.
- Experience with monitoring and observability tools (e. g. , Prometheus, Grafana, ELK stack, Cloudwatch).
- Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
- Experience of Incident management and blameless postmortems that includes driving the incident response efforts during outages and other critical incidents, resolution, and communication in a cross-functional team setup.
-
Readiness to work on Graveyard Shift .
You will have an advantage if you also have:
Kubernetes + certification, Grafana
Enjoy NICE-FLEX!
Requisition ID: 9566
Reporting into: Tech Manager
Role Type: Individual Contributor