Job
Description
Site Reliability is a unique blend of development and operations expertise aimed at enhancing organizational efficiency. Regardless of whether you hail from a development background and seek to delve deeper into operations or are a DevOps/Systems Engineer keen on crafting internal tools, your skill set can greatly benefit Cvent SRE. We are on the lookout for individuals who exhibit a fervent passion for continuous learning and technology. A Bachelor's or Master's degree in Computer Science or a related technical field is a prerequisite for this role. As part of our team, you will play a crucial role in ensuring the stability and robustness of our platform. We strive to eliminate barriers by promoting developer accountability and enabling their autonomy. By devising innovative and durable solutions to operational challenges, we extend our unwavering support to developers. Leveraging our expertise as generalists, we collaborate closely with product development teams - right from the initial design phase to identifying and rectifying production issues. Our holistic approach involves establishing and upholding standards while fostering an agile and knowledge-sharing culture. Embracing SRE principles like blameless postmortems and operational load caps, we are constantly enhancing our competencies and enhancing our quality of work life. Our team is deeply passionate about automation, continuous learning, and engaging in dynamic day-to-day operations. **Must Have:** - 7-9 years of relevant experience - Proficiency in SDLC methodologies, especially Agile software development - Strong background in software development with a solid knowledge of Java/Python/Ruby and Object-Oriented Programming concepts - Hands-on experience in managing AWS services and operational expertise in handling applications within AWS - Proficiency in configuration management tools like Chef, Puppet, Ansible, or equivalent - Sound Windows and Linux administration skills - Familiarity with APM, monitoring, and logging tools such as New Relic, DataDog, Splunk - Expertise in managing 3-tier application stacks and incident response - Experience with build tools like Jenkins, CircleCI, Harness, etc. - Exposure to containerization concepts like Docker, ECS, EKS, Kubernetes - Working knowledge of NoSQL databases such as MongoDB, Couchbase, Postgres, etc. - Self-motivation and the ability to work independently are essential. **Good to Have:** - Understanding of F5 load balancing concepts - Basic knowledge of observability, SLIs/SLOs - Familiarity with Message Queues like RabbitMQ - Knowledge of basic networking concepts - Experience with package managers such as Nexus, Artifactory, or equivalent - Strong communication skills - Previous experience in people management.,