Client Server Software Specialist

3 - 7 years

0 Lacs

Posted:1 day ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: As a Incident Manager, your primary responsibility will be to monitor, identify, and triage production incidents in cloud-based software systems. You will work closely with developers to diagnose defects, optimize system performance, and implement corrective actions. Additionally, you will be involved in incident escalation processes, automated monitoring solutions, and post-incident reviews to ensure efficient incident response. Key Responsibilities: - Monitor, identify, and triage production incidents to assess impact, root cause, and potential resolution paths. - Conduct detailed troubleshooting of cloud-based software systems to diagnose complex defects and implement corrective actions. - Manage incident escalation processes, ensuring timely communication and coordination with relevant teams. - Collaborate with developers to resolve bugs, optimize system performance, and deploy hotfixes as needed. - Analyze logs, error reports, and monitoring data to identify patterns and proactively mitigate potential issues. - Implement automated monitoring and alerting solutions to detect anomalies and streamline incident response. - Document incident response processes, including root cause analysis and preventive measures. - Participate in on-call rotation to provide 24/7 support for critical incidents. - Develop and maintain knowledge base articles, playbooks, and incident runbooks for common issues. - Contribute to post-incident reviews, identifying areas for improvement in monitoring, response, and resolution processes. Qualifications: - Bachelors degree in Computer Science, Engineering, or a related field (or equivalent work experience). - 3+ years of experience in software engineering, with a focus on incident management and resolution in cloud environments. - Strong proficiency in Node.js, including debugging, error handling, and performance optimization. - Experience with cloud platforms (AWS, Azure, or GCP), including monitoring and troubleshooting cloud-native applications. - Proficiency in logging frameworks (e.g., Winston, Bunyan) and monitoring tools (e.g., Datadog, ELK Stack, CloudWatch). - Strong problem-solving skills and ability to perform in high-pressure, time-sensitive scenarios. - Experience with CI/CD pipelines and automated deployments (e.g., Jenkins, GitLab CI, AWS CodePipeline). - Excellent communication and documentation skills, with a focus on clear incident reporting and knowledge transfer. - Ability to work effectively in a cross-functional team, collaborating with developers, DevOps, and product owners. - Written and spoken proficiency in English. Preferred Skills: - Experience with containerization (Docker, Kubernetes). - Knowledge of REST APIs, WebSockets, and microservices architecture. - Familiarity with incident management frameworks (e.g., ITIL, SRE practices). - Understanding of security best practices in cloud-based systems.,

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You