Incident Response Engineer

2 - 5 years

2 - 5 Lacs

Posted:14 hours ago| Platform: Foundit logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystemcomprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 marketsto create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.

Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.

About Job

  • Role Title: Incident Response Engineer
  • Role Overview: As an Incident Response Engineer at Centific, you will be responsible for handling and mitigating critical system incidents, ensuring minimal downtime and rapid recovery of services. This role involves working with cross-functional teams to detect, analyze, and resolve incidents efficiently. You will be required to improve incident handling processes, develop automated response strategies, and maintain documentation of all incidents and resolutions. Your expertise in managing real-time operational incidents and post-incident analysis will play a critical role in maintaining system stability and business continuity.
  • This is a hands-on role that requires deep knowledge of incident response frameworks, system troubleshooting, security monitoring, and automated remediation.
  • Key Responsibilities:
  • Incident Detection & Monitoring:
  • Implement real-time incident detection using tools like PagerDuty/Opsgenie/VictorOps for on-call alerting and escalations.
  • Monitor system health, logs, and telemetry using Splunk/Elastic Stack (ELK)/Sentry/Grafana Loki to identify early warning signs of system failures.
  • Configure and fine-tune SIEM solutions (Splunk/Graylog/Wazuh) for log-based security and operational threat detection.

Why Join Centific

  • High-Impact Role: Be at the forefront of mitigating critical system incidents and ensuring business continuity.
  • Cutting-Edge Technology: Work with modern automation, monitoring, and security tools.
  • Global Exposure: Collaborate with teams supporting enterprise-scale infrastructure worldwide.
  • Career Growth: Access to security certifications, SRE training, and industry-leading upskilling programs.
  • Work-Life Balance: Hybrid work model, shift flexibility, and wellness programs.
  • Skills:
  • Ability to remain calm under pressure and manage incidents in high-stress environments.
  • Ownership and accountability in resolving incidents from detection to closure, including post-mortem analysis.
  • Strong coordination skills to communicate incident status clearly with engineers, leadership, and external teams.
  • Process-oriented thinking to follow structured incident response playbooks and continuously improve workflows.
  • Ability to make rapid decisions in time-sensitive scenarios to minimize downtime and mitigate risks.
  • Good-to-have Qualifications:
  • Certifications: GIAC Certified Incident Handler (GCIH), AWS Certified Security Specialist, or Certified Information Systems Security Professional (CISSP).
  • Threat Hunting & Detection: Knowledge of MITRE ATT&CK framework and threat intelligence integration.
  • Chaos Engineering: Hands-on experience with Gremlin/LitmusChaos for incident testing and resilience validation.
  • Network Troubleshooting: Understanding of packet analysis, firewall logs, and intrusion detection systems (IDS/IPS).
  • Disaster Recovery Planning: Experience in business continuity planning and disaster recovery (BCP/DR) testing.

Must-Have Qualifications:

  • Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • Experience: 3+ years of hands-on experience in incident response, system monitoring, and operational troubleshooting.
  • Monitoring & Alerting Expertise: Proficiency with PagerDuty/Opsgenie/VictorOps/Splunk/Elastic Stack (ELK)/Sentry/Grafana Loki.
  • Incident Response & RCA: Experience conducting root cause analysis (RCA) and post-mortem reviews.
  • Automation & Scripting: Hands-on experience with Python/Bash/Ansible to develop automation scripts for incident resolution.
  • Security Incident Handling: Familiarity with SIEM tools (Splunk/Graylog/Wazuh) and forensic analysis tools (TheHive/Velociraptor).
  • CI/CD & Incident Remediation: Understanding of automated rollback strategies, self-healing systems, and deployment remediation.

Collaboration & Training:

  • Coordinate incident response drills and tabletop exercises to improve team readiness.
  • Train operational teams in incident detection, escalation, and response best practices.
  • Work closely with SRE, DevOps, and Observability Engineers to optimize response workflows and improve system observability.
  • Ensure compliance such as GDPR, HIPAA, and ISO 27001 standards in incident handling and logging.
  • Implement threat intelligence feeds to stay ahead of emerging security threats.
  • Security Incident Response & Compliance:
  • Work with security teams to investigate and mitigate security-related incidents.
  • Conduct forensic analysis on compromised systems and logs using TheHive/Velociraptor/Splunk SOAR.
  • Automated Remediation & Incident Prevention:
  • Develop self-healing automation using Ansible/Python/Bash to proactively remediate common failures.
  • Implement automated rollback and recovery mechanisms within CI/CD pipelines to reduce impact during deployments.
  • Integrate AI-driven anomaly detection to proactively detect and prevent potential failures before they escalate.
  • Test and deploy chaos engineering tools (Gremlin/LitmusChaos) to validate system resilience under stress conditions.
  • Root Cause Analysis & Post-Incident Review:
  • Conduct post-mortem analysis and root cause analysis (RCA) for all major incidents.
  • Work closely with SRE and security teams to identify persistent failure patterns and recommend long-term fixes.
  • Document all incident reports, mitigation steps, and RCA findings to enhance organizational learning and incident prevention.
  • Improve incident classification to differentiate between operational failures, security breaches, and performance degradation.
  • Incident Response & Mitigation:
  • Respond to critical incidents in a 24/7 shift rotation, ensuring minimal downtime and quick service recovery.
  • Follow standardized Incident Response Playbooks to handle various system failures, security incidents, and infrastructure outages.
  • Develop and maintain incident triage and escalation processes, ensuring clear handoffs between teams.
  • Implement runbook automation to execute predefined mitigation steps for common incidents.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Centific logo
Centific

IT Services and IT Consulting

Redmond Washington

RecommendedJobs for You

Bengaluru, Karnataka, India