Production Support Engineers

3 years

0 Lacs

Posted:5 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

EverAI Labs


We are seeking for experienced SRE/Production Support Engineer to join our dynamic team and ensure the seamless operation of our EverAI Suite products. In this role, you will provide 24/7 production support, troubleshoot issues, monitor system performance, and collaborate with development teams to maintain high availability and reliability. This position is ideal for problem-solvers who thrive in fast-paced environments and are passionate about AI technologies.


Key Responsibilities

  • Monitor production environments for EverAI Suite products (EverAI Simulator, EverAI Recruiter, and EverAI Knowledgeminer) using tools like Splunk, Prometheus, Grafana, ELK Stack, or similar monitoring systems.
  • Respond to incidents, alerts, and user-reported issues in a timely manner, performing root cause analysis and implementing fixes or workarounds.
  • Collaborate with cross-functional teams (development, QA, and operations) to resolve complex production problems and prevent recurrence.
  • Maintain and update documentation for support processes, troubleshooting guides, and knowledge bases.
  • Perform routine maintenance tasks, such as patching, scaling resources, and optimizes performance in cloud-based infrastructures (e.g., AWS, Azure, or GCP).
  • Participate in on-call rotations to provide after-hours support and ensure SLAs are met.
  • Analyze logs, metrics, and traces to identify trends, potential bottlenecks, and areas for improvement.
  • Assist in deployment activities, including CI/CD pipeline support and rollback procedures.
  • Contribute to continuous improvement initiatives, such as automating support tasks and enhancing monitoring capabilities.


Required Qualifications

  • Bachelor’s degree in computer science, Information Technology, Engineering, or a related field (or equivalent experience).
  • 3+ years of experience in production support, DevOps, or site reliability engineering (SRE) roles.
  • Strong troubleshooting skills with experience in debugging distributed systems, APIs, and microservices architectures.
  • Proficiency in scripting languages such as Python, Bash, or PowerShell for automation.
  • Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization tools (Docker, Kubernetes).
  • Familiarity with monitoring and logging tools (e.g., Splunk, Datadog, New Relic).
  • Knowledge of databases (SQL/NoSQL) and networking concepts.
  • Excellent communication skills, with the ability to explain technical issues to non-technical stakeholders.
  • Ability to work in a shift-based or on-call environment.


Preferred Qualifications

  • Experience supporting AI/ML-based products or SaaS platforms.
  • Certifications such as AWS Certified DevOps Engineer, Google Cloud Professional SRE, or equivalent.
  • Familiarity with incident management frameworks (e.g., ITIL) and tools like PagerDuty or Jira.
  • Strong problem-solving mindset with a proactive approach to preventing issues.

  • If you’ve got the skills to succeed and the motivation to make it happen, we look forward to hearing from you.

    Mock Interview

    Practice Video Interview with JobPe AI

    Start DevOps Interview
    cta

    Start Your Job Search Today

    Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

    Job Application AI Bot

    Job Application AI Bot

    Apply to 20+ Portals in one click

    Download Now

    Download the Mobile App

    Instantly access job listings, apply easily, and track applications.

    coding practice

    Enhance Your Python Skills

    Practice Python coding challenges to boost your skills

    Start Practicing Python Now

    RecommendedJobs for You