Software Engineer

5 - 8 years

2 Lacs

Posted:2 weeks ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

SUMMARY
Key Responsibilities:
  • Debug and resolve time-sensitive issues across AWS, Azure, or GCP, identifying points of failure and collaborating with internal teams for resolution ensuring minimal service disruption and adherence to ITIL best practices.
  • Develop and maintain scalable applications using Java or .NET, or C#, with automation support via Python scripting.
  • Apply ITIL best practices across incident, change, and problem management processes to ensure consistent, efficient, and compliant service delivery.
  • Demonstrate a strong understanding of system and cloud architecture, and proactively recommend best practices for scalability, reliability, and maintainability across applications and infrastructure.
  • Collaborate closely with solution architects and engineering teams to apply ITIL best practices across incident, change, and problem management, while leveraging a strong understanding of system architecture and design principles to identify flaws in underlying designs and recommend scalable, reliable, and maintainable solutions.
  • Write, optimize, and troubleshoot SQL queries, stored procedures, and ensure database performance.
  • Own the setup, configuration, and optimization of Datadog for full-stack observability, and actively leverage its AIOps capabilities including anomaly detection, event correlation, and automated root cause analysis to enhance incident response and system reliability.
  • Champion a mindset of continuous improvement in support operations by proactively identifying inefficiencies, streamlining workflows, and implementing automation or process enhancements to eliminate repetitive effort and improve overall service quality.
  • Design and implement automation workflows using Python to streamline operational tasks and reduce manual effort.
  • Perform API testing and debugging using tools like Postman, ensuring robust integrations and data flow.
  • Handle and manipulate JSON data structures for application and API interactions.
  • Utilize GitHub Copilot and other AI tools to accelerate development and troubleshooting tasks.
  • Analyse reports and logs to drill down issues, identify technical/functional/knowledge/operational debt, and drive resolution strategies.
  • Recommend and implement scaling and redundancy strategies in cloud infrastructure to ensure high availability.
  • Manage and troubleshoot containerized applications using Docker and Kubernetes in production environments
  • Mentor junior engineers, providing guidance on technical best practices and career development.
  • Ensure alignment with organizational standards and cloud governance policies (e.g., cloud gates), actively working towards  compliance  in all deployments, configurations, and operational practices across cloud environments.

 

Incident Management:

       Own the incident management lifecycle: detection, response, resolution, and post-mortem analysis.

       Conduct root cause analysis and implement preventive measures.

       Ensure change requests are properly assessed, documented, and executed with minimal impact

 

Change Management:

       Manage the change management process, ensuring controlled and efficient implementation of changes

       Assess the impact of proposed changes and mitigate potential risks.

       Ensure compliance with change management policies and procedures.

 

 

Metrics and Reporting:

       Maintain dashboards for real-time visibility into operational health.

       Use data-driven insights to identify recurring issues and recommend process improvements.

Transformation and Automation:
       Identify opportunities for process automation and implement solutions to improve efficiency.
       Evaluate and implement new monitoring tools
 
Key Requirements:

       Programming Languages: Minimum 6-8 years of experience in  Java or .NET or C# & Python

       Cloud Platforms:  Minimum of  4-6 years of experience in AWS, Azure, GCP (including debugging and scaling strategies)

       Database Management: Minimum of 2 years of SQL, stored procedures, performance tuning

       API Testing & Debugging: Postman, RESTful APIs

       Data Handling: JSON structures, data parsing

       Monitoring & Observability: Datadog (including AIOps features like anomaly detection, event correlation)

       Containerization : Docker, Kubernetes

       Automation: Python scripting, workflow automation

       Reporting & Analysis: Log analysis, issue drill-down, technical debt identification

       AI Tools: GitHub Copilot, GenAI familiarity

       ITIL Fundamentals: Incident, change, and problem management

       System & Cloud Architecture: Design principles, scalability, redundancy

       Collaboration : Working with architects and engineering teams

       Continuous Improvement: Process optimization, effort elimination

  • Experience with AIOps platforms such as:
    • Moogsoft  for event correlation and noise reduction
    • Datadog  for full-stack observability and AI-driven root cause analysis
    • Splunk ITSI  for predictive analytics and service intelligence
    • ServiceNow ITOM  for workflow automation and anomaly detection
  • Ability to interpret and act on AI-driven insights for proactive incident resolution.

       Experience in tools like Docker and Kubernetes for managing containerized applications.

       Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack (Elasticsearch, Logstash, Kibana).

       Expertise in creating Datadog dashboards, monitors, and log pipelines.

 
Must have Skills:
  • Excellent analytical and troubleshooting skills to diagnose and resolve complex issues. 
  • Effective communication skills to collaborate with cross-functional teams and convey technical information clearly. 
  •  Ability to thrive in a fast-paced environment, managing multiple tasks and projects simultaneously. 
  • Previous experience in a similar role or relevant industry experience is highly preferred. Knowledge of cloud platforms like AWS, Azure, or Google Cloud 

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now
Algoleap Technologies logo
Algoleap Technologies

Information Technology

San Francisco

RecommendedJobs for You