Software Engineer

6 - 8 years

0 Lacs

Posted:3 weeks ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities

  • Debug and resolve time-sensitive issues across AWS, Azure, or GCP, identifying points of failure and collaborating with internal teams for resolution ensuring minimal service disruption and adherence to ITIL best practices.
  • Develop and maintain scalable applications using Java or .NET, or C#, with automation support via Python scripting.
  • Apply ITIL best practices across incident, change, and problem management processes to ensure consistent, efficient, and compliant service delivery.
  • Demonstrate a strong understanding of system and cloud architecture, and proactively recommend best practices for scalability, reliability, and maintainability across applications and infrastructure.
  • Collaborate closely with solution architects and engineering teams to apply ITIL best practices across incident, change, and problem management, while leveraging a strong understanding of system architecture and design principles to identify flaws in underlying designs and recommend scalable, reliable, and maintainable solutions.
  • Write, optimize, and troubleshoot SQL queries, stored procedures, and ensure database performance.
  • Own the setup, configuration, and optimization of Datadog for full-stack observability, and actively leverage its AIOps capabilities—including anomaly detection, event correlation, and automated root cause analysis—to enhance incident response and system reliability.
  • Champion a mindset of continuous improvement in support operations by proactively identifying inefficiencies, streamlining workflows, and implementing automation or process enhancements to eliminate repetitive effort and improve overall service quality.
  • Design and implement automation workflows using Python to streamline operational tasks and reduce manual effort.
  • Perform API testing and debugging using tools like Postman, ensuring robust integrations and data flow.
  • Handle and manipulate JSON data structures for application and API interactions.
  • Utilize GitHub Copilot and other AI tools to accelerate development and troubleshooting tasks.
  • Analyse reports and logs to drill down issues, identify technical/functional/knowledge/operational debt, and drive resolution strategies.
  • Recommend and implement scaling and redundancy strategies in cloud infrastructure to ensure high availability.
  • Manage and troubleshoot containerized applications using Docker and Kubernetes in production environments
  • Mentor junior engineers, providing guidance on technical best practices and career development.
  • Ensure alignment with organizational standards and cloud governance policies (e.g., cloud gates), actively working towards compliance in all deployments, configurations, and operational practices across cloud environments.

Incident Management

· Own the incident management lifecycle: detection, response, resolution, and post-mortem analysis.· Conduct root cause analysis and implement preventive measures.· Ensure change requests are properly assessed, documented, and executed with minimal impact

Change Management

· Manage the change management process, ensuring controlled and efficient implementation of changes· Assess the impact of proposed changes and mitigate potential risks.· Ensure compliance with change management policies and procedures.

Metrics And Reporting· Maintain dashboards for real-time visibility into operational health.· Use data-driven insights to identify recurring issues and recommend process improvements.Transformation And Automation· Identify opportunities for process automation and implement solutions to improve efficiency.· Evaluate and implement new monitoring toolsKey Requirements·Programming Languages : Minimum 6-8 years of experience in Java or .NET or C# & Python·Cloud Platforms : Minimum of 4-6 years of experience in AWS, Azure, GCP (including debugging and scaling strategies)·Database Management : Minimum of 2 years of SQL, stored procedures, performance tuning·API Testing & Debugging : Postman, RESTful APIs·Data Handling : JSON structures, data parsing·Monitoring & Observability : Datadog (including AIOps features like anomaly detection, event correlation)·Containerization : Docker, Kubernetes·Automation : Python scripting, workflow automation·Reporting & Analysis : Log analysis, issue drill-down, technical debt identification·AI Tools : GitHub Copilot, GenAI familiarity·ITIL Fundamentals : Incident, change, and problem management·System & Cloud Architecture : Design principles, scalability, redundancy·Collaboration : Working with architects and engineering teams·Continuous Improvement : Process optimization, effort elimination· Experience in tools like Docker and Kubernetes for managing containerized applications.Experience with AIOps platforms such as:Moogsoft – for event correlation and noise reductionDatadog – for full-stack observability and AI-driven root cause analysisSplunk ITSI – for predictive analytics and service intelligenceServiceNow ITOM – for workflow automation and anomaly detectionAbility to interpret and act on AI-driven insights for proactive incident resolution.

Mock Interview

Practice Video Interview with JobPe AI

Start Java Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Java Skills

Practice Java coding challenges to boost your skills

Start Practicing Java Now

RecommendedJobs for You