Senior Software Engineer

4 years

0 Lacs

Posted:10 hours ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Career Area

Technology, Digital and Data

Your Work Shapes the World at Caterpillar Inc.

When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it.Software Engineer (SRE)

Description

Reliability in highly complex, integrated systems typically crosses between multiple programming languages, third-party services, and integrations – as well as software and hardware – an SRE needs to be multi-talented.As an SRE, you will be a process, technology and results oriented team member for Operations to deliver top notch service, quality, and metrics for Cat Digital data Platform.You will fit this role if you can.
  • Think about systems - edge cases, failure modes, behaviours, specific implementations.
  • Debug production issues across services and levels of the stack.
  • Make monitoring and alerting alert on symptoms and not on outages.
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
  • Have an urge to collaborate and communicate asynchronously.
  • Have an urge for delivering quickly and iterating fast.

Basic Qualifications

  • Bachelor’s degree, preferably in Computer Science, Software Engineering, or any other Engineering field.
  • 4+ years with SRE expertise.

Technical Experience

  • Knowledge of CI/CD solution on any platform with prior experience is must.
  • Expertise in at least one technology stack designing, coding, testing, and delivering software.
  • 4+ years prior experience in SRE.
  • Proficiency in scripting and programming languages like Python or Java
  • Deep experience with AWS (EC2, S3, Lambda, CloudWatch, etc.), Azure, and GCP is critical
  • Expertise in tools like Prometheus, Grafana, AppDynamics, CloudWatch, and Thousand Eyes for system health monitoring and alerting
  • Experience with Docker, Kubernetes, Ansible, Puppet, or Chef for managing deployments
  • Skills in diagnosing and resolving production issues, conducting root cause analysis, and writing postmortems
  • Familiarity with ITSM tools like ServiceNow and incident response protocols
  • Soft Skills & Operational Mindset and Problem Solving & Resilience
  • Ability to think critically under pressure and resolve complex issues calmly
  • Strong interpersonal skills to work across engineering, product, and operations teams
  • Precision in monitoring, alerting, and writing reliable code and continuous Improvement
  • A mindset geared toward reducing toil, automating repetitive tasks, and improving system reliability
  • Knowledge on Azure Cloud an added advantage.
  • Expertise in ELK Monitoring Tool that ensure Open-Source IT monitoring, network monitoring, server and applications monitoring is an added advantage.
  • Understanding of Restful API, Apigee or any other API Gateway will be plus.
  • 4+ years’ experience on Docker and at least one Docker Container orchestration – ECS, Kubernetes
  • Understanding with configuration Management tools like Ansible/Puppet/Chef/PowerShell/Terraform.
  • Understanding of Git, Bitbucket, Jira, Jenkins, Sonar, Splunk, Maven, AIM and/ or Continuous Delivery tools.
  • Working knowledge of Infrastructure components. (E.g. routers, load balancers, cloud products, container systems, compute, storage, and networks).
  • Excellent problem-solving skills and a strong attention to detail.
  • Background in ITIL and/or ITSM process.
  • Strong communication skills and ability to collaborate effectively with cross-functional teams.

Responsibilities

  • Monitor and troubleshoot production systems to identify and resolve performance, scalability, and reliability issues proactively.
  • Work closely with developers to identify and fix bugs and performance bottlenecks in the application code
  • Continuously evaluate systems and processes to identify areas for improvement and implement changes as needed
  • Monitor and troubleshoot production systems to identify and resolve performance, scalability, and reliability issues proactively
  • Collaborate with cross-functional teams to define and document operational processes, best practices, and procedures.
  • Meeting SLO, SLA, SLI’s defined in the operations model.
  • Setting task prioritization and troubleshoot to closure of incidents.
  • Participate in on-call /on-rotation.
  • Improve Service observability.
  • Proactively testing the flexibility and resilience of the system.
  • Drive adoption of continuous integration/inspection/deployment
If you have a passion for delivering reliable, high-performance services and thrive in a fast-paced environment, we'd love to hear from you. Apply now to join our team as a Site Reliability Engineer.

Posting Dates

August 11, 2025 - August 17, 2025Caterpillar is an Equal Opportunity Employer. Qualified applicants of any age are encouraged to applyNot ready to apply? Join our Talent Community.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Bengaluru, Karnataka, India

Hyderabad, Telangana, India

Hyderabad, Telangana, India

Pune, Maharashtra, India