About the Role:
We are seeking a skilled and reliable DevOps Engineer to support and enhance our infrastructure, applications, and deployment pipelines. This role is crucial in maintaining system reliability, ensuring platform stability, and optimizing our cloud-native environments. You will collaborate closely with cross-functional teams to monitor, automate, and improve the efficiency and performance of our systems.
Key Responsibilities:Monitoring & Incident Management
- Monitor infrastructure, cloud services, CI/CD pipelines, and application performance using tools like Prometheus, Grafana, ELK Stack, CloudWatch, or Datadog.
- Respond to incidents and alerts promptly to minimize downtime and maintain business continuity.
- Perform detailed root cause analysis (RCA) and maintain clear incident documentation.
- Ensure SLA adherence and timely escalation to L2/L3 teams when required.
- Track and report on system health, performance metrics, and incident trends.
Automation & Reliability Engineering
- Develop and maintain automation scripts for deployment, monitoring, and maintenance using Bash, Python, or Ansible.
- Implement Infrastructure as Code (IaC) practices using Terraform, CloudFormation, or Ansible for consistent deployments.
- Continuously improve system reliability, scalability, and observability through proactive optimization.
- Implement automated remediation for common issues to reduce manual intervention.
Cloud & Infrastructure Management
- Manage and support production systems on AWS, Azure, or GCP cloud platforms.
- Handle routine operational tasks including resource scaling, patching, backup management, and log analysis.
- Troubleshoot complex infrastructure, networking, and application-level issues across distributed environments.
- Support and maintain Kubernetes clusters and Docker containerized environments.
- Manage networking components including load balancers, DNS configurations, and SSL/TLS certificates.
CI/CD & Application Deployment Support
- Maintain, troubleshoot, and optimize CI/CD pipelines using Jenkins, GitLab CI, ArgoCD, or GitHub Actions.
- Support seamless deployments across development, staging, and production environments.
- Collaborate with development teams to ensure smooth delivery cycles and rapid feedback loops.
- Implement deployment best practices and maintain deployment documentation.
Required Qualifications:
- 3-7 years of hands-on experience in DevOps, Site Reliability Engineering (SRE), or Cloud Operations roles.
- Strong proficiency in Linux/Unix system administration and command-line operations.
- Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP).
- Solid understanding of CI/CD principles, Git version control, and automation frameworks.
- Experience with Kubernetes and Docker for container orchestration and management.
- Proficiency with monitoring and observability tools (Prometheus, Grafana, ELK Stack, CloudWatch, Datadog, etc.).
- Knowledge of networking fundamentals including load balancers, DNS, firewalls, and SSL/TLS.
- Strong scripting abilities in Python, Bash, Shell, or PowerShell.
- Experience with incident response procedures and problem management.
- Excellent communication and collaboration skills.
- Strong analytical and problem-solving mindset with attention to detail.
Preferred Qualifications:
- Professional certifications such as AWS Certified SysOps Administrator, Kubernetes CKA, Azure DevOps Engineer, or similar credentials.
- Experience with production-grade systems like Kafka, Redis, PostgreSQL, MongoDB, NGINX, or Apache.
- Familiarity with ITIL processes and incident management frameworks.
- Exposure to regulated environments (e.g., manufacturing, IoT, finance, healthcare, or enterprise systems).
- Experience with security tools and practices including vulnerability scanning, secrets management, and compliance monitoring.
- Knowledge of GitOps practices and declarative infrastructure management.
What We Offer:
- Opportunity to work with cutting-edge cloud technologies and modern DevOps practices.
- Exposure to large-scale, mission-critical infrastructure.
- Health insurance coverage.
- Performance-based bonuses.
- Professional development opportunities and certification support.
- Collaborative and supportive team environment.
Work Model:
- Work Hours: Day shift
- Location: On-site at Baner, Pune, Maharashtra.
- Work Mode: Reliable commute or relocation to Pune required before joining.
- Preference: Immediate joiners will be given priority.
Key Performance Indicators:
- System uptime and SLA compliance.
- Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR).
- Incident resolution rate without escalation.
- Deployment success rate and pipeline stability.
- Automation coverage and operational efficiency improvements.
Job Details:
- Job Type: Full-time, Permanent
- Location: Baner, Pune, Maharashtra
Schedule: Day shift
Job Types: Full-time, Permanent
Benefits:
- Health insurance
- Paid sick time
- Paid time off
Ability to commute/relocate:
- Baner, Pune, Maharashtra: Reliably commute or planning to relocate before starting work (Required)
Experience:
- DevOps: 3 years (Required)
Work Location: In person