Site Reliability Engineer - Autosys/Google Cloud Platform (4-8 yrs)

4 - 8 years

17 - 22 Lacs

Posted:3 weeks ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Overview : We are seeking a Site Reliability Engineer (SRE) with expertise in Autosys and Google Cloud Platform (GCP) to join our dynamic team. The ideal candidate will have strong hands-on experience in job scheduling and automation using Autosys, as well as a deep understanding of cloud infrastructure and operations on Google Cloud. You will be responsible for ensuring the reliability, scalability, and performance of cloud-based applications and infrastructure, while managing complex workflows and automating critical operations. This is a great opportunity for a highly motivated individual to work in a collaborative environment where you'll apply your skills to build and maintain highly reliable cloud infrastructure solutions. Key Responsibilities : - Work with Google Cloud Platform (GCP) to design, deploy, and maintain cloud infrastructure. Manage GCP services such as Compute Engine, Cloud Functions, Kubernetes Engine (GKE), and Cloud Storage. - Manage and automate job scheduling using Autosys to ensure that critical workflows run smoothly, are optimized for performance, and have minimal downtime. Troubleshoot, monitor, and resolve issues related to Autosys jobs. - Implement best practices for monitoring, alerting, and incident management to maintain high system uptime and service reliability. Develop automated solutions for routine tasks to ensure consistency and prevent downtime. - Collaborate with development teams to integrate and maintain CI/CD pipelines for continuous delivery of applications, ensuring seamless and efficient deployments across GCP environments. - Ensure the security, integrity, and compliance of all cloud-based systems within the GCP environment. Work with security teams to implement security best practices such as identity and access management (IAM), firewalls, and data encryption. - Set up and maintain monitoring solutions (e.g., Prometheus, Grafana, Stackdriver for GCP) to track system health and performance. Respond promptly to incidents, troubleshoot issues, and ensure effective resolution. - Analyze system performance and provide recommendations for improvements. Optimize resources to ensure applications are running cost-effectively, with good resource utilization in GCP. - Work closely with development, QA, and operations teams to ensure the smooth deployment and operation of applications. Participate in on-call rotations and incident management processes to maintain application uptime. - Document processes, troubleshooting guides, architecture diagrams, and standard operating procedures for system reliability. Conduct knowledge sharing sessions and help build a knowledge base within the team. Requirements : - 4 to 8 years of experience in Site Reliability Engineering (SRE) or Operations Engineering with hands-on experience in cloud environments (specifically GCP). - Strong experience with Autosys, including job scheduling, monitoring, and automation of workflows. Familiarity with Autosys configuration, job dependencies, and troubleshooting is essential. - Experience with Google Cloud Platform (GCP) services such as Compute Engine, Cloud Functions, GKE, Cloud Storage, and Cloud SQL. - Strong experience with Linux/Unix systems and system administration. - Proficiency in scripting languages such as Python, Bash, or Shell scripting to automate workflows, manage cloud resources, and handle repetitive tasks. - Familiarity with tools like Prometheus, Grafana, and Google Stackdriver for cloud monitoring, logging, and alerting. - Hands-on experience with Docker and Kubernetes, especially Google Kubernetes Engine (GKE) for container orchestration and deployment. - Knowledge of continuous integration and deployment tools (e.g., Jenkins, GitLab CI, Terraform) for automated deployments and infrastructure management. - Experience with Git for version control, code reviews, and managing automation scripts. - Strong troubleshooting, debugging, and analytical skills, with the ability to identify and resolve system failures or performance issues. - Familiarity with IAM roles, security best practices, and compliance standards within the GCP ecosystem. Apply Insights Follow-up Save this job for future reference Did you find something suspiciousReport Here! Hide This Job Click here to hide this job for you. You can also choose to hide all the jobs from the recruiter.

Mock Interview

Practice Video Interview with JobPe AI

Start System Administration Interview Now

My Connections Innova AM Tech

Download Chrome Extension (See your connection in the Innova AM Tech )

chrome image
Download Now

RecommendedJobs for You

Mumbai, New Delhi, Bengaluru