Site Reliability Engineer

0 years

0 Lacs

Posted:21 hours ago| Platform: GlassDoor logo

Apply

Work Mode

On-site

Job Type

Part Time

Job Description

We are Progress (Nasdaq: PRGS) - a trusted provider of software that enables our customers to develop, deploy and manage responsible, AI powered applications and experiences with agility and ease.
We’re proud to have a diverse, global team where we value the individual and enrich our culture by considering varied perspectives because we believe people power progress. Join us as a Site Reliability Engineer in our Product Operations division in Hyderabad and help us do what we do best: propelling business forward.
In this role, you will work on:
  • Data Security and Compliance:
    • Protect systems from data breaches, prioritizing data security.
    • Ensure compliance with PCI-DSS, HIPAA, SOC2, and other compliance policies, standards, and procedures.
    • Participate in the quarterly, bi-yearly, and yearly audit compliance activities.
  • Infrastructure and Security Services:
    • Build and maintain reliable infrastructure and security services for highly available and scalable services by utilizing native Azure/AWS/GCP infrastructure services from Azure/AWS/GCP and other industry leading tools.
  • System Administration and Automation:
    • Perform basic system administration tasks such as configuring servers, setting up HA/DR, automating routine tasks, and backup/restore procedures.
    • Implement automation to minimize manual work and achieve security and compliance objectives.
  • Automation and Tooling:
    • Develop and maintain automation frameworks, tools, and processes to streamline operations and improve efficiency.
    • Champion the adoption of infrastructure as code (IaC) principles for configuration management and deployment automation.
  • Performance Optimization:
    • Analyze system performance and identify opportunities for optimization and efficiency improvements.
    • Implement performance tuning strategies to enhance system reliability and scalability.
  • Monitoring and Observability:
    • Design and implement comprehensive monitoring and observability solutions to proactively identify and address system issues.
    • Utilize advanced monitoring tools and techniques to gain insights into system behavior and performance.
  • Incident Management and Postmortems:
    • Participate in incident management processes, ensuring timely resolution of incidents and minimizing impact on users.
    • Conduct postmortem reviews to identify root causes and implement preventive measures to mitigate future incidents.
  • Capacity Planning and Forecasting:
    • Perform capacity planning and forecasting to anticipate resource requirements and ensure adequate scalability.
    • Develop strategies for optimizing resource utilization and cost-effectiveness.
  • On-call Support and Troubleshooting:
    • Serve on the on-call team, acting as an escalation contact for service incidents.
    • Troubleshoot and resolve issues related to application development, deployment, and operations.
    • Work with Technical Support to troubleshoot customer issues.
  • Collaboration and Agile Support:
    • Work collaboratively with agile software development teams, providing support to developers, QA, and technical support.
    • Collaborate with other team members during our planned scheduled maintenance windows.
  • Customer Account Provisioning:
    • Provision new customer accounts, including handling complex orders in coordination with Progress Sales/Professional Services.
    • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • High-Availability Deployments:
    • Implement automated high-availability deployments, ensuring system reliability and uptime.
  • End-to-End Solution Understanding:
    • Become proficient in understanding how each software component, system design, and configuration are linked to form an end-to-end solution.
Your background:
  • Experience:
    • Proven experience as a Site Reliability Engineer (or similar position) in a production capacity.
    • You understand what it means to operate infrastructure as code and have experience developing services and automation to do so. Chef knowledge would be a plus.
    • You have a great ability to debug and optimize code and automate routine tasks to eliminate toil.
    • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership, initiative, grit, and drive.
    • You have designed and implemented applications and systems that scale, are resilient to failure, and are observable.
  • Technical Expertise:
    • Strong understanding of Windows, Linux, automation tools (Terraform, Ansible, Chef, or Puppet), Azure/AWS services (ECS, EKS, S3, and more), and scripting languages (Shell, Python, PowerShell, or others).
    • Knowledge of databases (Azure SQL, Postgres/RDS, Graph databases), Service Mesh (Linkerd or Envoy), API gateways, authentication services, 3rd party integrations, and more.
    • Proficient in managing containerized environments using Kubernetes, Docker, and Rancher, along with other related tools and technologies.
  • Security Knowledge:
    • Familiarity with security concepts, including cloud authentication, authorization, web attacks, and environment security.
    • Experience with network concepts, including TCP/IP, HTTP, and TLS.
  • Cloud Experience:
    • Experience with cloud-hosted apps/services (Azure/AWS preferred) and translating business requirements into securely implemented capabilities in the cloud.
  • Education:
    • Bachelor’s degree in computer science, Information Systems, or a related field.
  • Compliance and Communication:
    • Proven ability to adhere to policies, standards, and procedures related to change control and operational best practices.
    • Strong written and verbal communication skills for both technical and non-technical audiences.
  • Flexible and Proactive:
    • Willingness to be flexible in responding to customer issues and ability to identify product/deployment improvements for future mitigation.
    • You are interested in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Regulatory Compliance:
    • Experience with PCI, HIPAA, and SOC2 compliance.
Must be willing to work in US time zone [4:30 PM to 1:30AM IST]
If this sounds like you and fits your experience and career goals, we’d be happy to chat. What we offer in return is the opportunity to experience a great company culture with wonderful colleagues to learn from and collaborate with, and also to enjoy:
Compensation
  • Competitive remuneration package
  • Employee Stock Purchase Plan Enrolment
Vacation, Family, and Health
  • 30 days of earned leave
  • An extra day off for your birthday
  • Various other leaves like marriage leave, casual leave, maternity leave, and paternity leave
  • Premium Group Medical Insurance for employees and five dependents, personal accident insurance coverage, and life insurance coverage
  • Professional development reimbursement
  • Interest subsidy on loans - either vehicle or personal loans.
Apply now!
#LI-SR1
#LI-Hybrid

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Hyderabad, Telangana, India

Gandhinagar, Gujarat, India