We are Progress (Nasdaq: PRGS) - a trusted provider of software that enables our customers to develop, deploy and manage responsible, AI powered applications and experiences with agility and ease.
We’re proud to have a diverse, global team where we value the individual and enrich our culture by considering varied perspectives because we believe people power progress. Join us as a Site Reliability Engineer in our Product Operations division in Hyderabad and help us do what we do best: propelling business forward.
In this role, you will work on:
-
Data Security and Compliance:
-
Protect systems from data breaches, prioritizing data security.
-
Ensure compliance with PCI-DSS, HIPAA, SOC2, and other compliance policies, standards, and procedures.
-
Participate in the quarterly, bi-yearly, and yearly audit compliance activities.
-
Infrastructure and Security Services:
-
Build and maintain reliable infrastructure and security services for highly available and scalable services by utilizing native Azure/AWS/GCP infrastructure services from Azure/AWS/GCP and other industry leading tools.
-
System Administration and Automation:
-
Perform basic system administration tasks such as configuring servers, setting up HA/DR, automating routine tasks, and backup/restore procedures.
-
Implement automation to minimize manual work and achieve security and compliance objectives.
-
Automation and Tooling:
-
Develop and maintain automation frameworks, tools, and processes to streamline operations and improve efficiency.
-
Champion the adoption of infrastructure as code (IaC) principles for configuration management and deployment automation.
-
Performance Optimization:
-
Analyze system performance and identify opportunities for optimization and efficiency improvements.
-
Implement performance tuning strategies to enhance system reliability and scalability.
-
Monitoring and Observability:
-
Design and implement comprehensive monitoring and observability solutions to proactively identify and address system issues.
-
Utilize advanced monitoring tools and techniques to gain insights into system behavior and performance.
-
Incident Management and Postmortems:
-
Participate in incident management processes, ensuring timely resolution of incidents and minimizing impact on users.
-
Conduct postmortem reviews to identify root causes and implement preventive measures to mitigate future incidents.
-
Capacity Planning and Forecasting:
-
Perform capacity planning and forecasting to anticipate resource requirements and ensure adequate scalability.
-
Develop strategies for optimizing resource utilization and cost-effectiveness.
-
On-call Support and Troubleshooting:
-
Serve on the on-call team, acting as an escalation contact for service incidents.
-
Troubleshoot and resolve issues related to application development, deployment, and operations.
-
Work with Technical Support to troubleshoot customer issues.
-
Collaboration and Agile Support:
-
Work collaboratively with agile software development teams, providing support to developers, QA, and technical support.
-
Collaborate with other team members during our planned scheduled maintenance windows.
-
Customer Account Provisioning:
-
Provision new customer accounts, including handling complex orders in coordination with Progress Sales/Professional Services.
-
Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
-
High-Availability Deployments:
-
Implement automated high-availability deployments, ensuring system reliability and uptime.
-
End-to-End Solution Understanding:
-
Become proficient in understanding how each software component, system design, and configuration are linked to form an end-to-end solution.
Your background:
-
Experience:
-
Proven experience as a Site Reliability Engineer (or similar position) in a production capacity.
-
You understand what it means to operate infrastructure as code and have experience developing services and automation to do so. Chef knowledge would be a plus.
-
You have a great ability to debug and optimize code and automate routine tasks to eliminate toil.
-
You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership, initiative, grit, and drive.
-
You have designed and implemented applications and systems that scale, are resilient to failure, and are observable.
-
Technical Expertise:
-
Strong understanding of Windows, Linux, automation tools (Terraform, Ansible, Chef, or Puppet), Azure/AWS services (ECS, EKS, S3, and more), and scripting languages (Shell, Python, PowerShell, or others).
-
Knowledge of databases (Azure SQL, Postgres/RDS, Graph databases), Service Mesh (Linkerd or Envoy), API gateways, authentication services, 3rd party integrations, and more.
-
Proficient in managing containerized environments using Kubernetes, Docker, and Rancher, along with other related tools and technologies.
-
Security Knowledge:
-
Familiarity with security concepts, including cloud authentication, authorization, web attacks, and environment security.
-
Experience with network concepts, including TCP/IP, HTTP, and TLS.
-
Cloud Experience:
-
Experience with cloud-hosted apps/services (Azure/AWS preferred) and translating business requirements into securely implemented capabilities in the cloud.
-
Education:
-
Bachelor’s degree in computer science, Information Systems, or a related field.
-
Compliance and Communication:
-
Proven ability to adhere to policies, standards, and procedures related to change control and operational best practices.
-
Strong written and verbal communication skills for both technical and non-technical audiences.
-
Flexible and Proactive:
-
Willingness to be flexible in responding to customer issues and ability to identify product/deployment improvements for future mitigation.
-
You are interested in designing, analyzing, and troubleshooting large-scale distributed systems.
-
Regulatory Compliance:
-
Experience with PCI, HIPAA, and SOC2 compliance.
Must be willing to work in US time zone [4:30 PM to 1:30AM IST]
If this sounds like you and fits your experience and career goals, we’d be happy to chat. What we offer in return is the opportunity to experience a great company culture with wonderful colleagues to learn from and collaborate with, and also to enjoy:
Compensation
-
Competitive remuneration package
-
Employee Stock Purchase Plan Enrolment
Vacation, Family, and Health
-
30 days of earned leave
-
An extra day off for your birthday
-
Various other leaves like marriage leave, casual leave, maternity leave, and paternity leave
-
Premium Group Medical Insurance for employees and five dependents, personal accident insurance coverage, and life insurance coverage
-
Professional development reimbursement
-
Interest subsidy on loans - either vehicle or personal loans.
Apply now!
#LI-SR1
#LI-Hybrid