SRE Principle

0 years

0.0 Lacs P.A.

Pune, Maharashtra, India

Posted:2 weeks ago| Platform: Linkedin logo

Apply Now

Skills Required

engineeringservicereliabilitymonitoringsoftwarelatencyefficiencymanagementdrivedesigninstrumentationsupportautomatecollaborativetechnologyonboardingcodedeploymentdevopssplunkdatadogstrategyazureprogrammingautomationpythonansibleterraformkubernetesdockerarchitectureconfigurationchefawspipelinegitlabjenkinstroubleshooting

Work Mode

On-site

Job Type

Full Time

Job Description

SystemsPlus is hiring for a Principal SRE, Exp : 10 to 15 yr. Location : Pune Hybrid. Client's Direct-to-Consumer Engineering team is responsible for creating, maintaining and providing customer service for its branded eCommerce websites. We seek talented individuals that fit into our team-oriented atmosphere and are proud to have an environment that offers the comfort of a true work/life balance. The Principal Site Reliability Engineer will play a lead role in the production environment by monitoring availability and taking a holistic view of system health. They will build software and systems to manage platform infrastructure and applications; improve reliability, quality, and time-to-market of our suite of software solutions; and measure and optimize system performance - all with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve. Responsibilities · Ensure availability, latency, performance, and efficiency of our global ecomm sites · Experience driving change management and incident management · Promote best practices and innovative observability to guide product delivery teams in achieving operational excellence for new product deliveries. · Drive operational excellence and evangelize best practices in observability. · Develop unified observability dashboards and implement E2E observability requirements. · Design innovative observability solutions for internal and external stakeholders. · Contribute to observability instrumentation standards and create repeatable patterns for engineering teams. · Define and implement E2E observability requirements and lead teams to support E2E best practices. · Collaborate with cross-functional teams to achieve objectives and drive high reliability into systems. · Build proprietary tools to mitigate weaknesses in incident management or software delivery. · Implement SRE best practices to increase system reliability and performance. · Automate processes for improved collaborative response and prepare teams for incidents. · Maintain error budgets, meet SLOs, and support uptime and availability of critical platform components. · Automate technology stacks to improve operating costs while responding to traffic spikes. · Location: Pune – Client Office, Mandatory in person – Tu, We, Thu in a week · Work timings: First 3 months in EST to onboarding ramp up, move into IST work timings for 8 hours with a possible 1 hour overlap in the evening with US team in EST (10am to 7pm) Required Skills and Experience: · Bachelor's Degree in Computer Science, Information Science, Engineering, or a related field. · 10+ years of experience in code management, deployment processes, procedures, and tools in a DevOps or SRE role. · Experience with monitoring tools (preferred: Dynatrace, Splunk, Datadog, Grafana, and New Relic). · Proficiency in state-of-the-art observability trends, tools, products, and technologies. · Ability to identify organization-wide gaps in the SRE practice and implement solutions that contribute to organizational transformation. · Experience driving cross-organization adoption of new technologies or initiatives. · Ability to influence senior management in selecting the right strategy, processes, and structures to transform the organization into a modern SRE team. · Proactive in identifying performance bottlenecks, anomalous system behavior, and addressing root causes of service issues. · Passionate about technology with a strong sense of curiosity and a desire to improve processes, automate everything, and continuously learn. · Successful experience supporting a cloud production environment (strong preference for Azure). · Competency in one or more programming languages for automation (Python strongly preferred). · Knowledge of cloud deployment tools and methodologies (ideally Ansible, but Terraform, Azure DevOps, etc. are also considered). · Deep understanding of Kubernetes and Docker architecture and associated tools. · Experience with at least one configuration management solution (e.g., Chef, Ansible, AWS CodeDeploy). · Proficiency with repository and pipeline-related tools (e.g., GitLab, Jenkins, Bamboo, Travis, CircleCI). · Experience with implementing and using various application and infrastructure monitoring tools. · Strong troubleshooting skills. · Ability to take ownership and deliver solutions autonomously. Interested candidate drop CV on vandana.jha@systems-plus.com Show more Show less

Environmental Services
Baden Ontario +4

RecommendedJobs for You

Pune, Maharashtra, India