Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
11 - 15 years
15 - 20 Lacs
Mumbai
Work from Office
Overview: Accountabilities: As a Platform Reliability Engineer, you will be responsible for the evaluation, selection, and deployment of monitoring & observability technologies. You will manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices. You will collaborate with DevOps, CriticalOps and IT leadership teams to understand system requirements and design effective monitoring strategies. You will also develop and implement monitoring solutions for infrastructure, applications, and services. Essential Skills/Experience: Degree level education in computer science, information technology, or a related field Proven experience as a monitoring and observability engineer or a similar role Proficient in developing monitoring capabilities and configuring integration with tools such as Prometheus, Grafana, Splunk, SumoLogic, DataDog, DynaTrace, etc. Strong scripting skills (e.g., Python) for automation in data environments Familiarity with logging, tracing, and APM (Application Performance Monitoring) solutions. Ability to interpret and communicate technical information into business language Working knowledge of Agile Software Development techniques and Methodologies Familiarity with CI/CD pipelines and continuous deployment practices as part of an Agile team Proficient in all aspects of Agile and SaFE (can lead, teach, and run) Excellent problem-solving skills Customer engagement experience Knowledge of data processing frameworks (e.g. Apache Spark) and data storage solutions (e.g. data lakes, warehouses) Experience with data orchestration tools (e.g. Apache Airflow) Understanding of data lineage and metadata management. Good commercial awareness and understanding of the external market Demonstrate initiative, strong customer orientation, and cross-cultural working Excellent communication and interpersonal skills.
Posted 3 months ago
18 - 22 years
50 - 60 Lacs
Bengaluru
Work from Office
We are looking for a leader for our Site Reliability Engineering (SRE), Observability team. As a leader of SRE/Observability you will create compelling Offerings in SRE, Observability and Resiliency for customers and contribute to the business growth. Deliver solutions to our customers and maintain the highest standards and develop and implement Observability and SRE team and offerings for Virtusa. Be a strong thought leader in Site Reliability engineering, Observability, Operational excellence, and DevOps Principles. Strong technical acumen in Cloud Architecture, Observability, Performance Benchmarking, Capacity planning and Reliability tools. Experience in Observability platforms, application monitoring tools and performance analysis techniques. Experience managing & growing technical leaders and teams. Be responsible for building and mentoring a new team of SRE, Observability specialists Strong technical acumen in Cloud Architecture, Observability, Performance Benchmarking, Capacity planning and Reliability tools. KEY QUALIFICATION & EXPERIENCES: 15+ yrs of IT experience with minimum 5 years of experience in SRE/ Observability/ Monitoring tools Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field. Expert level experience in monitoring and logging technologies, both open source and closed source (e.g. AppDynamics, Newrelic, Datadog, Prometheus, Grafana, LogicMonitor, SumoLogic, ELK) Experience in implementing Metrics, Logs and Tracing for E2E observability A working knowledge of systems is needed. Terraform, Ansible, Chef, Puppet, Jenkins, Designing and implementing CI/CD pipelines, Infrastructure provisioning and management Ability to communicate and coordinate with cross-functional engineering teams across multiple geographic regions. Experience with AIOps and machine learning is highly desirable. Experience with other monitoring tools like Prometheus, Grafana, etc. Experience with Observability solutions like Dynatrace, DataDog, Instana etc. is highly desirable Excellent problem-solving and analytical skills. Strong communication and collaboration skills. Ability to work independently and manage multiple projects simultaneously. Knowledge of IT operations concepts and processes, such as monitoring, incident management, root cause analysis, remediation.
Posted 3 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2