Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
5.0 - 9.0 years
0 Lacs
karnataka
On-site
At Arctic Wolf, we are redefining the cybersecurity landscape with a global team committed to setting new industry standards. Our achievements include recognition in prestigious lists such as Forbes Cloud 100, CNBC Disruptor 50, Fortune Future 50, and Fortune Cyber 60, as well as winning the 2024 CRN Products of the Year award. We are proud to be named a Leader in the IDC MarketScape for Worldwide Managed Detection and Response Services and to have earned a Customers" Choice distinction from Gartner Peer Insights. Our Aurora Platform has also been recognized with CRN's Products of the Year award. Join us in shaping the future of security operations. Our mission is to End Cyber Risk, and we are looking for a Senior Developer to contribute to this goal. In this role, you will be part of our expanding Infrastructure teams and work closely with the Observability team. Your responsibilities will include designing, developing, and maintaining solutions to monitor the behavior and performance of R&D teams" workloads, reduce incidents, and troubleshoot issues effectively. We are seeking candidates with operations backgrounds (DevOps/SysOps/TechOps) who have experience supporting infrastructure at scale. If you believe in Infrastructure as Code, continuous deployment/delivery practices, and enjoy helping teams understand their services in real-world scenarios, this role might be a great fit for you. **Technical Responsibilities:** - Design, configure, integrate, deploy, and operate Observability systems and tools to collect metrics, logs, and events from backend services - Collaborate with engineering teams to support services from development to production - Ensure Observability platform meets availability, capacity, efficiency, scalability, and performance goals - Build next-generation observability integrating with Istio - Develop libraries and APIs for a unified interface for developers using monitoring, logging, and event processing systems - Enhance alerting capabilities with tools like Slack, Jira, and PagerDuty - Contribute to building a continuous deployment system driven by metrics and data - Implement anomaly detection in the observability stack - Participate in a 24x7 on-call rotation after at least 6 months of employment **What You Know:** - Minimum of five years of experience - Proficiency in Python or Go - Strong understanding of AWS services like Lambda, CloudWatch, IAM, EC2, ECS, S3 - Solid knowledge of Kubernetes - Experience with tools like Prometheus, Grafana, Thanos, AlertManager, etc. - Familiarity with monitoring protocols/frameworks such as Prometheus/Influx line format, SNMP, JMX, etc. - Exposure to Elastic stack, syslog, CloudWatch Logs - Comfortable with git, Github, and CI/CD approaches - Experience with IAC tools like CloudFormation or Terraform **How You Do Things:** - Provide expertise and guidance on the right way forward - Collaborate effectively with SRE, platform, and development teams - Work independently and seek support when needed - Advocate for automation and code-driven practices Join us if you have expertise in distributed tracing tools, Java, open Observability initiatives, Kafka, monitoring in GCP and Azure, AWS certifications, SQL, and more. At Arctic Wolf, we offer a collaborative and inclusive work environment that values diversity and inclusion. Our commitment to growth and customer satisfaction is unmatched, making us the most trusted name in the industry. Join us on our mission to End Cyber Risk and engage with a community that values unique perspectives and corporate responsibility.,
Posted 2 days ago
8.0 - 10.0 years
0 Lacs
bengaluru, karnataka, india
On-site
Teamwork makes the stream work. Roku is changing how the world watches TV Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we&aposve set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers. From your first day at Roku, you&aposll make a valuable - and valued - contribution. We&aposre a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines. About the role Do you want to help build Rokus next-generation unified cloud-agnostic hosting platform Are you experienced with Terraform, Kubernetes, and Istio Can you write applications and automation in Golang, Python, or Shell Are you interested in being part of a multinational team to design and create the platform If so, this role is for you! About the team The central Infrastructure Engineering team is looking for highly skilled infrastructure and software engineers to help develop and drive Rokus service mesh hosting architecture. Our team is responsible for building and scaling both the Platform (Kubernetes, Istio, Envoy, operators, and more) to affect Rokus transition towards a single, unified, cloud-agnostic system where all teams speak the same infrastructure language. We are engaging with Rokus engineering teams to migrate hundreds of workloads to our common platform, including helping augment and automate CI/CD flows. We are looking for engineers that love working collaboratively across teams to achieve results that impact the entire company. What youll be doing: Help architect, design, build, deploy Rokus next generation service mesh and cloud infrastructure. Contribute to evolving our?deployments?by building solutions using?Docker, Kubernetes, Istio/Envoy, and Terraform. Join in efforts to investigate new technology and tools to be adopted by Roku.? Help build and integrate security as part of the infrastructure. Collaborate on internal customer engagements as we migrate workloads to Kubernetes + Istio + open-source observability tools and technologies. Work closely with the Observability team to integrate and scale existing and new?observability?tools as part of a holistic solution. Work closely with the SRE team to maintain availability of our services and improve onboarding workflows. Mentor other team members to define and adopt new or improve existing processes and procedures. Were excited if you have: Strong hands-on experience in cloud technologies. AWS, ECS, and Kubernetes (EKS, GKE, AKS or other) preferred.?Knowledge of another cloud platform like GCP or Azure is a plus but not required. Demonstrated understanding of overall infrastructure design and developing tools to enable and automate the infrastructure.? Experienced with a high-level scripting language?(such as?Python) and?a?system programming language?(such as Go). Strong experience with Kubernetes. Production experience in testing and deploying applications via modern CI/CD tools and concepts Familiarity with Observability tools like Prometheus, Thanos, Loki, Grafana, etc. The drive and self-motivation to understand the intricate details of a complex infrastructure environment.?? Ability to work independently. Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders.?? Experience with integrating AI tools for improving processes and reducing toil is a plus. Masters degree or equivalent experience (8+ years) You have either tried Gen AI in your previous work or outside of work or are curious about Gen AI and have explored it. Benefits Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Our employees can take time off work for vacation and other personal reasons to balance their evolving work and life needs. It&aposs important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter. The Roku Culture Roku is a great place for people who want to work in a fast-paced environment where everyone is focused on the company&aposs success rather than their own. We try to surround ourselves with people who are great at their jobs, who are easy to work with, and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. We&aposre independent thinkers with big ideas who act boldly, move fast and accomplish extraordinary things through collaboration and trust. In short, at Roku you&aposll be part of a company that&aposs changing how the world watches TV.? We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers, which itself is a two-part idea. We come up with the solution, but the solution isn&apost real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation, one that has served us well since 2002.? To learn more about Roku, our global footprint, and how we&aposve grown, visit https://www.weareroku.com/factsheet. By providing your information, you acknowledge that you have read our Applicant Privacy Notice and authorize Roku to process your data subject to those terms. Show more Show less
Posted 1 week ago
8.0 - 12.0 years
0 Lacs
chennai, tamil nadu
On-site
You should have 8 to 11 years of experience in the field. The job is based in Chennai or Bangalore. Your technical skills should include: - Expertise in Prometheus setup, scaling, and federation. You should have knowledge of Thanos, Cortex, or VictoriaMetrics for long-term storage. Additionally, hands-on experience with PromQL for writing complex queries is required. - Proficiency in Grafana for creating dashboards and integrating with multiple data sources. - In-depth experience with ELK, Splunk, Loki, or similar logging tools, both with query languages and dashboarding. - Hands-on experience managing observability infrastructure in Kubernetes, Docker, or other container technologies. - Proficiency in scripting and automation using Python, Bash, or similar scripting languages. Experience with Infrastructure as Code tools like Terraform or Ansible is preferred.,
Posted 1 week ago
6.0 - 10.0 years
0 Lacs
chennai, tamil nadu
On-site
You are a skilled and motivated Cloud DevOps Engineer with over 6 years of experience, seeking to join a dynamic team in Chennai, Kochi, or Pune. Your primary responsibility will be the commissioning, development, and maintenance of secure and scalable cloud environments on Microsoft Azure. Emphasizing automation, security, and modern DevOps practices, you will play a crucial role in designing cloud architectures, managing CI/CD pipelines, and ensuring system reliability and performance. Your key responsibilities include designing and implementing scalable cloud environments, managing CI/CD pipelines using GitHub Actions, ensuring security and compliance in cloud infrastructure, developing tailored cloud strategies, automating deployment processes, monitoring system health, and collaborating with cross-functional teams to build cloud-native applications. In terms of technical requirements, you must have proven experience with Microsoft Azure services like AKS, KeyVault, Storage Account, EventHub, and Service Bus. Additionally, hands-on expertise with Docker, Kubernetes, familiarity with ArgoCD for continuous deployment, and experience with GitHub Actions for CI/CD workflows are essential. You should also be adept at using monitoring and observability tools such as Prometheus, Grafana, Alertmanager, Thanos, logging and tracing tools like Grafana Loki, Grafana Alloy, Promtail, OpenTelemetry Collector, and Infrastructure as Code (IaC) tools like Terraform and Ansible. A solid understanding of cloud architectures, deployment strategies, and relevant certifications in Azure would be beneficial. To excel in this role, you will need to demonstrate experience working in agile and DevOps-focused environments, strong problem-solving skills, and the ability to troubleshoot complex systems effectively. Excellent communication and documentation abilities are also essential for success in this position.,
Posted 1 week ago
3.0 - 5.0 years
60 - 65 Lacs
mumbai, delhi / ncr, bengaluru
Work from Office
We are seeking a talented and passionate Engineer to design, develop, and enhance our SaaS platform. As a key member of the team, you will work to create the best developer tools, collaborate with designers and engineers, and ensure our platform scales as it grows. The ideal candidate will have strong expertise in backend development, cloud infrastructure, and a commitment to delivering reliable systems. Location-Remote,Delhi NCR,Bangalore,Chennai,Pune,Kolkata,Ahmedabad,Mumbai,Hyderabad
Posted 1 week ago
10.0 - 14.0 years
0 Lacs
pune, maharashtra
On-site
As a Technical Product Manager for the internal Observability & Insights Platform, you will play a crucial role in defining the product strategy, overseeing discovery and delivery, and ensuring that engineers and stakeholders across 350+ services can effectively build, debug, and operate with confidence. Your responsibilities will include owning and evolving a platform encompassing logging (ELK stack), metrics (Prometheus, Grafana, Thanos), tracing (Jaeger), structured audit logs, and SIEM integrations, while competing with high-cost solutions like Datadog and Honeycomb. Your impact will be both technical and strategic, with a focus on enhancing developer experience, reducing operational noise, and driving platform efficiency and cost visibility. Key Deliverables: - Successfully manage and deliver initiatives from the Observability Roadmap / Job Jar, tracked via RAG status and Jira epics. - Conduct structured discoveries for upcoming capabilities such as SIEM exporter, SDK adoption, and trace sampling. - Design and implement scorecards in Port to measure observability maturity across teams. - Ensure feature parity and stakeholder migration in cost-saving initiatives like Datadog and Prometheus. - Track and report platform usage, reliability, and cost metrics aligned with business outcomes. - Drive feature documentation, adoption plans, and enablement sessions across engineering. Jobs To Be Done: - Define and evolve the observability product roadmap covering Logs, Metrics, Traces, SDK, Dashboards, and SIEM. - Lead dual-track agile product discovery for upcoming initiatives, gathering context, defining problems, and validating feasibility. - Collaborate with engineering managers to break down initiatives into quarterly deliverables, epics, and sprint-level execution. - Maintain the Observability Job Jar and present RAG status every 2 weeks with confidence supported by Jira hygiene. - Define and track metrics to measure the success of each platform capability including SLOs, cost savings, and adoption percentage. - Collaborate closely with FinOps, Security, and Platform teams to ensure observability aligns with cost, compliance, and operational goals. - Promote the adoption of SDKs, scorecards, and dashboards through enablement, documentation, and evangelism. Ways Of Working: - Operate in dual-track agile mode, discovering next quarter's priorities while delivering the current quarter's committed outcomes. - Maintain a GPS PRD (Product Requirements Doc) for each major initiative, defining the problem, rationale, and value measurement. - Collaborate deeply with engineers in backlog grooming, planning, demos, and retrospectives. - Follow RAG-based reporting with stakeholders, escalating risks early and presenting mitigation paths clearly. - Operate with full visibility in Jira, driving delivery rhythm across sprints. - Use quarterly Job Jar reviews to recalibrate product priorities, staffing needs, and stakeholder alignment. Requirements: - 10+ years of product management experience, preferably in platform/infrastructure products. - Demonstrated success in managing internal developer platforms or observability tooling. - Experience in launching or migrating enterprise-scale telemetry stacks like Datadog, Prometheus/Grafana, Honeycomb, Jaeger. - Ability to translate complex engineering requirements into structured product plans with measurable outcomes. - Strong technical background in cloud-native environments such as EKS, Kafka, Elasticsearch. - Excellent documentation and storytelling skills, especially to influence engineers and non-technical stakeholders. Success Metrics: - Reduction in Datadog/Honeycomb usage & cost post migration. - Uptime & latency of observability pipelines (Jaeger, ELK, Prometheus). - Scorecard improvement across teams (Bronze, Silver, Gold). - Number of issues detected/resolved using the new observability stack. - Time to incident triage with new tracing/logging capabilities.,
Posted 3 weeks ago
6.0 - 11.0 years
2 - 4 Lacs
Chennai, Tamil Nadu, India
On-site
A deep understanding of Observability Dynatrace preferably (or other tools if they are well versed). Provisioning and setup metric in any observability tool Dynatrace, Prometheus, Thanos, or Grafana, alerts and silences Development work (not just support and running scripts but actual development) done on: Chef (basic syntax, recipes, cookbooks) or Ansible (basic syntax, tasks, playbooks) or Terraform basic syntax and GitLab CI/CD configuration, pipelines, jobs Proficiency in scripting Python, PowerShell, Bash etc. This becomes the enabler for automation. Proposes ideas and solutions within the Infrastructure Department to reduce the workload through automation. Cloud resources provisioning and configuration through CLI/API specially Azure and GCP. AWS experience is also ok Troubleshooting SRE approach, SRE mindset. Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed Improves documentation all around, either in application documentation, or in runbooks, explaining the why, not stopping with the what. Root cause analysis and corrective actions Strong Concepts around Scale & Redundancy for design, troubleshooting, implementation Mid Term Kubernetes basic understanding, CLI, service re-provisioning Operating system (Linux) configuration, package management, startup and troubleshooting System Architecture & Design - Plan, design and execute solutions to reach specific goals agreed within the team. Long Term Block and object storage configuration Networking VPCs, proxies and CDNs At DXC Technology, we believe strong connections and community are key to our success. Our work model prioritizes in-person collaboration while offering flexibility to support wellbeing, productivity, individual work styles, and life circumstances. We re committed to fostering an inclusive environment where everyone can thrive.
Posted 1 month ago
8.0 - 12.0 years
0 Lacs
karnataka
On-site
As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at Cisco ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of the cloud and big data platforms. Your role will involve representing the NADP SRE team, contributing to the technical roadmap, and collaborating with cross-functional teams to design, build, and maintain SaaS systems operating at multi-region scale. Your efforts will be crucial in supporting machine learning (ML) and AI initiatives by ensuring the platform infrastructure is robust, efficient, and aligned with operational excellence. You will be tasked with designing, building, and optimizing cloud and data infrastructure to guarantee high availability, reliability, and scalability of big-data and ML/AI systems. This will involve implementing SRE principles such as monitoring, alerting, error budgets, and fault analysis. Additionally, you will collaborate with various teams to create secure and scalable solutions, troubleshoot technical problems, lead the architectural vision, and shape the technical strategy and roadmap. Your role will also encompass mentoring and guiding teams, fostering a culture of engineering and operational excellence, engaging with customers and stakeholders to understand use cases and feedback, and utilizing your strong programming skills to integrate software and systems engineering. Furthermore, you will develop strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices. To be successful in this role, you should have relevant experience (8-12 yrs) and a bachelor's engineering degree in computer science or its equivalent. You should possess the ability to design and implement scalable solutions, hands-on experience in Cloud (preferably AWS), Infrastructure as Code skills, experience with observability tools, proficiency in programming languages such as Python or Go, and a good understanding of Unix/Linux systems and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure is essential, along with a sense of ownership and accountability in architecting software and infrastructure at scale. Additional qualifications that would be advantageous include experience with the Hadoop Ecosystem, certifications in cloud and security domains, and experience in building/managing a cloud-based data platform. Cisco encourages individuals from diverse backgrounds to apply, as the company values perspectives and skills that emerge from employees with varied experiences. Cisco believes in unlocking potential and creating diverse teams that are better equipped to solve problems, innovate, and make a positive impact.,
Posted 1 month ago
3.0 - 8.0 years
6 - 12 Lacs
Gurugram
Work from Office
Location: NCR Team Type: Platform Operations Shift Model: 24x7 Rotational Coverage / On-call Support (L2/L3) Team Overview The OpenShift Container Platform (OCP) Operations Team is responsible for the continuous availability, health, and performance of OpenShift clusters that support mission-critical workloads. The team operates under a tiered structure (L2, L3) to manage day-to-day operations, incident management, automation, and lifecycle management of the container platform. This team is central to supporting stakeholders by ensuring the container orchestration layer is secure, resilient, scalable, and optimized. L2 OCP Support & Platform Engineering (Platform Analyst) Role Focus: Advanced Troubleshooting, Change Management, Automation Experience: 3–6 years Resources : 5 Key Responsibilities: Analyze and resolve platform issues related to workloads, PVCs, ingress, services, and image registries. Implement configuration changes via YAML/Helm/Kustomize. Maintain Operators, upgrade OpenShift clusters, and validate post-patching health. Work with CI/CD pipelines and DevOps teams for build & deploy troubleshooting. Manage and automate namespace provisioning, RBAC, NetworkPolicies. Maintain logs, monitoring, and alerting tools (Prometheus, EFK, Grafana). Participate in CR and patch planning cycles. L3 – OCP Platform Architect & Automation Lead (Platform SME) Role Focus: Architecture, Lifecycle Management, Platform Governance Experience: 6+ years Resources : 2 Key Responsibilities: Own lifecycle management: upgrades, patching, cluster DR, backup strategy. Automate platform operations via GitOps, Ansible, Terraform. Lead SEV1 issue resolution, post-mortems, and RCA reviews. Define compliance standards: RBAC, SCCs, Network Segmentation, CIS hardening. Integrate OCP with IDPs (ArgoCD, Vault, Harbor, GitLab). Drive platform observability and performance tuning initiatives. Mentor L1/L2 team members and lead operational best practices. Core Tools & Technology Stack Container Platform: OpenShift, Kubernetes CLI Tools: oc, kubectl, Helm, Kustomize Monitoring: Prometheus, Grafana, Thanos Logging: Fluentd, EFK Stack, Loki CI/CD: Jenkins, GitLab CI, ArgoCD, Tekton Automation: Ansible, Terraform Security: Vault, SCCs, RBAC, NetworkPolicies
Posted 1 month ago
8.0 - 12.0 years
0 Lacs
karnataka
On-site
As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of cloud and big data platforms. Your role will involve representing the NADP SRE team, working in a dynamic environment, and providing technical leadership in defining and executing the team's technical roadmap. Collaborating with cross-functional teams, including software development, product management, customers, and security teams, is essential. Your contributions will directly impact the success of machine learning (ML) and AI initiatives by ensuring a robust and efficient platform infrastructure aligned with operational excellence. In this role, you will design, build, and optimize cloud and data infrastructure to ensure high availability, reliability, and scalability of big-data and ML/AI systems. Collaboration with cross-functional teams will be crucial in creating secure, scalable solutions that support ML/AI workloads and enhance operational efficiency through automation. Troubleshooting complex technical problems, conducting root cause analyses, and contributing to continuous improvement efforts are key responsibilities. You will lead the architectural vision, shape the team's technical strategy and roadmap, and act as a mentor and technical leader to foster a culture of engineering and operational excellence. Engaging with customers and stakeholders to understand use cases and feedback, translating them into actionable insights, and effectively influencing stakeholders at all levels are essential aspects of the role. Utilizing strong programming skills to integrate software and systems engineering, building core data platform capabilities and automation to meet enterprise customer needs, is a crucial requirement. Developing strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices is also part of the role. Qualifications for this position include 8-12 years of relevant experience and a bachelor's engineering degree in computer science or its equivalent. Candidates should have the ability to design and implement scalable solutions with a focus on streamlining operations. Strong hands-on experience in Cloud, preferably AWS, is required, along with Infrastructure as a Code skills, ideally with Terraform and EKS or Kubernetes. Proficiency in observability tools like Prometheus, Grafana, Thanos, CloudWatch, OpenTelemetry, and the ELK stack is necessary. Writing high-quality code in Python, Go, or equivalent programming languages is essential, as well as a good understanding of Unix/Linux systems, system libraries, file systems, and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure, architecting software and infrastructure at scale, and certifications in cloud and security domains are beneficial qualifications for this role. Cisco emphasizes diversity and encourages candidates to apply even if they do not meet every single qualification. Diverse perspectives and skills are valued, and Cisco believes that diverse teams are better equipped to solve problems, innovate, and create a positive impact.,
Posted 1 month ago
3.0 - 5.0 years
60 - 65 Lacs
Mumbai, Delhi / NCR, Bengaluru
Work from Office
We are seeking a talented and passionate Engineer to design, develop, and enhance our SaaS platform. As a key member of the team, you will work to create the best developer tools, collaborate with designers and engineers, and ensure our platform scales as it grows. The ideal candidate will have strong expertise in backend development, cloud infrastructure, and a commitment to delivering reliable systems. Location-Remote,Delhi NCR,Bangalore,Chennai,Pune,Kolkata,Ahmedabad,Mumbai,Hyderabad
Posted 3 months ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
64580 Jobs | Dublin
Wipro
25801 Jobs | Bengaluru
Accenture in India
21267 Jobs | Dublin 2
EY
19320 Jobs | London
Uplers
13908 Jobs | Ahmedabad
Bajaj Finserv
13382 Jobs |
IBM
13114 Jobs | Armonk
Accenture services Pvt Ltd
12227 Jobs |
Amazon
12149 Jobs | Seattle,WA
Oracle
11546 Jobs | Redwood City