Jobs
Interviews

1633 Grafana Jobs - Page 31

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 10.0 years

9 - 13 Lacs

Pune

Work from Office

The Frontline Product Support team provides L1 & L2 support for multiple critical applications. This role involves addressing issues reported or escalated by users or the Level 1 support team, monitoring applications for potential problems, and proactively resolving them. You will manage high-severity incidents, either independently or in collaboration with other teams, to ensure swift and effective resolution. Operating in a 24x7 environment, the team offers continuous support across all time zones, ensuring the reliability and stability of essential applications. Key Responsibilities: Diagnose, troubleshoot, and resolve complex issues across systems and applications. Managing daily workload to users receive the best possible service, always being aware of SLAs and issues impacting live services. Delivery of L1 and L2 application support services to client users to agreed Service Level Agreements. Manage high-severity incidents, minimizing downtime and coordinating with key stakeholders. Demonstrate strong problem-solving skills to diagnose and fix complex issues across various systems and applications. Ability to perform deep dives into logs, databases, and system metrics to determine the underlying cause of issues. Perform proactive monitoring and address alerts before escalation. Utilize monitoring tools to predict and prevent potential issues. Perform in-depth analysis to identify the root cause of recurring issues and provide recommendations for permanent fixes. Collaborate effectively with other teams, such as development, operations, and L3 support to resolve complex issues or deploy fixes. Engage with customers for in-depth technical discussions, particularly in resolving complex issues. Participate in post-mortem reviews to help improve future incident response. Maintain and update runbooks and troubleshooting documentation. Explain technical issues and resolutions clearly to non-technical stakeholders. Handle multiple tickets and incidents concurrently, especially during critical situations. Required Skills & Qualifications: Strong understanding of retail media support services and workflows. Excellent troubleshooting and analytical skills for diagnosing complex issues. Experience in ITIL-based support environments with strict SLA/OLA adherence. Experience in delivering exceptional, customer focused and service driven support delivery. Proficiency in ticketing systems like JIRA, ServiceNow, and ZohoDesk. Advanced SQL skills and experience with database tools (Oracle, PostgreSQL, SQL Developer, pgAdmin). Basic knowledge of IIS, Linux, and Windows server environments. Familiarity with cloud platforms (Azure, Google Cloud). Strong communication skills to explain technical details to non-technical audiences. Ability to work in 24x7 shifts, including night shifts and on-call rotations. Hands-on experience with monitoring tools such as Grafana, New Relic, and App Dynamics. Self-motivated, autonomous, detail oriented, passionate about delivering high quality services. Good general understanding of Retail Media platforms and products. Qualifications Bachelors in Computer Science Job Location

Posted 3 weeks ago

Apply

7.0 - 12.0 years

15 - 25 Lacs

Bengaluru

Work from Office

Preferred candidate profile DevOps & Cloud Infrastructure Engineer to lead the design, implementation, and optimization of scalable, secure, and cost-effective infrastructure in Microsoft Azure . The ideal candidate will have deep expertise in Kubernetes , Docker , Terraform , CI/CD , and monitoring tools like Datadog, Grafana , Prometheus along with experience in SonarQube setup , Azure AD integration , and multi-tenancy architecture . Key Responsibilities: Design and implement scalable, secure, and cost-efficient infrastructure on Azure . Set up and manage Kubernetes clusters and containerized applications using Docker . Automate infrastructure provisioning using Terraform . Build and maintain robust CI/CD pipelines for continuous integration and deployment. Design and implement multi-tenancy architecture to support multiple clients or business units securely and efficiently

Posted 3 weeks ago

Apply

10.0 - 15.0 years

30 - 42 Lacs

Hyderabad

Work from Office

Responsibilities: * Design, develop, test & maintain software solutions using React.js, Node.js, PostgreSQL & Grafana. * Collaborate with cross-functional teams on IoT projects utilizing MQTT protocol.

Posted 3 weeks ago

Apply

15.0 - 20.0 years

5 - 10 Lacs

Hyderabad

Work from Office

Project Role : DevOps Engineer Project Role Description : Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats. Must have skills : DevSecOps Good to have skills : Google Cloud Platform Architecture, Microsoft Azure Infrastructure as Code (IaC)Minimum 7.5 year(s) of experience is required Educational Qualification : 15 years full time education Summary :As a DevsecOps Engineer, you will be responsible for building and setting up new development tools and infrastructure. A typical day involves utilizing your knowledge in continuous integration, delivery, and deployment, as well as cloud technologies and container orchestration. You will also focus on ensuring that systems are secure against potential threats while collaborating with various teams to enhance the development process and improve overall efficiency. Roles & Responsibilities:- Expected to be an SME.- Collaborate and manage the team to perform.- Responsible for team decisions.- Engage with multiple teams and contribute on key decisions.- Provide solutions to problems for their immediate team and across multiple teams.- Facilitate knowledge sharing sessions to enhance team capabilities.- Monitor and optimize CI/CD pipelines for performance and security.- Oversee the development, maintenance, and testing of Hashicorp Terraform modules for infrastructure as code (IaC)- Ensure the design, implementation, and management of Sentinel policies as code to enforce security and compliance standards- Collaborate with cross-functional teams to integrate security practices into the CI/CD pipeline- Drive the automation of infrastructure provisioning, configuration management, and application deployment processes- Monitor and troubleshoot infrastructure and application issues, ensuring high availability and performance- Conduct regular security assessments and audits to identify vulnerabilities and implement remediation measures- Stay up to date with the latest industry trends, tools, and best practices in DevSecOps, Terraform, and Sentinel- Foster a culture of continuous improvement, innovation, and collaboration within the team- Develop and implement strategies to enhance the team's efficiency, productivity, and overall performance- Report on team progress, challenges, and achievements to senior management Professional & Technical Skills: - Must To Have Skills: Proficiency in DevSecOps.- Good To Have Skills: Experience with Google Cloud Platform Architecture, Microsoft Azure Infrastructure as Code (IaC).- Strong understanding of continuous integration and continuous deployment methodologies.- Experience with container orchestration tools such as Kubernetes or Docker Swarm.- Familiarity with security best practices in software development and deployment.- Proven experience in a leadership role within a DevSecOps or similar environment- Strong expertise in Hashicorp Terraform and infrastructure as code (IaC) principles- Proficiency in developing and managing Sentinel policies as code- Experience with CI/CD tools such as GitHub, GitHub Actions, Jenkins, and JFrog Platform- Solid understanding of cloud platforms, specifically Google Cloud Platform (GCP) and Microsoft Azure- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration.- Familiarity with security frameworks and compliance standards (e.g., NIST, ISO 27001).- Certifications in Terraform, GCP, or Azure (e.g., HashiCorp Certified:Terraform Associate, Google Cloud Professional Cloud Architect, Microsoft Certified:Azure Solutions Architect Expert).- Experience with scripting languages (Python, Bash, PowerShell).- Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack). Additional Information:- The candidate should have minimum 7.5 years of experience in DevSecOps.- This position is based at our Hyderabad office.- A 15 years full time education is required. Qualification 15 years full time education

Posted 3 weeks ago

Apply

7.0 - 12.0 years

10 - 20 Lacs

Hyderabad, Chennai, Bengaluru

Hybrid

Dear candidate, Greetings from Wipro!!! We are hiring Devops SRE with python scripting -Bangalore/Hyderabad/Chennai. Exp: 7 to 15 years. Job location: Bangalore and Hyderabad, Chennai Note: pls share only who can join in 0 to 15 days. JD SRE - Very good in Unix, Jenkins and Scripting python. Should be proficient in creating Workflows in Jenkins and Ansible playbooks Should have understanding Monitoring Tools like Grafana, Splunk, Epic and Inginx Should be able to understand of Databases like MySQL/Oracle/Cassandra Very good in DevOps process and troubleshooting Issues Experience in Production Deployment and On-Call Support. Good to have knowledge in Spinnaker Excellent Analytical, Troubleshooting and problem-solving skills Experience in solving problems and working with a team to resolve large scale production environment issues. To drive the team during Production Maintenances, Outages and Load test activities. Please share profile to kasturi.mettin@wipro.com with below details. Total exp: Rel: CTC:: ECTC:: NP: Current Location: Pref Location: Interview Time: Thanks, Kasturi Mettin kasturi.mettin@wipro.com

Posted 3 weeks ago

Apply

2.0 - 5.0 years

0 - 1 Lacs

Pune

Remote

Dynatrace (On-prem & SaaS) Python Coding SLI/SLO/SLA – setup, tracking, reporting Open Telemetry & Instrumentation: Knowledge of logging, tracing, metrics collection AWS Services: CloudWatch, X-Ray, Lambda Red Hat OpenShift on AWS Grafana

Posted 3 weeks ago

Apply

3.0 - 6.0 years

10 - 18 Lacs

Gurugram

Hybrid

Interested candidates can directly apply via below link: https://jobs.amdocs.com/careers/job/563431001831462 Required Technical Competencies: Working knowledge of Microsoft tools like Office, Word, Excel. Working knowledge of incident management tool like Jira and monitoring and logs analysis tools like Splunk, Argos , Grafana , SOAP UI will be an advantage. ITIL/ITSM knowledge and certification would be an added advantage. Having exposure to telecom domain. Excellent Communication Skills. Willingness to learn drive issues towards resolution. Infrastructure Background: Experience in managing server deployments, ensuring server health, and monitoring certificate validity. Proficiency in configuration management and troubleshooting infrastructure-related issues. Strong understanding of log analysis using tools like Splunk, Argos, Grafana or similar logging solutions. Ability to perform advanced triaging by analyzing logs to identify root causes of infrastructure issues. Proficiency in manual testing for rapid issue verification and basic sanity flow checks, utilizing tools like Postman/curl for API testing. Experience in working in ambiguous situations, working under pressure, and flexible work hours (across multiple time zones) Required Behavioral Competencies : Effective Communication & Stakeholder Management: Ability to independently lead war-room discussions with multiple stakeholders and provide rapid, clear responses to customer queries. Adaptability & Resilience: Ability to work effectively in ambiguous situations, under pressure, and with flexible work hours. Sense of Urgency & Ownership: Production-oriented with a strong sense of urgency and sensitivity to production requirements. Analytical Thinking: Good analytical skills , coupled with the ability to systematically approach and resolve complex problems. Collaboration & Teamwork: Ability to work effectively within a team environment, fostering cooperation and knowledge sharing. Incident management often requires coordinated efforts across multiple teams. Proactive Learning & Continuous Improvement: Demonstrated commitment to learning from incidents, identifying areas for improvement, and implementing changes to prevent recurrence. Decision-Making & Judgment: Ability to make sound decisions under pressure, often with limited information. This includes prioritizing tasks and determining the best course of action.

Posted 4 weeks ago

Apply

2.0 - 6.0 years

3 - 8 Lacs

Bengaluru

Work from Office

2+ years of hands-on experience in DevOps, with strong expertise in infrastructure automation and cloud-native technologies. Proficient in Terraform for infrastructure provisioning and Argo CD for GitOps-based continuous deployment. Solid understanding of cloud platforms including GCP , AWS , and Azure . Azure experience is a strong plus . Must have experience in setting up and managing monitoring and alerting using tools like Prometheus and Grafana . Responsible for ensuring high system uptime , continuous monitoring, and timely detection and notification of system anomalies. Collaborate with product managers to define and execute the DevOps roadmap for Saleskens services. Drive end-to-end execution of DevOps projects and report on progress and system health at an executive level. Design, implement, and enhance CI/CD pipelines to support reliable and frequent deployments. Perform root cause analysis of operational issues and work closely with development teams to implement fixes and improvements. Manage capacity planning and lead infrastructure enhancement projects, including design, budgeting, and execution. Build and maintain platforms for log processing , metrics collection , and data visualization to support observability and performance tracking. Cloud certifications are a plus.

Posted 4 weeks ago

Apply

10.0 - 20.0 years

20 - 27 Lacs

Pune

Work from Office

Manage and optimize performance of critical systems, networks, applications & business processes. Monitor, analyze, troubleshoot to ensure high availability & operational efficiency Industry-standard tools like Dynatrace, Datadog, New Relic, Grafana Required Candidate profile Cloud platforms (AWS, Azure) scripting languages (Python, Bash) Performance Monitoring, Troubleshooting, Data Analysis, Cloud Platforms, System Optimization, Reporting, Automation, Cybersecurity

Posted 4 weeks ago

Apply

5.0 - 10.0 years

11 - 16 Lacs

Pune

Work from Office

What You'll Do We are looking for experienced Machine Learning Engineers with a background in software development and a deep enthusiasm for solving complex problems. You will lead a dynamic team dedicated to designing and implementing a large language model framework to power diverse applications across Avalara. Your responsibilities will span the entire development lifecycle, including conceptualization, prototyping and delivery of the LLM platform features. You will have a blend of technical skills in the fields of AI & Machine Learning especially with LLMs and a deep-seated understanding of software development practices where you'll work with a team to ensure our systems are scalable, performant and accurate. You will be reporting to Senior Manager, AI/ML. What Your Responsibilities Will Be We are looking for engineers who can think quick and have a background in implementation. Your responsibilities will include: Build on top of the foundational framework for supporting Large Language Model Applications at Avalara Experience with LLMs - like GPT, Claude, LLama and other Bedrock models Leverage best practices in software development, including Continuous Integration/Continuous Deployment (CI/CD) along with appropriate functional and unit testing in place. Inspire creativity by researching and applying the latest technologies and methodologies in machine learning and software development. Write, review, and maintain high-quality code that meets industry standards. Lead code review sessions, ensuring good code quality and documentation. Mentor junior engineers, encouraging a culture of collaboration. Proficiency in developing and debugging software with a preference for Python, though familiarity with additional programming languages is valued and encouraged. What You'll Need to be Successful Bachelor's/Master's degree in computer science with 5+ years of industry experience in software development, along with experience building Machine Learning models and deploying them in production environments. Proficiency working in cloud computing environments (AWS, Azure, GCP), Machine Learning frameworks, and software development best practices. Work with technological innovations in AI & ML(esp. GenAI) Experience with design patterns and data structures. Good analytical, design and debugging skills. Technologies you will work with: Python, LLMs, MLFlow, Docker, Kubernetes, Terraform, AWS, GitLab, Postgres, Prometheus, Grafana

Posted 4 weeks ago

Apply

3.0 - 8.0 years

5 - 6 Lacs

Gurugram

Work from Office

1.Develop, maintain and update IIOT applications for manufacturing plants. 2.Create user-friendly web interfaces using PHP, HTML, CSS, JavaScript, and Bootstrap. 3.Experience in MySQL 4.Exp in PLC systems 5.Exp in RBAC (Role-Based Access Control) Required Candidate profile knowledge of Python Troubleshoot, debug and improve application performance Basic knowledge of Python, especially in the context of IIOT and automation. Front end back end with integration on machines

Posted 4 weeks ago

Apply

8.0 - 12.0 years

27 - 31 Lacs

Pune

Work from Office

Position Summary: We are looking for a skilled Solutions Architect to support sales teams in designing and delivering cloud-native solutions using Kubernetes, Cilium, and eBPF. You'll lead technical discovery, PoCs, and customer enablement, while collaborating with teams across Gruve, Cisco, and Isovalent to drive adoption and provide field insights. Key Responsibilities: Sales Engagement & Technical Discovery Partner with Sales teams to qualify opportunities and conduct deep technical discovery sessions with enterprise and service provider customers. Understand customer pain points in networking, observability, and security, particularly in Kubernetes-based environments. Solution Architecture & PoCs Design and deliver tailored solutions based on Cilium, eBPF, Kubernetes, and related cloud-native technologies. Lead hands-on demonstrations and proof-of-concepts (PoCs), articulating the technical advantages and business outcomes. Assist in crafting architecture documents, bill of materials (BoMs), and migration plans. Technical Enablement & Support Present complex technical concepts to a variety of audiences including engineers, architects, and executives. Work closely with Gruve, Cisco, and Isovalent teams to enable partner and customer success. Support RFP/RFI responses and participate in technical workshops and events. Collaboration & Feedback Loop Serve as a trusted technical advisor to both Cisco and Isovalent during the sales cycle. Provide field insights and customer feedback to Gruve’s internal team to shape enablement, documentation, and solution accelerators. Basic Qualifications: B.E. / B.Tech Degree / Master's Degree required 6-10 years of experience in pre-sales engineering , solutions architecture , or technical consulting roles. Strong understanding of Kubernetes , cloud-native networking , service mesh , and cloud security . Hands-on familiarity with Cilium , eBPF , and cloud-native observability tooling (e.g., Prometheus, Grafana). Experience supporting or partnering with Cisco , especially in the AppDynamics , ThousandEyes , or Intersight K8s ecosystem, is a strong advantage. Excellent communication and presentation skills. Able to work independently across a distributed team and manage stakeholders in a matrixed environment. Willingness to travel across the APAC region for key customer engagements and PoCs. Preferred Qualifications: Certifications such as CKA , CKAD , or CCNP/CCIE (Data Center or Security) . Experience in service provider environments or with large-scale enterprise customers. Prior contributions to open-source networking or Kubernetes-related projects.

Posted 4 weeks ago

Apply

3.0 - 6.0 years

15 - 25 Lacs

Bengaluru

Work from Office

The Opportunity Are you a self-starter with a strong background in UI development, automation, and cloud technologies, who thrives in a collaborative environment? If so, youll find an exciting opportunity on our team, where youll engage in innovative projects, deliver impactful demos, and work closely with diverse experts to drive real-world customer outcome solutions. This team strives to promote continuous learning and growth in a flexible and supportive culture. About the Team The team for this role is part of the Solutions & Performance Engineering organization within R&D at Nutanix, a global organization which operates out of various geographic locations. The team is known for its collaborative culture, where innovation and continuous learning are highly valued. The mission of the Solutions & Performance Engineering team is to engage customers on their technological and business challenges and leverage advanced technologies to develop impactful solutions, and provide efficient, seamless automation processes for clients worldwide. Your Role We are seeking a highly skilled Front-End Engineer to design, build, and optimize user interfaces with a focus on scalability and efficiency , that empower our engineering teams with deep insights into system performance. This role is ideal for someone with strong React.js expertise, a passion for building high-performing UIs, and a problem-solving mindset. Youll work closely with backend engineers and infrastructure teams to develop dashboards, integrate with APIs, and automate the visualization of complex data. Your work will help drive decisions, detect performance regressions, and streamline infrastructure automation workflows . 1. UI/UX Design & Front-End Development Build scalable and responsive front-end applications using React.js . Optimize UI/UX by managing cookies, caching , and performance tuning for large-scale apps (1,000+ pages). Revamp and modernize legacy front-end codebases for better maintainability and performance. Integrate with microservices-based backend architectures to ensure seamless data flow. Collaborate with design teams to create intuitive and visually appealing user interfaces. 2. Data Visualization & Insights Generation Develop interactive dashboards to visualize system performance trends and analytics. Work with APIs and performance benchmarks to translate backend data into actionable visual insights. Collaborate with backend engineers to define and optimize API contracts for UI needs. Utilize tools like Figma for UI design and translate wireframes into high-quality front-end components. What You Will Bring Required Skills & Experience: Proficiency in React.js , JavaScript, and front-end architecture. Strong experience with UI/UX design principles and tools such as Figma . Familiarity with REST APIs and microservices integration. Version control with Git ; experience in CI/CD pipelines , Docker , and Kubernetes . Experience building UIs that scale and perform efficiently under large data loads. Soft Skills & Qualities: Problem Solver: Can troubleshoot complex issues and design innovative, scalable solutions. Effective Communicator: Comfortable explaining technical concepts to both engineers and non-technical stakeholders. Team Player: Works well across teams and contributes to a collaborative, solution-oriented environment. Self-Starter: Independent learner who adapts quickly to new technologies and challenges. Detail-Oriented: Produces high-quality, efficient, and reliable code. Accountable: Takes ownership of tasks and delivers end-to-end solutions. Organized: Strong time management and prioritization skills in fast-paced environments. Preferred / Bonus Skills: Experience with distributed systems and cloud-native architectures . Familiarity with observability tools (e.g., Prometheus, Grafana, Loki, Jaeger, ELK stack). Background in cloud infrastructure automation using AWS, Azure, GCP, or OpenStack. Hands-on with infrastructure as code and workload orchestration tools like Terraform , Ansible , or Kubernetes

Posted 4 weeks ago

Apply

3.0 - 8.0 years

6 - 16 Lacs

Bengaluru

Hybrid

Dear candidate, Greetings of the day from Innova solutions. We have a opening for a Java Fullstack for Bangalore location (Hybrid) Number of openings 20 Profile:- JavaFullstack Skills:- Java + Angular + AWS / Azure (Must) + DOcker + Kubernetes + Logging & Monitoring tools Experience:- 2-8 Years Location:- Bangalore (Hybrid) Budget:- Open (Case to case) Interview Mode:- Face to face (7Th July'25 OR 8th JUly'25 _ Monday - Tuesday) Number of rounds :- 1 (Technical) + HR Interview Location:- Bellandur Need local Candidates (Bangalore based) If you are interested, please share your updated cv on reena.gupta@innovasolutions.com Thanks

Posted 4 weeks ago

Apply

10.0 - 17.0 years

30 - 40 Lacs

Bengaluru

Hybrid

Staff Engineer/Tech Lead AI/ML [ Natural Language Processing, Transformers, Gen AI, LLM, Neural Networks] The Opportunity As a Staff Engineer (MTS-6) , you will own the architecture and AI/ML systems that power both log and metrics analysis , enabling automated diagnostics and reducing triage time for QA failures, regression runs, and customer issues. Youll also help define and drive the central AI charter at Nutanix, building reusable components, model infrastructure, and scalable ML services. About the Team The Panacea team has a passionate set of engineers across India and US office. We move fast, collaborate closely, and care deeply about quality and ownership. Our mission is to deliver AI/ML-powered developer productivity tools that solve real engineering and support pain points at scale. Why Join Us Build AI-first observability tools that redefine how engineers triage and troubleshoot. Own systems that reduce hours of manual work in engineering and SRE workflows . Collaborate with a tight-knit team of high-ownership engineers who are passionate about impact and innovation. Hybrid work model that supports flexibility and deep focus. Help shape the central AI charter at Nutanix and influence future AI products across the company. Your Role AI-Powered Observability Platform : Own the vision, architecture, and delivery of Panaceas ML-based log and metrics analyzer that reduces triage time and improves engineering efficiency. AI/ML-powered Log Analyzer Tool : Use deep learning (e.g., ModernBERT ) to represent log messages, detect anomalies, group patterns, and surface actionable insights to users. Metrics Anomaly Detection Engine : Build robust ML models to detect anomalies in time-series metrics like CPU, memory, disk I/O, network traffic, service health , and moreautomatically identifying performance degradation or system regressions across distributed environments. Auto-RCA Engine : Combine log and metrics signals with graph-based correlation and LLM-powered summarization to automatically diagnose the root cause of system failures. Feedback Loop & Continuous Learning : Build infrastructure for incorporating user feedback to continuously retrain and improve anomaly detection systems. LLM Integration : Integrate LLMs for user queries, problem summarization, anomaly explanation, and contextual recommendations. Central AI Charter : Contribute to Nutanixs foundational AI platform by defining shared tooling, datasets, governance, and reusable ML components across products. Responsibilities Architect and scale ML pipelines for real-time and batch-based anomaly detection in both logs and time-series metrics. Build and fine-tune ModernBERT and other transformer-based models for log understanding, anomaly classification, and summarization. Develop unsupervised and semi-supervised ML models for detecting anomalies in system metrics (CPU, memory, network throughput, latency, etc.). Implement correlation models to connect anomalies across logs and metrics to form a cohesive RCA narrative. Own the entire ML lifecycle: data ingestion, feature extraction, model training, evaluation, deployment, and monitoring. Build explainable AI systems that increase adoption and trust within engineering, QA, and support teams. Collaborate with cross-functional stakeholders (SRE, QA, Dev) to deeply understand pain points and translate them into intelligent tooling. Drive technical excellence through code and design reviews, mentoring, and setting engineering best practices. What You Will Bring Educational Background : B.Tech/M.Tech in Computer Science, Machine Learning, AI, or related fields. Experience : 12+ years of engineering experience , including designing , developing and deploying AI/ML systems at scale. ML Expertise : Strong in time-series anomaly detection, statistical modeling, supervised/unsupervised learning. Experience building ML models for metrics data (CPU, memory, IOPS, network, etc.) using models like Isolation Forest, Prophet, LSTM, or deep autoencoders. Expertise in NLP using ModernBERT, BERT, or log classification, clustering, and summarization. Experience with LLMs for downstream tasks like summarization, root cause reasoning, or intelligent Q&A. Engineering Skills : Strong Python background, hands-on with ML libraries (PyTorch, TensorFlow, Scikit-learn), time-series frameworks, and MLOps tools. Familiar with data pipelines and serving models. Observability Knowledge : Hands-on with logs, metrics, traces, and popular monitoring tools (e.g., Prometheus, Grafana, ELK). Leadership : Ability to independently drive projects from requirements to delivery, mentor junior engineers, and deliver business impact. Work Arrangement Hybrid: This role operates in a hybrid capacity, blending the benefits of remote work with the advantages of in-person collaboration. For most roles, that will mean coming into an office a minimum of 2 - 3 days per week, however certain roles and/or teams may require more frequent in-office presence. Additional team-specific guidance and norms will be provided by your manager.

Posted 4 weeks ago

Apply

10.0 - 20.0 years

10 - 20 Lacs

Hyderabad, Chennai, Bengaluru

Work from Office

Job Description: Cloud Infrastructure & Deployment Design and implement secure, scalable, and highly available cloud infrastructure on GCP. Provision and manage compute, storage, network, and database services. Automate infrastructure using Infrastructure as Code (IaC) tools such as Terraform or Deployment Manager. Architecture & Design Translate business requirements into scalable cloud solutions. Recommend GCP services aligned with application needs and cost optimization. Participate in high-level architecture and solution design discussions. DevOps & Automation Build and maintain CI/CD pipelines (e.g., using Cloud Build, Jenkins, GitLab CI). Integrate monitoring, logging, and alerting (e.g., Stackdriver / Cloud Operations Suite). Enable autoscaling, load balancing, and zero-downtime deployments. Security & Compliance Ensure compliance with security standards and best Migration & Optimization Support cloud migration projects from on-premise or other cloud providers to GCP. Optimize performance, reliability, and cost of GCP workloads. Documentation & Support Maintain technical documentation and architecture diagrams. Provide L2/L3 support for GCP-based services and incidents. Required Skills and Qualifications: Google Cloud Certification Associate Cloud Engineer or Professional Cloud Architect/Engineer Hands-on experience with GCP services (Compute Engine, GKE, Cloud SQL, BigQuery, etc.) Strong command of Linux , shell scripting , and networking fundamentals Proficiency in Terraform , Cloud Build , Cloud Functions , or other GCP-native tools Experience with containers and orchestration – Docker, Kubernetes (GKE) Familiarity with monitoring/logging – Cloud Monitoring , Prometheus , Grafana Understanding of IAM , VPCs , firewall rules , service accounts , and Cloud Identity

Posted 4 weeks ago

Apply

5.0 - 10.0 years

20 - 35 Lacs

Bengaluru

Remote

Role Role : Site Reliability Engineer (SRE) Location : Remote Work Hours : US Working Hours (Weekends on Rotation Basis) Upsmart Solutions At Upsmart Solutions, were focused on delivering high-performing digital solutions backed by strong engineering teams. Were looking for a skilled and proactive Site Reliability Engineer (SRE) to support and enhance the performance of systems that impact thousands of users on both buyer and seller sides. This role is ideal for someone with prior experience in high-traffic e-commerce and/or video platforms like Twitch, Whatnot, etc. You will collaborate with cross-functional teams to troubleshoot issues, build reliable systems, and maintain high availability. A strong background in Java and NodeJS is essential, along with excellent communication skills and a customer-first mindset. Objectives of this role: Ensure high availability, reliability, and performance of production systems. Handle escalated technical issues impacting users and vendors, driving quick and lasting resolutions. Collaborate with Engineering teams to improve observability, alerting, and system robustness. Own incident management, postmortems, and RCA documentation. Continuously improve automation for monitoring, deployment, and infrastructure. Key Responsibilities: Monitor system performance and troubleshoot production issues. Manage infrastructure reliability for platforms built on Java and NodeJS. Collaborate with development teams to optimize applications for scale and performance. Build internal tools for improved operational efficiency. Provide on-call support during US hours and on a rotational weekend basis. Maintain detailed records of incidents, fixes, and preventive measures. Required Skills and Qualifications: Minimum 5 years of experience in SRE or DevOps roles. Hands-on expertise in Java and NodeJS . (Mandatory) Prior experience supporting e-commerce or video streaming platforms . Proven troubleshooting experience across frontend, backend, and infrastructure layers. Strong grasp of system design, scalability, and observability. Excellent verbal and written communication skills. Preferred Skills and Qualifications: Experience with cloud platforms (AWS, GCP, or Azure). Familiarity with CI/CD pipelines, Docker, Kubernetes, and monitoring tools (Grafana, Prometheus, etc.). Incident response and RCA reporting experience.

Posted 4 weeks ago

Apply

10.0 - 15.0 years

12 - 22 Lacs

Hyderabad, Pune, Bengaluru

Hybrid

Job Description: Cloud Infrastructure & Deployment Design and implement secure, scalable, and highly available cloud infrastructure on GCP. Provision and manage compute, storage, network, and database services. Automate infrastructure using Infrastructure as Code (IaC) tools such as Terraform or Deployment Manager. Architecture & Design Translate business requirements into scalable cloud solutions. Recommend GCP services aligned with application needs and cost optimization. Participate in high-level architecture and solution design discussions. DevOps & Automation Build and maintain CI/CD pipelines (e.g., using Cloud Build, Jenkins, GitLab CI). Integrate monitoring, logging, and alerting (e.g., Stackdriver / Cloud Operations Suite). Enable autoscaling, load balancing, and zero-downtime deployments. Security & Compliance Ensure compliance with security standards and best Migration & Optimization Support cloud migration projects from on-premise or other cloud providers to GCP. Optimize performance, reliability, and cost of GCP workloads. Documentation & Support Maintain technical documentation and architecture diagrams. Provide L2/L3 support for GCP-based services and incidents. Required Skills and Qualifications: Google Cloud Certification Associate Cloud Engineer or Professional Cloud Architect/Engineer Hands-on experience with GCP services (Compute Engine, GKE, Cloud SQL, BigQuery, etc.) Strong command of Linux , shell scripting , and networking fundamentals Proficiency in Terraform , Cloud Build , Cloud Functions , or other GCP-native tools Experience with containers and orchestration – Docker, Kubernetes (GKE) Familiarity with monitoring/logging – Cloud Monitoring , Prometheus , Grafana Understanding of IAM , VPCs , firewall rules , service accounts , and Cloud Identity Excellent written and verbal communication skills

Posted 4 weeks ago

Apply

5.0 - 8.0 years

15 - 30 Lacs

Gurugram

Work from Office

We are looking for a talented Software Engineer with hands-on experience in Quarkus and Red Hat Fuse to design, develop, and maintain integration solutions. The ideal candidate will have strong proficiency in Java, experience with Kafka-based event streaming, RESTful APIs, relational databases, and CI/CD pipelines deployed on OpenShift Container Platform (OCP) . This role requires a developer who is passionate about building robust microservices and integration systems in a cloud-native environment. Key Responsibilities: Design and develop scalable microservices using Quarkus framework. Build and maintain integration flows and APIs leveraging Red Hat Fuse (Apache Camel) for enterprise integration patterns. Develop and consume RESTful web services and APIs. Design, implement, and optimize Kafka producers and consumers for real-time data streaming and event-driven architecture. Write efficient, well-documented, and testable Java code adhering to best practices. Work with relational databases (e.g., PostgreSQL, MySQL, Oracle) including schema design, queries, and performance tuning. Collaborate with DevOps teams to build and maintain CI/CD pipelines for automated build, test, and deployment workflows. Deploy and manage applications on OpenShift Container Platform (OCP) including containerization best practices (Docker). Participate in code reviews, design discussions, and agile ceremonies. Troubleshoot and resolve production issues with a focus on stability and performance. Keep up-to-date with emerging technologies and recommend improvements. Required Skills & Experience: Strong experience with Java (Java 8 or above) and the Quarkus framework. Expertise in Red Hat Fuse (or Apache Camel) for integration development. Proficient in designing and consuming REST APIs. Experience with Kafka for event-driven and streaming solutions. Solid understanding of relational databases and SQL . Experience in building and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI) and automated deployment. Hands-on experience deploying applications to OpenShift Container Platform (OCP). Working knowledge of containerization tools like Docker. Familiarity with microservices architecture, cloud-native development, and agile methodologies. Strong problem-solving skills and ability to work independently as well as in a team environment. Good communication and documentation skills.

Posted 4 weeks ago

Apply

6.0 - 9.0 years

10 - 15 Lacs

Ahmedabad

Remote

Design, build, and maintain scalable CI/CD pipelines using tools like Jenkins, GitLab CI, CircleCI, or GitHub Actions Automate infrastructure provisioning using tools such as Terraform, Ansible CloudFormation. Manage platforms like Kubernetes or ECS.

Posted 1 month ago

Apply

7.0 - 12.0 years

30 - 35 Lacs

Gurugram

Remote

Role: Site Reliability Engineer Shift : 5 Days working ( Fixed Shift, 5 days working:- 7 pm to 4 am IST) Location: Remote Work From Home Rackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites. If you enjoy solving complex business problems and can contribute to building next generation of modern applications for our customers helping them understand the connections between application performance, user experience and business outcomes creating amazing customer experiences, with modern interpretations of SRE, Observability using Datadog, New Relic, AppDynamics or Dynatrace, working with their suite of products and integrations, then join us! Rackspace enables businesses to accelerate digital transformation through our innovative data, integration solutions tools that help you fix problems quickly, maintain complex systems and improve code. We believe Datadog, AppDynamics or New Relicwill be a large contributor to what we do, and we want talented, creative, and thoughtful individuals to join our team to shape Observability Engineering for our customers. Key Responsibilities Work with customers and implement Observability solutions Build and maintain scalable systems and robust automation that supports engineering goals. Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance Proactively gather and analyze both metric and log data from systems and applications to perform anomaly detection, performance tuning, capacity planning and fault isolation. Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security and performance standards Collaborate with team members to document and share solutions Maintain a deep understanding of the customers business as well as their technical environment Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues Required Bachelors degree in engineering/computer science or equivalent Experience Range: 7 to 14 yrs Senior-level experience with Site Reliability Engineering, DevOps, Code level application support and troubleshooting, AWS Infrastructure design, implementation and optimization, Automation for deployment, scaling and reliability. Experience with observability solutions tools like Splunk, Datadog, SignalFx, etc. Experience deploying, maintaining and supporting software applications/services in the AWS ecosystem Proactive approach to identifying problems and solutions Experience writing code with one or more interpreted languages such as Python, PHP, Perl, Ruby,Linux Shell Experience with Terraform or Cloud Formation scripting Experience with configuration management tools like Ansible, Chef or Puppet Experience with standard software development best practices and tools such as code repositories (Git preferred) Experience executing in an agile software development environment Good understanding of pricing/cost models across AWS services, especially compute, storage, and database offerings A clear understanding of network & system Management solutions Excellent organizational and project management skills Excellent communication, critical thinking & analytical skills

Posted 1 month ago

Apply

8.0 - 13.0 years

16 - 25 Lacs

Hyderabad, Ahmedabad

Hybrid

Job Description: We are looking for a highly skilled Performance Test Engineer with over 8 years of experience to join our team. The ideal candidate will have extensive experience in performance testing and a strong ability to write and execute scripts using JMeter. Key Responsibilities: Design, develop, and execute comprehensive performance test plans and scripts. Write and maintain performance test scripts in JMeter, ensuring accuracy and efficiency. Conduct load, stress, and endurance testing to identify performance bottlenecks. Monitor application performance and analyse results using Grafana, DataDog and other APM tools. Collaborate with development and QA teams to troubleshoot performance issues and provide recommendations for improvements. Generate detailed reports on performance testing results, including findings and actionable insights. Continuously refine and improve testing processes and methodologies. Required Qualifications: 8 to 13+ years of experience in performance testing and analysis. Proven expertise and experience in JMeter, with hands-on experience of writing JMeter scripts in either Java/Groovy/BeanShell Strong experience with Grafana, Data or any other APM tool for performance monitoring and data visualization. Solid understanding of web applications and Rest API and their performance characteristics. Experience in client-side performance testing preferably with tools such as LightHouse, MPulse etc. Experience in component level testing Excellent analytical and troubleshooting skills. Strong communication skills and ability to work effectively within a team Preferred Qualifications: Familiarity with cloud environments and performance testing in distributed systems. Nice to have: Experience in performance testing for E- Commerce Sector.

Posted 1 month ago

Apply

2.0 - 5.0 years

3 - 7 Lacs

Chennai

Work from Office

Design, develop, and maintain automated test scripts using Playwright with TypeScript/JavaScript, as well as Selenium with Java, to ensure comprehensive test coverage across applications. Enhance the existing Playwright framework by implementing modular test design and optimizing performance, while also utilizing Cucumber for Behavior-Driven Development (BDD) scenarios. Execute functional, regression, integration, performance, and security testing of web applications, APIs and microservices. Collaborate in an Agile environment, participating in daily stand-ups, sprint planning, and retrospectives to ensure alignment on testing strategies and workflows. Troubleshoot and analyze test failures and defects using debugging tools and techniques, including logging and tracing within Playwright, Selenium, Postman, Grafana, etc. Document and report test results, defects, and issues using Jira and Confluence, ensuring clarity and traceability for all test activities. Implement page object models and reusable test components in both Playwright and Selenium to promote code reusability and maintainability. Integrate automated tests into CI/CD pipelines using Jenkins and GitHub Actions, ensuring seamless deployment and testing processes. Collaborate on Git for version control, managing branches and pull requests to maintain code quality and facilitate teamwork. Mentor and coach junior QA engineers on best practices for test automation, Playwright and Selenium usage, and CI/CD workflows. Research and evaluate new tools and technologies to enhance testing processes and coverage. WHAT DO YOU NEED TO SHINE IN THIS ROLE? Bachelor?s degree in Computer Science, Engineering, or related field, or equivalent work experience. At least 5 years of experience in software testing, with at least 3 years of experience in test automation. Ability to write functional test, test plan and test strategies Ability to configure test environment and test data using automation tools Experience in creation of an automated regress / CI test suite using Cucumber with Playwright (Preferred) or Selenium and Rest APIs Proficient in one or more programming languages - Java, Javascript or Typescript. Experience in testing web applications, APIs, and microservices using various tools and frameworks such as Selenium, Cucumber etc. Experience in testing SAST/DAST tools (Preferred) Experience in working with cloud platforms such as AWS, Azure, GCP, etc. Experience in working with CI/CD tools such as Jenkins, GitLab, GitHub, etc. Experience in writing queries and working with databases such as MySQL, MongoDB, Neo4j, Cassandra etc. Experience in working with tools such as Postman, JMeter, Grafana, etc. Exposure to Security standards and Compliance Experience in working with Agile methodologies such as Scrum, Kanban, etc. Ability to work independently and as part of a team. Ability to learn new technologies and tools quickly and adapt to changing requirements. Highly analytical mindset, logical approach to find solutions and perform root cause analysis Able to prioritize between critical and non critical path items Excellent communication skills with ability to communicate test results to stakeholders in the functional aspect of the system and its impact. WHAT YOU?LL GET Highly competitive compensation, benefits, and vacation package Ability to work for one of the fastest growing companies with some of the most talented people in the industry Team outings Fun, Hardworking, and Casual Environment Endless Growth Opportunities

Posted 1 month ago

Apply

5.0 - 10.0 years

7 - 12 Lacs

Bengaluru

Work from Office

Project description Institutional Banking Data Platform (IDP) is state-of-the-art cloud platform engineered to streamline data ingestion, transformation, and data distribution workflows that underpin Regulatory Reporting, Market Risk, Credit Risk, Quants, and Trader Surveillance. In your role as Software Engineer, you will be responsible for ensuring the stability of the platform, performing maintenance and support activities, and driving innovative process improvements that add significant business value. Responsibilities Problem solving advanced analytical and problem-solving skills to analyse complex information for key insights and present as meaningful information to senior management Communication excellent verbal and written communication skills with the ability to lead discussions with a varied stakeholder across levels Risk Mindset You are expected to proactively identify and understand, openly discuss, and act on current and future risks SkillsMust have Bachelor's degree in computer science, Engineering, or a related field/experience. 5+ years of proven experience as a Software Engineer or similar role, with a strong track record of successfully maintaining and supporting complex applications. Strong hands-on experience with Ab Initio GDE, including Express>It, Control Centre, Continuous>flow. Should have handled and worked with XML, JSON, and Web API. Strong hands-on experience in SQL. Hands-on experience in any shell scripting language. Experience with Batch and streaming-based integrations. Nice to have Knowledge of CI/CD tools such as TeamCity, Artifactory, Octopus, Jenkins, SonarQube, etc. Knowledge of AWS services including EC2, S3, CloudFormation, CloudWatch, RDS and others. Knowledge of Snowflake and Apache Kafka is highly desirable. Experience with configuration management and infrastructure-as-code tools such as Ansible, Packer, and Terraform. Experience with monitoring and observability tools like Prometheus/Grafana.

Posted 1 month ago

Apply

8.0 - 13.0 years

10 - 15 Lacs

Bengaluru

Work from Office

Project description We are seeking a highly skilled and motivated DevOps Engineer with 8+ years of experience to join our engineering team. You will work in a collaborative environment, automating and streamlining processes related to infrastructure, development, and deployment. As a DevOps Specialist, you will help implement and manage CI/CD pipelines, configure on-prem Windows OS infrastructure, and ensure the reliability and scalability of our systems. The system is on Windows with Microsoft SQL. Responsibilities CI/CD Pipeline ManagementDesign from scratch, implement, and manage automated build, test, and deployment pipelines to ensure smooth code integration and delivery. Infrastructure as Code (IaC)Develop and maintain infrastructure using tools for automated provisioning and management. System Monitoring & MaintenanceSet up monitoring systems for production and staging environments, analyze system performance, and provide solutions to increase efficiency. Deploy and manage configuration using fit-to-purpose tools and scripts with version controls, CI, etc. CollaborationWork closely with software developers, QA teams, and IT staff to define, develop, and improve DevOps processes and solutions. Automation & ScriptingCreate and maintain custom scripts to automate manual processes for deployment, scaling, and monitoring. SecurityImplement security practices and ensure compliance with industry standards and regulations related to cloud infrastructure. Troubleshooting & Issue ResolutionDiagnose and resolve issues related to system performance, deployments, and infrastructure. Drive DevOps thought leadership and delivery experience to the offshore client delivery team. Implement DevOps best practices based on developed patterns. SkillsMust have Total 9 to 12 years of experience as a DevOps Engineer 3+ years of experience in AWS Excellent knowledge of DevOps toolchains like GitHub Actions /GitHub Co-pilot Self-starter, capable of driving solutions from 0 to 1 and able to deliver projects from scratch Familiarity with containerization and orchestration tools (Docker, Kubernetes) Working understanding of platform security constructs Good exposure to Monitoring tools/Dashboards like Grafana, Obstack, or similar monitoring solutions Experience of working with Jira, Agile SDLC practices Expert knowledge of CI/CD Excellent written and verbal communication skills, strong collaboration, and teamwork skills Proficient in scripting languages like Python and PowerShell, and Database knowledge of MS SQL Experience with Windows or IIS, including installation, configuration, and maintenance Strong troubleshooting skills, with the ability to think critically, work under pressure, and resolve complex issues Excellent communication skills with the ability to work cross-functionally with development, operations, and IT teams Security Best PracticesKnowledge of security protocols, network security, and compliance standards Adaptability to new learning and strong attention to detail with a proactive approach to identifying issues before they arise Nice to have Cloud CertificationsAWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or equivalent. IAC pipelines and best practice Snyk, sysdiag knowledge Worked on windows OS, SRE, monitoring on Prometheus

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies