Jobs
Interviews

1162 Prometheus Jobs - Page 17

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

5.0 - 10.0 years

7 - 17 Lacs

Bengaluru

Work from Office

EMS and Observability Consultant. Overview: We are seeking a skilled IT Operations Consultant specializing in Monitoring and Observability to design, implement, and optimize monitoring solutions for our customers. The ideal candidate will have a minimum of 7 years of relevant experience, with a strong background in monitoring, observability and IT service management . The ideal candidate will be responsible for ensuring system reliability, performance, and availability by creating robust observability architectures and leveraging modern monitoring tools. Primary Responsibilities: Design end-to-end monitoring and observability solutions to provide comprehensive visibility into infrastructure, applications, and networks. Implement monitoring tools and frameworks (e.g., Prometheus, Grafana, OpsRamp, Dynatrace, New Relic ) to track key performance indicators and system health metrics. Integration of monitoring and observability solutions with IT Service Management Tools. Develop and deploy dashboards, alerts, and reports to proactively identify and address system performance issues. Architect scalable observability solutions to support hybrid and multi-cloud environments. Collaborate with infrastructure, development, and DevOps teams to ensure seamless integration of monitoring systems into CI/CD pipelines . Continuously optimize monitoring configurations and thresholds to minimize noise and improve incident detection accuracy. Automate alerting, remediation, and reporting processes to enhance operational efficiency. Utilize AIOps and machine learning capabilities for intelligent incident management and predictive analytics . Work closely with business stakeholders to define monitoring requirements and success metrics. Document monitoring architectures, configurations, and operational procedures.

Posted 2 weeks ago

Apply

6.0 - 11.0 years

8 - 12 Lacs

Hyderabad

Work from Office

The Platform Engineer is responsible for designing, implementing and maintaining scalable, secure and highly available Linux-based systems and DevOps pipelines. This role requires close collaboration with cross-functional teams to align infrastructure capabilities with business goals. What youll do: Engineer and manage scalable Linux-based infrastructure within a DevOps framework including core services such as web servers (Nginx/Apache), FTP, DNS, SSH etc. Automate infrastructure provisioning and configuration using tools like Ansible, Terraform or Puppet. Build and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI, or GitHub Actions. Monitor system performance and analyze performance metric availability using tools like Nagios, Solarwinds, Prometheus, Grafana etc Actively participate in incident management processes, quickly identifying and resolving issues. Conduct root cause analysis to prevent future incidents Collaborate with development teams to ensure seamless integration and deployment of applications Apply security best practices to harden systems and manage vulnerabilities. Create and maintain documentation for systems, processes and procedures to ensure knowledge sharing across teams Participate in on-call rotations Stay updated on industry trends and emerging technologies What youll bring 6+years of experience leveraging automation to manage Linux systems and infrastructure, specifically RedHat In-depth knowledge of cloud platforms such as AWS and Azure Proficiency with infrastructure as code (IaC) tools such as Terraform and CloudFormation Strong technical experience implementing, managing and supporting Linux systems infrastructure Proficiency in one or more programming languages (Python, Powershell, etc) Ability to deliver software which meets consistent standards of quality, security and operability. Able to work flexible hours as required by business priorities; Available on a 24x7x365 basis when needed for production impacting incidents or key customer events Stay up to date on everything Blackbaud, follow us on Linkedin, X, Instagram, Facebook and YouTube Blackbaud is proud to be an equal opportunity employer and is committed to maintaining an inclusive work environment. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, physical or mental disability, age, or veteran status or any other basis protected by federal, state, or local law.

Posted 2 weeks ago

Apply

3.0 - 7.0 years

6 - 11 Lacs

Bengaluru

Work from Office

: Proficiency in problem solving and troubleshooting technical issues. Willingness to take ownership and strive for the best solutions. Experience in using performance analysis tools, such as Android Profiler, Traceview, perfetto, and Systrace etc. Strong understanding of Android architecture, memory management, and threading. Strong understanding of Android HALs, Car Framework, Android graphics pipeline, DRM, Codecs. Good knowledge in Hardware abstraction layers in Android and/or Linux. Good understanding of the git, CI/CD workflow Experience in agile based projects. Experience with Linux as a development platform and target Extensive experience with Jenkins and Gitlab CI system Hands-on experience with GitLab, Jenkins, Artifactory, Grafana, Prometheus and/or Elastic Search. Experience with different testing frameworks and their implementation in CI system Programming using C/C++, Java/Kotlin, Linux. Yocto and its use in CI Environments Familiarity with ASPICE Works in the area of Software Engineering, which encompasses the development, maintenance and optimization of software solutions/applications.1. Applies scientific methods to analyse and solve software engineering problems.2. He/she is responsible for the development and application of software engineering practice and knowledge, in research, design, development and maintenance.3. His/her work requires the exercise of original thought and judgement and the ability to supervise the technical and administrative work of other software engineers.4. The software engineer builds skills and expertise of his/her software engineering discipline to reach standard software engineer skills expectations for the applicable role, as defined in Professional Communities.5. The software engineer collaborates and acts as team player with other software engineers and stakeholders. - Grade Specific Is highly respected, experienced and trusted. Masters all phases of the software development lifecycle and applies innovation and industrialization. Shows a clear dedication and commitment to business objectives and responsibilities and to the group as a whole. Operates with no supervision in highly complex environments and takes responsibility for a substantial aspect of Capgeminis activity. Is able to manage difficult and complex situations calmly and professionally. Considers the bigger picture when making decisions and demonstrates a clear understanding of commercial and negotiating principles in less-easy situations. Focuses on developing long term partnerships with clients. Demonstrates leadership that balances business, technical and people objectives. Plays a significant part in the recruitment and development of people. Skills (competencies) Verbal Communication

Posted 2 weeks ago

Apply

2.0 - 5.0 years

3 - 7 Lacs

Hyderabad

Work from Office

Your Role Design, implement, and maintain end-to-end ML pipelines for model training, evaluation, and deployment Collaborate with data scientists and software engineers to operationalize ML models, serving frameworks (TensorFlow Serving, TorchServe) and experience with MLOps tools Develop and maintain CI/CD pipelines for ML workflows Implement monitoring and logging solutions for ML models, experience with ML model serving frameworks (TensorFlow Serving, TorchServe) Optimize ML infrastructure for performance, scalability, and cost-efficiency Your Profile Strong programming skills in Python, with experience in ML frameworks; understanding of ML-specific testing and validation techniques Expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes), Knowledge of data versioning and model versioning techniques Proficiency in cloud platform (AWS) and their ML-specific services Strong understanding of DevOps practices and tools (GitLab, Artifactory, Gitflow etc.) Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack) and knowledge of distributed training techniques What youll love about working here We recognise the significance of flexible work arrangements to provide support in hybrid mode, you will get an environment to maintain healthy work life balance Our focus will be your career growth & professional development to support you in exploring the world of opportunities. Equip yourself with valuable certifications & training programmes in the latest technologies such as MLOps, Machine Learning

Posted 2 weeks ago

Apply

6.0 - 11.0 years

18 - 22 Lacs

Hyderabad

Work from Office

Overview We are seeking a highly skilled and analytically strong Site Reliability Engineer ( SRE) and Scrum with 6+ years of experience. The ideal candidate will have a proven track record in managing SRE responsibilities across multiple teams, with deep expertise in Active Directory (AD) groups, Databricks, architecture design, and enterprise tools like Clarity and ServiceNow. Strong Scrum delivery experience and cross-functional collaboration are essential. Responsibilities Key Responsibilities Lead SRE operations across distributed teams, ensuring system reliability, scalability, and performance. Design and implement robust monitoring, alerting, and observability frameworks. Lead Scrum ceremonies Manage and optimize Active Directory (AD) group structures and access controls. Collaborate with data engineering teams to support Databricks environments. Contribute to architectural discussions and decisions for high-availability systems. Drive incident response, root cause analysis, and continuous improvement initiatives. Integrate and manage workflows using Clarity PPM and ServiceNow for change, incident, and problem management. Actively participate in Scrum ceremonies (daily stand-ups, sprint planning, reviews, retrospectives). Collaborate with Product Owners and Scrum Masters to ensure timely and qual ity . Qualifications Education Bachelors or Masters degree in Computer Science, Information Systems, Business Analytics, or a related field. Experience 6+ years of experience in SRE, DevOps, or Infrastructure Engineering roles. Strong analytical thinking and troubleshooting skills. Hands-on experience with Active Directory (AD) group policy management, access provisioning. Databricks cluster management, job orchestration, performance tuning. Architecture designing scalable, fault-tolerant systems. Clarity PPM project tracking, resource planning. ServiceNow incident/change/problem management workflows. Proficiency in monitoring tools (e.g., Prometheus, Grafana, Datadog). Experience with CI/CD pipelines and infrastructure as code (Terraform, Ansible). Familiarity with cloud platforms (Azure, AWS, or GCP). Strong scripting skills (Python, Bash, PowerShell). Solid understanding of Agile/Scrum methodologies and tools like Jira or Azure DevOps. Preferred Qualifications Certified Scrum Master or equivalent Agile certification. Experience working in a global delivery model. Exposure to digital product and reporting services is a plus.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

12 - 17 Lacs

Pune

Work from Office

Sarvaha would like to welcome a Lead/Senior Java Developer with a minimum of 5 years, ideally 10+ years, of experience in designing and developing scalable micro-services and cloud-native applications using Java, Spring Boot, and reactive programming paradigms. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the glo be. Please visit our website at https://www.sarvaha.com to know more about us. What Youll Do Design and develop scalable micro-services using Java 17+ and Spring Boot Build reactive applications with Spring WebFlux and Project Reactor Implement event-driven architectures using Kafka and Azure Event Hub Develop secure, high-throughput REST APIs Work with AWS and Azure cloud environments to deploy and monitor services Collaborate with DevOps to ensure reliability, tracing, and observability of systems Participate in code reviews, mentor team members, and promote engineering best practices Troubleshoot and resolve production issues in distributed systems You Bring BE/BTech/MTech (CS/IT or MCA) with strong software engineering fundamentals Hands-on experience with Java, Spring Boot, and the broader Spring ecosystem Strong knowledge of Spring WebFlux, Project Reactor, and non-blocking I/O Solid understanding of Kafka (Producers, Consumers, Streams) and message-driven design Experience with AWS (EC2, S3, Lambda, SNS/SQS) or Azure SDKs and Event Hub Expertise in designing and developing high-performance, resilient, and observable systems Exposure to Docker, CI/CD pipelines, and Kubernetes (preferred) Familiarity with microservices testing strategies like contract testing, mocking, & test containers Strong problem-solving abilities & system design thinking (caching, partitioning, load balancing) Clear communication, love for documentation, and mentoring to programmers on the team What Sets You Apart Monitoring experience with Grafana, Prometheus, ELK, or Datadog Excellent collaboration with cross-functional teamsdevelopers, DevOps, QA Knowledge of both AWS and Azure is a strong plus

Posted 2 weeks ago

Apply

7.0 - 10.0 years

11 - 12 Lacs

Hyderabad

Work from Office

We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 7 to 10+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project.

Posted 2 weeks ago

Apply

5.0 - 10.0 years

11 - 21 Lacs

Pune

Work from Office

Experience 5-7 years (P3) 21 LPA NP – Immediate – 15 Days Must Skills – Devops + AWS [Primary responsibility of candidate]: Involved in supporting infrastructure architecture, system performance and overall infrastructure operating environment for either on-premise infrastructure or cloud computing platforms, or both. Support new application system initiatives and to drive development of infrastructure architecture, system integration, acceptance, performance management practice and performance testing Run infrastructure services either on premise, hybrid or public cloud environment to support application systems by working closely with Applications Service counterparts Understand systems operations environment and drive Architecture Review and Governance Process to ensure smooth and sustained operations (including resiliency requirements). [What is project requirement to qualify candidates]: Degree in Computer Science, Computer or Electronics Engineering or Information Technology or equivalent Minimum 5 years of relevant working experience, with validated records of having utilized architect design capabilities in infrastructure management for both on premise and Cloud workloads. Certification in a Cloud Technology platform (either in Architecture, DevOps or System Administration/ SysOps track) preferred. Successful candidate need to demonstrate either deep or broad (or both) level of technical expertise in the area of Infrastructure Services, with appreciation in one or more areas of Infrastructure Management, Cloud Computing and DevOps Engineering concepts. Proactive and dedicated individual with good leadership and multi-tasking capabilities Good interpersonal skills, oral and written skills, with the ability to present ideas and influence partners of different level [Candidate's Tech Stack]: AWS - EC2, S3, VPC, EKS, Lambda, CloudWatch, Transit Gateway, Network Firewall, IAM, Transfer Family IAC - Terraform Kubernetes Helm Grafana & Prometheus

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

chennai, tamil nadu

On-site

As a Lead Engineer, DevOps at Toyota Connected India, you will be part of a dynamic team that is dedicated to creating innovative infotainment solutions on embedded and cloud platforms. You will play a crucial role in shaping the future of mobility by leveraging your expertise in cloud platforms, containerization, infrastructure automation, scripting languages, monitoring solutions, networking, security best practices, and CI/CD tools. Your responsibilities will include: - Demonstrating hands-on experience with cloud platforms such as AWS or Google Cloud Platform. - Utilizing strong expertise in containerization (e.g., Docker) and Kubernetes for container orchestration. - Implementing infrastructure automation and configuration management tools like Terraform, CloudFormation, Ansible, or similar. - Proficiency in scripting languages such as Python, Bash, or Go for efficient workflow. - Experience with monitoring and logging solutions such as Prometheus, Grafana, ELK Stack, or Datadog to ensure system reliability. - Knowledge of networking concepts, security best practices, and infrastructure monitoring to maintain a secure and stable environment. - Strong experience with CI/CD tools such as Jenkins, GitLab CI, CircleCI, Travis CI, or similar for continuous integration and delivery. At Toyota Connected, you will enjoy top-of-the-line compensation, autonomy in managing your time and workload, yearly gym membership reimbursement, free catered lunches, and a casual dress code. You will have the opportunity to work on products that enhance the safety and convenience of millions of customers, all within a collaborative, innovative, and empathetic work culture that values customer-centric decision-making, passion for excellence, creativity, and teamwork. Join us at Toyota Connected India and be part of a team that is redefining the automotive industry and making a positive global impact!,

Posted 2 weeks ago

Apply

4.0 - 8.0 years

0 Lacs

coimbatore, tamil nadu

On-site

As a Site Reliability Engineer (SRE) at our Coimbatore location, you will be responsible for ensuring the reliability and performance of our cloud and on-prem environments. Your key responsibilities will include driving root cause analysis to prevent incident recurrence, managing capacity planning and performance tuning, and participating in the on-call rotation for timely support and issue resolution. You will also be involved in designing, implementing, and maintaining CI/CD pipelines using tools such as Jenkins and GitHub, automating infrastructure deployment and monitoring following Infrastructure as Code (IaC) principles, and enhancing automation for operational tasks and incident response. In addition, you will implement and manage enterprise monitoring solutions like Splunk, Dynatrace, Prometheus, and Grafana, build real-time dashboards and alerts to proactively identify system anomalies, and continuously improve observability, logging, and tracing across all environments. Furthermore, you will work with AWS, Azure, and PCF environments, managing cloud-native services and infrastructure, designing and optimizing cloud architecture for reliability and cost-efficiency, and collaborating with cloud security and networking teams to ensure secure and compliant infrastructure. Your collaboration with product and development teams will ensure alignment with business objectives.,

Posted 2 weeks ago

Apply

10.0 - 14.0 years

0 Lacs

delhi

On-site

You are a highly experienced DevOps Architect and Level 4 DevOps Subject Matter Expert (SME) with over 10 years of relevant experience in the field of DevOps. Your expertise lies in building scalable, secure, and fully automated infrastructure environments, with a focus on delivering robust DevOps solutions, establishing architecture best practices, and driving automation across development and operations teams. Your role involves deep hands-on expertise in Continuous Integration And Continuous Delivery (CI/CD) tools like Jenkins, Azure DevOps, Helm, GIOPS, and ArgoCD to implement reliable, automated software delivery pipelines. You possess advanced Infrastructure as Code (IaC) experience using tools such as Terraform, Ansible, SaltStack, ARM Templates, and Google Cloud Deployment Manager for scalable and consistent infrastructure provisioning. You are an expert in container platforms, particularly Kubernetes and Docker, for orchestrated, secure, and highly available deployments. Your proficiency extends to Kubernetes operations, including production-grade cluster management, autoscaling, Helm chart development, RBAC configuration, ingress controllers, and network policy enforcement. Furthermore, you have extensive cloud experience across ROS, Azure, and GCP, with deep knowledge of core services, networking, storage, identity, and security implementations. Your scripting and automation capabilities using Bash, Python, or Go enable you to develop robust automation tools and system-level integrations. In addition, you have comprehensive monitoring and observability expertise with Prometheus, Grafana, and the ELK stack for end-to-end visibility, alerting, and performance analysis. You are skilled in designing and implementing secure, scalable, and resilient DevOps architectures aligned with industry best practices for both cloud-native and hybrid environments. Your experience also includes artifact management using JFrog Artifactory or Nexus, performing root cause analysis, developing self-healing scripts, and ensuring high system availability and minimal disruption. You are familiar with DevSecOps and compliance frameworks, mentoring engineering teams in DevOps adoption, tooling, automation strategies, and architectural decision-making. As a recognized DevOps expert and L4 SME, you continuously evaluate and recommend emerging tools, frameworks, and practices to enhance deployment speed, pipeline efficiency, and platform reliability. Your strong communication skills allow you to present and explain architectural strategies and system design decisions to both technical and non-technical stakeholders with clarity and confidence.,

Posted 2 weeks ago

Apply

4.0 - 8.0 years

0 Lacs

delhi

On-site

As a DevOps Engineer specializing in App Infrastructure & Scaling, you will be a crucial member of our technology team. Your primary responsibility will involve designing, implementing, and maintaining scalable and secure cloud infrastructure that supports our mobile and web applications. Your contributions will be essential in ensuring system reliability, performance optimization, and cost efficiency across different environments. Your key responsibilities will include designing and managing cloud infrastructure on Google Cloud Platform (GCP), implementing horizontal scaling, load balancers, auto-scaling groups, and performance monitoring systems. You will also be responsible for developing and managing CI/CD pipelines using tools like GitHub Actions, Jenkins, or GitLab CI. Setting up real-time monitoring, crash alerting, logging systems, and health dashboards using industry-leading tools will be part of your daily tasks. You will collaborate closely with Flutter and PHP (Laravel) teams to address performance bottlenecks and reduce system loads. Additionally, you will conduct infrastructure security audits, recommend best practices to prevent downtime and security breaches, and monitor and optimize cloud usage and billing for a cost-effective and scalable architecture. To be successful in this role, you should have at least 3-5 years of hands-on experience in a DevOps or Cloud Infrastructure role, preferably with GCP. Strong proficiency in Docker, Kubernetes, NGINX, and load balancing strategies is essential. Experience with CI/CD pipelines and tools like GitHub Actions, Jenkins, or GitLab CI, as well as familiarity with monitoring tools like Grafana, Prometheus, NewRelic, or Datadog, is required. Deep understanding of API architecture, PHP/Laravel backends, Firebase, and modern mobile app infrastructure is also necessary. Preferred qualifications include Google Cloud Professional certification or equivalent and experience in optimizing systems for high-concurrency, low-latency environments. Familiarity with Infrastructure as Code (IaC) tools such as Terraform or Ansible is a plus. In summary, as a DevOps Engineer specializing in App Infrastructure & Scaling, you will play a critical role in ensuring the scalability, reliability, and security of our cloud infrastructure that powers our applications. Your expertise will contribute to the overall performance and cost efficiency of our systems.,

Posted 2 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

haryana

On-site

You will be responsible for implementing and managing CI/CD pipelines, container orchestration, and cloud services to enhance our software development lifecycle. Collaborate with development and operations teams to streamline processes and improve deployment efficiency. Implement and manage CI/CD tools such as GitLab CI, Jenkins, or CircleCI. Utilize Docker and Kubernetes (k8s) for containerization and orchestration of applications. Write and maintain scripts in at least one scripting language (e.g., Python, Bash) to automate tasks. Manage and deploy applications using cloud services (e.g. AWS, Azure, GCP) and their respective management tools. Understand and apply network protocols, IP networking, load balancing, and firewalling concepts. Implement infrastructure as code (IaC) practices to automate infrastructure provisioning and management. Utilize logging and monitoring tools (e.g., ELK stack, OpenSearch, Prometheus, Grafana) to ensure system reliability and performance. Familiarize with GitOps practices using tools like Flux or ArgoCD for continuous delivery. Work with Helm and Flyte for managing Kubernetes applications and workflows. Bachelors or masters degree in computer science, or a related field. Proven experience in a DevOps engineering role. Strong background in software development and system administration. Experience with CI/CD tools and practices. Proficiency in Docker and Kubernetes. Familiarity with cloud services and their management tools. Understanding of networking concepts and protocols. Experience with infrastructure as code (IaC) practices. Familiarity with logging and monitoring tools. Knowledge of GitOps practices and tools. Experience with Helm and Flyte is a plus. Preferred Qualifications: Experience with cloud-native architectures and microservices. Knowledge of security best practices in DevOps and cloud environments. Understanding database management and optimization (e.g., SQL, NoSQL). Familiarity with Agile methodologies and practices. Experience with performance tuning and optimization of applications. Knowledge of backup and disaster recovery strategies. Familiarity with emerging DevOps tools and technologies.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

maharashtra

On-site

Invent the future with Ampere, a semiconductor design company leading the future of computing with a focus on high-performance, energy efficiency, and sustainable cloud computing. Ampere, recognized by Fast Company's 2023 100 Best Workplaces for Innovators List, collaborates with leading cloud suppliers and partners to deliver cutting-edge cloud instances, servers, and embedded/edge products. Join the passionate and growing team at Ampere to make a difference! As a HPC Systems Engineer at Ampere, you will be responsible for scaling the company's high-performance compute needs to support innovative engineering projects. In this role, you will: - Explore emerging technologies to address expanding compute, networking, and memory requirements. - Implement industry-standard orchestration tools like Terraform for automation. - Collaborate with internal teams and vendors to implement best practices and optimize efficiency. - Drive capacity planning, negotiations, and the purchase of engineering compute resources. - Design and deploy highly scalable compute environments to facilitate growth. - Modify and implement job flows to balance resources in both cloud and on-premises environments. - Identify and implement monitoring solutions to optimize resource utilization. - Create data-driven forecasting to track development cycles and projects. To excel in this role, you should have: - A BS degree in Computer Science, Computer Engineering, or a related technical field with 5+ years of experience. - Proficiency in scripting languages such as PERL, Python, or Bash for automating infrastructure needs. - Experience with Configuration Management tools like CFEngine, Chef, or Puppet, and Ansible. - Expertise in setting up Prometheus environments and developing monitoring solutions. - Linux systems administration experience with debugging skills. - Familiarity with Git tools and ASIC design flow would be a plus. - Ability to drive technical leadership and manage large-scale HPC system projects. - Strong communication skills and the ability to collaborate effectively with multiple teams. Ampere offers a range of benefits, including premium medical, dental, and vision insurance, parental benefits, retirement plans, generous paid time off, and catered meals in the office. Join Ampere's inclusive culture and contribute to industry-leading cloud-native designs for a sustainable future. Explore career opportunities with Ampere through the interview process.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

ahmedabad, gujarat

On-site

As a DevOps Engineer, you will be responsible for defining and implementing DevOps strategies that are in line with the business objectives. You will lead cross-functional teams to enhance collaboration among development, QA, and operations departments. Your role will involve designing, implementing, and managing Continuous Integration/Continuous Deployment (CI/CD) pipelines to automate build, test, and deployment processes, thus expediting release cycles. Furthermore, you will be tasked with implementing and overseeing Infrastructure as Code using tools such as Terraform, CloudFormation, Ansible, etc. Managing cloud platforms like AWS, Azure, or Google Cloud will also be a part of your responsibilities. It will be crucial for you to monitor and address security risks within CI/CD pipelines and infrastructure. Setting up observability tools like Prometheus, Grafana, Splunk, Datadog, etc., and implementing proactive alerting and incident response processes will be essential. In this role, you will take the lead in incident response and root cause analysis (RCA) activities. You will also play a key role in documenting DevOps processes, best practices, and system architectures. Additionally, you will be involved in evaluating and incorporating new DevOps tools and technologies. A significant aspect of your role will involve fostering a culture of continuous learning and sharing knowledge among team members.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

thiruvananthapuram, kerala

On-site

Techvantage.ai is a next-generation technology and product engineering company at the forefront of innovation in Generative AI, Agentic AI, and autonomous intelligent systems. We build intelligent, scalable, and future-ready digital platforms that drive the next wave of AI-powered transformation. We are seeking a highly skilled and experienced Senior Node.js Developer with 5+ years of hands-on experience in backend development. As part of our engineering team, you will be responsible for architecting and building scalable APIs, services, and infrastructure that power high-performance AI-driven applications. You'll collaborate with front-end developers, DevOps, and data teams to ensure fast, secure, and efficient back-end functionality that meets the needs of modern AI-first products. What we are looking for in an ideal candidate: - Design, build, and maintain scalable server-side applications and APIs using Node.js and related frameworks. - Implement RESTful and GraphQL APIs for data-driven and real-time applications. - Collaborate with front-end, DevOps, and data teams to build seamless end-to-end solutions. - Optimize application performance, scalability, and security. - Write clean, maintainable, and well-documented code. - Integrate with third-party services and internal microservices. - Apply best practices in code quality, testing (unit/integration), and continuous integration/deployment. - Troubleshoot production issues and implement monitoring and alerting solutions. Requirements: - 5+ years of professional experience in backend development using Node.js. - Proficiency in JavaScript (ES6+) and strong experience with Express.js, NestJS, or similar frameworks. - Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB). - Strong understanding of API security, authentication (OAuth2, JWT), and rate limiting. - Experience building scalable microservices and working with message queues (e.g., RabbitMQ, Kafka). - Familiarity with containerized applications using Docker and orchestration via Kubernetes. - Proficient in using Git, CI/CD pipelines, and version control best practices. - Solid understanding of performance tuning, caching, and system design. Preferred Qualifications: - Experience in cloud platforms like AWS, GCP, or Azure. - Exposure to building backends for AI/ML platforms, data pipelines, or analytics dashboards. - Familiarity with GraphQL, WebSockets, or real-time communication. - Knowledge of infrastructure-as-code tools like Terraform is a plus. - Experience with monitoring tools like Prometheus, Grafana, or New Relic. What We Offer: - The chance to work on cutting-edge products leveraging AI and intelligent automation. - A high-growth, innovation-driven environment with global exposure. - Access to modern development tools and cloud-native technologies. - Attractive compensation no constraints for the right candidate.,

Posted 2 weeks ago

Apply

10.0 - 14.0 years

0 Lacs

karnataka

On-site

As a Senior Software DevOps Engineer, you will be responsible for leading the design, implementation, and evolution of telemetry pipelines and DevOps automation to enable next-generation observability for distributed systems. Your main focus will be on leveraging a deep understanding of Open Telemetry architecture along with strong DevOps practices to construct a reliable, high-performance, and self-service observability platform that spans hybrid cloud environments such as AWS and Azure. Your primary goal will be to provide engineering teams with actionable insights through rich metrics, logs, and traces while promoting automation and innovation at all levels. In your role, you will be involved in the following key activities: Observability Strategy & Implementation: - Design and manage scalable observability solutions using OpenTelemetry (OTel), including deploying OTel Collectors for ingesting and exporting telemetry data, guiding teams on instrumentation best practices, building telemetry pipelines for data routing, and utilizing processors and extensions for advanced enrichment and routing. DevOps Automation & Platform Reliability: - Take ownership of the CI/CD experience using GitLab Pipelines, integrate infrastructure automation with Terraform, Docker, and scripting in Bash and Python, and develop resilient and reusable infrastructure-as-code modules across AWS and Azure ecosystems. Cloud-Native Enablement: - Create observability blueprints for cloud-native applications on AWS and Azure, optimize cost and performance of telemetry pipelines, and ensure SLA/SLO adherence for observability services. Monitoring, Dashboards, and Alerting: - Build and maintain role-based dashboards in tools like Grafana and New Relic for real-time visibility into service health and business KPIs, implement alerting best practices, and integrate with incident management systems. Innovation & Technical Leadership: - Drive cross-team observability initiatives to reduce MTTR and enhance engineering velocity, lead innovation projects such as self-service observability onboarding and AI-assisted root cause detection, and mentor engineering teams on telemetry standards and operational excellence. Qualifications and Skills: - 10+ years of experience in DevOps, Site Reliability Engineering, or Observability roles - Deep expertise with OpenTelemetry, GitLab CI/CD, Terraform, Docker, and scripting languages (Python, Bash, Go) - Hands-on experience with AWS and Azure services, cloud automation, and cost optimization - Proficiency with observability backends such as Grafana, New Relic, Prometheus, and Loki - Strong passion for building automated, resilient, and scalable telemetry pipelines - Excellent documentation and communication skills to drive adoption and influence engineering culture Nice to Have: - Certifications in AWS, Azure, or Terraform - Experience with OpenTelemetry SDKs in Go, Java, or Node.js - Familiarity with SLO management, error budgets, and observability-as-code approaches - Exposure to event streaming technologies (Kafka, RabbitMQ), Elasticsearch, Vault, and Consul,

Posted 2 weeks ago

Apply

2.0 - 6.0 years

0 Lacs

pune, maharashtra

On-site

You will be joining as a talented SDE1 - DevOps Engineer with the exciting opportunity to contribute towards building a top-notch DevOps infrastructure that can scale to accommodate the next 100M users. As an ideal candidate, you will be expected to tackle a variety of challenges with enthusiasm and take full ownership of your responsibilities. Your main responsibilities will include running a highly available Cloud-based software product on AWS, designing and implementing new systems in close collaboration with the Software Development team, setting up and maintaining CI/CD systems, and automating the deployment of software. You will also be tasked with continuously enhancing the security posture and operational efficiency of the Amber platform, as well as optimizing the operational costs. To excel in this role, you should possess 2-3 years of experience in a DevOps / SRE role, with a minimum of 2 years. You must have hands-on experience with AWS services such as ECS, EKS, RDS, Elasticache, and CloudFront, as well as familiarity with Google Cloud Platform. Proficiency in Infrastructure as Code tools like Terraform, CI/CD tools like Jenkins and GitHub Actions, and scripting languages such as Python and Bash is essential. Additionally, you should have a strong grasp of SCM in GitHub, networking concepts, and experience with observability and monitoring tools like Grafana, Loki, Prometheus, and ELK. Prior exposure to On-Call Rotation and mentoring junior DevOps Engineers would be advantageous. While not mandatory, knowledge of NodeJS and Ruby, including their platforms and workflows, would be considered a plus for this role.,

Posted 2 weeks ago

Apply

2.0 - 6.0 years

0 Lacs

chennai, tamil nadu

On-site

The role at Hitachi Energy India Development Centre (IDC) in Chennai offers you the opportunity to be part of a dedicated team of over 500 R&D engineers, specialists, and experts focused on creating innovative digital solutions, new products, and cutting-edge technology. As a part of the IDC team, you will collaborate with R&D and Research centers across more than 15 locations globally, contributing to the advancement of the world's energy system towards sustainability, flexibility, and security. Your primary responsibilities in this role include staying on track to meet project milestones and deadlines, actively suggesting and implementing process improvements, collaborating with a diverse team across different time zones, and enhancing processes related to continuous integration, deployment, testing, and release management. You will play a crucial role in developing, maintaining, and supporting azure infrastructure and system software components, providing guidance on azure tech components, ensuring application performance, uptime, and scalability, and leading CI/CD processes design and implementation. To excel in this position, you should possess at least 3 years of experience in azure DevOps, CI/CD, configuration management, and test automation, along with expertise in Azure PaaS, Azure Active Directory, Kubernetes, and application insight. Additionally, you should have hands-on experience with infrastructure as code automation, database management, system monitoring, security practices, containerization, and Linux system administration. Proficiency in at least one programming language, strong communication skills in English, and a commitment to Hitachi Energy's core values of safety and integrity are essential for success in this role. If you are a qualified individual with a disability and require accommodations during the job application process, you can request reasonable accommodations through our website. Please provide specific details about your needs to receive the necessary support. This opportunity is tailored for individuals seeking accessibility assistance, and inquiries for other purposes may not receive a response.,

Posted 2 weeks ago

Apply

10.0 - 14.0 years

0 Lacs

andhra pradesh

On-site

You are seeking a highly skilled Technical Architect with expertise in Java Spring Boot, React.js, IoT system architecture, and a strong foundation in DevOps practices. As the ideal candidate, you will play a pivotal role in designing scalable, secure, and high-performance IoT solutions, leading full-stack teams, and collaborating across product, infrastructure, and data teams. Your key responsibilities will include designing and implementing scalable and secure IoT platform architecture, defining data flow and event processing pipelines, architecting micro services-based solutions, and integrating them with React-based front-ends. You will also be responsible for defining CI/CD pipelines, managing containerization & orchestration, driving infrastructure automation, ensuring platform monitoring and observability, and enabling auto-scaling and zero-downtime deployments. In addition, you will need to collaborate with product managers and business stakeholders to translate requirements into technical specs, mentor and lead a team of developers and engineers, conduct code and architecture reviews, set goals and targets, and provide coaching and professional development to team members. Your role will also involve conducting unit testing, identifying risks, using coding standards and best practices to ensure quality, and maintaining a long-term outlook on the product roadmap and its enabling technologies. To be successful in this role, you must have hands-on IoT project experience, experience in designing and deploying multi-tenant SaaS platforms, strong knowledge of security best practices in IoT and cloud, excellent problem-solving, communication, and team leadership skills. It would be beneficial if you have experience with Edge Computing frameworks, AI/ML model integration into IoT pipelines, exposure to industrial protocols, experience with digital twin concepts, and certifications in relevant technologies. Ideally, you should have a Bachelor's or Master's degree in Computer Science, Engineering, or a related field. By joining us, you will have the opportunity to lead architecture for cutting-edge industrial IoT platforms, work with a passionate team in a fast-paced and innovative environment, and gain exposure to cross-disciplinary challenges in IoT, AI, and cloud-native technologies.,

Posted 2 weeks ago

Apply

6.0 - 10.0 years

0 Lacs

ahmedabad, gujarat

On-site

You will be responsible for leading a team of DevOps engineers in Ahmedabad. Your main duties will include managing and mentoring the team, overseeing the deployment and maintenance of various applications such as Odoo, Magento, and Node.js. You will also be in charge of designing and managing CI/CD pipelines using tools like Jenkins and GitLab CI, handling environment-specific configurations, and containerizing applications using Docker. In addition, you will need to implement and maintain Infrastructure as Code using tools like Terraform and Ansible, monitor application health and infrastructure, and ensure systems are secure, resilient, and compliant with industry standards. Collaboration with development, QA, and IT support teams is essential for seamless delivery, and troubleshooting performance, deployment, or scaling issues across tech stacks will also be part of your responsibilities. To be successful in this role, you should have at least 6 years of experience in DevOps/Cloud/System Engineering roles, with a minimum of 2 years managing or leading DevOps teams. Hands-on experience with Odoo, Magento, Node.js, and AWS/Azure/GCP infrastructure is required. Strong scripting skills in Bash, Python, PHP, or Node CLI, as well as a deep understanding of Linux system administration and networking fundamentals, are essential. Experience with Git, SSH, reverse proxies, and load balancers is also necessary, along with good communication skills and client management exposure. Preferred certifications that would be highly valued for this role include AWS Certified DevOps Engineer Professional, Azure DevOps Engineer Expert, and Google Cloud Professional DevOps Engineer. Bonus skills that are nice to have include experience with multi-region failover, HA clusters, MySQL/PostgreSQL optimization, GitOps, ArgoCD, Helm, VAPT 2.0, WCAG compliance, and infrastructure security best practices.,

Posted 2 weeks ago

Apply

5.0 - 9.0 years

0 Lacs

thane, maharashtra

On-site

As a Senior Software Engineer / Technical Architect for the B2C Platform at AjnaLens in Thane, Maharashtra, India, you will be responsible for architecting and building scalable backend services using Java (Spring Boot) and Python (FastAPI/Django). With at least 5 years of experience in backend development, you will possess a strong foundation in scalable system design and cloud-native infrastructure. Your role will involve leading key decisions on infrastructure, real-time data pipelines, LLM-integrated systems, and platform observability. You will collaborate closely with product, design, and infrastructure teams to ship user-first features at scale. Additionally, you will demonstrate a product mindset, technical depth, and the ability to deliver fast in a collaborative, purpose-driven environment. The ideal candidate will have deep hands-on experience with Java, Python, and frameworks like Spring Boot, Django, or FastAPI. You should also showcase proven expertise in building scalable microservices, streaming pipelines, and event-driven systems. Proficiency in tools like Redis, PostgreSQL, PGVector, and cloud platforms such as AWS/GCP is essential. Exposure to LLM features like RAG, vector search, GPT workflows, or chatbot architectures will be advantageous. In this role, you will be expected to possess solid system design capabilities, lead architecture reviews, and mentor junior developers. Full-stack awareness, including familiarity with front-end tools like ReactJS for internal dashboards and modern DevOps practices, will be beneficial. Experience with containerization and observability tools like Prometheus, Grafana, and OTEL is desired. Bonus points will be awarded for experience with Agentic systems, real-time infrastructure, or AI-driven product features. Prior involvement in building high-performance consumer products from scratch and a passion for conscious innovation and impactful technology will be highly valued. As part of the team at AjnaLens, you will create a robust, scalable backend for the next-gen B2C platform. You will develop real-time, intelligent user experiences that combine performance with personalization, in alignment with AjnaLens" vision of building mindfully at scale. Join us in co-creating the future of conscious technology with Lenskart as our strategic investor.,

Posted 2 weeks ago

Apply

11.0 - 15.0 years

0 Lacs

hyderabad, telangana

On-site

As an AI Azure Architect, your primary responsibility will be to develop the technical vision for AI systems that cater to the existing and future business requirements. This involves architecting end-to-end AI applications, ensuring seamless integration with legacy systems, enterprise data platforms, and microservices. Collaborating closely with business analysts and domain experts, you will translate business objectives into technical requirements and AI-driven solutions. Additionally, you will partner with product management to design agile project roadmaps aligning technical strategies with market needs. Coordinating with data engineering teams is essential to ensure smooth data flows, quality, and governance across different data sources. Your role will also involve leading the design of reference architectures, roadmaps, and best practices for AI applications. Evaluating emerging technologies and methodologies to recommend suitable innovations for integration into the organizational strategy is a crucial aspect of your responsibilities. You will be required to identify and define system components such as data ingestion pipelines, model training environments, CI/CD frameworks, and monitoring systems. Leveraging containerization (Docker, Kubernetes) and cloud services will streamline the deployment and scaling of AI systems. Implementation of robust versioning, rollback, and monitoring mechanisms to ensure system stability, reliability, and performance will be part of your duties. Moreover, you will oversee the planning, execution, and delivery of AI and ML applications, ensuring they are completed within budget and timeline constraints. Managing project goals, allocating resources, and mitigating risks will fall under your project management responsibilities. You will be responsible for overseeing the complete lifecycle of AI application developmentfrom conceptualization and design to development, testing, deployment, and post-production optimization. Emphasizing security best practices during each development phase, focusing on data privacy, user security, and risk mitigation, is crucial. In addition to technical skills, the ideal candidate for this role should possess key behavioral attributes such as the ability to mentor junior developers, take ownership of project deliverables, and contribute towards risk mitigation. Understanding business objectives and functions to support data needs is also essential. Mandatory technical skills for this position include a strong background in working with agents using langgraph, autogen, and CrewAI. Proficiency in Python, along with knowledge of machine learning libraries like TensorFlow, PyTorch, and Keras, is required. Experience with cloud computing platforms (AWS, Azure, Google Cloud Platform), containerization tools (Docker), orchestration frameworks (Kubernetes), and DevOps tools (Jenkins, GitLab CI/CD) is essential. Proficiency in SQL and NoSQL databases, designing distributed systems, RESTful APIs, GraphQL integrations, and event-driven architectures are also necessary. Preferred technical skills include experience with monitoring and logging tools, cutting-edge libraries like Hugging Face Transformers, and large-scale deployment of ML projects. Training and fine-tuning of Large Language Models (LLMs) is an added advantage. Educational qualifications for this role include a Bachelor's/Master's degree in Computer Science, along with certifications in Cloud technologies (AWS, Azure, GCP) and TOGAF certification. The ideal candidate should have 11 to 14 years of relevant work experience in this field.,

Posted 2 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

hyderabad, telangana

On-site

You should have a total of 3-6 years of experience in Server Management Administration, specifically in Linux environments. Your responsibilities will include server installation, maintenance, and decommissioning tasks for both physical and virtual machines. Proficient knowledge of Linux OS operations and experience with hypervisor operations (XEN and VMware) is required. Additionally, you should have expertise in LVM, DNS, LDAP, and IPTables, along with good troubleshooting skills for server downtime situations. It is beneficial to have associate-level knowledge in at least one public cloud platform such as AWS, Azure, or GCP. Basic familiarity with automation tools (e.g., Chef) and scripting languages (e.g., Bash, Perl) is desirable. Knowledge or experience in using GitHub and Jira would be advantageous. Experience with monitoring tools like Prometheus in a Docker setup or ELK stack is preferred but not mandatory. Understanding of central infrastructure services such as Security Groups, RPM, Rsync, Mail, Active Directory, Chef automation, Nagios, and repository management is recommended. You will be responsible for providing knowledge transfer to 2nd level technical support on specific technology areas. In terms of responsibilities, you will deliver 3rd level technical support in accordance with customer SLAs. Daily compute operations, including events, incidents, service requests, change requests, and root cause analysis, must follow ITIL processes. You will handle server operations, monitoring, decommissioning, special configurations, storage operations (based on netapp-nfs), package/repository management, and OS image maintenance. Patch management using BladeLogic Automation, server-related monitoring/alert configuration, performance testing, and analysis are also part of your role. You will be involved in root cause analysis for customer incidents, service/task documentation, testing, quality management, and knowledge transfer. Additionally, you will support 2nd level technical support in complex tasks and be willing to participate in an OnCall setup, which may include weekends or public holidays as needed by the project. If you have any questions regarding the job description, you can reach out to the Recruiter named Santhosh Koyada at santhosh.koyada@bs.nttdata.com. NTT DATA Business Solutions is a rapidly growing international IT company and a leading SAP partner, offering a full range of services from business consulting to SAP solutions implementation, hosting services, and support.,

Posted 2 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

chennai, tamil nadu

On-site

You are seeking a hands-on backend expert to elevate your FastAPI-based platform to the next level by developing production-grade model-inference services, agentic AI workflows, and seamless integration with third-party LLMs and NLP tooling. In this role, you will be responsible for various key areas: 1. Core Backend Enhancements: - Building APIs - Strengthening security with OAuth2/JWT, rate-limiting, SecretManager, and enhancing observability through structured logging and tracing - Adding CI/CD, test automation, health checks, and SLO dashboards 2. Awesome UI Interfaces: - Developing UI interfaces using React.js/Next.js, Redact/Context, and various CSS frameworks like Tailwind, MUI, Custom-CSS, and Shadcn 3. LLM & Agentic Services: - Designing micro/mini-services to host and route to platforms such as OpenAI, Anthropic, local HF models, embeddings & RAG pipelines - Implementing autonomous/recursive agents that orchestrate multi-step chains for Tools, Memory, and Planning 4. Model-Inference Infrastructure: - Setting up GPU/CPU inference servers behind an API gateway - Optimizing throughput with techniques like batching, streaming, quantization, and caching using tools like Redis and pgvector 5. NLP & Data Services: - Managing the NLP stack with Transformers for classification, extraction, and embedding generation - Building data pipelines to combine aggregated business metrics with model telemetry for analytics You will be working with a tech stack that includes Python, FastAPI, Starlette, Pydantic, Async SQLAlchemy, Postgres, Docker, Kubernetes, AWS/GCP, Redis, RabbitMQ, Celery, Prometheus, Grafana, OpenTelemetry, and more. Experience in building production Python REST APIs, SQL schema design in Postgres, async patterns & concurrency, UI application development, RAG, LLM/embedding workflows, cloud container orchestration, and CI/CD pipelines is essential for this role. Additionally, experience with streaming protocols, NGINX Ingress, SaaS security hardening, data privacy, event-sourced data models, and other related technologies would be advantageous. This role offers the opportunity to work on evolving products, tackle real challenges, and lead the scaling of AI services while working closely with the founder to shape the future of the platform. If you are looking for meaningful ownership and the chance to solve forward-looking problems, this role could be the right fit for you.,

Posted 2 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies