Jobs
Interviews

1633 Grafana Jobs - Page 11

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

7.0 - 10.0 years

18 - 20 Lacs

Hyderabad

Remote

Java + AI/ML role required with at least 6+ years of industry experience on Java, Spring Boot, Spring Data & at least 2 years of AI/ML project / professional experience. Strong experience in building and consuming REST APIs and asynchronous messaging (Kafka/RabbitMQ). Working experience in integrating AI/ML models into Java services or calling external ML endpoints (REST/gRPC). Understanding of ML lifecycle: training, validation, inference, monitoring, and retraining. Familiarity with tools like TensorFlow, PyTorch, Scikit-Learn, or ONNX. Prior experience in domain-specific ML implementations (e.g., fraud detection, recommendation systems, NLP chatbots) Experience working with data formats like JSON, Parquet, Avro, and CSV. Solid understanding of database systems both SQL (PostgreSQL, MySQL) and NoSQL (Redis). Integrate machine learning models (batch and real-time) into backend systems and APIs. Optimize and automate AI/ML workflows using MLOps best practices. Monitor and manage model performance, versioning, and rollbacks. Collaborate with cross-functional teams (DevOps, SRE, Product Engineering) to ensure seamless deployment. Exposure to MLOps tools like MLflow, Kubeflow, or Seldon. Experience with any 1 of the cloud platforms, preferably AWS & Knowledge of observability tools & its metrics, events, logs, and traces (for e.g., Prometheus, Grafana, Open Telemetry, Splunk, Data Dog, App Dynamics, etc..).

Posted 1 week ago

Apply

3.0 - 8.0 years

10 - 20 Lacs

Hyderabad, Ahmedabad, Bengaluru

Work from Office

SUMMARY Sr. Site Reliability Engineer Keep Planet-Scale Systems Reliable, Secure, and Fast (On-site only) At Ajmera Infotech , we build planet-scale platforms for NYSE-listed clients from HIPAA-compliant health systems to FDA-regulated software that simply cannot fail. Our 120+ elite engineers design, deploy, and safeguard mission-critical infrastructure trusted by millions. Why You’ll Love It Dev-first SRE culture automation, CI/CD, zero-toil mindset TDD, monitoring, and observability baked in not bolted on Code-first reliability script, ship, and scale with real ownership Mentorship-driven growth with exposure to regulated industries (HIPAA, FDA, SOC2) End-to-end impact own infra across Dev and Ops Requirements Key Responsibilities Architect and manage scalable, secure Kubernetes clusters (k8s/k3s) in production Develop scripts in Python, PowerShell, and Bash to automate infrastructure operations Optimize performance, availability, and cost across cloud environments Design and enforce CI/CD pipelines using Jenkins, Bamboo, GitHub Actions Implement log monitoring and proactive alerting systems Integrate and tune observability tools like Prometheus and Grafana Support both development and operations pipelines for continuous delivery Manage infrastructure components including Artifactory, Nginx, Apache, IIS Drive compliance-readiness across HIPAA, FDA, ISO, SOC2 Must-Have Skills 3 8 years in SRE or infrastructure engineering roles Kubernetes (k8s/k3s) production experience Scripting: Python, PowerShell, Bash CI/CD tools: Jenkins, Bamboo, GitHub Actions Experience with log monitoring, alerting, and observability stacks Cross-functional pipeline support (Dev + Ops) Tooling: Artifactory, Nginx, Apache, IIS Performance, availability, and cost-efficiency tuning Nice-to-Have Skills Background in regulated environments (HIPAA, FDA, ISO, SOC2) Multi-OS platform experience Integration of Prometheus, Grafana, or similar observability platforms Benefits What We Offer Competitive salary package with performance-based bonuses. Comprehensive health insurance for you and your family. Flexible working hours and generous paid leave . High-end workstations and access to our in-house device lab. Sponsored learning: certifications, workshops, and tech conferences.

Posted 1 week ago

Apply

4.0 - 5.0 years

3 - 6 Lacs

Hyderabad

Work from Office

Job Summary: We are looking for an experienced and detail-oriented PostgreSQL Database Administrator (DBA) to manage and maintain our database systems. The ideal candidate will have a strong background in PostgreSQL administration, performance tuning, backup and recovery, and high availability solutions. Key Responsibilities: Install, configure, upgrade, and maintain PostgreSQL database servers. Monitor database performance, implement changes, and apply new patches and versions when required. Ensure high availability, backup, and disaster recovery strategies are in place and tested. Perform regular database maintenance tasks including re-indexing, vacuuming, and tuning. Manage database access, roles, and permissions securely. Write and maintain scripts for automation of routine database tasks. Work closely with developers to optimize queries and schema design. Troubleshoot and resolve database-related issues promptly. Implement and monitor replication strategies (logical and physical replication). Perform regular security assessments and apply best practices to secure data. Participate in on-call rotation and provide production support as needed. Required Skills: Minimum 4 years of hands-on experience with PostgreSQL administration. Strong experience in performance tuning and query optimization. Experience with database backup, restore, and disaster recovery planning. Good understanding of PostgreSQL internals. Familiarity with tools like pgAdmin , pgBouncer , pgBackRest , or Patroni . Knowledge of Linux/Unix systems for managing PostgreSQL on those platforms. Experience with shell scripting and automation tools. Basic understanding of cloud platforms like AWS/GCP/Azure (RDS, Aurora, etc.) is a plus. Knowledge of monitoring tools like Prometheus , Grafana , or similar.

Posted 1 week ago

Apply

4.0 - 7.0 years

13 - 17 Lacs

Bengaluru

Work from Office

Overview This is an opportunity to join a growing team in a dynamic new business opportunity within Zebra. Zebra Technologies located in Bangalore revolutionizes the way work is done in large warehouses and distribution centers with innovative robotic solutions. Our team members include robotics and software experts with multidisciplinary backgrounds. We are seeking software engineers to join the Zebra Team! As the key backend developer, you will be asked to interpret the business requirements as provided by the product owner and translate those requirements into a design, an application and supporting DB schema. Careful design with solid implementations will be key to success in this role. Position Description: Zebra is looking for passionate, talented software engineers to join the tight knit team that designs and builds our cloud robotics platform which powers fleets of robots deployed at our customer sites. This is a very specific and remarkable greenfield opportunity. You will partner closely with cloud software, infrastructure and robotics teams Responsibilities Design, develop and deploy core components of our next generation cloud robotics platform. Work closely with the product and other engineering teams to envision and drive the technical roadmap for cloud robotics platform Influence larger team's technical and collaborative culture and processes by growing and mentoring other engineers Work with business analyst to understand our customer’s business processing needs. Work with the DevOps teams to ensure a smooth, automated test and deployment for the application. Qualifications Qualifications: Preferred Education:Bachelor's degree 3-6 years of relevant work experience in server side and/or data engineering development. Outstanding abstraction skills to decouple complex problems into simple concepts. Software development using programming languages Go, Python Strong knowledge in any one of the cloud technologies like AWS/Google Cloud/Azure Experience with automated testing, Unit testing, deployment pipelines and cloud based infrastructure. Experience in Docker, Kubernetes Strong understanding of databases, NoSQL data stores, storage and distributed persistence technologies Experience with cloud-based architectures: SaaS, Micro-Services Experience in Design principle patterns & system design Proficient understanding of code versioning tools, such as Git Strong knowledge on data structures, algorithms and problem-solving skills Passionate about enabling next generation experiences Good Analytical, Problem solving and Debugging skills. Experience in greenfield architecture and well-crafted, elegant systems. Demonstrated ability to find the best solution for the problem at hand. Highly collaborative and open working style. Desired: Good understanding on REST, gRPC and GraphQL Experience with robotics and embedded systems development. Experience with 3rd party data platforms (e.g., Snowflake, Spark, Databricks). Experience integrating with 3rd party monitoring tools (e.g., Prometheus, Kibanna, Grafana). Experience working at deployment and infrastructure level. What We Offer: Competitive salary. Zebra Incentive Program (annual performance bonus) Zebra’s GEM appreciation/recognition program Flexible time off - work hard & play hard. Awesome company culture and ability to work with robots! Zebra’s culture is encouraging and collaborative where our employees are encouraged to learn and grow together. As we celebrate our 5 decades of success, this is a phenomenal time to join us and make your mark for the 6th one. We are excited to hear from you!”

Posted 1 week ago

Apply

3.0 - 5.0 years

5 - 7 Lacs

Visakhapatnam, Onsite

Work from Office

Reports To: Senior Engineer/Team Lead Job Overview We are looking for dedicated back-end engineers to join our team and contribute to our server- side development processes. You will be responsible for designing and maintaining scalable web services, managing databases and collaborating with stakeholders to ensure seamless integration between front end and back end. Key Responsibilities 1. Develop and maintain server-side applications. 2. Build scalable and secure web services using backend programming languages like .NET, Python, Java, and Node.js. Manage databases and data storage. 3. Design and optimize databases on PostGRESQL MySQL, MongoDB, or SQL Server while ensuring secure and reliable data management. 4. Collaborate with team members. Work closely with front-end developers, designers, and project managers to ensure alignment between server-side functionality and user interfaces. 5. Implement APIs and frameworks. Design and implement RESTful APIs to facilitate communication between server-side applications and end-user systems. 6. Conduct troubleshooting and debugging. Identify and resolve performance bottlenecks, security vulnerabilities, and server-side errors to maintain system stability. 7. Optimize scalability and workflow. Develop reusable code and scalable solutions to accommodate future growth. 8. Integrate core backend systems with multiple external parties. 9. Perform test driven development. 10. Develop systems with logging and observability as core tenets. Key Technical Requirements 1. Programming Languages Proficient in at least one server-side language: Python, Java, Node.js, Go, or .NET Core. Writing clean, modular, and scalable code. 2. Frameworks & Libraries Experience with backend frameworks like: Python: Flask, FastAPI, Django Java: Spring Boot Node.js: Express.js, NestJS 3. API Development Strong expertise in designing and implementing RESTful APIs and GraphQL APIs. Understanding of API authentication (API Keys). Familiar with API documentation tools (Swagger/OpenAPI). 4. Database Management Experience with RDBMS (PostgreSQL, MySQL, MS SQL) and NoSQL databases (MongoDB). Writing optimized queries and knowledge of schema design and indexing. 5. Microservices Architecture Understanding and experience in building scalable microservices. Knowledge of message brokers like Kafka, RabbitMQ 6. Security Best Practices Knowledge of securing APIs (rate limiting, CORS, input sanitization). 7. Cloud & Experience with cloud platforms like AWS, Azure, or GCP. Containerization using Docker and orchestration with Kubernetes. (Bonus Skillset) CI/CD pipelines using tools like Jenkins, GitHub Actions, or GitLab CI. 8. Version Control Proficient in using Git, GitHub/GitLab/Bitbucket workflows. 9. Testing & Debugging Writing unit, integration, and performance tests using frameworks like PyTest, JUnit, Mocha, or Postman. Proficient in using debugging and profiling tools. 10. Monitoring & Logging Familiarity with logging frameworks (ELK Stack, Prometheus, Grafana). Error monitoring with tools like Sentry, Datadog, or New Relic. 11. Agile Development Comfortable working in Agile/Scrum teams. Soft Skills 1. Strong communication and stakeholder management. 2. Ability to work as an individual contributor and team member. 3. Problem solving.

Posted 1 week ago

Apply

8.0 - 13.0 years

25 - 30 Lacs

Hyderabad

Work from Office

Job Description: Mandatory Skills: AWS, CI/CD, Jenkins, Chef, Terraform Good to Have Skills: Scripting Skills Experience 8+ years Only

Posted 1 week ago

Apply

5.0 - 8.0 years

9 - 13 Lacs

Bengaluru

Work from Office

About The Role About The Role : Proficiency in problem solving and troubleshooting technical issues. Willingness to take ownership and strive for the best solutions. Experience in using performance analysis tools, such as Android Profiler, Traceview, perfetto, and Systrace etc. Strong understanding of Android architecture, memory management, and threading. Strong understanding of Android HALs, Car Framework, Android graphics pipeline, DRM, Codecs. Good knowledge in Hardware abstraction layers in Android and/or Linux. Good understanding of the git, CI/CD workflow Experience in agile based projects. Experience with Linux as a development platform and target Extensive experience with Jenkins and Gitlab CI system Hands-on experience with GitLab, Jenkins, Artifactory, Grafana, Prometheus and/or Elastic Search. Experience with different testing frameworks and their implementation in CI system Programming using C/C++, Java/Kotlin, Linux. Yocto and its use in CI Environments Familiarity with ASPICE Works in the area of Software Engineering, which encompasses the development, maintenance and optimization of software solutions/applications.1. Applies scientific methods to analyse and solve software engineering problems.2. He/she is responsible for the development and application of software engineering practice and knowledge, in research, design, development and maintenance.3. His/her work requires the exercise of original thought and judgement and the ability to supervise the technical and administrative work of other software engineers.4. The software engineer builds skills and expertise of his/her software engineering discipline to reach standard software engineer skills expectations for the applicable role, as defined in Professional Communities.5. The software engineer collaborates and acts as team player with other software engineers and stakeholders. - Grade Specific Is highly respected, experienced and trusted. Masters all phases of the software development lifecycle and applies innovation and industrialization. Shows a clear dedication and commitment to business objectives and responsibilities and to the group as a whole. Operates with no supervision in highly complex environments and takes responsibility for a substantial aspect of Capgeminis activity. Is able to manage difficult and complex situations calmly and professionally. Considers the bigger picture when making decisions and demonstrates a clear understanding of commercial and negotiating principles in less-easy situations. Focuses on developing long term partnerships with clients. Demonstrates leadership that balances business, technical and people objectives. Plays a significant part in the recruitment and development of people. Skills (competencies) Verbal Communication

Posted 1 week ago

Apply

4.0 - 7.0 years

6 - 10 Lacs

Bengaluru

Work from Office

About The Role About The Role : Proficiency in problem solving and troubleshooting technical issues. Willingness to take ownership and strive for the best solutions. Experience in using performance analysis tools, such as Android Profiler, Traceview, perfetto, and Systrace etc. Strong understanding of Android architecture, memory management, and threading. Strong understanding of Android HALs, Car Framework, Android graphics pipeline, DRM, Codecs. Good knowledge in Hardware abstraction layers in Android and/or Linux. Good understanding of the git, CI/CD workflow Experience in agile based projects. Experience with Linux as a development platform and target Extensive experience with Jenkins and Gitlab CI system Hands-on experience with GitLab, Jenkins, Artifactory, Grafana, Prometheus and/or Elastic Search. Experience with different testing frameworks and their implementation in CI system Programming using C/C++, Java/Kotlin, Linux. Yocto and its use in CI Environments Familiarity with ASPICE Works in the area of Software Engineering, which encompasses the development, maintenance and optimization of software solutions/applications.1. Applies scientific methods to analyse and solve software engineering problems.2. He/she is responsible for the development and application of software engineering practice and knowledge, in research, design, development and maintenance.3. His/her work requires the exercise of original thought and judgement and the ability to supervise the technical and administrative work of other software engineers.4. The software engineer builds skills and expertise of his/her software engineering discipline to reach standard software engineer skills expectations for the applicable role, as defined in Professional Communities.5. The software engineer collaborates and acts as team player with other software engineers and stakeholders. - Grade Specific Is fully competent in it's own area and has a deep understanding of related programming concepts software design and software development principles. Works autonomously with minimal supervision. Able to act as a key contributor in a complex environment, lead the activities of a team for software design and software development. Acts proactively to understand internal/external client needs and offers advice even when not asked. Able to assess and adapt to project issues, formulate innovative solutions, work under pressure and drive team to succeed against its technical and commercial goals. Aware of profitability needs and may manage costs for specific project/work area. Explains difficult concepts to a variety of audiences to ensure meaning is understood. Motivates other team members and creates informal networks with key contacts outside own area. Skills (competencies) Verbal Communication

Posted 1 week ago

Apply

3.0 - 5.0 years

7 - 11 Lacs

Bengaluru

Work from Office

About The Role Build, manage and optimize scalable and reliable infrastructure platforms. Collaborate with cross-functional teams to automate operations, ensure system resilience, and enable seamless deployment of applications. You would get an opportunity to work on a platform that scales to 10 billion minute meetings per month TechnologiesLinux , CentOs and Alma internals, serviceability , containerization, docker, CI/CD processes to support fast-paced development and delivery cycles. Required Qualifications: 3-5 years of experience in container orchestration platforms like Docker , Kubernetes is good to have. Linux Administration experience including deep understanding of Linux operating systems, serviceability, bootup process, bootloader installation and configuration (grub), shell scripting, and system-level troubleshooting. Version Control System Experience working with GitHub repositories, managing pull requests, branching strategies, GitHub Enterprise and automation using GitHub APIs or GitHub CLI. Experience in shell and python scripting for scripts . Understanidng of SQL databases like Informix , Musql Working knowledge of IP addressing, routing, DNS, and load balancing concepts. 2+ years of experience in designing and managing Continuous Integration/Continuous Delivery (CI/CD) pipelines using industry-standard tools. Experience in usage of monitoring/logging solutions Grafana , Kibana) and able to identify and debug system issues. Demonstrated ability to analyze issues, identify root causes, and implement effective solutions efficiently. Desired Skills: Strong collaboration skills to work closely with developers, DevOps, and operations teams. Experience in cloud platforms (AWS) and managing hybrid cloud environments. Exposure to security best practices and docker requirements in platform engineering. Strong coding skills in languages such as shell script / Python . Primary Skills Kubernetes, Docker, CICD pipeline, IaC, linux bootup process, bootloader installation & configuration (grub ), shell , python, Linux Administration, networking, network monitoring

Posted 1 week ago

Apply

1.0 - 3.0 years

4 - 5 Lacs

Pune

Work from Office

We are urgently looking for an experienced Linux System Administrator with 2-3 years of experience in the same field. You should be having a good knowledge of Linux, Windows servers and firewalls. Immediate joiners are preferred. Job Responsibilities: Ubuntu / Windows end user PC installation, configuration, maintenance, software deployment, security updates and patches. Installation, configuration, and maintenance of Ubuntu & Windows VMs(Citrix)/servers. Installation, configuration, and maintenance of databases servers like MySQL, MongoDB and PostgreSQL with replication setup. Installation, configuration, and maintenance of LDAP, FTP/SFTP servers on the Linux platform. To ensure the availability, performance, scalability and security of server infrastructure. Hands on experience in shell scripting. Hands on experience in working with GIT repositories (Branching, merging, versioning). Incident analysis/RCAs/troubleshooting and identification of permanent resolutions. Monitoring servers, infra using tools like Nagios/Grafana Troubleshoot problems reported by users and analyse and isolate issues. Research and Identify new technologies which can improve our service offerings. Required Qualification: B.Tech / B.E. / MCA / MCM / Graduate / Post Graduate in any Specialization. Experience: Candidate with 2-3 years of relevant experience. Job location: Pune (Work from office)

Posted 1 week ago

Apply

5.0 - 7.0 years

25 - 40 Lacs

Bengaluru

Work from Office

Roles and Responsibilities: Ensure the ongoing stability, scalability, and performance of PhonePes Hadoop ecosystem and associated services. Exhibit a high level of ownership and accountability that ensures reliability of the Distributed clusters. Manage and administer Hadoop infrastructure including Apache Hadoop,HDFS, HBase, Hive, Pig, Airflow, YARN, Ranger, Kafka, Pinot,Ozone and Druid. Automate BAU operations through scripting and tool development. Perform capacity planning, system tuning, and performance optimization. Set-up, configure, and manage Nginx in high-traffic environments. Administration and troubleshooting of Linux + Bigdata systems, including networking (IP, Iptables, IPsec). Handle on-call responsibilities, investigate incidents, perform root cause analysis, and implement mitigation strategies. Collaborate with infrastructure, network, database, and BI teams to ensure data availability and quality. Apply system updates, patches, and manage version upgrades in coordination with security teams. Build tools and services to improve observability, debuggability, and supportability. Enabling cluster security using Kerberos and LDAP. Experience in capacity planning and performance tuning of Hadoop clusters. Work with configuration management and deployment tools like Puppet, Chef, Salt, or Ansible. Preferred candidate profile: Minimum 1 year of Linux/Unix system administration experience. Over 4 years of hands-on experience in Apache Hadoop administration. Minimum 1 years of experience managing infrastructure on public cloud platforms like AWS, Azure, or GCP (optional ) . Strong understanding of networking, open-source tools, and IT operations. Proficient in scripting and programming (Perl, Golang, or Python). Hands-on experience with maintaining and managing the Hadoop ecosystem components like HDFS, Yarn, Hbase, Kafka . Strong operational knowledge in systems (CPU, memory, storage, OS-level troubleshooting). Experience in administering and tuning relational and NoSQL databases. Experience in configuring and managing Nginx in production environments. Excellent communication and collaboration skills. Good to Have Experience designing and maintaining Airflow DAGs to automate scalable and efficient workflows. Experience in ELK stack administration. Familiarity with monitoring tools like Grafana, Loki, Prometheus, and OpenTSDB. Exposure to security protocols and tools (Kerberos, LDAP). Familiarity with distributed systems like elasticsearch or similar high-scale environments

Posted 1 week ago

Apply

7.0 - 11.0 years

35 - 50 Lacs

Bengaluru

Work from Office

About the Role: This role is responsible for managing and maintaining complex, distributed big data ecosystems. It ensures the reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include automating processes, optimizing workflows, troubleshooting production issues, and driving system improvements across multiple business verticals. Roles and Responsibilities: Manage, maintain, and support incremental changes to Linux/Unix environments. Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes. Design and implement automation systems for managing big data infrastructure, including provisioning, scaling, upgrades, and patching clusters. Troubleshoot and resolve complex production issues while identifying root causes and implementing mitigating strategies. Design and review scalable and reliable system architectures. Collaborate with teams to optimize overall system/cluster performance. Enforce security standards across systems and infrastructure. Set technical direction, drive standardization, and operate independently. Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning. Resolve, analyze, and respond to system outages and disruptions and implement measures to prevent similar incidents from recurring. Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency and improving system resilience. Monitor and optimize system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle. Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities. Develop and enforce SRE best practices and principles. Align across functional teams on priorities and deliverables. Drive automation to enhance operational efficiency. Adapt new technologies as and when the need arises and define architectural recommendations for new tech stacks. Preferred candidate profile Over 6 years of experience managing and maintaining distributed big data ecosystems. Strong expertise in Linux including IP, Iptables, and IPsec. Proficiency in scripting/programming with languages like Perl, Golang, or Python. Hands-on experience with the Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot). Familiarity with open-source configuration management and deployment tools such as Puppet, Salt, Chef, or Ansible. Solid understanding of networking, open-source technologies, and related tools. Excellent communication and collaboration skills. DevOps tools: Saltstack, Ansible, docker, Git. SRE Logging and monitoring tools: ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry. Good to Have: Experience managing infrastructure on public cloud platforms (AWS, Azure, GCP). Experience in designing and reviewing system architectures for scalability and reliability. Experience with observability tools to visualize and alert on system performance. Experience in massive petabyte scale data migrations, massive upgrades

Posted 1 week ago

Apply

3.0 - 8.0 years

6 - 12 Lacs

Pune

Work from Office

Greeting From Peoplefy !! We are hiring for one of our MNC Client based out of Pune , Yerawada location Immediate Joiner Only .Net or Java Expertise in MS SQL Server ITIL Process Monitoring tools Candidates with application or production support experiencE Interested candidates for above position kindly share your CVs on gayatri.pat @peoplefy.com with below details - Experience : CTC : Expected CTC : Notice Period : Location :

Posted 1 week ago

Apply

5.0 - 10.0 years

0 - 0 Lacs

Bengaluru

Work from Office

Devops Lead- 5 to 10 yrs 1. Lead team, mentor and execute the Pipeline strategy 2. Strong proficiency in Python or Groovy. 3. CI/CD Tools: Extensive experience with GitLab CI/CD or Jenkins 4. Devops/Cloud experience 5. Containerization: Proficient in Docker 6. Shared Libraries: Proven ability to write and manage shared libraries for pipeline automation. 7. Experience in Grafana- Added advantage

Posted 1 week ago

Apply

2.0 - 4.0 years

4 - 6 Lacs

Chennai

Work from Office

Job Description/Preferred Qualifications We are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success. Responsibilities : Design, implement, and maintain scalable and reliable machine learning infrastructure. Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production. Develop and maintain CI/CD pipelines for machine learning workflows. Monitor and optimize the performance of machine learning systems and infrastructure. Implement and manage automated testing and validation processes for machine learning models. Ensure the security and compliance of machine learning systems and data. Troubleshoot and resolve issues related to machine learning infrastructure and workflows. Document processes, procedures, and best practices for machine learning operations. Stay up-to-date with the latest developments in MLOps and related technologies. Required Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field. Proven experience as a Site Reliability Engineer (SRE) or in a similar role. Strong knowledge of machine learning concepts and workflows. Proficiency in programming languages such as Python, Java, or Go. Experience with cloud platforms such as AWS, Azure, or Google Cloud. Familiarity with containerization technologies like Docker and Kubernetes. Experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI. Strong problem-solving skills and the ability to troubleshoot complex issues. Excellent communication and collaboration skills. Preferred Qualifications: Master's degree in Computer Science, Engineering, or a related field. Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn. Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow. Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible. Experience with automated testing frameworks for machine learning models. Knowledge of security best practices for machine learning systems and data. Minimum Qualifications Master's Level Degree or Bachelor's Level Degree and related work experience of 2 years

Posted 1 week ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

At EG, we are dedicated to developing software solutions that enable our customers to focus on their profession while we handle the intricacies of technology. Our industry-specific software is crafted by professionals who understand the sector intricately, supported by the stability, innovation, and security provided by EG. We are on a mission to drive industries forward by addressing significant challenges such as resource optimization, efficiency enhancement, and sustainability promotion. With a thriving global workforce exceeding 3000 employees, including a 700+ strong team located in Mangaluru, India, we foster a people-first culture that encourages innovation, collaboration, and continuous learning. We invite individuals to join us in the journey of creating software that serves people rather than making people work for it. EG Healthcare, a division of EG, is dedicated to building intelligent solutions that enhance healthcare services across the Nordics and Europe. Our goal is to simplify complexities, empower care providers, and improve patient outcomes through technological innovation. Our core values revolve around collaboration, curiosity, and purpose-driven progress. As a Senior Software Developer at EG Healthcare, you will leverage your passion for software engineering, backed by over 8 years of experience, to develop impactful solutions using cutting-edge technologies. Your role will involve designing and implementing robust, scalable software solutions utilizing Java and associated technologies. You will be responsible for creating and maintaining RESTful APIs for seamless system integration, collaborating with diverse teams to deliver high-impact features, ensuring code quality through best practices and testing, and contributing to architectural decisions and technical enhancements. Key Responsibilities: - Design and develop robust, scalable software solutions using Java and related technologies. - Build and maintain RESTful APIs for seamless integration across systems. - Collaborate with cross-functional teams to deliver high-impact features. - Ensure code quality through best practices, automated testing, and peer reviews. - Utilize Docker, Kubernetes, and CI/CD pipelines for modern DevOps workflows. - Troubleshoot issues efficiently and provide timely solutions. - Contribute to architectural decisions and technical improvements. Must-Have Skills: - Proficiency in Java (8+ years of professional experience). - Experience with Spring Boot for backend service development. - Strong understanding of REST API design and implementation. - Familiarity with Docker and Kubernetes. - Exposure to Event Sourcing / CQRS patterns. - Hands-on experience with front-end technology, preferably React.js. - Proficient in relational databases, Git, and testing practices. Good-to-Have Skills: - Knowledge of ElasticSearch for advanced search and indexing. - Experience with Axon Framework for distributed systems in Java. - Familiarity with tools like Grafana for observability. - Exposure to ArgoCD for GitOps-based deployments. - Full-stack mindset or experience collaborating with front-end teams. Who You Are: - Analytical and structured thinker. - Reliable, self-driven, and team-oriented. - Strong communication skills. - Eager to contribute to a meaningful mission in healthcare. What We Offer: - Competitive salary and benefits. - Opportunities for professional growth. - Collaborative, innovative work culture. Join us at EG Healthcare and become a part of a team that is dedicated to building smarter healthcare solutions for the future.,

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

pune, maharashtra

On-site

As a DataOps Engineer, you will be responsible for designing and maintaining scalable ML model deployment infrastructure using Kubernetes and Docker. Your role will involve implementing CI/CD pipelines for ML workflows, ensuring security best practices are followed, and setting up monitoring tools to track system health, model performance, and data pipeline issues. You will collaborate with cross-functional teams to streamline the end-to-end lifecycle of data products and identify performance bottlenecks and data reliability issues in the ML infrastructure. To excel in this role, you should have strong experience with Kubernetes and Docker for containerization and orchestration, hands-on experience in ML model deployment in production environments, and proficiency with orchestration tools like Airflow or Luigi. Familiarity with monitoring tools such as Prometheus, Grafana, or ELK Stack, knowledge of security protocols, CI/CD pipelines, and DevOps practices in a data/ML environment are essential. Exposure to cloud platforms like AWS, GCP, or Azure is preferred. Additionally, experience with MLflow, Seldon, or Kubeflow, knowledge of data governance, lineage, and compliance standards, and understanding of data pipelines and streaming frameworks would be advantageous in this role. Your expertise in data pipelines, Docker, Grafana, Airflow, CI/CD pipelines, orchestration tools, cloud platforms, compliance standards, data governance, ELK Stack, Kubernetes, lineage, ML, streaming frameworks, ML model deployment, and DevOps practices will be key to your success in this position.,

Posted 1 week ago

Apply

5.0 - 9.0 years

0 Lacs

karnataka

On-site

You will be joining our client's team as a Site Reliability Engineer, where your main responsibility will be to ensure the reliability and uptime of critical services. Your focus will include Kubernetes administration, CentOS servers, Java application support, incident management, and change management. The ideal candidate for this role will have strong experience with ArgoCD for Kubernetes management, Linux skills, basic scripting knowledge, and familiarity with modern monitoring, alerting, and automation tools. We are looking for a self-motivated individual with excellent communication skills, both oral and written, who can work effectively both independently and collaboratively. Your responsibilities will include monitoring, maintaining, and managing applications on CentOS servers to ensure high availability and performance. You will be conducting routine tasks for system and application maintenance and following SOPs to correct or prevent issues. Responding to and managing running incidents, including post-mortem meetings, root cause analysis, and timely resolution will also be part of your responsibilities. Additionally, you will be monitoring production systems, applications, and overall performance, using tools to detect abnormal behaviors in the software and collecting information to help developers understand the issues. Security checks, running meetings with business partners, writing and maintaining policy and procedure documents, writing scripts or code as necessary, and learning from post-mortems to prevent new incidents are also key aspects of the role. Technical skills required for this position include: - 5+ years of experience in a SaaS and Cloud environment - Administration of Kubernetes clusters, including management of applications using ArgoCD - Linux scripting to automate routine tasks and improve operational efficiency - Experience with database systems like MySQL and DB2 - Experience as a Linux (CentOS / RHEL) administrator - Understanding of change management procedures and enforcement of safe and compliant changes to production environments - Knowledge of on-call responsibilities and maintaining on-call management tools - Experience with managing deployments using Jenkins - Prior experience with monitoring tools like New Relic, Splunk, and Nagios - Experience with log aggregation tools such as Splunk, Loki, or Grafana - Strong scripting knowledge in one of Python, Ruby, Bash, Java, or GoLang - Experience with API programming and integrating tools like Jira, Slack, xMatters, or PagerDuty If you are a dedicated professional who thrives in a high-pressure environment and enjoys working on critical services, this opportunity could be a great fit for you.,

Posted 1 week ago

Apply

10.0 - 14.0 years

0 Lacs

karnataka

On-site

As a Senior Full Stack Developer (Java + React) in the Fintech / Insurance domain, you will be responsible for leveraging your 10+ years of experience in full-stack development to deliver high-quality solutions. Your technical strengths should include proficiency in Java 11+, Spring Boot, and REST APIs. Additionally, you must possess a strong expertise in React, TypeScript, and modern frontend frameworks. In this role, experience with microservices and micro frontend architecture is crucial, along with cloud deployment experience, preferably on Azure. You should also have knowledge of Kafka, distributed systems, and API gateways. Familiarity with observability tools such as Grafana, ELK, Prometheus, and Splunk is highly desirable. Experience with Strapi CMS and OpenFeature for feature management would be an added advantage. Apart from your technical skills, strong leadership and communication abilities are essential. You should have experience leading Agile development teams and be capable of managing risks, dependencies, and 3rd-party integrations. Confidence in working with cross-functional and remote teams is a key requirement for this role. This is a contractual/temporary position with a contract length of 6 months. The work location is remote, and you must align with Singapore hours. Your role as a Senior Full Stack Developer will involve collaborating with a dynamic team to deliver innovative solutions in the Fintech / Insurance domain.,

Posted 1 week ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineer III at JPMorgan Chase within the Corporate Technology, you will play a crucial role in driving innovation and modernizing complex and mission-critical systems. Your primary responsibility will be to solve intricate business problems by providing simple and effective solutions through code and cloud infrastructure. You will configure, maintain, monitor, and optimize applications and their associated infrastructure while continuously improving existing solutions. Your expertise in end-to-end operations, availability, reliability, and scalability will make you a valuable asset to the team. You will guide and support others in designing appropriate solutions and collaborate with software engineers to implement deployment strategies using automated continuous integration and continuous delivery pipelines. Your role will also involve designing, developing, testing, and implementing availability, reliability, and scalability solutions for applications. Additionally, you will be responsible for implementing infrastructure, configuration, and network as code for the applications and platforms under your purview. Collaboration with technical experts, stakeholders, and team members will be essential in resolving complex issues. You will utilize service level indicators and objectives to proactively address issues before they impact customers. Furthermore, you will support the adoption of site reliability engineering best practices within your team to ensure operational excellence. To qualify for this role, you should have formal training or certification in software engineering concepts along with at least 3 years of applied experience. Proficiency in site reliability principles and experience in implementing site reliability within applications or platforms is required. You should be adept in at least one programming language like Python, Java/Spring Boot, or .Net. Knowledge of software applications and technical processes in disciplines such as Cloud, AI, or Android is also essential. Experience in observability, continuous integration, continuous delivery tools, container technologies, networking troubleshooting, and collaboration within large teams is highly valued. Your proactive approach to problem-solving, eagerness to learn new technologies, and ability to identify innovative solutions will be crucial in this role. Preferred qualifications include experience in the banking or financial domain.,

Posted 1 week ago

Apply

6.0 - 10.0 years

0 Lacs

karnataka

On-site

As a DevOps Engineer at Wabtec Corporation, you will play a crucial role in performing CI/CD and automation design/validation activities. Reporting to the Technical Project Manager and working closely with the software architect, you will be responsible for adhering to internal processes, including coding rules, and documenting implementations accurately. Your focus will be on meeting Quality, Cost, and Time objectives set by the Technical Project Manager. To qualify for this role, you should hold a Bachelor's or Master's degree in engineering in Computer Science with a web option in CS, IT, or a related field. You should have 6 to 10 years of hands-on experience as a DevOps Engineer and possess the following abilities: - A good understanding of Linux systems and networking - Proficiency in CI/CD tools like GitLab - Knowledge of containerization technologies such as Docker - Experience with scripting languages like Bash and Python - Hands-on experience in setting up CI/CD pipelines and configuring Virtual Machines - Familiarity with C/C++ build tools like CMake and Conan - Expertise in setting up pipelines in GitLab for build, Unit testing, and static analysis - Experience with infrastructure as code tools like Terraform or Ansible - Proficiency in monitoring and logging tools such as ELK Stack or Prometheus/Grafana - Strong problem-solving skills and the ability to troubleshoot production issues - A passion for continuous learning and staying up-to-date with modern technologies and trends in the DevOps field - Familiarity with project management and workflow tools like Jira, SPIRA, Teams Planner, and Polarion In addition to technical skills, soft skills are also crucial for this role. You should have a good level of English proficiency, be autonomous, possess good interpersonal and communication skills, have strong synthesis skills, be a solid team player, and be able to handle multiple tasks efficiently. At Wabtec, we are committed to embracing diversity and inclusion. We value the variety of experiences, expertise, and backgrounds that our employees bring and aim to create an inclusive environment where everyone belongs. By fostering a culture of leadership, diversity, and inclusion, we believe that we can harness the brightest minds to drive innovation and create limitless opportunities. If you are ready to join a global company that is revolutionizing the transportation industry and are passionate about driving exceptional results through continuous improvement, then we invite you to apply for the role of Lead/Engineer DevOps at Wabtec Corporation.,

Posted 1 week ago

Apply

1.0 - 9.0 years

0 Lacs

hyderabad, telangana

On-site

As an Associate Manager - Data IntegrationOps, you will play a crucial role in supporting and managing data integration and operations programs within our data organization. Your responsibilities will involve maintaining and optimizing data integration workflows, ensuring data reliability, and supporting operational excellence. To succeed in this position, you will need a solid understanding of enterprise data integration, ETL/ELT automation, cloud-based platforms, and operational support. Your primary duties will include assisting in the management of Data IntegrationOps programs, aligning them with business objectives, data governance standards, and enterprise data strategies. You will also be involved in monitoring and enhancing data integration platforms through real-time monitoring, automated alerting, and self-healing capabilities to improve uptime and system performance. Additionally, you will help develop and enforce data integration governance models, operational frameworks, and execution roadmaps to ensure smooth data delivery across the organization. Collaboration with cross-functional teams will be essential to optimize data movement across cloud and on-premises platforms, ensuring data availability, accuracy, and security. You will also contribute to promoting a data-first culture by aligning with PepsiCo's Data & Analytics program and supporting global data engineering efforts across sectors. Continuous improvement initiatives will be part of your responsibilities to enhance the reliability, scalability, and efficiency of data integration processes. Furthermore, you will be involved in supporting data pipelines using ETL/ELT tools such as Informatica IICS, PowerCenter, DDH, SAP BW, and Azure Data Factory under the guidance of senior team members. Developing API-driven data integration solutions using REST APIs and Kafka, deploying and managing cloud-based data platforms like Azure Data Services, AWS Redshift, and Snowflake, and participating in implementing DevOps practices using tools like Terraform, GitOps, Kubernetes, and Jenkins will also be part of your role. Your qualifications should include at least 9 years of technology work experience in a large-scale, global organization, preferably in the CPG (Consumer Packaged Goods) industry. You should also have 4+ years of experience in Data Integration, Data Operations, and Analytics, as well as experience working in cross-functional IT organizations. Leadership/management experience supporting technical teams and hands-on experience in monitoring and supporting SAP BW processes are also required qualifications for this role. In summary, as an Associate Manager - Data IntegrationOps, you will be responsible for supporting and managing data integration and operations programs, collaborating with cross-functional teams, and ensuring the efficiency and reliability of data integration processes. Your expertise in enterprise data integration, ETL/ELT automation, cloud-based platforms, and operational support will be key to your success in this role.,

Posted 1 week ago

Apply

3.0 - 7.0 years

0 Lacs

pune, maharashtra

On-site

You are a skilled DevOps Specialist with over 3 years of experience, seeking to join a global automotive team with locations in Kochi, Pune, and Chennai. Your primary role will involve managing operations, system monitoring, troubleshooting, and supporting automation workflows to ensure the operational stability and excellence of enterprise IT projects. You will play a crucial part in overseeing critical application environments for leading companies in the automotive industry. Your responsibilities will include performing daily maintenance tasks to ensure application availability and system performance through proactive incident tracking, log analysis, and resource monitoring. Additionally, you will be expected to monitor and respond to tickets raised by the DevOps team or end-users, support users with troubleshooting, maintain detailed incident logs, track SLAs, and prepare root cause analysis reports. You will also assist in scheduled changes, releases, and maintenance activities while identifying and tracking recurring issues. Furthermore, you will be responsible for maintaining process documentation, runbooks, and knowledge base articles, providing regular updates to stakeholders on incidents and resolutions. You will also manage and troubleshoot CI/CD tools such as Jenkins, GitLab, container platforms like Docker and Kubernetes, and cloud services including AWS and Azure. To excel in this role, you should have proficiency in logfile analysis and troubleshooting (ELK Stack), Linux administration, and monitoring tools such as AppDynamics, Checkmk, Prometheus, and Grafana. Experience with security tools like Black Duck, SonarQube, Dependabot, and OWASP is essential. Hands-on experience with Docker, familiarity with DevOps principles, and ticketing tools like ServiceNow are also required. Experience in handling confidential data and safety-sensitive systems, along with strong analytical, communication, and organizational skills, will be beneficial. Additionally, you should possess the ability to work effectively in a team environment. Optional qualifications include experience in the automotive or manufacturing industry, particularly with production management systems, and familiarity with IT process frameworks like SCRUM and ITIL. In summary, as a DevOps Specialist, you will play a vital role in ensuring the operational stability and excellence of enterprise IT projects for leading companies in the automotive industry by managing operations, system monitoring, troubleshooting, and supporting automation workflows. Your expertise in tools and technologies such as ELK Stack, Docker, Jenkins, AWS, and Azure, along with your strong analytical and communication skills, will be instrumental in your success in this role.,

Posted 1 week ago

Apply

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at Cisco ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of the cloud and big data platforms. Your role will involve representing the NADP SRE team, contributing to the technical roadmap, and collaborating with cross-functional teams to design, build, and maintain SaaS systems operating at multi-region scale. Your efforts will be crucial in supporting machine learning (ML) and AI initiatives by ensuring the platform infrastructure is robust, efficient, and aligned with operational excellence. You will be tasked with designing, building, and optimizing cloud and data infrastructure to guarantee high availability, reliability, and scalability of big-data and ML/AI systems. This will involve implementing SRE principles such as monitoring, alerting, error budgets, and fault analysis. Additionally, you will collaborate with various teams to create secure and scalable solutions, troubleshoot technical problems, lead the architectural vision, and shape the technical strategy and roadmap. Your role will also encompass mentoring and guiding teams, fostering a culture of engineering and operational excellence, engaging with customers and stakeholders to understand use cases and feedback, and utilizing your strong programming skills to integrate software and systems engineering. Furthermore, you will develop strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices. To be successful in this role, you should have relevant experience (8-12 yrs) and a bachelor's engineering degree in computer science or its equivalent. You should possess the ability to design and implement scalable solutions, hands-on experience in Cloud (preferably AWS), Infrastructure as Code skills, experience with observability tools, proficiency in programming languages such as Python or Go, and a good understanding of Unix/Linux systems and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure is essential, along with a sense of ownership and accountability in architecting software and infrastructure at scale. Additional qualifications that would be advantageous include experience with the Hadoop Ecosystem, certifications in cloud and security domains, and experience in building/managing a cloud-based data platform. Cisco encourages individuals from diverse backgrounds to apply, as the company values perspectives and skills that emerge from employees with varied experiences. Cisco believes in unlocking potential and creating diverse teams that are better equipped to solve problems, innovate, and make a positive impact.,

Posted 1 week ago

Apply

2.0 - 6.0 years

0 Lacs

haryana

On-site

The Customer Experience team is a vital component in the successful implementation and continued efficiency of GreyOranges warehouse automation solutions. As a Customer Experience Manager, your role is pivotal in serving as the main connection point between clients and internal departments. You are responsible for driving solution optimization, managing operational escalations, ensuring adherence to service level agreements (SLAs), and providing a seamless post-implementation experience. Your primary focus is on enhancing customer satisfaction, maximizing system uptime, and delivering business value through proactive support, data-driven insights, and consistent engagement. Based in Gurgaon, your key responsibilities include managing daily customer interactions with a focus on clear and consistent communication at all touchpoints. You will handle escalations promptly and professionally to ensure swift issue resolution and overall customer satisfaction. Conducting regular business reviews such as Monthly Business Reviews (MBRs), Quarterly Business Reviews (QBRs), and Executive Business Reviews (EBRs) to maintain stakeholder alignment is also a crucial aspect of your role. Additionally, you will be tasked with understanding solution configurations and optimizing them for operational and technical performance. Supporting peak period planning, including volume forecasting, resource allocation, and system preparedness, is essential. Monitoring and reporting on system performance metrics such as hardware and software uptime, throughput, and order accuracy are key components of your responsibilities. Utilizing basic SQL and tools like Grafana and Power BI, you will analyze performance data and generate valuable insights. It is imperative to monitor dashboards for real-time system visibility and performance tracking while owning operational reporting and performance metric updates for both internal and external stakeholders. Driving customer success metrics like Customer Satisfaction (CSAT), Net Promoter Score (NPS), and Annual Maintenance Contract (AMC) renewals is a critical part of your role. You will identify and implement initiatives aimed at enhancing customer onboarding, adoption, and overall satisfaction. Documenting lessons learned, best practices, and creating enablement content for continuous improvement is essential. Collaboration with Product, Solution Architecture, Support, and Program Management teams is crucial to deliver software enhancements and new features successfully. You will plan and coordinate system upgrades, patch deployments, and feature rollouts while aligning with customers on change requests, scope, and delivery timelines. Your performance will be measured against various Key Performance Indicators (KPIs) such as ensuring software and hardware uptime exceeding 90%, achieving higher Mean Time Between Failures (MTBF) and lower Mean Time To Repair (MTTR), maintaining operational KPIs like order fulfillment speed and accuracy above 99%, and meeting Inventory Management standards. Additionally, adherence to SLA-based issue resolution and escalation turnaround times, customer satisfaction and NPS scores, AMC renewal rates, and accuracy and timeliness of reporting and dashboard updates are crucial metrics. To qualify for this role, you should have a minimum of 2-4 years of experience in Customer Experience, Project Coordination, or Technical Account Management. Strong communication and stakeholder management skills are essential, along with familiarity with SQL and analytics platforms like Grafana and Power BI. Understanding customer success metrics such as CSAT, NPS, and renewal rates is required. Experience or interest in warehouse automation, robotics, or supply chain operations is preferred, and the ability to excel in a fast-paced, cross-functional environment is crucial. If you are passionate about revolutionizing global customer experiences in the logistics and supply chain industry, possess the specified skills, and are enthusiastic about the potential of robotics and automation, we encourage you to apply for this role.,

Posted 1 week ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies