Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
5.0 - 8.0 years
20 - 30 Lacs
Hyderabad
Work from Office
About the Role We are looking for a highly skilled Site Reliability Engineer (SRE) to lead the implementation and management of our observability stack across Azure-hosted infrastructure and .NET Core applications. This role will focus on configuring and managing Open Telemetry, Prometheus, Loki, and Tempo, along with setting up robust alerting systems across all services including Azure infrastructure and MSSQL databases. You will work closely with developers, DevOps, and infrastructure teams to ensure the performance, reliability, and visibility of our .NET Core applications and cloud services. Key Responsibilities • Observability Platform Implementation: Design and maintain distributed tracing, metrics, and logging using OpenTelemetry, Prometheus, Loki, and Tempo. Ensure complete instrumentation of .NET Core applications for end-to-end visibility. o Implement telemetry pipelines for application logs, performance metrics, and traces. Monitoring & Alerting: Develop and manage SLIs, SLOs, and error budgets. Create actionable, noise-free alerts using Prometheus Alertmanager and Azure Monitor. o Monitor key infrastructure components, applications, and databases with a focus on reliability and performance. • Azure & Infrastructure Integration: Integrate Azure services (App Services, VMs, Storage, etc.) with the observability stack. o Configure monitoring for MSSQL databases, including performance tuning metrics and health indicators. o Use Azure Monitor, Log Analytics, and custom exporters where necessary. Automation & DevOps: Automate observability configurations using Terraform, PowerShell, or other IaC tools. Integrate telemetry validation and health checks into CI/CD pipelines. Maintain observability as code for repeatable deployments and easy scaling. • Resilience & Reliability Engineering: Conduct capacity planning to anticipate scaling needs based on usage patterns and growth. Define and implement disaster recovery strategies for critical Azure-hosted services and databases. Perform load and stress testing to identify performance bottlenecks and validate infrastructure limits. Support release engineering by integrating observability checks and rollback strategies in CI/CD pipelines. Apply chaos engineering practices in lower environments to uncover potential reliability risks proactively. • Collaboration & Documentation: Partner with engineering teams to promote observability best practices in .NET Core development. o Create dashboards (Grafana preferred) and runbooks for system insights and incident response. o Document monitoring standards, troubleshooting guides, and onboarding materials. Required Skills and Experience 4+ years of experience in SRE, DevOps, or infrastructure-focused roles. Deep experience with .NET Core application observability using OpenTelemetry. Proficiency with Prometheus, Loki, Tempo, and related observability tools. Strong background in Azure infrastructure monitoring, including App Services and VMs. Hands-on experience monitoring MSSQL databases (deadlocks, query performance, etc.). • Familiarity with Infrastructure as Code (Terraform, Bicep) and scripting (PowerShell, Bash). Experience building and tuning alerts, dashboards, and metrics for production systems. Preferred Qualifications Azure certifications (e.g., AZ-104, AZ-400). Experience with Grafana, Azure Monitor, and Log Analytics integration. Familiarity with distributed systems and microservice architectures. Prior experience in high-availability, regulated, or customer-facing environments.
Posted 1 week ago
3.0 - 8.0 years
0 - 1 Lacs
Bangalore Rural, Bengaluru
Hybrid
Description - External You Lead the Way. We've Got Your Back.With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, youll learn and grow as we help you create a career journey thats unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally.At American Express, youll be recognized for your contributions, leadership, and impactevery colleague has the opportunity to share in the companys success. Together, well win as a team, striving to uphold our company values and powerful backing promise to provide the world’s best customer experience every day. And we’ll do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong.Join #TeamAmex and let's lead the way together. Key Responsibilities: SRE Strategy and Leadership: Develop and implement a comprehensive SRE strategy aligned with the company's goals and objectives. Lead a team of SRE professionals to drive the reliability, performance, and scalability of GRC technology solutions. Observability and Monitoring: Establish observability practices to ensure real-time insights into system performance, availability, and customer experience. Implement monitoring tools, metrics, and dashboards to proactively identify and address potential issues. Production Support Optimization: Lead all aspects of the end-to-end production support process, including incident management, problem resolution, and service-level agreement (SLA) compliance. Drive continuous improvement initiatives to enhance operational effectiveness and reduce mean time to resolution (MTTR). GRC Customer Journeys: Collaborate with multi-functional teams to enhance customer journeys through seamless and reliable technology experiences. Reliability Engineering Best Practices: Promote and implement standard methodologies, including error budgeting, chaos engineering, and disaster recovery planning. Cultivate a culture of resilience and reliability within technology. Automation and Efficiency: Champion automation initiatives to streamline operational workflows, deployment processes, and incident response tasks. Leverage automation tools and orchestration to improve reliability and reduce manual intervention. Qualifications: 3- 12 years of experience and degree or equivalent experience in Computer Science, Information Technology, or related field. Advanced certifications in SRE or related are a plus. Deep understanding of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms. Strong leadership and people management skills, with the ability to inspire and empower successful SRE teams. Preferred Skills: Hands-on coding and System Design of highly available distributed systems Java/Golang/Javascript, Kubernetes, Docker Knowledge on modern observability stack – splunk, elastic search, Prometheus, Grafana Knowledge of cloud-based SRE practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud. Familiarity with containerization technologies (e.g., Kubernetes, Docker) and microservices architecture. Demonstrated expertise in driving culture change, DevOps practices, and continuous improvement in SRE and production support functions. Join our innovative team and be at the forefront of advancing Site Reliability Engineering and production support in the Global Risk and Compliance Technology space. If you are passionate about driving reliability, observability, and excellence in customer experiences, we invite you to apply and join our mission to redefine the future of risk and compliance technology. Apply now and join us in shaping the reliability and performance of GRC solutions for a secure and compliant world. Compliance Language We back our colleagues and their loved ones with benefits and programs that support their holistic well-being. That means we prioritize their physical, financial, and mental health through each stage of life. Benefits include: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities
Posted 1 week ago
5.0 - 8.0 years
8 - 12 Lacs
Bengaluru
Work from Office
Expertise in container technologies like Docker, Kubernetes Strong knowledge of architecture, design and implementation of microservices in container environments like Kubernetes. Experience managing Azure Kubernetes cluster Experience in container ecosystem buid maintain Proficient in Ansible/Terraform Experience with Azure DevOps/GitHub Actions/Jenkins Experience with system monitoring tools such as Lens,Prometheus, Azure monitoring etc. Experience in Helm Experience in building CI/CD pipeline for the web application backend services. Exceptional skills in debugging, performance tuning, optimization and troubleshooting of large software systems and ability to guide development teams and deliver in fast paced environments. Proficient understanding of code versioning tools like GitHub/TFS Familiarity with DevSecOps practices and tools BE/B.Tech/MCAor any Relevant Degree, CKA CKAD, DevOps certification desirable Key Responsibilities Design and Implement: Architect, design, and implement cloud-native applications and services using AKS. Containerization: Develop and deploy containerized applications utilizing Docker and Kubernetes. Migration of Applications from IaaS to AKS Automation: Automate the deployment, scaling, and management of containerized applications. CI/CD Pipeline: Build and maintain robust CI/CD pipelines to ensure smooth and efficient delivery of software. Monitoring and Optimization: Monitor application performance, troubleshoot issues, and optimize resource usage. Collaboration: Work closely with development, operations, and security teams to ensure seamless integration and alignment with business goals. Documentation: Create and maintain comprehensive documentation for architecture, processes, and procedures.
Posted 1 week ago
4.0 - 8.0 years
12 - 30 Lacs
Hyderabad
Work from Office
Strong Linux and Strong AWS experience Strong active directory Manage Hadoop clusters on Linux, Active Directory integration Collaborate with data science team on project delivery using Splunk & Spark Exp. managing BigData clusters in Production
Posted 1 week ago
10.0 - 13.0 years
20 - 25 Lacs
Pune
Work from Office
Company Overview With 80,000 customers across 150 countries, UKG is the largest U.S.-based private software company in the world. And were only getting started. Ready to bring your bold ideas and collaborative mindset to an organization that still has so much more to build and achieve? Read on. At UKG, you get more than just a job. You get to work with purpose. Our team of U Krewers are on a mission to inspire every organization to become a great place to work through our award-winning HR technology built for all. Here, we know that youre more than your work. Thats why our benefits help you thrive personally and professionally, from wellness programs and tuition reimbursement to U Choose a customizable expense reimbursement program that can be used for more than 200+ needs that best suit you and your family, from student loan repayment, to childcare, to pet insurance. Our inclusive culture, active and engaged employee resource groups, and caring leaders value every voice and support you in doing the best work of your career. If youre passionate about our purpose people then we cant wait to support whatever gives you purpose. Were united by purpose, inspired by you. Site Reliability Engineers at UKG are team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an automate everything mindset, helping us bring value to our customers by deploying services with incredible speed, consistency and availability. Primary/Essential Duties and Key Responsibilities: Proficient in Splunk/ELK, and Datadog. Experience with observability tools such as Prometheus/InfluxDB, and Grafana. Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages. Design, develop, and maintain observability tools and infrastructure. Collaborate with other teams to ensure observability best practices are followed. Develop and maintain dashboards and alerts for monitoring system health. Troubleshoot and resolve issues related to observability tools and infrastructure. Engage in and improve the lifecycle of services from conception to EOL, includingsystem design consulting, and capacity planning Define and implement standards and best practices related toSystem Architecture, Service delivery, metrics and the automation of operational tasks Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response. Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis Collaborate closely with engineering professionals within the organization to deliver reliable services Identify and eliminate operational toil by treating operational challenges as a software engineering problem Actively participate in incident response, including on-call responsibilities Partner with stakeholders to influence and help drive the best possible technical and business outcomes Guide junior team members and serve as a champion for Site Reliability Engineering Engineering degree, or a related technical discipline, and 10+years of experience in SRE. Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java) Knowledge of Cloud based applications & Containerization Technologies Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them Working experience with industry standards like Terraform, Ansible. (Experience, Education, Certification, License and Training) Must have hands-on experience working within Engineering or Cloud. Experience with public cloud platforms (e.g. GCP, AWS, Azure) Experience in configuration and maintenance of applications & systems infrastructure. Experience with distributed system design and architecture Experience building and managing CI/CD Pipelines Where were going UKG is on the cusp of something truly special. Worldwide, we already hold the #1 market share position for workforce management and the #2 position for human capital management. Tens of millions of frontline workers start and end their days with our software, with billions of shifts managed annually through UKG solutions today. Yet its our AI-powered product portfolio designed to support customers of all sizes, industries, and geographies that will propel us into an even brighter tomorrow! UKG is proud to be an equal opportunity employer and is committed to promoting diversity and inclusion in the workplace, including the recruitment process. Disability Accommodation For individuals with disabilities that need additional assistance at any point in the application and interview process, please email UKGCareers@ukg.com
Posted 1 week ago
3.0 - 7.0 years
15 - 20 Lacs
Noida, Pune
Work from Office
The duties of a Site Reliability Engineer will be to support and maintain various Cloud Infrastructure Technology Tools in our hosted production/DR environments. He/she will be the subject matter expert for specific tool(s) or monitoring solution(s). Will be responsible for testing, verifying and implementing upgrades, patches and implementations. He/She will also partner with the other service and/or service functions to investigate and/or improve monitoring solutions. May mentor one or more tools team members or provide training to other cross functional teams as required. May motivate, develop, and manage performance of individuals and teams while on shift. May be assigned to produces regular and adhoc management reports in a timely manner. Proficient in Splunk/ELK, and Datadog. Experience with observability tools such as Prometheus/InfluxDB, and Grafana. Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages. Design, develop, and maintain observability tools and infrastructure. Collaborate with other teams to ensure observability best practices are followed. Develop and maintain dashboards and alerts for monitoring system health. Troubleshoot and resolve issues related to observability tools and infrastructure. Bachelors Degree in information systems or Computer Science or related discipline with relevant experience of 5-8 years Proficient in Splunk/ELK, and Datadog. Experience with Enterprise Software Implementations for Large Scale Organizations Exhibit extensive experience about the new technology trends prevalent in the market like SaaS, Cloud, Hosting Services and Application Management Service Monitoring tools like : Grafana, Prometheus, Datadog, Experience in deployment of application & infrastructure clusters within a Public Cloud environment utilizing a Cloud Management Platform Professional and positive with outstanding customer-facing practices Can-do attitude, willing to go the extra mile Consistently follows-up and follows-through on delegated tasks and actions
Posted 1 week ago
12.0 - 18.0 years
16 - 20 Lacs
Pune
Work from Office
Seasoned DevOps Architect to will lead the design, implementation, and maintenance of cloud-based infrastructure and DevOps team Collaborating closely with development, operations, and security teams, and ensure the seamless delivery of high-quality software solutions Qualifications: 18+ years of IT experience, with 8+ years dedicated to DevOps roles Deep knowledge of cloud platforms (AWS, Azure, GCP) Expertise in infrastructure automation tools (Terraform, Ansible, Puppet, Chef) Proficiency in containerization and orchestration (Docker, Kubernetes) Experience with CI/CD pipelines and tools (Jenkins, GitLab CI/CD, Azure DevOps) Strong knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack) Advanced scripting abilities (Bash, Python, Ruby) Solid understanding of security best practices and related tools Ability to work effectively both independently and within a team
Posted 1 week ago
2.0 - 7.0 years
3 - 7 Lacs
Ahmedabad
Work from Office
To help us build functional systems that improve customer experience we are now looking for an experienced DevOps Engineer. They will be responsible for deploying product updates, identifying production issues and implementing integrations that meet our customers' needs. If you have a solid background in software engineering and are familiar with Ruby or Python, wed love to speak with you. Responsibilities Work with development teams to ideate software solutions Building and setting up new development tools and infrastructure Working on ways to automate and improve development and release processes Ensuring that systems are safe and secure against cybersecurity threats Deploy updates and fixes Perform root cause analysis for production errors Develop scripts to automate infrastructure provision Working with software developers and software engineers to ensure that development follows established processes and works as intended Technologies we use GitOps GitHub, GitLab, BitBucket CI/CD Jenkins, Circle CI, Travis CI, TeamCity, Azure DevOps Containerization Docker, Swarm, Kubernetes Provisioning Terraform CloudOps Azure, AWS, GCP Observability Prometheus, Grafana, GrayLog, ELK Qualifications Graduate / Postgraduate in Technology sector Proven experience as a DevOps Engineer or similar role Effective communication and teamwork skills
Posted 1 week ago
6.0 - 10.0 years
11 - 12 Lacs
Hyderabad
Work from Office
We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 6 to 10+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm
Posted 1 week ago
3.0 - 8.0 years
1 - 4 Lacs
Chandigarh
Work from Office
Opportunity: We are seeking a highly skilled and experienced AI Infrastructure Engineer (or MLOps Engineer) to design, build, and maintain the robust and scalable AI/ML platforms that power our cutting-edge asset allocation strategies. In this critical role, you will be instrumental in enabling our AI Researchers and Quantitative Developers to efficiently develop, deploy, and monitor machine learning models in a high-performance, secure, and regulated financial environment. You will bridge the gap between research and production, ensuring our AI initiatives run smoothly and effectively. Responsibilities: Platform Design & Development: Architect, implement, and maintain the end-to-end AI/ML infrastructure, including data pipelines, feature stores, model training environments, inference serving platforms, and monitoring systems. Environment Setup & Management: Configure and optimize AI/ML development and production environments, ensuring access to necessary compute resources (CPUs, GPUs), software libraries, and data. MLOps Best Practices: Implement and advocate for MLOps best practices, including version control for models and data, automated testing, continuous integration/continuous deployment (CI/CD) pipelines for ML models, and robust model monitoring. Resource Optimization: Manage and optimize cloud computing resources (AWS, Azure, GCP, or on-premise) for cost-efficiency and performance, specifically for AI/ML workloads. Data Management: Collaborate with data engineers to ensure seamless ingestion, storage, and accessibility of high-quality financial and alternative datasets for AI/ML research and production. Tooling & Automation: Select, implement, and integrate various MLOps tools and platforms (e.g., Kubeflow, MLflow, Sagemaker, DataRobot, Vertex AI, Airflow, Jenkins, GitLab CI/CD) to streamline the ML lifecycle. Security & Compliance: Ensure that all AI/ML infrastructure and processes adhere to strict financial industry security standards, regulatory compliance, and data governance policies. Troubleshooting & Support: Provide expert support and troubleshooting for AI/ML infrastructure issues, resolving bottlenecks and ensuring system stability. Collaboration: Work closely with AI Researchers, Data Scientists, Software Engineers, and DevOps teams to translate research prototypes into scalable production systems. Documentation: Create and maintain comprehensive documentation for all AI/ML infrastructure components, processes, and best practices. Qualifications: Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field. Experience: 3+ years of experience in a dedicated MLOps, AI Infrastructure, DevOps, or Site Reliability Engineering role, preferably in the financial services industry. Proven experience in designing, building, and maintaining scalable data and AI/ML pipelines and platforms. Strong proficiency in cloud platforms (AWS, Azure, GCP) including services relevant to AI/ML (e.g., EC2, S3, Sagemaker, Lambda, Azure ML, Google AI Platform). Expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes). Solid understanding of CI/CD principles and tools (Jenkins, GitLab CI/CD, CircleCI, Azure DevOps). Proficiency in scripting languages like Python (preferred), Bash, or similar. Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, Ansible). Familiarity with distributed computing frameworks (e.g., Spark, Dask) is a plus. Understanding of machine learning concepts and lifecycle, even if not directly developing models. Technical Skills: Deep knowledge of Linux/Unix operating systems. Strong understanding of networking, security, and database concepts. Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Familiarity with data warehousing and data lake concepts. Preferred candidate profile Exceptional problem-solving and debugging skills. Proactive and self-driven with a strong sense of ownership. Excellent communication and interpersonal skills, able to collaborate effectively with diverse teams. Ability to prioritize and manage multiple tasks in a fast-paced environment. A keen interest in applying technology to solve complex financial problems.
Posted 1 week ago
5.0 - 10.0 years
7 - 11 Lacs
Mumbai
Work from Office
We are looking for an experienced Senior Java Developer with a strong background in observability and telemetry to join our talented team. In this role, you will be responsible for designing, implementing, and maintaining robust and scalable solutions that enable us to gain deep insights into the performance, reliability, and health of our systems and applications. WHAT'S IN' IT FOR YOU : - You will get a pivotal role in the project and associated incentives based on your contribution towards the project success. - Working on optimizing performance of a platform handling data volume in the range of 5-8 petabytes. - An opportunity to collaborate and work with engineers from Google, AWS, ELK - You will be enabled to take-up leadership role in future to set-up your team as you grow with the customer during the project engagement. - Opportunity for advancement within the company, with clear paths for career progression based on performance and demonstrated capabilities. - Be part of a company that values innovation and encourages experimentation, where your ideas are heard and your contributions are recognized and rewarded. Work in a zero micro-management culture where you get to enjoy accountability and ownership for your tasks RESPONSIBILITIES : - Design, develop, and maintain Java-based microservices and applications with a focus on observability and telemetry. - Implement best practices for instrumenting, collecting, analyzing, and visualizing telemetry data (metrics, logs, traces) to monitor and troubleshoot system behavior and performance. - Collaborate with cross-functional teams to integrate observability solutions into the software development lifecycle, including CI/CD pipelines and automated testing frameworks. - Drive improvements in system reliability, scalability, and performance through data-driven insights and continuous feedback loops. - Stay up-to-date with emerging technologies and industry trends in observability, telemetry, and distributed systems to ensure our systems remain at the forefront of innovation. - Mentor junior developers and provide technical guidance and expertise in observability and telemetry practices. REQUIREMENTS : - Bachelor's or Master's degree in Computer Science, Engineering, or related field. - 5+ years of professional experience in software development with a strong focus on Java programming. - Expertise in observability and telemetry tools and practices, including but not limited to Prometheus, Grafana, Jaeger, ELK stack (Elasticsearch, Logstash, Kibana), and distributed tracing. - Solid understanding of microservices architecture, containerization (Docker, Kubernetes), and cloud- native technologies (AWS, Azure, GCP). - Proficiency in designing and implementing scalable, high-performance, and fault-tolerant systems. -Strong analytical and problem-solving skills with a passion for troubleshooting complex issues. - Excellent communication and collaboration skills with the ability to work effectively in a fast-paced, agile environment. - Experience with Agile methodologies and DevOps practices is a plus.
Posted 1 week ago
5.0 - 10.0 years
7 - 11 Lacs
Ahmedabad
Work from Office
We are looking for an experienced Senior Java Developer with a strong background in observability and telemetry to join our talented team. In this role, you will be responsible for designing, implementing, and maintaining robust and scalable solutions that enable us to gain deep insights into the performance, reliability, and health of our systems and applications. WHAT'S IN' IT FOR YOU : - You will get a pivotal role in the project and associated incentives based on your contribution towards the project success. - Working on optimizing performance of a platform handling data volume in the range of 5-8 petabytes. - An opportunity to collaborate and work with engineers from Google, AWS, ELK - You will be enabled to take-up leadership role in future to set-up your team as you grow with the customer during the project engagement. - Opportunity for advancement within the company, with clear paths for career progression based on performance and demonstrated capabilities. - Be part of a company that values innovation and encourages experimentation, where your ideas are heard and your contributions are recognized and rewarded. Work in a zero micro-management culture where you get to enjoy accountability and ownership for your tasks RESPONSIBILITIES : - Design, develop, and maintain Java-based microservices and applications with a focus on observability and telemetry. - Implement best practices for instrumenting, collecting, analyzing, and visualizing telemetry data (metrics, logs, traces) to monitor and troubleshoot system behavior and performance. - Collaborate with cross-functional teams to integrate observability solutions into the software development lifecycle, including CI/CD pipelines and automated testing frameworks. - Drive improvements in system reliability, scalability, and performance through data-driven insights and continuous feedback loops. - Stay up-to-date with emerging technologies and industry trends in observability, telemetry, and distributed systems to ensure our systems remain at the forefront of innovation. - Mentor junior developers and provide technical guidance and expertise in observability and telemetry practices. REQUIREMENTS : - Bachelor's or Master's degree in Computer Science, Engineering, or related field. - 5+ years of professional experience in software development with a strong focus on Java programming. - Expertise in observability and telemetry tools and practices, including but not limited to Prometheus, Grafana, Jaeger, ELK stack (Elasticsearch, Logstash, Kibana), and distributed tracing. - Solid understanding of microservices architecture, containerization (Docker, Kubernetes), and cloud- native technologies (AWS, Azure, GCP). - Proficiency in designing and implementing scalable, high-performance, and fault-tolerant systems. -Strong analytical and problem-solving skills with a passion for troubleshooting complex issues. - Excellent communication and collaboration skills with the ability to work effectively in a fast-paced, agile environment. - Experience with Agile methodologies and DevOps practices is a plus.
Posted 1 week ago
4.0 - 9.0 years
6 - 11 Lacs
Hyderabad
Work from Office
ABOUT AMGEN Amgen harnesses the best of biology and technology to fight the worlds toughest diseases, and make peoples lives easier, fuller and longer. We discover, develop, manufacture and deliver innovative medicines to help millions of patients. Amgen helped establish the biotechnology industry more than 45 years ago and remains on the cutting-edge of innovation, using technology and human genetic data to push beyond whats known today. ABOUT THE ROLE Role Description We are seeking a detail-oriented and highly skilled Data Engineering Test Automation Engineer with deep expertise of R&D domain in life sciences to ensure the quality, reliability, and performance of our data pipelines and platforms. The ideal candidate will have a strong background in data testing , ETL validation , and test automation frameworks . You will work closely with data engineers, analysts, and DevOps teams to build robust test suites for large-scale data solutions. This role combines deep technical execution with a solid foundation in QA best practices including test planning, defect tracking, and test lifecycle management . You will be responsible for designing and executing manual and automated test strategies for complex real-time and batch data pipelines , contributing to the design of automation frameworks , and ensuring high-quality data delivery across our AWS and Databricks-based analytics platforms . The role is highly technical and hands-on , with a strong focus on automation, data accuracy, completeness, consistency , and ensuring data governance practices are seamlessly integrated into development pipelines. Roles & Responsibilities Design, develop, and maintain automated test scripts for data pipelines, ETL jobs, and data integrations. Validate data accuracy, completeness, transformations, and integrity across multiple systems. Collaborate with data engineers to define test cases and establish data quality metrics. Develop reusable test automation frameworks and CI/CD integrations (e.g., Jenkins, GitHub Actions). Perform performance and load testing for data systems. Maintain test data management and data mocking strategies. Identify and track data quality issues, ensuring timely resolution. Perform root cause analysis and drive corrective actions. Contribute to QA ceremonies (standups, planning, retrospectives) and drive continuous improvement in QA processes and culture. Must-Have Skills Experience in QA roles, with strong exposure to data pipeline validation and ETL Testing. Domin Knowledge of R&D domain of life science. Validate data accuracy, transformations, schema compliance, and completeness across systems using PySpark and SQL . Strong hands-on experience with Python, and optionally PySpark, for developing automated data validation scripts. Proven experience in validating ETL workflows, with a solid understanding of data transformation logic, schema comparison, and source-to-target mapping. Experience working with data integration and processing platforms like Databricks/Snowflake, AWS EMR, Redshift etc Experience in manual and automated testing of data pipelines executions for both batch and real-time data pipelines. Perform performance testing of large-scale complex data engineering pipelines. Ability to troubleshoot data issues independently and collaborate with engineering teams for root cause analysis Strong understanding of QA methodologies, test planning, test case design, and defect lifecycle management. Hands-on experience with API testing using Postman, pytest, or custom automation scripts Experience integrating automated tests into CI/CD pipelines using tools like Jenkins, GitHub Actions, or similar. Knowledge of cloud platforms such as AWS, Azure, GCP. Good-to-Have Skills Certifications in Databricks, AWS, Azure, or data QA (e.g., ISTQB). Understanding of data privacy, compliance, and governance frameworks. Knowledge of UI automated testing frameworks like Selenium, JUnit, TestNG Familiarity with monitoring/observability tools such as Datadog, Prometheus, or Cloud Watch Education and Professional Certifications Masters degree and 3 to 7 years of Computer Science, IT or related field experience Bachelors degree and 4 to 9 years of Computer Science, IT or related field experience Soft Skills Excellent analytical and troubleshooting skills. Strong verbal and written communication skills Ability to work effectively with global, virtual teams High degree of initiative and self-motivation. Ability to manage multiple priorities successfully. Team-oriented, with a focus on achieving team goals Strong presentation and public speaking skills.
Posted 1 week ago
3.0 - 6.0 years
4 - 8 Lacs
Bengaluru
Work from Office
We are looking for a Kibana Subject Matter Expert (SME) to support our Network Operations Center (NOC) by designing, developing, and maintaining real-time dashboards and alerting mechanisms. The ideal candidate will have strong experience in working with Elasticsearch and Kibana to visualize key performance indicators (KPIs), system health, and alerts related to NOC-managed infrastructure. Key Responsibilities: Design and develop dynamic and interactive Kibana dashboards tailored for NOC monitoring. Integrate various NOC elements such as network devices, servers, applications, and services into Elasticsearch/Kibana. Create real-time visualizations and trend reports for system health, uptime, traffic, errors, and performance metrics. Configure alerts and anomaly detection mechanisms for critical infrastructure issues using Kibana or related tools (e.g., ElastAlert, Watcher). Collaborate with NOC engineers, infrastructure teams, and DevOps to understand monitoring requirements and deliver customized dashboards. Optimize Elasticsearch queries and index mappings for performance and data integrity. Provide expert guidance on best practices for log ingestion, parsing, and data retention strategies. Support troubleshooting and incident response efforts by providing actionable insights through Kibana visualizations. Primary Skills Proven experience as a Kibana SME or similar role with a focus on dashboards and alerting. Strong hands-on experience with Elasticsearch and Kibana (7.x or higher). Experience in working with log ingestion tools (e.g., Logstash, Beats, Fluentd). Solid understanding of NOC operations and common infrastructure elements (routers, switches, firewalls, servers, etc.). Proficiency in JSON, Elasticsearch Query DSL, and Kibana scripting for advanced visualizations. Familiarity with alerting frameworks such as ElastAlert, Kibana Alerting, or Watcher. Good understanding of Linux-based systems and networking fundamentals. Strong problem-solving skills and attention to detail. Excellent communication and collaboration skills. Preferred Qualifications: Experience in working within telecom, ISP, or large-scale IT operations environments. Exposure to Grafana, Prometheus, or other monitoring and visualization tools. Knowledge of scripting languages such as Python or Shell for automation. Familiarity with SIEM or security monitoring solutions.
Posted 1 week ago
4.0 - 9.0 years
9 - 14 Lacs
Bengaluru
Work from Office
Primary Skills Strong hands-on experience with observability tools like AppDynamics, Dynatrace, Prometheus, Grafana, and ELK Stack Proficient in AppDynamics setup, including installation, configuration, monitor creation, and integration with ServiceNow, email, and Teams Ability to design and implement monitoring solutions for logs, traces, telemetry, and KPIs Skilled in creating dashboards and alerts for application and infrastructure monitoring Experience with AppDynamics features such as NPM, RUM, and synthetic monitoring Familiarity with AWS and Kubernetes, especially in the context of observability Scripting knowledge in Python or Bash for automation and tool integration Understanding of ITIL processes and APM support activities Good grasp of non-functional requirements like performance, capacity, and security Secondary Skills AppDynamics Performance Analyst or Implementation Professional certification Experience with other APM tools like New Relic, Datadog, or Splunk Exposure to CI/CD pipelines and integration of monitoring into DevOps workflows Familiarity with infrastructure-as-code tools like Terraform or Ansible Understanding of network protocols and troubleshooting techniques Experience in performance tuning and capacity planning Knowledge of compliance and audit requirements related to monitoring and logging Ability to work in Agile/Scrum environments and contribute to sprint planning from an observability perspective
Posted 1 week ago
5.0 - 10.0 years
7 - 12 Lacs
Bengaluru
Hybrid
Position Overview: We are seeking a Senior Software Engineer to help drive our build, release, and testing infrastructure to the next level. You will focus on scaling and optimizing our systems for large-scale, high-performance deployments reducing build times from days to mere minutes while maintaining high-quality releases. As part of our collaborative, fast-paced engineering team, you will play a pivotal role in delivering tools and processes that support continuous delivery, test-driven development, and agile methodologies. Key Responsibilities: Automation & Tooling Development: Build, maintain, and improve our automated build, release, and testing infrastructure. Your focus will be on developing tools and scripts that automate our deployment pipeline, enabling a seamless and efficient continuous delivery process. Cross-functional Collaboration: Collaborate closely with development, QA, and SRE teams to ensure our build infrastructure meets the needs of all teams. Work with teams across the organization to create new tools, processes, and technologies that will streamline and enhance our delivery pipeline. Innovative Technology Integration: Stay on top of the latest advancements in cloud technology, automation, and infrastructure tools. You ll have the opportunity to experiment with and recommend new technologies, including AWS services, to enhance our CI/CD system. Scaling Infrastructure: Work on scaling our infrastructure to meet the demands of running thousands of automated tests for every commit. Help us reduce compute time from days to minutes, addressing scalability and performance challenges as we grow. Continuous Improvement & Feedback Loops: Be a champion for continuous improvement by collecting feedback from internal customers, monitoring the adoption of new tools, and fine-tuning processes to maximize efficiency, stability, and overall satisfaction. Process & Project Ownership: Lead the rollout of new tools and processes, from initial development through to full implementation. You ll be responsible for ensuring smooth adoption and delivering value to internal teams. Required Qualifications: 5+ years of experience in software development with a strong proficiency in at least one of the following languages: Python , Go , Java , or JavaScript . Deep understanding of application development, microservices architecture, and the elements that drive a successful multi-service ecosystem. Familiarity with building and deploying scalable services is essential. Strong automation skills : Experience scripting and building tools for automation in the context of continuous integration and deployment pipelines. Cloud infrastructure expertise : Hands-on experience with AWS services (e.g., EC2, S3, Lambda, RDS) and Kubernetes or containerized environments. Familiarity with containerization : Strong understanding of Docker and container orchestration, with a particular focus on cloud-native technologies. Problem-solving mindset : Ability to identify, troubleshoot, and resolve technical challenges, particularly in large-scale systems. Agile experience : Familiarity with Agile methodologies, and the ability to collaborate effectively within cross-functional teams to deliver on-time and with high quality. Collaboration skills : Ability to communicate complex technical concepts to both technical and non-technical stakeholders. Strong team-oriented mindset with a focus on delivering value through collaboration. Bachelor s degree in Computer Science or a related field, or equivalent professional experience. Preferred Qualifications: Experience with Kubernetes (K8s): In-depth knowledge of Kubernetes architecture and operational experience in managing Kubernetes clusters at scale. CI/CD expertise: Solid experience working with CI/CD pipelines and tools (e.g., Terraform, Ansible, Spinnaker). Infrastructure-as-code experience: Familiarity with Terraform , CloudFormation , or similar tools for automating cloud infrastructure deployments. Container orchestration & scaling : Experience with Karpenter or other auto-scaling tools for Kubernetes. Monitoring & Logging : Familiarity with tools such as Prometheus , Grafana , and CloudWatch for tracking infrastructure performance and debugging production issues.
Posted 1 week ago
5.0 - 10.0 years
7 - 12 Lacs
Bengaluru
Work from Office
What You ll Be Doing: Lead and implement secure, scalable Kubernetes clusters across on-prem, hybrid, and cloud environments. Integrate security throughout the cluster lifecycle (design to production) with network policies, RBAC, Pod Security Policies, and encryption. Work with development teams to enforce secure containerization practices and integrate security tools into CI/CD pipelines. Implement secure networking, service meshes (Istio, Linkerd), and implement mutual TLS for secure service-to-service communication. Secure CI/CD pipelines with automated security checks (code scanning, vulnerability assessments, configuration checks). Automate Kubernetes infrastructure provisioning with IaC tools (Terraform, CloudFormation, Ansible), embedding security best practices. Enhance automation workflows for patching, vulnerability assessments, and incident response. Implement observability strategies with Prometheus, Grafana, ELK Stack, and Loki for monitoring health, logging, performance, and security. Ensure security events are logged, monitored, and proactively mitigated. Participate in incident response, on-call rotations, root cause analysis, and post-incident reviews to refine security protocols. Define, document, and enforce Kubernetes security best practices and policies. What You ll Bring to the Role: Strong knowledge in Kubernetes, ECS, and migrating applications to cloud-native environments, ensuring security at every stage. Experience in designing secure identity management and access control solutions for Kubernetes, ECS, and cloud platforms. Experience in migrating legacy applications to Kubernetes and ECS, optimizing for security and scalability. Skilled in managing and securing cloud identities, roles, and implementing RBAC in Kubernetes and ECS. Extensive experience in securing and automating CI/CD pipelines with tools like Jenkins, GitLab CI, ArgoCD, and Spinnaker. Hands-on experience with container security using tools like Aqua Security, Twistlock, and runtime protection practices. In-depth understanding of service meshes like Istio and Linkerd, and securing communications with mutual TLS encryption. Expertise in using IaC tools like Terraform, CloudFormation, and Ansible for secure infrastructure automation. Skilled in using Prometheus, Grafana, and ELK Stack for real-time monitoring and proactive incident detection. Experience in managing incidents, troubleshooting, root cause analysis, and improving security protocols. Strong ability to collaborate with cross-functional teams and mentor junior engineers, promoting a security-first culture. Knowledge on secrets in Kubernetes using Vault, Secrets Manager, or Kubernetes Secrets. Experience & Qualifications: 5+ years of experience in managing large-scale, secure Kubernetes clusters, including architecture, security, and scalability. 5+ years of hands-on experience with ECS (Elastic Container Service) and migrating legacy monolithic applications to cloud-native environments (Kubernetes/ECS). 3+ years of experience in cloud security, including IAM (Identity and Access Management), role-based access control (RBAC), and secure identity management for cloud platforms and Kubernetes. 3+ years of experience in automating CI/CD pipelines using tools such as Spinnaker, Jenkins or ArgoCD with an emphasis on integrating security throughout the process. Strong knowledge of service mesh technologies (Istio, Linkerd) and secure networking practices in Kubernetes environments, including mutual TLS encryption. Experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible, and the ability to automate infrastructure provisioning with a security-first approach. Proven experience in implementing monitoring and observability solutions with Prometheus, Grafana, Loki or similar tools to enhance security and detect incidents in real-time. Strong problem-solving skills with hands-on experience in incident management, troubleshooting, and conducting post-incident analysis. Excellent collaboration skills with experience working cross-functionally with security engineers, developers, and DevOps teams to enforce security best practices and policies. Bachelor s degree in Computer Science, Engineering, or a related field, or equivalent professional experience. Certifications (preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), AWS Certified DevOps Engineer, or equivalent certifications in cloud and security domains.
Posted 1 week ago
9.0 - 10.0 years
11 - 12 Lacs
Hyderabad
Work from Office
We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 9 to 10+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm
Posted 1 week ago
6.0 - 8.0 years
13 - 18 Lacs
Gurugram
Work from Office
Responsibilities : - Define and enforce SLOs, SLIs, and error budgets across microservices - Architect an observability stack (metrics, logs, traces) and drive operational insights - Automate toil and manual ops with robust tooling and runbooks - Own incident response lifecycle: detection, triage, RCA, and postmortems - Collaborate with product teams to build fault-tolerant systems - Champion performance tuning, capacity planning, and scalability testing - Optimise costs while maintaining the reliability of cloud infrastructure Must have Skills : - 6+ years in SRE/Infrastructure/Backend related roles using Cloud Native Technologies - 2+ years in SRE-specific capacity - Strong experience with monitoring/observability tools (Datadog, Prometheus, Grafana, ELK etc.) - Experience with infrastructure-as-code (Terraform/Ansible) - Proficiency in Kubernetes, service mesh (Istio/Linkerd), and container orchestration - Deep understanding of distributed systems, networking, and failure domains - Expertise in automation with Python, Bash, or Go - Proficient in incident management, SLAs/SLOs, and system tuning - Hands-on experience with GCP (preferred)/AWS/Azure and cloud cost optimisation - Participation in on-call rotations and running large-scale production systems Nice to have skills : - Familiarity with chaos engineering practices and tools (Gremlin, Litmus) - Background in performance testing and load simulation (Gatling, Locust, k6, JMeter)
Posted 1 week ago
6.0 - 10.0 years
15 - 25 Lacs
Gurugram, Bengaluru
Hybrid
What you will be doing The Site Reliability Engineer (SRE) operates and maintains production systems in the cloud. Their primary goal is to make sure the systems are up and running and provide the expected performance. This involves daily operations tasks of monitoring, deployment and incident management as well as strategic tasks like capacity planning, provisioning and continuous improvement of processes. Also, a major part of the role is the design for reliability, scalability, efficiency and the automation of everyday system operations tasks. SREs work closely together with technical support teams, application developers and DevOps engineers both on incident resolution and on long-term evolution of systems. Employees will primarily work on creating Terraform, Shell & Ansible scripts and will be part of Application deployments using Azure Kubernetes service. Employees will work with a cybersecurity client/company. Monitor production systems' health, usage, and performance using dashboards and monitoring tools. Track provisioned resources, infrastructure, and their configuration. Perform regular maintenance activities on databases, services, and infrastructure. Respond to alerts and incidents: investigate, resolve, or dispatch according to SLAs. Respond to emergencies: recover systems and restore services with minimal downtime. Coordinate with customer success and engineering teams on incident resolution. Perform postmortems after major incidents. Change management: perform rollouts, rollbacks, patching and configuration changes. Drive demand forecasting and capacity planning with engineering and customer success teams. Consider projected growth and demand spikes. Provision production resources according to capacity demands. Work with the engineering teams on the design and testing for reliability, scalability, performance, efficiency, and security. Track resource utilization and cost-efficiency of production services. What were BSc/MSc, B. Tech degree in STEM, 6+ years of relevant industry experience. Technical skills: Terraform, Docker Swarm/K8s, Python, Unix/Linux Shell scripting, DevOps, GitHub Actions, Azure Active Directory, Azure monitor & Log Analytics. Experience in integrating Grafana with Prometheus will be an added advantage. Strong verbal and written communication skills. Ability to perform on-call duties.
Posted 1 week ago
0.0 - 3.0 years
3 - 6 Lacs
Hyderabad
Work from Office
The ideal candidate will have a deep understanding of automation, configuration management, and infrastructure-as-code principles, with a strong focus on Ansible. You will work closely with developers, system administrators, and other collaborators to automate infrastructure related processes, improve deployment pipelines, and ensure consistent configurations across multiple environments. The Infrastructure Automation Engineer will be responsible for developing innovative self-service solutions for our global workforce and further enhancing our self-service automation built using Ansible. As part of a scaled Agile product delivery team, the Developer works closely with product feature owners, project collaborators, operational support teams, peer developers and testers to develop solutions to enhance self-service capabilities and solve business problems by identifying requirements, conducting feasibility analysis, proof of concepts and design sessions. The Developer serves as a subject matter expert on the design, integration and operability of solutions to support innovation initiatives with business partners and shared services technology teams. Please note, this is an onsite role based in Hyderabad. Key Responsibilities: Automating repetitive IT tasks - Collaborate with multi-functional teams to gather requirements and build automation solutions for infrastructure provisioning, configuration management, and software deployment. Configuration Management - Design, implement, and maintain code including Ansible playbooks, roles, and inventories for automating system configurations and deployments and ensuring consistency Ensure the scalability, reliability, and security of automated solutions. Troubleshoot and resolve issues related to automation scripts, infrastructure, and deployments. Perform infrastructure automation assessments, implementations, providing solutions to increase efficiency, repeatability, and consistency. DevOps Facilite continuous integration and deployment (CI/CD) Orchestration Coordinating multiple automated tasks across systems Develop and maintain clear, reusable, and version-controlled playbooks and scripts. Manage and optimize cloud infrastructure using Ansible and terraform automation (AWS, Azure, GCP, etc.). Continuously improve automation workflows and practices to enhance speed, quality, and reliability. Ensure that infrastructure automation adheres to best practices, security standards, and regulatory requirements. Document and maintain processes, configurations, and changes in the automation infrastructure. Participate in design review, client requirements sessions and development teams to deliver features and capabilities supporting automation initiatives Collaborate with product owners, collaborators, testers and other developers to understand, estimate, prioritize and implement solutions Design, code, debug, document, deploy and maintain solutions in a highly efficient and effective manner Participate in problem analysis, code review, and system design Remain current on new technology and apply innovation to improve functionality Collaborate closely with collaborators and team members to configure, improve and maintain current applications Work directly with users to resolve support issues within product team responsibilities Monitor health, performance and usage of developed solutions What we expect of you We are all different, yet we all use our unique contributions to serve patients. Basic Qualifications: Bachelors degree and 0 to 3 years of computer science, IT, or related field experience OR Diploma and 4 to 7 years of computer science, IT, or related field experience Deep hands-on experience with Ansible including playbooks, roles, and modules Proven experience as an Ansible Engineer or in a similar automation role Scripting skills in Python, Bash, or other programming languages Proficiency expertise in Terraform & CloudFormation for AWS infrastructure automation Experience with other configuration management tools (e.g., Puppet, Chef). Experience with Linux administration, scripting (Python, Bash), and CI/CD tools (GitHub Actions, CodePipeline, etc.) Familiarity with monitoring tools (e.g., Dynatrace, Prometheus, Nagios) Working in an Agile (SAFe, Scrum, and Kanban) environment Preferred Qualifications: Red Hat Certified Specialist in Developing with Ansible Automation Platform Red Hat Certified Specialist in Managing Automation with Ansible Automation Platform Red Hat Certified System Administrator AWS Certified Solutions Architect Associate or Professional AWS Certified DevOps Engineer Professional Terraform Associate Certification Good-to-Have Skills: Experience with Kubernetes (EKS) and service mesh architectures. Knowledge of AWS Lambda and event-driven architectures. Familiarity with AWS CDK, Ansible, or Packer for cloud automation. Exposure to multi-cloud environments (Azure, GCP) Experience operating within a validated systems environment (FDA, European Agency for the Evaluation of Medicinal Products, Ministry of Health, etc.) Soft Skills: Strong analytical and problem-solving skills. Effective communication and collaboration with multi-functional teams. Ability to work in a fast-paced, cloud-first environment. Shift Information: This position is an onsite role and may require working during later hours to align with business hours. Candidates must be willing and able to work outside of standard hours as required to meet business needs.
Posted 1 week ago
6.0 - 10.0 years
15 - 25 Lacs
Gurugram, Bengaluru
Hybrid
What you will be doing The Site Reliability Engineer (SRE) operates and maintains production systems in the cloud. Their primary goal is to make sure the systems are up and running and provide the expected performance. This involves daily operations tasks of monitoring, deployment and incident management as well as strategic tasks like capacity planning, provisioning and continuous improvement of processes. Also, a major part of the role is the design for reliability, scalability, efficiency and the automation of everyday system operations tasks. SREs work closely together with technical support teams, application developers and DevOps engineers both on incident resolution and on long-term evolution of systems. Employees will primarily work on creating Terraform, Shell & Ansible scripts and will be part of Application deployments using Azure Kubernetes service. Employees will work with a cybersecurity client/company. Monitor production systems' health, usage, and performance using dashboards and monitoring tools. Track provisioned resources, infrastructure, and their configuration. Perform regular maintenance activities on databases, services, and infrastructure. Respond to alerts and incidents: investigate, resolve, or dispatch according to SLAs. Respond to emergencies: recover systems and restore services with minimal downtime. Coordinate with customer success and engineering teams on incident resolution. Perform postmortems after major incidents. Change management: perform rollouts, rollbacks, patching and configuration changes. Drive demand forecasting and capacity planning with engineering and customer success teams. Consider projected growth and demand spikes. Provision production resources according to capacity demands. Work with the engineering teams on the design and testing for reliability, scalability, performance, efficiency, and security. Track resource utilization and cost-efficiency of production services. What were BSc/MSc, B. Tech degree in STEM, 3+ years of relevant industry experience. Technical skills: Terraform, Docker Swarm/K8s, Python, Unix/Linux Shell scripting, DevOps, GitHub Actions, Azure Active Directory, Azure monitor & Log Analytics. Experience in integrating Grafana with Prometheus will be an added advantage. Strong verbal and written communication skills. Ability to perform on-call duties. Regards, Kajal Khatri Kajal@beanhr.com
Posted 1 week ago
3.0 - 6.0 years
6 - 11 Lacs
Pune
Work from Office
Job ID: 200078 Required Travel :Minimal Managerial - No Location: :India- Pune (Amdocs Site) Who are we Amdocs helps those who build the future to make it amazing. With our market-leading portfolio of software products and services, we unlock our customers innovative potential, empowering them to provide next-generation communication and media experiences for both the individual end user and enterprise customers. Our employees around the globe are here to accelerate service providers migration to the cloud, enable them to differentiate in the 5G era, and digitalize and automate their operations. Listed on the NASDAQ Global Select Market, Amdocs had revenue of $5.00 billion in fiscal 2024. For more information, visit www.amdocs.com In one sentence Immerse yourself in the design, development, modification, debugging and maintenance of our client's software systems! Engage with specific modules, applications or technologies, and look after sophisticated assignments during the software development process. What will your job look like Key responsibilities: Design, implement, and maintain CI/CD pipelines to automate the software development lifecycle. Collaborate with development, QA, and operations teams to ensure smooth deployment and operation of applications. Monitor system performance, troubleshoot issues, and optimize infrastructure for scalability and reliability. Implement and manage infrastructure as code (IaC) using tools like Terraform, Ansible, or CloudFormation. Ensure security best practices are followed in all aspects of the development and deployment process. Manage cloud infrastructure (AWS, Azure, GCP) and on-premises servers. Develop and maintain scripts for automation of routine tasks. Participate in on-call rotations to provide 24/7 support for critical systems. All you need is... Proven experience as a DevOps Engineer role. Strong knowledge of CI/CD tools (Jenkins, Bitbucket, GitLab CI, CircleCI, etc.). Experience with containerization and orchestration tools (Docker, Kubernetes). Proficiency in scripting languages (Python, Bash, etc.). Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK stack). Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills. Good to have: Experience with microservices architecture., Knowledge of configuration management tools (Chef, Puppet)., Certification in cloud platforms (AWS Certified DevOps Engineer, Azure DevOps Engineer Expert). Behavioral skills: Eagerness & Hunger to learn. Good problem solving & decision-making skills. Good communication skills within the team, site and with the customer Ability to stretch respective working hours, when necessary, to support business needs. Ability to work independently and drive issues to closure. Consult, when necessary, with relevant parties, raise timely risks. Why you will love this job: You will be responsible for the integration between a major product infrastructure system and the Amdocs infrastructure system, driving automation helping teams work smarter and faster. Be a key member of an international, highly skilled and encouraging team with various possibilities for personal and professional development! You will have the opportunity to work in multinational environment for the global market leader in its field. We are a dynamic, multi-cultural organization that constantly innovates and empowers our employees to grow. Our people our passionate, daring, and phenomenal teammates that stand by each other with a dedication to creating a diverse, inclusive workplace! We offer a wide range of stellar benefits including health, dental, vision, and life insurance as well as paid time off, sick time, and parental leave! Amdocs is an equal opportunity employer. We welcome applicants from all backgrounds and are committed to fostering a diverse and inclusive workforce
Posted 1 week ago
3.0 - 5.0 years
10 - 15 Lacs
Bengaluru
Work from Office
: Job TitleProject & Change Execution Manager, VP LocationBangalore, India Role Description Vice President Core Engineering (Technical Leadership Role) We are seeking a highly skilled and experienced Vice President of Engineering to lead the design, development, and maintenance of our core software systems and infrastructure. This is a purely technical leadership role ideal for someone who thrives on solving complex engineering challenges, stays ahead of modern technology trends, and is passionate about software craftsmanship. You will play a pivotal role in shaping our architecture, contributing directly to the codebase, and mentoring engineers across the organization. This role does not involve people management responsibilities, but requires strong collaboration and technical influence. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities System Design & Development Architect, develop, and maintain high-performance, scalable software systems using Java. Code Contribution Actively contribute to the codebase, ensuring high standards of quality, performance, and reliability. Database Engineering Design and optimize data-intensive applications using MongoDB, including indexing and query optimization. Microservices & Cloud Implement microservices architecture following established guidelines, deployed on Google Kubernetes Engine (GKE) . Security & Compliance Ensure systems comply with security regulations and internal policies. Infrastructure Oversight Review and update policies related to internal systems and equipment. Mentorship Guide and mentor engineers, setting a high bar for technical excellence and best practices. Cross-functional Collaboration Work closely with product managers, architects, and other stakeholders to translate business requirements into scalable technical solutions, including HLD and LLD documentation. Process Improvement Drive best practices in software development, deployment, and operations. Your skills and experience Deep expertise in software architecture, cloud infrastructure, and modern development practices. Strong coding skills and a passion for hands-on development. Excellent communication and leadership abilities. 10+ years of professional software development experience, with deep expertise in Java . Strong experience with MongoDB and building data-intensive applications. Proficiency in Kubernetes and deploying systems at scale in cloud environments , preferably Google Cloud Platform (GCP) . Hands-on experience with CI/CD pipelines , monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK ). Solid understanding of reactive or event-driven architectures . Familiarity with Infrastructure as Code (IaC) tools such as Terraform . Experience with modern software engineering practices , including TDD, CI/CD, and Agile methodologies. Front-end knowledge is a plus. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs
Posted 1 week ago
3.0 - 5.0 years
35 - 40 Lacs
Pune
Work from Office
: Job TitleLead Engineer, VP LocationPune, India Role Description Vice President Core Engineering (Technical Leadership Role) We are seeking a highly skilled and experienced Vice President of Engineering to lead the design, development, and maintenance of our core software systems and infrastructure. This is a purely technical leadership role ideal for someone who thrives on solving complex engineering challenges, stays ahead of modern technology trends, and is passionate about software craftsmanship. You will play a pivotal role in shaping our architecture, contributing directly to the codebase, and mentoring engineers across the organization. This role does not involve people management responsibilities, but requires strong collaboration and technical influence. Deutsche Banks Corporate Bank division is a leading provider of cash management, trade finance and securities finance. We complete green-field projects that deliver the best Corporate Bank - Securities Services products in the world. Our team is diverse, international, and driven by shared focus on clean code and valued delivery. At every level, agile minds are rewarded with competitive pay, support, and opportunities to excel. You will work as part of a cross-functional agile delivery team. You will bring an innovative approach to software development, focusing on using the latest technologies and practices, as part of a relentless focus on business value. You will be someone who sees engineering as team activity, with a predisposition to open code, open discussion and creating a supportive, collaborative environment. You will be ready to contribute to all stages of software delivery, from initial analysis right through to production support. What we'll offer you As part of our flexible scheme, here are just some of the benefits that youll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral) Sponsorship for Industry relevant certifications and education Employee Assistance Program for you and your family members Comprehensive Hospitalization Insurance for you and your dependents Accident and Term life Insurance Complementary Health screening for 35 yrs. and above Your key responsibilities System Design & Development Architect, develop, and maintain high-performance, scalable software systems using Java. Code Contribution Actively contribute to the codebase, ensuring high standards of quality, performance, and reliability. Database Engineering Design and optimize data-intensive applications using MongoDB, including indexing and query optimization. Microservices & Cloud Implement microservices architecture following established guidelines, deployed on Google Kubernetes Engine (GKE) . Security & Compliance Ensure systems comply with security regulations and internal policies. Infrastructure Oversight Review and update policies related to internal systems and equipment. Mentorship Guide and mentor engineers, setting a high bar for technical excellence and best practices. Cross-functional Collaboration Work closely with product managers, architects, and other stakeholders to translate business requirements into scalable technical solutions, including HLD and LLD documentation. Process Improvement Drive best practices in software development, deployment, and operations. Your skills and experience Deep expertise in software architecture, cloud infrastructure, and modern development practices. Strong coding skills and a passion for hands-on development. Excellent communication and leadership abilities. 10+ years of professional software development experience, with deep expertise in Java . Strong experience with MongoDB and building data-intensive applications. Proficiency in Kubernetes and deploying systems at scale in cloud environments , preferably Google Cloud Platform (GCP) . Hands-on experience with CI/CD pipelines , monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK ). Solid understanding of reactive or event-driven architectures . Familiarity with Infrastructure as Code (IaC) tools such as Terraform . Experience with modern software engineering practices , including TDD, CI/CD, and Agile methodologies. Front-end knowledge is a plus. How we'll support you Training and development to help you excel in your career Coaching and support from experts in your team A culture of continuous learning to aid progression A range of flexible benefits that you can tailor to suit your needs About us and our teams Please visit our company website for further information: https://www.db.com/company/company.htm We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively. Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group. We welcome applications from all people and promote a positive, fair and inclusive work environment.
Posted 1 week ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2