Get alerts for new jobs matching your selected skills, preferred locations, and experience range.
3.0 - 7.0 years
15 - 20 Lacs
Noida, Pune
Work from Office
The duties of a Site Reliability Engineer will be to support and maintain various Cloud Infrastructure Technology Tools in our hosted production/DR environments. He/she will be the subject matter expert for specific tool(s) or monitoring solution(s). Will be responsible for testing, verifying and implementing upgrades, patches and implementations. He/She will also partner with the other service and/or service functions to investigate and/or improve monitoring solutions. May mentor one or more tools team members or provide training to other cross functional teams as required. May motivate, develop, and manage performance of individuals and teams while on shift. May be assigned to produces regular and adhoc management reports in a timely manner. Proficient in Splunk/ELK, and Datadog. Experience with observability tools such as Prometheus/InfluxDB, and Grafana. Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages. Design, develop, and maintain observability tools and infrastructure. Collaborate with other teams to ensure observability best practices are followed. Develop and maintain dashboards and alerts for monitoring system health. Troubleshoot and resolve issues related to observability tools and infrastructure. Bachelors Degree in information systems or Computer Science or related discipline with relevant experience of 5-8 years Proficient in Splunk/ELK, and Datadog. Experience with Enterprise Software Implementations for Large Scale Organizations Exhibit extensive experience about the new technology trends prevalent in the market like SaaS, Cloud, Hosting Services and Application Management Service Monitoring tools like : Grafana, Prometheus, Datadog, Experience in deployment of application & infrastructure clusters within a Public Cloud environment utilizing a Cloud Management Platform Professional and positive with outstanding customer-facing practices Can-do attitude, willing to go the extra mile Consistently follows-up and follows-through on delegated tasks and actions
Posted 1 week ago
10.0 - 15.0 years
15 - 20 Lacs
Pune
Work from Office
Develop implement DevOps practices across cloud on-premises Lead mentor DevOps team, collaborate with cross-functional teams Oversee IaC, manage hybrid infrastructure, design implement CI/CD pipelines Set up monitoring, troubleshoot issues, ensure security compliance Act as liaison between development operations Bachelor's degree in CS/Engg, 10+ yrs DevOps exp Cloud (AWS, Azure), IaC (Ansible, CloudFormation), CI/CD (Jenkins, GitLab), containers (Docker, Kubernetes), scripting (Python, Bash), monitoring (Dynatrace, Grafana), strong problem-solving, leadership, communication skills Security compliance (SOC2, ISO 27001), cloud certifications
Posted 1 week ago
7.0 - 12.0 years
15 - 27 Lacs
Gurugram
Work from Office
Role & responsibilities The mandatory skills we require include: Application-Specific Troubleshooting: Demonstrated experience in effectively diagnosing and resolving application-related issues. Coding Proficiency: Strong coding skills in at least one programming language, such as Python, Java, Groovy, or Shell Scripting. Virtual Machine and Network Troubleshooting: Good exposure to working with virtual machines and troubleshooting network-related issues. Database Experience: Hands-on experience with databases, particularly PostgreSQL and Redis, including the ability to write and optimize SQL queries. Monitoring Tools: Familiarity with monitoring solutions, specifically experience with Prometheus, Grafana, and Loki for observability and alerting. Kafka Experience: Practical experience with Apache Kafka, including its configuration and troubleshooting. CI/CD Pipeline Optimization: Expertise in optimizing CI/CD pipelines using Jenkins. Containerization and Orchestration: Proficiency in configuring and troubleshooting Kubernetes and Docker, including managing pods, handling restarts, and overseeing deployments. Testing and Load Simulation: Experience in testing and simulating load to assess service performance. Monitoring Service Interactions and Resiliency Patterns: Ability to monitor service interactions and implement resiliency patterns effectively
Posted 1 week ago
1.0 - 5.0 years
9 - 13 Lacs
Mumbai, Gurugram, Bengaluru
Work from Office
We are seeking a talented and motivated Data Scientist with 1-3 years of experience to join our Data Science team. If you have a strong passion for data science, expertise in machine learning, and experience working with large-scale datasets, we want to hear from you. As a Data Scientist at RevX, you will play a crucial role in developing and implementing machine learning models to drive business impact. You will work closely with teams across data science, engineering, product, and campaign management to build predictive models, optimize algorithms, and deliver actionable insights. Your work will directly influence business strategy, product development, and campaign optimization. Major Responsibilities: Develop and implement machine learning models, particularly neural networks, decision trees, random forests, and XGBoost, to solve complex business problems. Work on deep learning models and other advanced techniques to enhance predictive accuracy and model performance. Analyze and interpret large, complex datasets using Python, SQL, and big data technologies to derive meaningful insights. Collaborate with cross-functional teams to design, build, and deploy end-to-end data science solutions, including data pipelines and model deployment frameworks. Utilize advanced statistical techniques and machine learning methodologies to optimize business strategies and outcomes. Evaluate and improve model performance, calibration, and deployment strategies for real-time applications. Perform clustering, segmentation, and other unsupervised learning techniques to discover patterns in large datasets. Conduct A/B testing and other experimental designs to validate model performance and business strategies. Create and maintain data visualizations and dashboards using tools such as matplotlib, seaborn, Grafana, and Looker to communicate findings. Provide technical expertise in handling big data, data warehousing, and cloud-based platforms like Google Cloud Platform (GCP). Required Experience/Skills: Bachelors or Masters degree in Data Science, Computer Science, Statistics, Mathematics, or a related field. 1-3 years of experience in data science or machine learning roles. Strong proficiency in Python for machine learning, data analysis, and deep learning applications. Experience in developing, deploying, and monitoring machine learning models, particularly neural networks, and other advanced algorithms. Expertise in handling big data technologies, with experience in tools such as BigQuery and cloud platforms (GCP preferred). Advanced SQL skills for data querying and manipulation from large datasets. Experience in data visualization tools like matplotlib, seaborn, Grafana, and Looker. Strong understanding of A/B testing, statistical tests, experimental design, and methodologies. Experience in clustering, segmentation, and other unsupervised learning techniques. Strong problem-solving skills and the ability to work with complex datasets and machine learning pipelines. Excellent communication skills, with the ability to explain complex technical concepts to non-technical stakeholders. Preferred Skills: Experience with deep learning frameworks such as TensorFlow or PyTorch. Familiarity with data warehousing concepts and big data tools. Knowledge of MLOps practices, including model deployment, monitoring, and management. Experience with business intelligence tools and creating data-driven dashboards. Understanding of reinforcement learning, natural language processing (NLP), or other advanced AI techniques. Education: Bachelor of Engineering or similar degree from any reputed University.
Posted 1 week ago
5.0 - 9.0 years
12 - 16 Lacs
Mumbai, Gurugram, Bengaluru
Work from Office
Research and Problem-Solving: Identify and frame business problems, conduct exploratory data analysis, and propose innovative data science solutions tailored to business needs. Leadership & Communication: Serve as a technical referent for the research team, driving high-impact, high-visibility initiatives. Effectively communicate complex scientific concepts to senior stakeholders, ensuring insights are actionable for both technical and non-technical audiences. Mentor and develop scientists within the team, fostering growth and technical excellence. Algorithm Development: Design, optimize, and implement advanced machine learning algorithms, including neural networks, ensemble models (XGBoost, random forests), and clustering techniques. End-to-End Project Ownership: Lead the development, deployment, and monitoring of machine learning models and data pipelines for large-scale applications. Model Optimization and Scalability: Focus on optimizing algorithms for performance and scalability, ensuring robust, well-calibrated models suitable for real-time environments. A/B Testing and Validation: Design and execute experiments, including A/B testing, to validate model effectiveness and business impat. Big Data Handling: Leverage tools like BigQuery, advanced SQL, and cloud platforms (e.g., GCP) to process and analyze large datasets. Collaboration and Mentorship: Work closely with engineering, product, and campaign management teams, while mentoring junior data scientists in best practices and advanced techniques. Data Visualization: Create impactful visualizations using tools like matplotlib, seaborn, Looker, and Grafana to communicate insights effectively to stakeholders. Required Experience/Skills 5–8 years of hands-on experience in data science or machine learning roles. 2+ years leading data science projects in AdTech Strong hands-on skills in Advanced Statistics, Machine Learning, and Deep Learning. Demonstrated ability to implement and optimize neural networks and other advanced ML models. Proficiency in Python for developing machine learning models, with a strong grasp of TensorFlow or PyTorch. Expertise handling large datasets using advanced SQL and big data tools like BigQuery In-depth knowledge of MLOps pipelines, from data preprocessing to deployment and monitoring. Strong background in A/B testing, statistical analysis, and experimental design. Proven capability in clustering, segmentation, and unsupervised learning methods. Strong problem-solving and analytical skills with a focus on delivering business value. Education: A Master’s in Data Science, Computer Science, Mathematics, Statistics, or a related field is preferred. A Bachelor's degree with exceptional experience will also be considered.
Posted 1 week ago
2.0 - 7.0 years
3 - 7 Lacs
Ahmedabad
Work from Office
To help us build functional systems that improve customer experience we are now looking for an experienced DevOps Engineer. They will be responsible for deploying product updates, identifying production issues and implementing integrations that meet our customers' needs. If you have a solid background in software engineering and are familiar with Ruby or Python, wed love to speak with you. Responsibilities Work with development teams to ideate software solutions Building and setting up new development tools and infrastructure Working on ways to automate and improve development and release processes Ensuring that systems are safe and secure against cybersecurity threats Deploy updates and fixes Perform root cause analysis for production errors Develop scripts to automate infrastructure provision Working with software developers and software engineers to ensure that development follows established processes and works as intended Technologies we use GitOps GitHub, GitLab, BitBucket CI/CD Jenkins, Circle CI, Travis CI, TeamCity, Azure DevOps Containerization Docker, Swarm, Kubernetes Provisioning Terraform CloudOps Azure, AWS, GCP Observability Prometheus, Grafana, GrayLog, ELK Qualifications Graduate / Postgraduate in Technology sector Proven experience as a DevOps Engineer or similar role Effective communication and teamwork skills
Posted 1 week ago
6.0 - 10.0 years
11 - 12 Lacs
Hyderabad
Work from Office
We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 6 to 10+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm
Posted 1 week ago
1.0 - 6.0 years
4 - 9 Lacs
Chennai
Work from Office
• Proficient in using DataDog, Grafana, and Nagios for monitoring and analysis • Experience in incident management and resolution • Knowledge of AWS and Azure services and architecture • Generate and analyse monitoring reports to provide insights Required Candidate profile • Understanding DevOps principles & practices • Scripting languages • Develop & maintain automation scripts to streamline monitoring process • Detect, analyse, & resolve Cloud/Infrastructure issues
Posted 1 week ago
3.0 - 8.0 years
1 - 4 Lacs
Chandigarh
Work from Office
Opportunity: We are seeking a highly skilled and experienced AI Infrastructure Engineer (or MLOps Engineer) to design, build, and maintain the robust and scalable AI/ML platforms that power our cutting-edge asset allocation strategies. In this critical role, you will be instrumental in enabling our AI Researchers and Quantitative Developers to efficiently develop, deploy, and monitor machine learning models in a high-performance, secure, and regulated financial environment. You will bridge the gap between research and production, ensuring our AI initiatives run smoothly and effectively. Responsibilities: Platform Design & Development: Architect, implement, and maintain the end-to-end AI/ML infrastructure, including data pipelines, feature stores, model training environments, inference serving platforms, and monitoring systems. Environment Setup & Management: Configure and optimize AI/ML development and production environments, ensuring access to necessary compute resources (CPUs, GPUs), software libraries, and data. MLOps Best Practices: Implement and advocate for MLOps best practices, including version control for models and data, automated testing, continuous integration/continuous deployment (CI/CD) pipelines for ML models, and robust model monitoring. Resource Optimization: Manage and optimize cloud computing resources (AWS, Azure, GCP, or on-premise) for cost-efficiency and performance, specifically for AI/ML workloads. Data Management: Collaborate with data engineers to ensure seamless ingestion, storage, and accessibility of high-quality financial and alternative datasets for AI/ML research and production. Tooling & Automation: Select, implement, and integrate various MLOps tools and platforms (e.g., Kubeflow, MLflow, Sagemaker, DataRobot, Vertex AI, Airflow, Jenkins, GitLab CI/CD) to streamline the ML lifecycle. Security & Compliance: Ensure that all AI/ML infrastructure and processes adhere to strict financial industry security standards, regulatory compliance, and data governance policies. Troubleshooting & Support: Provide expert support and troubleshooting for AI/ML infrastructure issues, resolving bottlenecks and ensuring system stability. Collaboration: Work closely with AI Researchers, Data Scientists, Software Engineers, and DevOps teams to translate research prototypes into scalable production systems. Documentation: Create and maintain comprehensive documentation for all AI/ML infrastructure components, processes, and best practices. Qualifications: Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field. Experience: 3+ years of experience in a dedicated MLOps, AI Infrastructure, DevOps, or Site Reliability Engineering role, preferably in the financial services industry. Proven experience in designing, building, and maintaining scalable data and AI/ML pipelines and platforms. Strong proficiency in cloud platforms (AWS, Azure, GCP) including services relevant to AI/ML (e.g., EC2, S3, Sagemaker, Lambda, Azure ML, Google AI Platform). Expertise in containerization technologies (Docker) and orchestration platforms (Kubernetes). Solid understanding of CI/CD principles and tools (Jenkins, GitLab CI/CD, CircleCI, Azure DevOps). Proficiency in scripting languages like Python (preferred), Bash, or similar. Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, Ansible). Familiarity with distributed computing frameworks (e.g., Spark, Dask) is a plus. Understanding of machine learning concepts and lifecycle, even if not directly developing models. Technical Skills: Deep knowledge of Linux/Unix operating systems. Strong understanding of networking, security, and database concepts. Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Familiarity with data warehousing and data lake concepts. Preferred candidate profile Exceptional problem-solving and debugging skills. Proactive and self-driven with a strong sense of ownership. Excellent communication and interpersonal skills, able to collaborate effectively with diverse teams. Ability to prioritize and manage multiple tasks in a fast-paced environment. A keen interest in applying technology to solve complex financial problems.
Posted 1 week ago
5.0 - 10.0 years
7 - 11 Lacs
Mumbai
Work from Office
We are looking for an experienced Senior Java Developer with a strong background in observability and telemetry to join our talented team. In this role, you will be responsible for designing, implementing, and maintaining robust and scalable solutions that enable us to gain deep insights into the performance, reliability, and health of our systems and applications. WHAT'S IN' IT FOR YOU : - You will get a pivotal role in the project and associated incentives based on your contribution towards the project success. - Working on optimizing performance of a platform handling data volume in the range of 5-8 petabytes. - An opportunity to collaborate and work with engineers from Google, AWS, ELK - You will be enabled to take-up leadership role in future to set-up your team as you grow with the customer during the project engagement. - Opportunity for advancement within the company, with clear paths for career progression based on performance and demonstrated capabilities. - Be part of a company that values innovation and encourages experimentation, where your ideas are heard and your contributions are recognized and rewarded. Work in a zero micro-management culture where you get to enjoy accountability and ownership for your tasks RESPONSIBILITIES : - Design, develop, and maintain Java-based microservices and applications with a focus on observability and telemetry. - Implement best practices for instrumenting, collecting, analyzing, and visualizing telemetry data (metrics, logs, traces) to monitor and troubleshoot system behavior and performance. - Collaborate with cross-functional teams to integrate observability solutions into the software development lifecycle, including CI/CD pipelines and automated testing frameworks. - Drive improvements in system reliability, scalability, and performance through data-driven insights and continuous feedback loops. - Stay up-to-date with emerging technologies and industry trends in observability, telemetry, and distributed systems to ensure our systems remain at the forefront of innovation. - Mentor junior developers and provide technical guidance and expertise in observability and telemetry practices. REQUIREMENTS : - Bachelor's or Master's degree in Computer Science, Engineering, or related field. - 5+ years of professional experience in software development with a strong focus on Java programming. - Expertise in observability and telemetry tools and practices, including but not limited to Prometheus, Grafana, Jaeger, ELK stack (Elasticsearch, Logstash, Kibana), and distributed tracing. - Solid understanding of microservices architecture, containerization (Docker, Kubernetes), and cloud- native technologies (AWS, Azure, GCP). - Proficiency in designing and implementing scalable, high-performance, and fault-tolerant systems. -Strong analytical and problem-solving skills with a passion for troubleshooting complex issues. - Excellent communication and collaboration skills with the ability to work effectively in a fast-paced, agile environment. - Experience with Agile methodologies and DevOps practices is a plus.
Posted 1 week ago
5.0 - 10.0 years
7 - 11 Lacs
Ahmedabad
Work from Office
We are looking for an experienced Senior Java Developer with a strong background in observability and telemetry to join our talented team. In this role, you will be responsible for designing, implementing, and maintaining robust and scalable solutions that enable us to gain deep insights into the performance, reliability, and health of our systems and applications. WHAT'S IN' IT FOR YOU : - You will get a pivotal role in the project and associated incentives based on your contribution towards the project success. - Working on optimizing performance of a platform handling data volume in the range of 5-8 petabytes. - An opportunity to collaborate and work with engineers from Google, AWS, ELK - You will be enabled to take-up leadership role in future to set-up your team as you grow with the customer during the project engagement. - Opportunity for advancement within the company, with clear paths for career progression based on performance and demonstrated capabilities. - Be part of a company that values innovation and encourages experimentation, where your ideas are heard and your contributions are recognized and rewarded. Work in a zero micro-management culture where you get to enjoy accountability and ownership for your tasks RESPONSIBILITIES : - Design, develop, and maintain Java-based microservices and applications with a focus on observability and telemetry. - Implement best practices for instrumenting, collecting, analyzing, and visualizing telemetry data (metrics, logs, traces) to monitor and troubleshoot system behavior and performance. - Collaborate with cross-functional teams to integrate observability solutions into the software development lifecycle, including CI/CD pipelines and automated testing frameworks. - Drive improvements in system reliability, scalability, and performance through data-driven insights and continuous feedback loops. - Stay up-to-date with emerging technologies and industry trends in observability, telemetry, and distributed systems to ensure our systems remain at the forefront of innovation. - Mentor junior developers and provide technical guidance and expertise in observability and telemetry practices. REQUIREMENTS : - Bachelor's or Master's degree in Computer Science, Engineering, or related field. - 5+ years of professional experience in software development with a strong focus on Java programming. - Expertise in observability and telemetry tools and practices, including but not limited to Prometheus, Grafana, Jaeger, ELK stack (Elasticsearch, Logstash, Kibana), and distributed tracing. - Solid understanding of microservices architecture, containerization (Docker, Kubernetes), and cloud- native technologies (AWS, Azure, GCP). - Proficiency in designing and implementing scalable, high-performance, and fault-tolerant systems. -Strong analytical and problem-solving skills with a passion for troubleshooting complex issues. - Excellent communication and collaboration skills with the ability to work effectively in a fast-paced, agile environment. - Experience with Agile methodologies and DevOps practices is a plus.
Posted 1 week ago
3.0 - 6.0 years
4 - 8 Lacs
Bengaluru
Work from Office
We are looking for a Kibana Subject Matter Expert (SME) to support our Network Operations Center (NOC) by designing, developing, and maintaining real-time dashboards and alerting mechanisms. The ideal candidate will have strong experience in working with Elasticsearch and Kibana to visualize key performance indicators (KPIs), system health, and alerts related to NOC-managed infrastructure. Key Responsibilities: Design and develop dynamic and interactive Kibana dashboards tailored for NOC monitoring. Integrate various NOC elements such as network devices, servers, applications, and services into Elasticsearch/Kibana. Create real-time visualizations and trend reports for system health, uptime, traffic, errors, and performance metrics. Configure alerts and anomaly detection mechanisms for critical infrastructure issues using Kibana or related tools (e.g., ElastAlert, Watcher). Collaborate with NOC engineers, infrastructure teams, and DevOps to understand monitoring requirements and deliver customized dashboards. Optimize Elasticsearch queries and index mappings for performance and data integrity. Provide expert guidance on best practices for log ingestion, parsing, and data retention strategies. Support troubleshooting and incident response efforts by providing actionable insights through Kibana visualizations. Primary Skills Proven experience as a Kibana SME or similar role with a focus on dashboards and alerting. Strong hands-on experience with Elasticsearch and Kibana (7.x or higher). Experience in working with log ingestion tools (e.g., Logstash, Beats, Fluentd). Solid understanding of NOC operations and common infrastructure elements (routers, switches, firewalls, servers, etc.). Proficiency in JSON, Elasticsearch Query DSL, and Kibana scripting for advanced visualizations. Familiarity with alerting frameworks such as ElastAlert, Kibana Alerting, or Watcher. Good understanding of Linux-based systems and networking fundamentals. Strong problem-solving skills and attention to detail. Excellent communication and collaboration skills. Preferred Qualifications: Experience in working within telecom, ISP, or large-scale IT operations environments. Exposure to Grafana, Prometheus, or other monitoring and visualization tools. Knowledge of scripting languages such as Python or Shell for automation. Familiarity with SIEM or security monitoring solutions.
Posted 1 week ago
5.0 - 10.0 years
7 - 14 Lacs
Hyderabad
Work from Office
Responsibilities As our Senior Quality Assurance Engineer, you embrace the following responsibilities: Take ownership and responsibility for the design and development of all aspects of testing. Work on acceptance criteria and test scenarios with the Product Owner and development team. Design, execute, and maintain test scenarios and automation capabilities for all test levels and types (e.g., automated, regression, exploratory, etc.). Create and optimize test frameworks and integrate them into deployment pipelines. Participate in the code review process for both production and test code to ensure all critical cases are covered. Monitoring test runs, application errors and performance. Making information flow, keeping the team informed and being a stakeholder in releases and defect tracking. Promote and coach the team towards a quality-focused mindset. Influence and lead the team towards continuous improvement and best testing practices. Be the reference of the QA Center of Practice, promoting their practices and influencing their strategy, bringing your team experience into their plan. Technical and Professional Requirements: As a Senior Quality Assurance Engineer, you must be able to provide among these: Ability to work in an autonomous, self-responsible and self-organised way. 6+ years of experience in software testing, manual and automated Strong Experience working with modern test automation frameworks and tools (Cypress, Playwright, Jest, React testing libraries). Strong experience in different testing practices (from unit to load to endurance to cross-platform) specifically integrated within CI/CD. Experience in continuous testing practices in production by leveraging BOT and virtual users Experience working with CI/CD pipelines and monitoring tools (e.g. Jenkins, TeamCity, Kibana, Grafana, etc.). Knowledge of API testing, REST protocol and microservice architecture concepts. Postman, AWS Able to effectively communicate in English. Comfortable in developing test automation frameworks from scratch and maintaining existing frameworks. Knowledge of software testing theory. Preferred Skills: Technology->Automated Testing->Automated Testing - ALL->API Testing Additional Responsibilities: These are some of the technologies/frameworks/practices we use: NodeJs with Typescript React and NextJS Contentful CMS Optimizely experimentation platform Micro-services, Event streams and file exchange CI/CD with Jenkins pipeline AWS and Terraform InfluxDB, Grafana, Sensu, ELK stack Infrastructure as a code, one-click deployment Docker, Kubernetes Amazon Web Services and cloud deployments (S3, SNS, SQS, RDS, DynamoDB, etc.), using tools such as Terraform or AWS CLI Git, Scrum, Pair Programming, Peer Reviewing InfluxDB, Kibana, Grafana, Sensu Educational Requirements Bachelor of Engineering Service Line Infosys Quality Engineering * Location of posting is subject to business requirements
Posted 1 week ago
4.0 - 9.0 years
9 - 14 Lacs
Bengaluru
Work from Office
Primary Skills Strong hands-on experience with observability tools like AppDynamics, Dynatrace, Prometheus, Grafana, and ELK Stack Proficient in AppDynamics setup, including installation, configuration, monitor creation, and integration with ServiceNow, email, and Teams Ability to design and implement monitoring solutions for logs, traces, telemetry, and KPIs Skilled in creating dashboards and alerts for application and infrastructure monitoring Experience with AppDynamics features such as NPM, RUM, and synthetic monitoring Familiarity with AWS and Kubernetes, especially in the context of observability Scripting knowledge in Python or Bash for automation and tool integration Understanding of ITIL processes and APM support activities Good grasp of non-functional requirements like performance, capacity, and security Secondary Skills AppDynamics Performance Analyst or Implementation Professional certification Experience with other APM tools like New Relic, Datadog, or Splunk Exposure to CI/CD pipelines and integration of monitoring into DevOps workflows Familiarity with infrastructure-as-code tools like Terraform or Ansible Understanding of network protocols and troubleshooting techniques Experience in performance tuning and capacity planning Knowledge of compliance and audit requirements related to monitoring and logging Ability to work in Agile/Scrum environments and contribute to sprint planning from an observability perspective
Posted 1 week ago
2.0 - 7.0 years
3 - 7 Lacs
Bengaluru
Work from Office
A career in IBM Software means you ll be part of a team that transforms our customer s challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the world s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. We are seeking a skilled back-end developer to join our IBM Software team. As part of our team, you will be responsible for developing and maintaining high-quality software products, working with a variety of technologies and programming languages. IBM s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive. Your role and responsibilities At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in AI, analytics, security, commerce, and quantum computing and with unmatched hardware and software design and industrial research capabilities, no other company is as well positioned to address the full opportunity of enterprise cloud computing. We are looking for a backend developer to join our IBM Cloud VPC Observability team. This team is part of IBM Cloud VPC Service dedicated to ensuring that the IBM Cloud is at the forefront of reliable enterprise cloud technology. We are building Observability platforms to deliver performance, reliability and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. In this role, you will be responsible for producing and enhancing features that collect, transform, and surface data on the various components of our cloud. The ability to take in requirements on an agile basis and be able to work autonomously with high level perspective is a must. You understand cloud native concepts and have experience with highly tunable and scalable Kubernetes based cloud deployments. You will participate in the design of the service, writing tools and automation, building containers, developing tests, determining monitoring best practices and handling complex escalations. If you are the kind of person who is collaborative, able to handle responsibility and enjoys not only sharing a vision but getting your hands dirty to be sure that the vision is made a reality in a fast-paced, challenging environment, then we want to talk to you! Required education Bachelor's Degree Required technical and professional expertise Bachelor's in Engineering, Computer Science, or relevant experience 2+ years experience and expertise in programming atleast in one language Python/Go/Node.js 1+ years experience in developing and deploying applications on Kubernetes and containerization technologies like Docker 2+ years familiarity with working in a CICD environment 2+ years experience with developing and operating highly available, distributed applications in production environments on Kubernetes Experience with building automated tests, handling customer escalations, 1+ years experience with managing service dependencies via Terraform or Ansible At least 2 years of experience coding and troubleshooting applications written in Go, Python, Node.js, Express.js. 1+ years experience in operating with secure principles At least 3 years of experience with micro-service development At least 1 years' experience with no-sql database systems such as MongoDB At least 1 years' experience with operating, configuring, and developing with caching systems like redis Proven understanding of REST principles and architecture Familiarity with working with Cloud services (IBM Cloud, GCP, AWS, Azure) Preferred technical and professional experience Advanced Experience with Kubernetes Experience with development on PostgreSQL, Kafka, Elastic, MySQL, Redis, or MongoDB 2 years experience with managing Linux machines using Configuration management (eg, Chef, Puppet, Ansible). Debian experience is preferred 2+ years experience with ability to automate using scripting languages like Python, Shell Experience with troubleshooting, using and configuring Linux systems 2+ years experience with infrastructure automation 2+ years experience with using monitoring tooling like Grafana, Prometheus ABOUT BUSINESS UNIT
Posted 1 week ago
6.0 - 9.0 years
14 - 24 Lacs
Bengaluru
Hybrid
Role & responsibilities Site Reliability Engineering Preferred candidate profile * GitLab setup & administration * Implement best practices to improve pipeline performance * AWS with Terraform coding * Linux administration & troubleshooting * Strong coding skills in any language (preferably Python) * Familiar with container technologies (Docker / Kubernetes) * Good knowledge of infrastructure and application monitoring (Prometheus / Grafana / Could watch)
Posted 1 week ago
5.0 - 8.0 years
10 - 20 Lacs
Chennai
Hybrid
Experience - 5+ Years Location - Chennai Responsibilities Direct Responsibilities - Installing, upgrading, maintaining of all related software products and applications (icinga, Grafana, Jenkins, influxdb, ) - Maintaining the usage of Dynatrace, its interfaces and related automations - Maintaining of the overall monitoring configuration based on ServiceNow Tickets - Maintaining the current installation in view of obsolescence, hotfixes, updates, patches, - Enhancing the current installation in view of future needs or business needs And/or - Installing, upgrading, maintaining our ansible based application automation environment, into our rundeck solution - Implementation of new roles based on the needs and requirements from either ourself, or from other IT- or Business-teams - Troubleshooting in case of errors due to the regular patching and lifecycle process - Focusing on security related topics, like product-hardening, patching-cycles, vulnerability-avoidance - Developing of additionally needed ansible plugins for our infrastructure (based on python) - Integrating and maintaining ansible-based roles for other kinds of infrastructure, like F5 Loadbalancer, Monitoring, Contributing Responsibilities - Providing support for the corresponding teams, which work with the monitoring infrastructure, i.e. ITO Pilotage (L1/L2) for Monitoring issues, alerts, notifications; Application Engineering for developing new plugins or enhancing existing ones, or also application development for providing background information regarding application monitoring or in case of troubles with their pipelines and workflows - Contacting the group responsibles of the cloud monitoring infrastructure for alignment on their and our prerequisites and/or the team(s) which are responsible for providing and running the application automation infrastructure and their components Technical & Behavioral Competencies - Good it-knowledge in general, mainly Linux-Knowlegde is needed - Advanced Linux Operating Systems (esp. RedHat Linux) Administration knowledge - Good experience in scripting / Automation based on bash, python, perl, ansible, rundeck Jenkins - Good troubleshooting skills and logical way of thinking - Good knowledge and experience in different types of monitoring applications / environments, like Xymon, Icinga, Grafana, Influxdb, Dynatrace - Good knowledge and experience in designing and developing ansible roles and corresponding python modules - Good knowledge in other, infrastructure based components, like f5 loadbalancers, Firewalls, networks, .. - Base knowledge in cloud-architecture and principles (in order to support the move 2 cloud project) Specific Qualifications (if required) - Successfully completed studies in computer science or business informatics or comparable knowledge and experience acquired through practical and IT training - Good knowledge and experience in Jenkins automation - Good knowledge and experience in Ansible and Rundeck and Gitlab automation - Several years of professional experience in the IT application management or in the support of complex IT applications - Advanved skills in Demand management and Software-Engineering - Working with gitlab as source code versioning system - Advanced linguistic proficiency in English - Expert knowledge in cloud-architecture (Private-, Public- and Hybrid Cloud) - Analytical skills - Fundamental database technology knowledge (Oracle, Postresql, ) - Very good German and English language skills in word and writing
Posted 1 week ago
5.0 - 10.0 years
7 - 12 Lacs
Bengaluru
Hybrid
Position Overview: We are seeking a Senior Software Engineer to help drive our build, release, and testing infrastructure to the next level. You will focus on scaling and optimizing our systems for large-scale, high-performance deployments reducing build times from days to mere minutes while maintaining high-quality releases. As part of our collaborative, fast-paced engineering team, you will play a pivotal role in delivering tools and processes that support continuous delivery, test-driven development, and agile methodologies. Key Responsibilities: Automation & Tooling Development: Build, maintain, and improve our automated build, release, and testing infrastructure. Your focus will be on developing tools and scripts that automate our deployment pipeline, enabling a seamless and efficient continuous delivery process. Cross-functional Collaboration: Collaborate closely with development, QA, and SRE teams to ensure our build infrastructure meets the needs of all teams. Work with teams across the organization to create new tools, processes, and technologies that will streamline and enhance our delivery pipeline. Innovative Technology Integration: Stay on top of the latest advancements in cloud technology, automation, and infrastructure tools. You ll have the opportunity to experiment with and recommend new technologies, including AWS services, to enhance our CI/CD system. Scaling Infrastructure: Work on scaling our infrastructure to meet the demands of running thousands of automated tests for every commit. Help us reduce compute time from days to minutes, addressing scalability and performance challenges as we grow. Continuous Improvement & Feedback Loops: Be a champion for continuous improvement by collecting feedback from internal customers, monitoring the adoption of new tools, and fine-tuning processes to maximize efficiency, stability, and overall satisfaction. Process & Project Ownership: Lead the rollout of new tools and processes, from initial development through to full implementation. You ll be responsible for ensuring smooth adoption and delivering value to internal teams. Required Qualifications: 5+ years of experience in software development with a strong proficiency in at least one of the following languages: Python , Go , Java , or JavaScript . Deep understanding of application development, microservices architecture, and the elements that drive a successful multi-service ecosystem. Familiarity with building and deploying scalable services is essential. Strong automation skills : Experience scripting and building tools for automation in the context of continuous integration and deployment pipelines. Cloud infrastructure expertise : Hands-on experience with AWS services (e.g., EC2, S3, Lambda, RDS) and Kubernetes or containerized environments. Familiarity with containerization : Strong understanding of Docker and container orchestration, with a particular focus on cloud-native technologies. Problem-solving mindset : Ability to identify, troubleshoot, and resolve technical challenges, particularly in large-scale systems. Agile experience : Familiarity with Agile methodologies, and the ability to collaborate effectively within cross-functional teams to deliver on-time and with high quality. Collaboration skills : Ability to communicate complex technical concepts to both technical and non-technical stakeholders. Strong team-oriented mindset with a focus on delivering value through collaboration. Bachelor s degree in Computer Science or a related field, or equivalent professional experience. Preferred Qualifications: Experience with Kubernetes (K8s): In-depth knowledge of Kubernetes architecture and operational experience in managing Kubernetes clusters at scale. CI/CD expertise: Solid experience working with CI/CD pipelines and tools (e.g., Terraform, Ansible, Spinnaker). Infrastructure-as-code experience: Familiarity with Terraform , CloudFormation , or similar tools for automating cloud infrastructure deployments. Container orchestration & scaling : Experience with Karpenter or other auto-scaling tools for Kubernetes. Monitoring & Logging : Familiarity with tools such as Prometheus , Grafana , and CloudWatch for tracking infrastructure performance and debugging production issues.
Posted 1 week ago
10.0 - 15.0 years
12 - 17 Lacs
Bengaluru
Work from Office
We are looking for an experienced Staff BT Site Reliability Engineer to join our Business Technology team to build, improve, and maintain our cloud platform services. The Site Reliability Engineering team builds foundational back-end infrastructure services and tooling for Okta s corporate teams. We enable teams to build infrastructure at scale and automate their software reliably and predictably. SREs are team players and innovators who build and operate technology using best practices and an agile mindset. We are looking for a smart, innovative, and passionate engineer for this role, someone who is interested in designing and implementing complex cloud-based infrastructure. This is a lean and agile team, and the ideal candidate welcomes the challenge of building in a dynamic and ever changing environment. They enjoy seeing their designs run at scale with automation, testing, and an excellent operational mindset. If you exemplify the ethics of, "If you have to do something more than once, automate it," we want to hear from you! Responsibilities Build and run development tools, pipelines, and infrastructure with a security-first mindset Actively participate in Agile ceremonies, write stories, and support team members through demos, knowledge sharing, and architecture sessions Promote and apply best practices for building secure, scalable, and reliable cloud infrastructure Develop and maintain technical documentation, network diagrams, runbooks, and procedures Designing, building, running, and monitoring Okta's IT infrastructure and cloud services Driving initiatives to evolve our current cloud platforms to increase efficiency and keep it in line with current security standards and best practices Recommend, develop, implement, and manage appropriate policy, standards, processes, and procedural updates Working with software engineers to ensure that development follows established processes and works as intended Create and maintain centralized technical processes, including container and image management Provide excellent customer service to our internal users and be an advocate for SRE services and DevOps practices Qualifications 10+ years of experience as a SRE, DevOps, Systems Engineer, or equivalent Demonstrated ability to develop complex applications for cloud infrastructure at scale and deliver projects on schedule and within budget Proficient in managing AWS multi-account environments and AWS authentication, governance, and using org management suite, including, but not limited to, AWS Orgs, AWS IAM, AWS Identity Center, and Stacksets Proficient with automating systems and infrastructure via Terraform Proficient in developing applications running on AWS or other cloud infrastructure resources, including compute, storage, networking, and virtualization Proficient with Git and building deployment pipeline using commercial tools, especially Github Actions Proficient with developing tooling and automation using Python Proficient with AWS container based workloads and concepts, especially EKS, ECS, and ECR. Experience with monitoring tools, especially Splunk, Cloudwatch, and Grafana Experience with reliability engineering concepts and security best practices on public cloud platforms Experience with image creation and management, especially for container and EC2 based workloads Experience with Github Actions Runner Controller self-hosted runners Knowledgeable with Linux system administration skills Knowledgeable of configuration management tools, such as Ansible and SSM Good communication skills, with the ability to influence others and communicate complex technical concepts to different audiences
Posted 1 week ago
5.0 - 10.0 years
7 - 12 Lacs
Bengaluru
Work from Office
What You ll Be Doing: Lead and implement secure, scalable Kubernetes clusters across on-prem, hybrid, and cloud environments. Integrate security throughout the cluster lifecycle (design to production) with network policies, RBAC, Pod Security Policies, and encryption. Work with development teams to enforce secure containerization practices and integrate security tools into CI/CD pipelines. Implement secure networking, service meshes (Istio, Linkerd), and implement mutual TLS for secure service-to-service communication. Secure CI/CD pipelines with automated security checks (code scanning, vulnerability assessments, configuration checks). Automate Kubernetes infrastructure provisioning with IaC tools (Terraform, CloudFormation, Ansible), embedding security best practices. Enhance automation workflows for patching, vulnerability assessments, and incident response. Implement observability strategies with Prometheus, Grafana, ELK Stack, and Loki for monitoring health, logging, performance, and security. Ensure security events are logged, monitored, and proactively mitigated. Participate in incident response, on-call rotations, root cause analysis, and post-incident reviews to refine security protocols. Define, document, and enforce Kubernetes security best practices and policies. What You ll Bring to the Role: Strong knowledge in Kubernetes, ECS, and migrating applications to cloud-native environments, ensuring security at every stage. Experience in designing secure identity management and access control solutions for Kubernetes, ECS, and cloud platforms. Experience in migrating legacy applications to Kubernetes and ECS, optimizing for security and scalability. Skilled in managing and securing cloud identities, roles, and implementing RBAC in Kubernetes and ECS. Extensive experience in securing and automating CI/CD pipelines with tools like Jenkins, GitLab CI, ArgoCD, and Spinnaker. Hands-on experience with container security using tools like Aqua Security, Twistlock, and runtime protection practices. In-depth understanding of service meshes like Istio and Linkerd, and securing communications with mutual TLS encryption. Expertise in using IaC tools like Terraform, CloudFormation, and Ansible for secure infrastructure automation. Skilled in using Prometheus, Grafana, and ELK Stack for real-time monitoring and proactive incident detection. Experience in managing incidents, troubleshooting, root cause analysis, and improving security protocols. Strong ability to collaborate with cross-functional teams and mentor junior engineers, promoting a security-first culture. Knowledge on secrets in Kubernetes using Vault, Secrets Manager, or Kubernetes Secrets. Experience & Qualifications: 5+ years of experience in managing large-scale, secure Kubernetes clusters, including architecture, security, and scalability. 5+ years of hands-on experience with ECS (Elastic Container Service) and migrating legacy monolithic applications to cloud-native environments (Kubernetes/ECS). 3+ years of experience in cloud security, including IAM (Identity and Access Management), role-based access control (RBAC), and secure identity management for cloud platforms and Kubernetes. 3+ years of experience in automating CI/CD pipelines using tools such as Spinnaker, Jenkins or ArgoCD with an emphasis on integrating security throughout the process. Strong knowledge of service mesh technologies (Istio, Linkerd) and secure networking practices in Kubernetes environments, including mutual TLS encryption. Experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible, and the ability to automate infrastructure provisioning with a security-first approach. Proven experience in implementing monitoring and observability solutions with Prometheus, Grafana, Loki or similar tools to enhance security and detect incidents in real-time. Strong problem-solving skills with hands-on experience in incident management, troubleshooting, and conducting post-incident analysis. Excellent collaboration skills with experience working cross-functionally with security engineers, developers, and DevOps teams to enforce security best practices and policies. Bachelor s degree in Computer Science, Engineering, or a related field, or equivalent professional experience. Certifications (preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), AWS Certified DevOps Engineer, or equivalent certifications in cloud and security domains.
Posted 1 week ago
9.0 - 10.0 years
11 - 12 Lacs
Hyderabad
Work from Office
We are seeking a highly skilled Devops Engineer to join our dynamic development team. In this role, you will be responsible for designing, developing, and maintaining both frontend and backend components of our applications using Devops and associated technologies. You will collaborate with cross-functional teams to deliver robust, scalable, and high-performing software solutions that meet our business needs. The ideal candidate will have a strong background in devops, experience with modern frontend frameworks, and a passion for full-stack development. Requirements : Bachelor's degree in Computer Science Engineering, or a related field. 9 to 10+ years of experience in full-stack development, with a strong focus on DevOps. DevOps with AWS Data Engineer - Roles & Responsibilities: Use AWS services like EC2, VPC, S3, IAM, RDS, and Route 53. Automate infrastructure using Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation . Build and maintain CI/CD pipelines using tools AWS CodePipeline, Jenkins,GitLab CI/CD. Cross-Functional Collaboration Automate build, test, and deployment processes for Java applications. Use Ansible , Chef , or AWS Systems Manager for managing configurations across environments. Containerize Java apps using Docker . Deploy and manage containers using Amazon ECS , EKS (Kubernetes) , or Fargate . Monitoring & Logging using Amazon CloudWatch,Prometheus + Grafana,E Stack (Elasticsearch, Logstash, Kibana),AWS X-Ray for distributed tracing manage access with IAM roles/policies . Use AWS Secrets Manager / Parameter Store for managing credentials. Enforce security best practices , encryption, and audits. Automate backups for databases and services using AWS Backup , RDS Snapshots , and S3 lifecycle rules . Implement Disaster Recovery (DR) strategies. Work closely with development teams to integrate DevOps practices. Document pipelines, architecture, and troubleshooting runbooks. Monitor and optimize AWS resource usage. Use AWS Cost Explorer , Budgets , and Savings Plans . Must-Have Skills: Experience working on Linux-based infrastructure. Excellent understanding of Ruby, Python, Perl, and Java . Configuration and managing databases such as MySQL, Mongo. Excellent troubleshooting. Selecting and deploying appropriate CI/CD tools Working knowledge of various tools, open-source technologies, and cloud services. Awareness of critical concepts in DevOps and Agile principles. Managing stakeholders and external interfaces. Setting up tools and required infrastructure. Defining and setting development, testing, release, update, and support processes for DevOps operation. Have the technical skills to review, verify, and validate the software code developed in the project. Interview Mode : F2F for who are residing in Hyderabad / Zoom for other states Location : 43/A, MLA Colony,Road no 12, Banjara Hills, 500034 Time : 2 - 4pm
Posted 1 week ago
6.0 - 8.0 years
13 - 18 Lacs
Gurugram
Work from Office
Responsibilities : - Define and enforce SLOs, SLIs, and error budgets across microservices - Architect an observability stack (metrics, logs, traces) and drive operational insights - Automate toil and manual ops with robust tooling and runbooks - Own incident response lifecycle: detection, triage, RCA, and postmortems - Collaborate with product teams to build fault-tolerant systems - Champion performance tuning, capacity planning, and scalability testing - Optimise costs while maintaining the reliability of cloud infrastructure Must have Skills : - 6+ years in SRE/Infrastructure/Backend related roles using Cloud Native Technologies - 2+ years in SRE-specific capacity - Strong experience with monitoring/observability tools (Datadog, Prometheus, Grafana, ELK etc.) - Experience with infrastructure-as-code (Terraform/Ansible) - Proficiency in Kubernetes, service mesh (Istio/Linkerd), and container orchestration - Deep understanding of distributed systems, networking, and failure domains - Expertise in automation with Python, Bash, or Go - Proficient in incident management, SLAs/SLOs, and system tuning - Hands-on experience with GCP (preferred)/AWS/Azure and cloud cost optimisation - Participation in on-call rotations and running large-scale production systems Nice to have skills : - Familiarity with chaos engineering practices and tools (Gremlin, Litmus) - Background in performance testing and load simulation (Gatling, Locust, k6, JMeter)
Posted 1 week ago
6.0 - 10.0 years
15 - 25 Lacs
Gurugram, Bengaluru
Hybrid
What you will be doing The Site Reliability Engineer (SRE) operates and maintains production systems in the cloud. Their primary goal is to make sure the systems are up and running and provide the expected performance. This involves daily operations tasks of monitoring, deployment and incident management as well as strategic tasks like capacity planning, provisioning and continuous improvement of processes. Also, a major part of the role is the design for reliability, scalability, efficiency and the automation of everyday system operations tasks. SREs work closely together with technical support teams, application developers and DevOps engineers both on incident resolution and on long-term evolution of systems. Employees will primarily work on creating Terraform, Shell & Ansible scripts and will be part of Application deployments using Azure Kubernetes service. Employees will work with a cybersecurity client/company. Monitor production systems' health, usage, and performance using dashboards and monitoring tools. Track provisioned resources, infrastructure, and their configuration. Perform regular maintenance activities on databases, services, and infrastructure. Respond to alerts and incidents: investigate, resolve, or dispatch according to SLAs. Respond to emergencies: recover systems and restore services with minimal downtime. Coordinate with customer success and engineering teams on incident resolution. Perform postmortems after major incidents. Change management: perform rollouts, rollbacks, patching and configuration changes. Drive demand forecasting and capacity planning with engineering and customer success teams. Consider projected growth and demand spikes. Provision production resources according to capacity demands. Work with the engineering teams on the design and testing for reliability, scalability, performance, efficiency, and security. Track resource utilization and cost-efficiency of production services. What were BSc/MSc, B. Tech degree in STEM, 6+ years of relevant industry experience. Technical skills: Terraform, Docker Swarm/K8s, Python, Unix/Linux Shell scripting, DevOps, GitHub Actions, Azure Active Directory, Azure monitor & Log Analytics. Experience in integrating Grafana with Prometheus will be an added advantage. Strong verbal and written communication skills. Ability to perform on-call duties.
Posted 1 week ago
8.0 - 12.0 years
35 - 50 Lacs
Bengaluru
Work from Office
Job Summary We are seeking a highly skilled Principal Infra Developer with 8 to 12 years of experience to join our team. The ideal candidate will have expertise in Splunk Admin SRE Grafana ELK and Dynatrace AppMon. This hybrid role requires a proactive individual who can contribute to our infrastructure development projects and ensure the reliability and performance of our systems. The position does not require travel and operates during day shifts. Responsibilities Systems Engineer Splunk or ElasticSearch Admin Job Requirements Build Deploy and Manage the Enterprise Lucene DB systems Splunk Elastic to ensure that the legacy physical Virtual systems and container infrastructure for businesscritical services are being rigorously and effectively served for high quality logging services with high availability. Support periodic Observability and infrastructure monitoring tool releases and tool upgrades Environment creation Performance tuning of large scale Prometheus systems Serve as Devops SRE for the internal observability systems in Visas various data centers across the globe including in Cloud environment Lead the evaluation selection design deployment and advancement of the portfolio of tools used to provide infrastructure and service monitoring. Ensure tools utilized can provide the critical visibility on modern architectures leveraging technologies such as cloud containers etc. Maintain upgrade and troubleshoot issues with SPLUNK clusters. Monitor and audit configurations and participate in the Change Management process to ensure that unauthorized changes do not occur. Manage patching and updates of Splunk hosts andor Splunk application software. Design develop recommend and implement Splunk dashboards and alerts in support of the Incident Response team. Ensure monitoring team increases use of automation and adopts a DevOpsSRE mentality Qualification 6plus years of enterprise system logging and monitoring tools experience with a desired 5plus years in a relevant critical infrastructure of Enterprise Splunk and Elasticsearch 5plus yrs of working experience as Splunk Administrator with Cluster Building Data Ingestion Management User Role Management Search Configuration and Optimization. Strong knowledge on opensource logging and monitoring tools. Experience with containers logging and monitoring solutions. Experience with Linux operating system management and administration Familiarity with LANWAN technologies and clear understanding of basic network concepts services Strong understanding of multitier application architectures and application runtime environments Monitoring the health and performance of the Splunk environment and troubleshooting any issues that arise. Worked in 247 on call environment. Knowledge of Python and other scripting languages and infrastructure automation technologies such as Ansible is desired Splunk Admin Certified is a plus
Posted 1 week ago
5.0 - 10.0 years
7 - 12 Lacs
Hyderabad
Work from Office
Position Summary The F5 NGINX Business Unit is seeking a Devops Software Engineer III based in India. As a Devops engineer, you will be an integral part of a development team delivering high-quality features for exciting next generation NGINX SaaS products. In this position, you will play a key role in building automation, standardization, operations support, and tools to implement and support world-class products; design, build, and maintain infrastructure, services and tools used by our developers, testers and CI/CD pipelines. You will champion efforts to improve reliability and efficiency in these environments and explore and lead efforts towards new strategies and architectures for pipeline services, infrastructure, and tooling. When necessary, you are comfortable wearing a developer hat to build a solution. You are passionate about automation and tools. You'll be expected to handle most development tasks independently, with minimal direct supervision. Primary Responsibilities Collaborate with a globally distributed team to design, build, and maintain tools, services, and infrastructure that support product development, testing, and CI/CD pipelines for SaaS applications hosted on public cloud platforms. Ensure Devops infrastructure and services maintain the required level of availability, reliability, scalability, and performance. Diagnose and resolve complex operational challenges involving network, security, and web technologies. This includes troubleshooting problems with HTTP load balancers, API gateways (e.g., NGINX proxies), and related systems. Take part in product support, bug triaging, and bug-fixing activities on a rotating schedule to ensure the SaaS service meets its SLA commitments. Consistently apply forward-thinking concepts relating to automation and CI/CD processes. Skills Experience with deploying infrastructure and services in one or more cloud environments such as AWS, Azure, Google Cloud. Experience with configuration management and deployment automation tools, such as Terraform, Ansible, Packer. Experience with Observability platforms like Grafana, Elastic Stack etc. Experience with source control and CI/CD tools like git, Gitlab CI, Github Actions, AWS Code Pipeline etc. Proficiency in scripting languages such as Python and Bash. Solid understanding of Unix OS Familiarity or experience with container orchestration technologies such as Docker and Kubernetes. Good understanding of computer networking (e.g., DNS, DHCP, TCP, IPv4/v6) Experience with network service technologies (e.g., HTTP, gRPC, TLS, REST APIs, OpenTelemetry). Qualifications Bachelors or advanced degree; and/or equivalent work experience. 5+ years of experience in relevant roles.
Posted 1 week ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
Accenture
36723 Jobs | Dublin
Wipro
11788 Jobs | Bengaluru
EY
8277 Jobs | London
IBM
6362 Jobs | Armonk
Amazon
6322 Jobs | Seattle,WA
Oracle
5543 Jobs | Redwood City
Capgemini
5131 Jobs | Paris,France
Uplers
4724 Jobs | Ahmedabad
Infosys
4329 Jobs | Bangalore,Karnataka
Accenture in India
4290 Jobs | Dublin 2