Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
6.0 years
0 Lacs
hyderabad, telangana, india
On-site
Founded in the year 2017, CoffeeBeans specializes in offering high end consulting services in technology, product, and processes. We help our clients attain significant improvement in quality of delivery through impactful product launches, process simplification, and help build competencies that drive business outcomes across industries. The company uses new-age technologies to help its clients build superior products and realize better customer value. We also offer data-driven solutions and AI-based products for businesses operating in a wide range of product categories and service domains 🚀 Job Overview We are seeking an experienced Lead DevOps Engineer with deep expertise in Kubernetes infrastructure design and implementation. This role requires an individual who can architect, build, and manage enterprise-grade Kubernetes clusters from the ground up. The position offers an exciting opportunity to lead infrastructure modernization initiatives and work with cutting-edge cloud-native technologies. Initial Setup Phase: 🎯 Key Responsibilities Infrastructure Design & Implementation Design and architect enterprise-grade Kubernetes clusters across multi-cloud environments (AWS/Azure/GCP) Build production-ready Kubernetes infrastructure with high availability, scalability, and security best practices Implement Infrastructure as Code using Terraform, Helm charts, and GitOps methodologies Set up monitoring, logging, and observability solutions for Kubernetes workloads Design disaster recovery and backup strategies for containerized applications Leadership & Team Management Lead a team of 3-4 DevOps engineers and provide technical mentorship Drive best practices for containerization, orchestration, and cloud-native development Collaborate with development teams to optimize application deployment strategies Conduct technical reviews and ensure code quality standards across infrastructure components Facilitate knowledge transfer and create comprehensive documentation Operational Excellence Manage CI/CD pipelines integrated with Kubernetes deployments Implement security policies including RBAC, network policies, and container security scanning Optimize cluster performance and resource utilization Automate routine operations and reduce manual intervention Ensure 99.9% uptime for production Kubernetes workloads Strategic Planning Define infrastructure roadmap aligned with business objectives Evaluate and recommend new tools and technologies for container orchestration Capacity planning and cost optimization for cloud infrastructure Risk assessment and mitigation strategies for production environments 🛠 Must-Have Technical Skills Core Kubernetes Expertise 6+ years of hands-on experience with Kubernetes in production environments Deep understanding of Kubernetes architecture, components (etcd, API server, scheduler, kubelet) Expertise in Kubernetes networking (CNI, Ingress controllers, Service mesh) Advanced knowledge of Kubernetes storage (CSI, Persistent Volumes, StorageClasses) Experience with Kubernetes operators and custom resource definitions (CRDs) Infrastructure as Code Terraform - Advanced proficiency for infrastructure provisioning Helm - Creating and managing complex Helm charts Ansible/Chef/Puppet - Configuration management experience GitOps workflows - ArgoCD, Flux, or similar tools Cloud Platforms Multi-cloud experience with at least 2 major cloud providers: AWS: EKS, EC2, VPC, IAM, CloudFormation Azure: AKS, Virtual Networks, Azure Resource Manager GCP: GKE, Compute Engine, VPC, Deployment Manager CI/CD & DevOps Tools Jenkins, GitLab CI, Azure DevOps, or GitHub Actions Docker - Advanced containerization and optimization techniques Container registries - Docker Hub, ECR, ACR, GCR management Version control - Git workflows and branching strategies Monitoring & Observability Prometheus & Grafana - Metrics collection and visualization ELK Stack / EFK - Centralized logging solutions Jaeger/Zipkin - Distributed tracing implementation AlertManager - Intelligent alerting and incident management 💡 Good-to-Have Skills Advanced Technologies Service Mesh experience (Istio, Linkerd, Consul Connect) Serverless platforms (Knative, OpenFaaS, AWS Lambda) Database operations in Kubernetes (PostgreSQL, MongoDB operators) Machine Learning pipelines on Kubernetes (Kubeflow, MLflow) Security & Compliance Container security tools (Twistlock, Aqua Security, Falco) Policy management (Open Policy Agent, Gatekeeper) Compliance frameworks (SOC 2, PCI-DSS, GDPR) Certificate management (cert-manager, Let's Encrypt) Programming & Scripting Python/Go - For automation and tooling development Shell scripting (Bash/PowerShell) - Advanced automation YAML/JSON - Configuration management expertise 🎓 Required Qualifications Education Bachelor's degree in Computer Science, Engineering, or related technical field Relevant certifications preferred: Certified Kubernetes Administrator (CKA) Certified Kubernetes Application Developer (CKAD) Cloud provider certifications (AWS/Azure/GCP) Experience 6-7 years of DevOps/Infrastructure engineering experience 4+ years of hands-on Kubernetes experience in production 2+ years in a lead/senior role managing infrastructure teams Experience with large-scale distributed systems and microservices architecture
Posted 2 hours ago
5.0 years
0 Lacs
bengaluru, karnataka, india
On-site
Job Title: Technical Lead — Backend (Spring Boot / Distributed Systems) Location: Bangalore | Experience: 5+ years | Type: Full-time | Department: Engineering / Technology About the Role: We are seeking a hands-on Technical Lead (Backend) with deep expertise in Java and Spring Boot to lead the design, development, and operation of scalable, secure backend services. You will own system architecture, mentor engineers, and drive delivery quality across microservices, data stores, observability, and DevOps collaboration. This is a leadership-and-coding role—expect to contribute to design reviews and critical code paths while guiding the team. Key Responsibilities: Own the architecture and development of backend services and APIs using Java/Spring Boot; drive microservices patterns, data modeling, and service boundaries. Ensure performance, reliability, and security: conduct capacity planning, profiling, and hardening (rate limiting, input validation, secrets hygiene, OWASP). Define integration patterns for synchronous (REST/gRPC) and asynchronous (Kafka/RabbitMQ) communication; champion idempotency and exactly-once semantics where needed. Design persistence layers with relational databases (PostgreSQL/MySQL) and caching (Redis); guide schema evolution and migration strategies. Set up and enforce testing strategy (unit, contract, integration, load), CI/CD, and release readiness across environments. Partner with DevOps on containerization (Docker), orchestration (Kubernetes/EKS), infrastructure-as-code, observability (logs/metrics/traces), and incident response. Establish authentication/authorization (OAuth2, OIDC, JWT) and multi-tenant best practices (isolation, quotas, rate limits). Mentor the team through code reviews, pairing, and design sessions; uphold coding standards and documentation quality. Collaborate with Product, Design, and cross-functional stakeholders to plan sprints and deliver business outcomes on time. Continuously reduce technical debt; lead root-cause analyses and preventative engineering after incidents. Skills & Experience Required: Java/Spring Boot: REST, Spring Data/JPA & Hibernate, Spring Security, actuator/health checks, configuration management. Distributed Systems: microservices patterns, resiliency (circuit breaker, retries, backoff), idempotency, eventual consistency, saga/outbox patterns. Messaging: Kafka or RabbitMQ (topics, partitions, consumer groups), stream processing basics. Data: PostgreSQL/MySQL (query optimization), Redis (caching patterns); exposure to NoSQL/Search (MongoDB/Elasticsearch) is a plus. Cloud & DevOps: AWS preferred (EKS, RDS, S3, SQS/SNS), Docker/Kubernetes, CI/CD (GitHub Actions/Jenkins), infrastructure-as-code. Observability: structured logging, metrics, distributed tracing (OpenTelemetry/Jaeger/Prometheus/Grafana); SLOs and alerting. Security & Compliance: OAuth2/OIDC/JWT, secrets management, least privilege, audit logging; familiarity with data privacy practices. Quality: JUnit/Testcontainers/Mock frameworks, contract testing, load testing (Gatling/JMeter). Leadership: team mentorship, backlog planning, clear stakeholder communication; ability to simplify complex trade-offs. Qualifications: Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. 5+ years of backend development experience, including 2+ years in a technical leadership capacity. Bonus Points : Experience building multi-tenant SaaS at scale, including region pinning and per-tenant observability. GraphQL exposure; real-time updates (WebSockets) where appropriate. Performance profiling and capacity planning; cost-awareness in design (e.g., caching, batch vs. streaming). Contributions to open source, tech talks, or technical writing.
Posted 1 day ago
10.0 years
0 Lacs
pune, maharashtra, india
On-site
Metro Global Solution Center (MGSC) is internal solution partner for METRO, a €31 Billion international wholesaler with operations in more than 30 countries. The store network comprises a total of 623 stores in 21 countries, of which 522 offer out-of-store delivery (OOS), and 94 dedicated depots. In 12 countries, METRO runs only the delivery business by its delivery companies (Food Service Distribution, FSD). HoReCa and Traders are core customer groups of METRO. The HoReCa section includes hotels, restaurants, catering companies as well as bars, cafés and canteen operators. The Traders section includes small grocery stores and kiosks. The majority of all customer groups are small and medium-sized enterprises as well as sole traders. METRO helps them manage their business challenges more effectively. MGSC, location wise is present in Pune (India), Düsseldorf (Germany) and Szczecin (Poland). We provide HR, Finance, IT & Business operations support to 31 countries, speak 24+ languages and process over 18,000 transactions a day. We are setting tomorrow’s standards for customer focus, digital solutions, and sustainable business models. For over 10 years, we have been providing services and solutions from our two locations in Pune and Szczecin. This has allowed us to gain extensive experience in how we can best serve our internal customers with high quality and passion. We believe that we can add value, drive efficiency, and satisfy our customers. Job Description We are looking for… An experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure, automation, and observability A hands-on engineer who ensures reliability, performance, and scalability of systems A proactive problem solver with a strong focus on operational excellence and continuous improvement A collaborator who bridges development and operations through modern DevOps and SRE practices An effective communicator who thrives in cross-functional teams and drives best practices This role matters to us… The Senior SRE & DevOps Engineer plays a vital role in ensuring the resilience, scalability, and reliability. By applying modern SRE principles, automation, and incident management practices, you will enable faster, more reliable delivery of business value while safeguarding system stability and customer trust. Key Responsibilities Design, implement, and maintain scalable, secure, and cloud-native infrastructure Set up and maintain observability solutions, including monitoring, alerting, logging, and tracing (e.g., Prometheus, Grafana, ELK, DataDog) Continuously improve CI/CD pipelines and automate deployment workflows to increase delivery efficiency Lead structured incident response, root cause analysis, and drive a culture of post-mortem learning Collaborate closely with developers, QA, and architects to ensure seamless integration and performance optimization Apply SRE principles (SLIs, SLOs, SLAs, error budgets) to guide operational decisions and system reliability Champion Infrastructure-as-Code practices using Terraform, Helm, or Ansible Ensure security, compliance, and reliability are embedded into operations Mentor team members and foster a culture of operational excellence and continuous improvement Qualifications Education Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience Work Experience Proven 6 to 8 yrs experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles Hands-on expertise with Kubernetes (preferably GKE), Docker, and service mesh technologies like Istio Strong background in CI/CD practices and tools (GitHub Actions, Jenkins X, ArgoCD, or similar) Experience with observability solutions (Prometheus, Grafana, ELK, Jaeger, DataDog, GCP Dashboards) Proficiency with at least one major cloud platform (GCP, AWS, Azure) Scripting or programming experience (Python, Go, Bash, or similar) Practical knowledge of Infrastructure-as-Code tools like Terraform, Helm, or Ansible Hands-on experience managing incidents, troubleshooting, and performing root cause analysis Familiarity with SRE practices (SLIs, SLOs, SLAs, error budgets) Other Requirements Strong communication and collaboration skills across cross-functional teams Ability to balance short-term operational needs with long-term scalability and system health Analytical and proactive mindset with focus on continuous improvement Fluency in English (written and spoken) Nice-to-Have Experience with security best practices in distributed systems (OAuth2, mTLS, RBAC) Knowledge of cost optimization and cloud governance practices Familiarity with Camunda/CIB7 environments Contributions to open-source DevOps/SRE communities
Posted 1 day ago
15.0 - 17.0 years
0 Lacs
noida, uttar pradesh, india
Remote
Req ID: 340251 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Industry Consulting Manager to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Description: Technical Architect Observability & SRE Frameworks Position Title: Technical Architect Observability & Site Reliability Engineering (SRE) Location: Noida, India Experience: 15+ years (including 5+ years in observability/SRE architecture) Employment Type: Full-time Role Overview We are looking for a highly experienced Technical Architect to lead the design, strategy, and implementation of Observability and SRE frameworks for enterprise-scale, microservices-based applications. The ideal candidate will bring deep technical knowledge of both Splunk Observability Stack and Open Source tools (like OpenTelemetry, Prometheus, Grafana, Jaeger), and be capable of defining and executing architecture strategies for complex distributed systems. This role requires hands-on ability to create architecture blueprints , lead technical teams, and work directly with stakeholders and platform owners to embed observability and reliability practices across the SDLC. Key Responsibilities Architecture & Blueprinting Design and deliver end-to-end observability architecture (Metrics, Logs, Traces, Events) for cloud-native and hybrid environments. Create technical architecture diagrams, data flow maps, and integration blueprints using tools like Lucidchart, Draw.io, or Visio. Lead the definition of SLIs, SLOs, and Error Budgets aligned with business KPIs and DORA metrics. Toolchain Strategy & Implementation Architect telemetry pipelines using OpenTelemetry Collector and Splunk Observability Cloud (SignalFx, APM, RUM, Log Observer). Define tool adoption strategy and integration roadmap for OSS tools (Prometheus, Loki, Grafana, Jaeger) and Splunk-based stacks. Guide teams on instrumentation approaches (auto/manual) across languages like Java, Go, Python, .NET, etc. Reliability Engineering Enablement Lead adoption of SRE principles including incident management frameworks, resiliency testing, and runbook automation. Collaborate with DevOps to integrate observability into CI/CD pipelines (e.g., Jenkins, ArgoCD, GitHub Actions). Define health checks, golden signals, and SPoG (Single Pane of Glass) dashboards. Exposure to AIOps, ML-based anomaly detection, or business observability. Stakeholder Management & Governance Serve as a technical liaison between client leadership, SREs, developers, and infrastructure teams. Run workshops, assessments, and evangelize observability-first culture across teams. Provide guidance on data retention, access control, cost optimization, and compliance (especially with Splunk ingestion policies). Performance & Optimization Continuously monitor and fine-tune observability data flows to prevent alert fatigue and ensure actionability. Implement root cause analysis practices using telemetry correlation across metrics, logs, and traces. Lead efforts to build self-healing systems using automated playbooks and AIOps integrations (where applicable). Required Skills & Qualifications 15+ years in IT, with 5 years in Observability/SRE architecture roles Proven experience designing architecture for microservices, containers (Docker, Kubernetes), and distributed systems Strong hands-on expertise with: Splunk Observability Cloud (SignalFx, Log Observer, APM) OpenTelemetry (SDKs + Collector) Prometheus + Grafana Jaeger / Zipkin for distributed tracing CI/CD tools: Jenkins, GitHub Actions, ArgoCD Ability to build and present clear architecture diagrams and solution roadmaps Working knowledge of cloud environments (AWS, Azure, GCP) and container orchestration (K8s/OpenShift) Familiarity with SRE and DevOps best practices (error budgets, release engineering, chaos testing) Nice to Have Splunk certifications: Core Consultant, Observability Specialist, Admin Knowledge of ITIL and modern incident management frameworks (PagerDuty, OpsGenie) Experience in banking or regulated enterprise environments Soft Skills Strong leadership and cross-functional collaboration Ability to work in ambiguous, fast-paced environments Excellent documentation and communication skills Passion for mentoring teams and building best practices at scale Why This Role Matters The client is on a journey to mature its Observability and SRE ecosystem , and this role will be critical in: Unifying legacy and modern telemetry stacks Driving reliability-first mindset and tooling Establishing a scalable blueprint for production excellence About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each clients needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us . NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us . This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here . If you&aposd like more information on your EEO rights under the law, please click here . For Pay Transparency information, please click here . Show more Show less
Posted 1 day ago
4.0 years
0 Lacs
bengaluru, karnataka, india
On-site
We’re transforming the software industry. We’re Flexera. With more than 50,000 customers across the world, we’re achieving that goal. But we know we can’t do any of that without our team. Ready to help us re-imagine the industry during a time of substantial growth and ambitious plans? Come and see why we’re consistently recognized by Gartner, Forrester and IDC as a category leader in the marketplace. Flexera delivers Technology Value Optimization solutions that enable some of the largest companies in the world to inform their IT so they can transform their IT. From on-prem to the cloud, companies can get the IT asset data needed to rightsize, reallocate spend, reduce risk and maximize ROI. Site Reliability Engineer - Cloud Cost Optimisation Engineer About Us We're a fast-growing, category-leading organization with ambitious objectives and a positive, inclusive culture. We're looking for passionate professionals who want to grow their talents and achieve great things. If that sounds like you, we want to talk to you about joining our team. The Cloud Enablement team is responsible for accelerating the delivery and improving the operation of our cloud-based software by providing and supporting tools and patterns which reduce the cognitive load on our development teams. We free up our developers to focus on solving problems for our customers rather than spending time on extraneous tasks. Drawing on the shared experience and expertise from our organization and industry; we create, support and evolve the paved path for teams to build, deploy and run secure and reliable software. What will you do? Design, build, advocate for and support the common tools and delivery platform used by Flexera developers. Improve developer experience and operational excellence. Foster collaboration and knowledge sharing across Flexera. Select and rollout supported defaults and standards for CI/CD tooling, Observability, Security and Runtime Environment. Work with teams across several continents, build relationships with our engineers by listening and understanding their needs and balancing this with the needs of our business. Research new tools and patterns and continuously measure and evolve our ways of doing things. Cloud Cost Optimization uses a combination of strategies, techniques, best practices and tools to help manage/reduce cloud costs. You have Developer/DevOps/SRE/Platform experience and a strong interest in software delivery and ongoing operation. Worked on rolling out automation, tools, technologies, patterns and guardrails across an organization or teams. Experience working in a globally distributed team. Extensive public cloud (preferably AWS) knowledge & experience. Deep knowledge of containers (Docker) orchestration (Kubernetes). Knowledge of tools and patterns around CI/CD (familiar with Travis CI, Circle CI, Buildkite or similar). Observability knowledge; Logs, Tracing, Metrics and experience in a few of Elastic Stack, XRay, Jaeger, Zipkin, Prometheus, Honeycomb or LightStep. Enterprise observability tools such as NewRelic, DataDog etc. Cloud cost optimization; Using automation to keep Cloud cost under control and within budget. Enabling individual Engineering teams with cloud cost optimization. Knowledge of operations, including incident management, immutable infrastructure as code (esp. Terraform or CloudFormation), and problem-solving. Produced robust well-tested code preferably in Golang; however, we will also consider Python, JavaScript, Ruby, Java or C# if you are happy to learn Go. Excellent communication skills, including experience in writing good documentation and running workshops. Critical Skills / Competencies Agile software delivery methodologies Experience managing cloud-based services e.g. AWS, Azure at scale Experience with DevOps Experience with docker Containers, Kubernetes, EKS, ECS Infrastructure as code e.g. Terraform, CloudFormation CI/CD pipelines using Jenkins, travisCI, teamcity, pipeline as code Automation / Configuration Management at scale e.g. Puppet, Chef, Ansible, Salt, Packer etc. Service mesh such as ishtio, Consul or similar Expertise in one or more of the following languages: Python / Go / Java / C# / C / C++ Experience with IaaS and Serverless services from a cloud provider A strong understanding in TCP/IP, DNS and experience designing networks Linux & Windows system administration experience Experience implementing fault detection, and automating fixes Experience designing scalable services Experience designing distributed, fault-tolerant systems A good understanding of SQL, No-SQL databases A solid understanding of data structures and algorithms A positive attitude and willingness to learn Strong conflict resolution competence Excellent written and verbal communication skills Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why Minimum Qualifications Bachelor's or higher degree in Computer Science, Information Technology, or a related field. At least 4 years of hands-on job experience managing services in a public cloud At least 1 years of experience working as a member of a centralized Cloud enablement / Platform or a similar team Bonus Skills The following list of items are not pre-requisites for the role but might give you a bit more of an idea about what you may expect to come across in your SRE role at Flexera: Python / Golang / Java / C# / C / C++ / Bash experience Big Data, Machine Learning, AI (DataBricks, Snowflake etc.) Platforms Experience with Monitoring systems such as New Relic, ELK, Prometheus, Datadog, X-ray etc. Security background SQL, NOSQL and Graph databases Relevant Certification e.g. AWS, GCP, Azure Experience of Disciplined Agile Delivery (DAD) Flexera is proud to be an equal opportunity employer. Qualified applicants will be considered for open roles regardless of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by local/national laws, policies and/or regulations. Flexera understands the value that results from employing a diverse, equitable, and inclusive workforce. We recognize that equity necessitates acknowledging past exclusion and that inclusion requires intentional effort. Our DEI (Diversity, Equity, and Inclusion) council is the driving force behind our commitment to championing policies and practices that foster a welcoming environment for all. We encourage candidates requiring accommodations to please let us know by emailing careers@flexera.com.
Posted 2 days ago
7.0 years
0 Lacs
india
On-site
About Latinum: Latinum is hiring for multiple backend engineering roles. You must demonstrate strong capabilities in either Core Java backend engineering or Microservices and Cloud architecture , with working knowledge in the other. Candidates with strengths in both areas will be considered for senior roles. You will be part of a high-performance engineering team solving complex business problems through robust, scalable, and high-throughput systems. Experience : Minimum 7+ years of hands on experience is mandatory. Java & Backend Engineering Java 8+ (Streams, Lambdas, Functional Interfaces, Optionals) Spring Core, Spring Boot, object-oriented principles, exception handling, immutability Multithreading (Executor framework, locks, concurrency utilities) Collections, data structures, algorithms, time/space complexity Kafka (producer/consumer, schema, error handling, observability) JPA, RDBMS/NoSQL, joins, indexing, data modelling, sharding, CDC JVM tuning, GC configuration, profiling, dump analysis Design patterns (GoF – creational, structural, behavioral) Microservices, Cloud & Distributed Systems REST APIs, OpenAPI/Swagger, request/response handling, API design best practices Spring Boot, Spring Cloud, Spring Reactive Kafka Streams, CQRS, materialized views, event-driven patterns GraphQL (Apollo/Spring Boot), schema federation, resolvers, caching Cloud-native apps on AWS (Lambda, IAM, S3, containers) API security (Oauth 2.0, JWT, Keycloak, API Gateway configuration) CI/CD pipelines, Docker, Kubernetes, Terraform Observability with ELK, Prometheus, Grafana, Jaeger, Kiali Additional Skills (Nice to Have) Node.js, React, Angular, Golang, Python, GenAI Web platforms: AEM, Sitecore Production support, rollbacks, canary deployments TDD, mocking, Postman, security/performance test automation Architecture artifacts: logical/sequence views, layering, solution detailing Key Responsibilities: Design and develop scalable backend systems using Java and Spring Boot Build event-driven microservices and cloud-native APIs Implement secure, observable, and high-performance solutions Collaborate with teams to define architecture, patterns, and standards Contribute to solution design, code reviews, and production readiness Troubleshoot, optimize, and monitor distributed systems in production Mentor junior engineers (for senior roles).
Posted 2 days ago
5.0 - 10.0 years
0 Lacs
bengaluru, karnataka, india
On-site
Job Requirements Role: Java Backend Developer Function/ Department: Information Technology Job Purpose Looking for enthusiast developers who would be willing to be a part of Cutting-Edge Technology Team of IDFC First bank. Roles And Responsibilities Good understanding and working knowledge of Spring Boot and Java technology platform. Able to code an endpoint (Backend) end to end (router, service layer and repository) Create APIs and integrating with existing APIs Understand the rationale behind a process, practitioner of Unit testing frameworks, Test-driven development, Jira- Grafana/Dynatrace – Jaeger, Oracle/SQL server, Continuous integration (Understand CI/CD - build pipelines), Working understanding of docker, Kubernetes - to build and deploy apps, Write SQL queries to store, fetch and process data using Oracle Database. Secondary Roles And Responsibilities Ability to deliver outcomes within targets and deliver projects on time. Understand business requirements and follow up to close items that can facilitate a good design. Develop detailed design based on stories or requirements. Support lead in high level design and design review Help trouble shoot in case of production issues. support in environment setup and maintenance. Provide level3 support during production issues. Maintain configuration management of source code. Participate and contribute to scrum meetings. Collaborate in testing and deployment process. Secondary Responsibilities Strong communication skills, self-driven, motivated, proactive approach and detail-oriented, yet comfortable working in a dynamic environment Ability to guide, mentor and manage a talent pool of Business Analysts. Qualifications: Graduation: Bachelor’s in Engineering / Technology / Mathematics / Commerce / Arts / Science / Biology / Business / Computers / Engineering / Management. Experience: 5 to 10 Years.
Posted 3 days ago
7.0 years
2 - 9 Lacs
noida
Remote
Req ID: 340254 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Industry Consulting Snr. Consultant to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Title: Telemetry Engineer Location: Noida, India Employment Type: Full-Time Experience Level: Senior (7+ years preferred) Role Overview: We are seeking a highly skilled Telemetry Engineer to lead the design and implementation of telemetry pipelines across diverse environments including microservices, VM-based applications, cloud-native platforms, and on-premise systems. The ideal candidate will have deep expertise in OpenTelemetry architecture and implementation , and a strong background in observability, distributed systems, and performance monitoring. Key Responsibilities: Architect and implement end-to-end telemetry pipelines for applications deployed across cloud, on-prem, and hybrid environments. Lead the installation, configuration, and optimization of OpenTelemetry components including SDKs, Collector, and exporters. Collaborate with application, infrastructure, and DevOps teams to define telemetry standards and integrate observability into CI/CD workflows. Design scalable and resilient data collection strategies for metrics, logs, and traces . Develop and maintain instrumentation libraries for microservices and legacy applications. Ensure telemetry data is efficiently routed to observability platforms (e.g., Splunk, Prometheus, Grafana, Datadog). Conduct performance tuning and troubleshooting of telemetry pipelines. Provide architectural guidance and best practices for telemetry adoption across teams. Stay current with OpenTelemetry releases and contribute to internal tooling and automation. Required Skills & Qualifications: Proven experience in setting up telemetry pipelines from scratch across multiple environments. Strong hands-on expertise with OpenTelemetry (Collector, SDKs, OTLP protocol). Deep understanding of distributed tracing, metrics collection, and log aggregation . Experience with observability platforms such as Splunk, Prometheus, Grafana, Jaeger, Zipkin, Datadog , etc. Proficiency in one or more programming languages (e.g., Python, Go, Java, Node.js) for instrumentation. Familiarity with cloud platforms (AWS, Azure, GCP) and VM/on-prem infrastructure . Knowledge of container orchestration (Kubernetes), service meshes (Istio), and CI/CD pipelines. Excellent communication and documentation skills. Preferred Qualifications: Experience contributing to or working with the OpenTelemetry community . Certifications in cloud technologies or observability tools. Experience in regulated or enterprise-scale environments. About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client's needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us. NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us. This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.
Posted 5 days ago
15.0 years
2 - 6 Lacs
noida
Remote
Req ID: 340251 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Industry Consulting Manager to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Description: Technical Architect - Observability & SRE Frameworks Position Title: Technical Architect - Observability & Site Reliability Engineering (SRE) Location: Noida, India Experience: 15+ years (including 5+ years in observability/SRE architecture) Employment Type: Full-time Role Overview We are looking for a highly experienced Technical Architect to lead the design, strategy, and implementation of Observability and SRE frameworks for enterprise-scale, microservices-based applications. The ideal candidate will bring deep technical knowledge of both Splunk Observability Stack and Open Source tools (like OpenTelemetry, Prometheus, Grafana, Jaeger), and be capable of defining and executing architecture strategies for complex distributed systems. This role requires hands-on ability to create architecture blueprints , lead technical teams, and work directly with stakeholders and platform owners to embed observability and reliability practices across the SDLC. Key Responsibilities Architecture & Blueprinting Design and deliver end-to-end observability architecture (Metrics, Logs, Traces, Events) for cloud-native and hybrid environments. Create technical architecture diagrams , data flow maps, and integration blueprints using tools like Lucidchart, Draw.io, or Visio. Lead the definition of SLIs, SLOs, and Error Budgets aligned with business KPIs and DORA metrics. Toolchain Strategy & Implementation Architect telemetry pipelines using OpenTelemetry Collector and Splunk Observability Cloud (SignalFx, APM, RUM, Log Observer). Define tool adoption strategy and integration roadmap for OSS tools (Prometheus, Loki, Grafana, Jaeger) and Splunk-based stacks. Guide teams on instrumentation approaches (auto/manual) across languages like Java, Go, Python, .NET, etc. Reliability Engineering Enablement Lead adoption of SRE principles including incident management frameworks, resiliency testing, and runbook automation. Collaborate with DevOps to integrate observability into CI/CD pipelines (e.g., Jenkins, ArgoCD, GitHub Actions). Define health checks, golden signals, and SPoG (Single Pane of Glass) dashboards. Exposure to AIOps , ML-based anomaly detection, or business observability. Stakeholder Management & Governance Serve as a technical liaison between client leadership, SREs, developers, and infrastructure teams. Run workshops, assessments, and evangelize observability-first culture across teams. Provide guidance on data retention, access control, cost optimization, and compliance (especially with Splunk ingestion policies). Performance & Optimization Continuously monitor and fine-tune observability data flows to prevent alert fatigue and ensure actionability. Implement root cause analysis practices using telemetry correlation across metrics, logs, and traces. Lead efforts to build self-healing systems using automated playbooks and AIOps integrations (where applicable). Required Skills & Qualifications 15+ years in IT, with 5 years in Observability/SRE architecture roles Proven experience designing architecture for microservices, containers (Docker, Kubernetes), and distributed systems Strong hands-on expertise with: Splunk Observability Cloud (SignalFx, Log Observer, APM) OpenTelemetry (SDKs + Collector) Prometheus + Grafana Jaeger / Zipkin for distributed tracing CI/CD tools : Jenkins, GitHub Actions, ArgoCD Ability to build and present clear architecture diagrams and solution roadmaps Working knowledge of cloud environments (AWS, Azure, GCP) and container orchestration (K8s/OpenShift) Familiarity with SRE and DevOps best practices (error budgets, release engineering, chaos testing) Nice to Have Splunk certifications: Core Consultant, Observability Specialist, Admin Knowledge of ITIL and modern incident management frameworks (PagerDuty, OpsGenie) Experience in banking or regulated enterprise environments Soft Skills Strong leadership and cross-functional collaboration Ability to work in ambiguous, fast-paced environments Excellent documentation and communication skills Passion for mentoring teams and building best practices at scale Why This Role Matters The client is on a journey to mature its Observability and SRE ecosystem , and this role will be critical in: Unifying legacy and modern telemetry stacks Driving reliability-first mindset and tooling Establishing a scalable blueprint for production excellence About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client's needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us. NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us. This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.
Posted 5 days ago
0 years
0 Lacs
india
On-site
Job Description – Kong Service Mesh Engineer Position Overview We are seeking a skilled Kong Service Mesh Engineer/Architect to design, implement, and support service mesh solutions leveraging Kong Service Mesh (Kuma + Envoy) . The ideal candidate will have expertise in microservices networking, API management, Kubernetes, and hybrid cloud deployments. This role will collaborate with DevOps, Platform, and Application teams to ensure secure, reliable, and observable service-to-service communication across distributed environments. Key Responsibilities Design, deploy, and manage Kong Service Mesh for Kubernetes and hybrid environments (VMs, bare metal, multi-cloud). Implement mTLS, service discovery, traffic routing, retries, circuit breaking, and observability policies within the mesh. Collaborate with the API Gateway team to integrate Kong API Gateway with Kong Service Mesh for north-south and east-west traffic management. Define and enforce zero-trust networking policies across microservices. Troubleshoot and optimize service-to-service communication using Envoy sidecar proxies. Implement monitoring, logging, and tracing solutions (Prometheus, Grafana, Jaeger, ELK, etc.). Drive adoption of best practices for microservices networking, security, and resilience . Support migration of applications from monoliths or non-mesh architectures to Kong Service Mesh. Provide technical guidance, documentation, and training to DevOps and development teams. Required Skills & Qualifications Strong experience with Kong Service Mesh (Kuma) and Envoy proxy . Hands-on experience with Kubernetes (K8s), Helm, Docker, and CI/CD pipelines . Knowledge of service mesh concepts (mTLS, traffic routing, observability, zero trust, multi-zone). Familiarity with Kong API Gateway and its integration with service mesh. Expertise in networking, security policies, and distributed systems . Experience with monitoring & observability tools (Prometheus, Grafana, Jaeger, OpenTelemetry). Programming/scripting experience (Go, Python, Bash, or similar). Good understanding of cloud platforms (AWS, Azure, GCP) and hybrid deployments. Strong troubleshooting skills for service-to-service latency, resiliency, and security issues. Preferred Qualifications Experience with other service meshes (Istio, Linkerd, Consul Connect). Knowledge of API security (OAuth2, JWT, OPA, RBAC/ABAC) . Experience in DevSecOps and SRE practices . Certifications in Kubernetes (CKA/CKAD/CKS) or cloud certifications .
Posted 5 days ago
7.0 years
0 Lacs
pune, maharashtra, india
On-site
Position Overview: The Performance Engineer will play a critical role in analyzing, optimizing, and scaling ArcOne’s data and AI systems, with a focus on revenue management. This role involves deep performance profiling across application, middleware, runtime, and infrastructure layers, developing advanced observability tools, and collaborating with cross-functional teams to meet stringent latency, throughput, and scalability goals. Qualifications: Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. Experience: 7+ years of software engineering experience, with a strong focus on performance or reliability engineering for high-scale distributed systems. Proven expertise in optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Python/Golang internals, GPU utilization). Hands-on experience with real-time and batch processing frameworks (e.g., Apache Kafka, Spark, Flink). Demonstrated success in building observability, benchmarking, or performance-focused infrastructure at scale. Experience in revenue management systems or similar domains (e.g., pricing, forecasting) is a plus. Technical Skills: Deep proficiency with performance profiling tools (e.g., perf, eBPF, VTune) and tracing systems (e.g., Jaeger, Open Telemetry). Strong understanding of OS internals, including scheduling, memory management, and IO patterns. Expertise in programming languages such as Python, Go, or Java, with a focus on runtime optimization. Key Responsibilities: Performance Analysis & Optimization: Analyze and optimize performance across the full stack, including application, middleware, runtime (e.g., Python runtime, GPU utilization), and infrastructure layers (e.g., networking, storage). Perform deep performance profiling, tuning, and optimization for databases, data pipelines, AI model inference, and distributed systems. Optimize critical components such as garbage collection (GC), memory management, IO patterns, and scheduling to ensure high efficiency. Observability & Tooling: Develop and maintain tooling and metrics to provide deep observability into system performance, enabling proactive identification of bottlenecks and inefficiencies. Implement and enhance performance monitoring systems (e.g., tracing, logging, dashboards) to track latency, throughput, and resource utilization in real-time. Contribute to benchmarking frameworks and performance-focused infrastructure to support continuous improvement. Cross-Functional Collaboration: Partner with infrastructure, platform, training, and product teams to define and achieve key performance goals for revenue management systems. Influence architecture and design decisions to prioritize latency, throughput, and scalability in large-scale data and AI systems. Align stakeholders around performance objectives, navigating ambiguity to deliver measurable improvements. Performance Testing & SLAs: Lead the development and execution of performance testing strategies, including load, stress, and scalability tests, for real-time and batch processing workloads. Define and monitor Service Level Agreements (SLAs) and Service Level Objectives (SLOs) around latency, throughput, and system reliability. Drive investigations into high-impact performance regressions or scalability issues in production, ensuring rapid resolution and root cause analysis. System Design & Scalability: Collaborate on the design of robust data architectures and AI systems, ensuring scalability and performance for revenue management use cases. Optimize real-time streaming (e.g., Apache Kafka, Flink) and batch processing (e.g., Spark, Hadoop) workloads for high-scale environments. Advocate for simplicity and rigor in system design to address complex performance challenges.
Posted 5 days ago
6.0 years
0 Lacs
kochi, kerala, india
On-site
Job Title: Senior Software Engineer (Kotlin/Java) Job Code: CUB/2025/DWE/016 Experience: 6 years Location: Infopark, Kochi Why Join Us? Innovative Environment : Join a forward-thinking company that encourages creativity and problem-solving. Career Growth : Opportunities for professional development and career advancement. Collaborative Culture : Work in a team-oriented environment where your contributions are valued. Competitive Compensation : Attractive salary package and performance-based incentives. Job Requirements Minimum 6 years of hands-on software development experience. At least 1 year of practical experience working with Kotlin . Strong knowledge of OOP concepts, design patterns, and unit testing . Experience with Java (Spring) , JavaScript (NodeJS) , and OpenTracing (Jaeger or similar) . Exposure to Spring Webflux (preferred). Experience deploying applications on AWS/Kubernetes . Proficiency with Git and familiarity with Agile methodologies . Ability to design lightweight, scalable, and highly available solutions for global use. Key Responsibilities Design, build, test, and improve high-quality, high-performance applications. Collaborate with the Head of Engineering and technical leads to develop innovative SaaS solutions for the airline industry. Work hands-on with the team to improve and scale existing systems. Ensure solutions meet modern standards for scalability, security, and usability. Contribute to the technical strategy of the company by experimenting and driving innovation. Operate effectively in a fast-paced, early-stage startup environment with high autonomy. Required Skills Strong programming experience in Kotlin and Java (Spring) . Proficiency in Node.js and backend service development. Hands-on experience with cloud-native deployments (AWS, Kubernetes) . Good understanding of distributed tracing (Jaeger or equivalent) . Strong debugging, problem-solving, and performance optimization skills. Preferred Skills Experience with Spring Webflux . Familiarity with airline or e-commerce retailing systems . Exposure to conversational commerce, recommendation systems, or mobile-first solutions . Soft Skills Strong verbal and written communication skills. Ability to work independently and in a collaborative, cross-cultural team environment. Open-minded, resilient, and adaptable to challenges. Startup mindset: self-starter, proactive, and comfortable with ambiguity. Passion for solving complex problems and delivering impactful solutions. Qualifications Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent professional experience). 6+ years of software development experience with at least 1 year in Kotlin.
Posted 5 days ago
8.0 years
0 Lacs
noida, uttar pradesh, india
On-site
Job Description Role : Manager - DevOps We at Pine Labs are looking for those who share our core belief - Every Day is Game day. We bring our best selves to work each day to realize our mission of enriching the world through the power of digital commerce and financial services. Role Purpose We are seeking a Manager DevOps who will lead and manage the organizations DevOps Infrastructure, Observability stack for applications, CI-CD Pipeline and support services. This role involves managing a team of DevOps engineers, architecting scalable infrastructure, and ensuring high availability and performance of our messaging and API management systems. This individual will oversee a team of IT professionals, ensure the seamless delivery of IT services, and implement strategies to align technology solutions with business objectives. The ideal candidate is a strategic thinker with strong technical expertise and proven leadership we entrust you with : Lead and mentor a team of DevOps Lead/Engineers in designing and maintaining scalable infrastructure. Architect and manage Kafka clusters for high-throughput, low-latency data streaming. Deploy, configure, and manage Kong API Gateway for secure and scalable API traffic Design and implement CI/CD pipelines for microservices and infrastructure. Automate infrastructure provisioning using tools like Terraform or Ansible. Monitor system performance and ensure high availability and disaster recovery. Collaborate with development, QA, and security teams to streamline deployments and enforce best practices. Ensure compliance with security standards and implement DevSecOps practices. Maintain documentation and provide training on Kafka and Kong usage and best practices. Strong understanding of observability pillars : metrics, logs, traces, and events. Hands-on experience with Prometheus for metrics collection and Grafana for dashboarding and visualization. Proficiency in centralized logging solutions like ELK Stack (Elasticsearch, Logstash, Kibana), Fluentd, or Splunk. Experience with distributed tracing tools such as Jaeger, Zipkin, or OpenTelemetry. Ability to implement instrumentation in applications for custom metrics and traceability. Skilled in setting up alerting and incident response workflows using tools like Alertmanager, PagerDuty, or Opsgenie. Familiarity with SLOs, SLIs, and SLA definitions and monitoring for service reliability. Experience with anomaly detection and root cause analysis (RCA) using observability data. Knowledge of cloud-native monitoring tools (e.g., AWS CloudWatch, Azure Monitor, GCP Operations Suite). Ability to build actionable dashboards and reports for technical and business stakeholders. Understanding of security and compliance monitoring within observability frameworks. Collaborative mindset to work with SREs, developers, and QA teams to define meaningful observability goals. Prepare and manage the IT budget, ensuring alignment with organizational priorities. Monitor expenditures and identify opportunities for cost savings without compromising quality. Well-spoken with good communication skills, as lot of stakeholder management is needed. What matters in this role : work experience : Bachelors or masters degree in computer science, Engineering, or related field. 8+ years of experience in DevOps or related roles, with at least 5 years in a leadership position. Strong hands-on experience with Apache Kafka (setup, tuning, monitoring, security). Proven experience with Kong API Gateway (plugins, routing, authentication, rate limiting). Proficiency in cloud platforms (AWS, Azure, or GCP). Kafka certification or Kong Gateway certification. Experience with service mesh technologies (e.g., Istio, Linkerd). Knowledge of event-driven architecture and microservices patterns. Experience with GitOps and Infrastructure as Code (IaC). Experience with containerization and orchestration (Docker, Kubernetes). Strong scripting skills (Bash, Python, etc.). Hands on with monitoring tools (Prometheus, Grafana, Mimir, ELK you should be comfortable with : Working from office : 5 days a week ( Sector 62, Noida) Pushing The Boundaries Have a big idea? See something that you feel we should do but havent done? We will hustle hard to make it happen. We encourage out of the box thinking, and if you bring that with you, we will make sure you get a bag that fits all the energy you bring along. What We Value In Our People You take the shot : You Decide Fast and You Deliver Right You are the CEO of what you do : you show ownership and make things happen You own tomorrow : by building solutions for the merchants and doing the right thing You sign your work like an artist : You seek to learn and take pride in the work you do (ref:hirist.tech)
Posted 5 days ago
15.0 years
0 Lacs
noida, uttar pradesh, india
Remote
Req ID: 340251 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Industry Consulting Manager to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Description: Technical Architect – Observability & SRE Frameworks Position Title: Technical Architect – Observability & Site Reliability Engineering (SRE) Location: Noida, India Experience: 15+ years (including 5+ years in observability/SRE architecture) Employment Type: Full-time Role Overview We are looking for a highly experienced Technical Architect to lead the design, strategy, and implementation of Observability and SRE frameworks for enterprise-scale, microservices-based applications. The ideal candidate will bring deep technical knowledge of both Splunk Observability Stack and Open Source tools (like OpenTelemetry, Prometheus, Grafana, Jaeger), and be capable of defining and executing architecture strategies for complex distributed systems. This role requires hands-on ability to create architecture blueprints , lead technical teams, and work directly with stakeholders and platform owners to embed observability and reliability practices across the SDLC. Key Responsibilities Architecture & Blueprinting Design and deliver end-to-end observability architecture (Metrics, Logs, Traces, Events) for cloud-native and hybrid environments. Create technical architecture diagrams, data flow maps, and integration blueprints using tools like Lucidchart, Draw.io, or Visio. Lead the definition of SLIs, SLOs, and Error Budgets aligned with business KPIs and DORA metrics. Toolchain Strategy & Implementation Architect telemetry pipelines using OpenTelemetry Collector and Splunk Observability Cloud (SignalFx, APM, RUM, Log Observer). Define tool adoption strategy and integration roadmap for OSS tools (Prometheus, Loki, Grafana, Jaeger) and Splunk-based stacks. Guide teams on instrumentation approaches (auto/manual) across languages like Java, Go, Python, .NET, etc. Reliability Engineering Enablement Lead adoption of SRE principles including incident management frameworks, resiliency testing, and runbook automation. Collaborate with DevOps to integrate observability into CI/CD pipelines (e.g., Jenkins, ArgoCD, GitHub Actions). Define health checks, golden signals, and SPoG (Single Pane of Glass) dashboards. Exposure to AIOps, ML-based anomaly detection, or business observability. Stakeholder Management & Governance Serve as a technical liaison between client leadership, SREs, developers, and infrastructure teams. Run workshops, assessments, and evangelize observability-first culture across teams. Provide guidance on data retention, access control, cost optimization, and compliance (especially with Splunk ingestion policies). Performance & Optimization Continuously monitor and fine-tune observability data flows to prevent alert fatigue and ensure actionability. Implement root cause analysis practices using telemetry correlation across metrics, logs, and traces. Lead efforts to build self-healing systems using automated playbooks and AIOps integrations (where applicable). Required Skills & Qualifications 15+ years in IT, with 5 years in Observability/SRE architecture roles Proven experience designing architecture for microservices, containers (Docker, Kubernetes), and distributed systems Strong hands-on expertise with: Splunk Observability Cloud (SignalFx, Log Observer, APM) OpenTelemetry (SDKs + Collector) Prometheus + Grafana Jaeger / Zipkin for distributed tracing CI/CD tools: Jenkins, GitHub Actions, ArgoCD Ability to build and present clear architecture diagrams and solution roadmaps Working knowledge of cloud environments (AWS, Azure, GCP) and container orchestration (K8s/OpenShift) Familiarity with SRE and DevOps best practices (error budgets, release engineering, chaos testing) Nice to Have Splunk certifications: Core Consultant, Observability Specialist, Admin Knowledge of ITIL and modern incident management frameworks (PagerDuty, OpsGenie) Experience in banking or regulated enterprise environments Soft Skills Strong leadership and cross-functional collaboration Ability to work in ambiguous, fast-paced environments Excellent documentation and communication skills Passion for mentoring teams and building best practices at scale Why This Role Matters The client is on a journey to mature its Observability and SRE ecosystem , and this role will be critical in: Unifying legacy and modern telemetry stacks Driving reliability-first mindset and tooling Establishing a scalable blueprint for production excellence About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client’s needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us . NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us . This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here . If you'd like more information on your EEO rights under the law, please click here . For Pay Transparency information, please click here .
Posted 6 days ago
7.0 years
0 Lacs
noida, uttar pradesh
Remote
Req ID: 340254 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Industry Consulting Snr. Consultant to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Title: Telemetry Engineer Location: Noida, India Employment Type: Full-Time Experience Level: Senior (7+ years preferred) Role Overview: We are seeking a highly skilled Telemetry Engineer to lead the design and implementation of telemetry pipelines across diverse environments including microservices, VM-based applications, cloud-native platforms, and on-premise systems. The ideal candidate will have deep expertise in OpenTelemetry architecture and implementation , and a strong background in observability, distributed systems, and performance monitoring. Key Responsibilities: Architect and implement end-to-end telemetry pipelines for applications deployed across cloud, on-prem, and hybrid environments. Lead the installation, configuration, and optimization of OpenTelemetry components including SDKs, Collector, and exporters. Collaborate with application, infrastructure, and DevOps teams to define telemetry standards and integrate observability into CI/CD workflows. Design scalable and resilient data collection strategies for metrics, logs, and traces . Develop and maintain instrumentation libraries for microservices and legacy applications. Ensure telemetry data is efficiently routed to observability platforms (e.g., Splunk, Prometheus, Grafana, Datadog). Conduct performance tuning and troubleshooting of telemetry pipelines. Provide architectural guidance and best practices for telemetry adoption across teams. Stay current with OpenTelemetry releases and contribute to internal tooling and automation. Required Skills & Qualifications: Proven experience in setting up telemetry pipelines from scratch across multiple environments. Strong hands-on expertise with OpenTelemetry (Collector, SDKs, OTLP protocol). Deep understanding of distributed tracing, metrics collection, and log aggregation . Experience with observability platforms such as Splunk, Prometheus, Grafana, Jaeger, Zipkin, Datadog , etc. Proficiency in one or more programming languages (e.g., Python, Go, Java, Node.js) for instrumentation. Familiarity with cloud platforms (AWS, Azure, GCP) and VM/on-prem infrastructure . Knowledge of container orchestration (Kubernetes), service meshes (Istio), and CI/CD pipelines. Excellent communication and documentation skills. Preferred Qualifications: Experience contributing to or working with the OpenTelemetry community . Certifications in cloud technologies or observability tools. Experience in regulated or enterprise-scale environments. About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client's needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us. NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us. This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.
Posted 6 days ago
0.0 - 5.0 years
0 Lacs
noida, uttar pradesh
Remote
Req ID: 340251 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Industry Consulting Manager to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Description: Technical Architect - Observability & SRE Frameworks Position Title: Technical Architect - Observability & Site Reliability Engineering (SRE) Location: Noida, India Experience: 15+ years (including 5+ years in observability/SRE architecture) Employment Type: Full-time Role Overview We are looking for a highly experienced Technical Architect to lead the design, strategy, and implementation of Observability and SRE frameworks for enterprise-scale, microservices-based applications. The ideal candidate will bring deep technical knowledge of both Splunk Observability Stack and Open Source tools (like OpenTelemetry, Prometheus, Grafana, Jaeger), and be capable of defining and executing architecture strategies for complex distributed systems. This role requires hands-on ability to create architecture blueprints , lead technical teams, and work directly with stakeholders and platform owners to embed observability and reliability practices across the SDLC. Key Responsibilities Architecture & Blueprinting Design and deliver end-to-end observability architecture (Metrics, Logs, Traces, Events) for cloud-native and hybrid environments. Create technical architecture diagrams , data flow maps, and integration blueprints using tools like Lucidchart, Draw.io, or Visio. Lead the definition of SLIs, SLOs, and Error Budgets aligned with business KPIs and DORA metrics. Toolchain Strategy & Implementation Architect telemetry pipelines using OpenTelemetry Collector and Splunk Observability Cloud (SignalFx, APM, RUM, Log Observer). Define tool adoption strategy and integration roadmap for OSS tools (Prometheus, Loki, Grafana, Jaeger) and Splunk-based stacks. Guide teams on instrumentation approaches (auto/manual) across languages like Java, Go, Python, .NET, etc. Reliability Engineering Enablement Lead adoption of SRE principles including incident management frameworks, resiliency testing, and runbook automation. Collaborate with DevOps to integrate observability into CI/CD pipelines (e.g., Jenkins, ArgoCD, GitHub Actions). Define health checks, golden signals, and SPoG (Single Pane of Glass) dashboards. Exposure to AIOps , ML-based anomaly detection, or business observability. Stakeholder Management & Governance Serve as a technical liaison between client leadership, SREs, developers, and infrastructure teams. Run workshops, assessments, and evangelize observability-first culture across teams. Provide guidance on data retention, access control, cost optimization, and compliance (especially with Splunk ingestion policies). Performance & Optimization Continuously monitor and fine-tune observability data flows to prevent alert fatigue and ensure actionability. Implement root cause analysis practices using telemetry correlation across metrics, logs, and traces. Lead efforts to build self-healing systems using automated playbooks and AIOps integrations (where applicable). Required Skills & Qualifications 15+ years in IT, with 5 years in Observability/SRE architecture roles Proven experience designing architecture for microservices, containers (Docker, Kubernetes), and distributed systems Strong hands-on expertise with: Splunk Observability Cloud (SignalFx, Log Observer, APM) OpenTelemetry (SDKs + Collector) Prometheus + Grafana Jaeger / Zipkin for distributed tracing CI/CD tools : Jenkins, GitHub Actions, ArgoCD Ability to build and present clear architecture diagrams and solution roadmaps Working knowledge of cloud environments (AWS, Azure, GCP) and container orchestration (K8s/OpenShift) Familiarity with SRE and DevOps best practices (error budgets, release engineering, chaos testing) Nice to Have Splunk certifications: Core Consultant, Observability Specialist, Admin Knowledge of ITIL and modern incident management frameworks (PagerDuty, OpsGenie) Experience in banking or regulated enterprise environments Soft Skills Strong leadership and cross-functional collaboration Ability to work in ambiguous, fast-paced environments Excellent documentation and communication skills Passion for mentoring teams and building best practices at scale Why This Role Matters The client is on a journey to mature its Observability and SRE ecosystem , and this role will be critical in: Unifying legacy and modern telemetry stacks Driving reliability-first mindset and tooling Establishing a scalable blueprint for production excellence About NTT DATA NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client's needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us. NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us. This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.
Posted 6 days ago
8.0 years
0 Lacs
noida, uttar pradesh, india
On-site
Job Description – Manager - DevOps We at Pine Labs are looking for those who share our core belief - “Every Day is Game day”. We bring our best selves to work each day to realize our mission of enriching the world through the power of digital commerce and financial services. Role Purpose We are seeking a Manager DevOps who will lead and manage the organizations DevOps Infrastructure, Observability stack for applications, CI-CD Pipeline and support services. This role involves managing a team of DevOps engineers, architecting scalable infrastructure, and ensuring high availability and performance of our messaging and API management systems. This individual will oversee a team of IT professionals, ensure the seamless delivery of IT services, and implement strategies to align technology solutions with business objectives. The ideal candidate is a strategic thinker with strong technical expertise and proven leadership experience. Responsibilities We Entrust You With Lead and mentor a team of DevOps Lead/Engineers in designing and maintaining scalable infrastructure. Architect and manage Kafka clusters for high-throughput, low-latency data streaming. Deploy, configure, and manage Kong API Gateway for secure and scalable API traffic management. Design and implement CI/CD pipelines for microservices and infrastructure. Automate infrastructure provisioning using tools like Terraform or Ansible. Monitor system performance and ensure high availability and disaster recovery. Collaborate with development, QA, and security teams to streamline deployments and enforce best practices. Ensure compliance with security standards and implement DevSecOps practices. Maintain documentation and provide training on Kafka and Kong usage and best practices. Strong understanding of observability pillars: metrics, logs, traces, and events. Hands-on experience with Prometheus for metrics collection and Grafana for dashboarding and visualization. Proficiency in centralized logging solutions like ELK Stack (Elasticsearch, Logstash, Kibana), Fluentd, or Splunk. Experience with distributed tracing tools such as Jaeger, Zipkin, or OpenTelemetry. Ability to implement instrumentation in applications for custom metrics and traceability. Skilled in setting up alerting and incident response workflows using tools like Alertmanager, PagerDuty, or Opsgenie. Familiarity with SLOs, SLIs, and SLA definitions and monitoring for service reliability. Experience with anomaly detection and root cause analysis (RCA) using observability data. Knowledge of cloud-native monitoring tools (e.g., AWS CloudWatch, Azure Monitor, GCP Operations Suite). Ability to build actionable dashboards and reports for technical and business stakeholders. Understanding of security and compliance monitoring within observability frameworks. Collaborative mindset to work with SREs, developers, and QA teams to define meaningful observability goals. Prepare and manage the IT budget, ensuring alignment with organizational priorities. Monitor expenditures and identify opportunities for cost savings without compromising quality. Well-spoken with good communication skills, as lot of stakeholder management is needed. What Matters In This Role Relevant work experience Bachelor’s or master’s degree in computer science, Engineering, or related field. 8+ years of experience in DevOps or related roles, with at least 5 years in a leadership position. Strong hands-on experience with Apache Kafka (setup, tuning, monitoring, security). Proven experience with Kong API Gateway (plugins, routing, authentication, rate limiting). Proficiency in cloud platforms (AWS, Azure, or GCP). Kafka certification or Kong Gateway certification. Experience with service mesh technologies (e.g., Istio, Linkerd). Knowledge of event-driven architecture and microservices patterns. Experience with GitOps and Infrastructure as Code (IaC). Experience with containerization and orchestration (Docker, Kubernetes). Strong scripting skills (Bash, Python, etc.). Hands on with monitoring tools (Prometheus, Grafana, Mimir, ELK Stack). Things You Should Be Comfortable With Working from office: 5 days a week ( Sector 62, Noida) Pushing the boundaries: Have a big idea? See something that you feel we should do but haven’t done? We will hustle hard to make it happen. We encourage out of the box thinking, and if you bring that with you, we will make sure you get a bag that fits all the energy you bring along. What We Value In Our People You take the shot: You Decide Fast and You Deliver Right You are the CEO of what you do: you show ownership and make things happen You own tomorrow: by building solutions for the merchants and doing the right thing You sign your work like an artist: You seek to learn and take pride in the work you do
Posted 6 days ago
3.0 - 5.0 years
0 Lacs
bengaluru, karnataka, india
On-site
The Role: We are seeking a Site Reliability Engineer (SRE) to ensure our multi-cloud networking platform meets and exceeds the stringent reliability, performance, and availability targets our enterprise customers demand. This is not a traditional operations role you will apply a software engineering mindset to solve complex infrastructure challenges and automate solutions at scale. You will be the guardian of our production environment, responsible for the uptime of our services and the architect of the systems that allow us to scale with confidence. Your work is critical to building and maintaining the trust of our customers. Responsibilities: Define and Manage Reliability: Establish and own the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) that define the reliability of our platform. Participate in a blameless post-incident analysis culture and an on-call rotation to manage and resolve production incidents. Build and Own the Observability Stack: Design, implement, and manage our complete observability stack, leveraging tools like Prometheus for metrics, Grafana for visualization, Elasticsearch for logging, and Jaeger/OpenTelemetry for distributed tracing to provide end-to-end visibility into our distributed system. Automate Everything: Write robust automation and tooling in Python or Go to eliminate manual operational tasks, from incident response to infrastructure provisioning. Infrastructure as Code (IaC): Use Terraform and Ansible to manage our multi-cloud infrastructure as code, ensuring our environments are consistent, repeatable, and auditable. Kubernetes and Cloud Operations: Manage, troubleshoot, and scale our Kubernetes clusters across our multi-cloud footprint (AWS, Azure, GCP). You will be the expert on running our application reliably in a containerized environment. CI/CD and Release Engineering: Collaborate with development teams to enhance our CI/CD pipelines, ensuring that every release is safe, reliable, and can be deployed with high velocity. Required Qualifications: 3-5+ years of experience in a Site Reliability Engineering (SRE), DevOps, or similar infrastructure-focused software engineering role. Strong programming and automation skills in Python or Go. Deep, hands-on expertise with a modern observability stack, including Prometheus, Grafana, and the ELK Stack (Elasticsearch, Logstash/Fluentd, Kibana). Proven experience with Infrastructure as Code (Terraform) and configuration management (Ansible). In-depth knowledge of running, managing, and troubleshooting applications on Kubernetes in a production, multi-cloud environment. A rigorous, data-driven approach to reliability and a deep understanding of distributed systems, their failure modes, and how to make them resilient. Preferred Qualifications: Experience with distributed tracing using Jaeger or OpenTelemetry. A strong understanding of cloud networking concepts (VPCs, subnets, routing, security groups). Experience defining and tracking SLOs and error budgets. Experience in a fast-paced startup environment. Relevant certifications such as Certified Kubernetes Administrator (CKA) or cloud provider certifications (AWS, Azure, GCP).
Posted 1 week ago
6.0 - 8.0 years
3 - 12 Lacs
coimbatore, tamil nadu, india
On-site
RESPONSIBILITIES: Develop performance test strategies and plans based on business requirements and system designs. Design and execute performance test scripts. Analyze test results to identify bottlenecks and performance issues. Provide detailed performance reports with recommendations for improvement. Collaborate with cross-functional teams to understand system requirements and ensure performance goals are achieved. Contribute to the continuous development of performance testing methodologies and stay current with industry tools and best practices. Work in an agile environment with geographically distributed teams. Plan, execute, and interpret performance and stress tests to identify system limitations and guide engineering improvements.
Posted 1 week ago
0 years
0 Lacs
hyderabad, telangana, india
On-site
About the Role We are looking for an experienced DevOps Engineer to join our engineering team. This role involves setting up, managing, and scaling development, staging, and production environments both on AWS cloud and on-premise (open source stack) . You will be responsible for CI/CD pipelines, infrastructure automation, monitoring, container orchestration, and model deployment workflows for our enterprise applications and AI platform. Key Responsibilities Infrastructure Setup & Management Design and implement cloud-native architectures on AWS and be able to manage on-premise open source environments when required . Automate infrastructure provisioning using tools like Terraform or CloudFormation. Maintain scalable environments for dev, staging, and production . CI/CD & Release Management Build and maintain CI/CD pipelines for backend, frontend, and AI workloads. Enable automated testing, security scanning, and artifact deployments. Manage configuration and secret management across environments. Containerization & Orchestration Manage Docker-based containerization and Kubernetes clusters (EKS, self-managed K8s) . Implement service mesh, auto-scaling, and rolling updates. Monitoring, Security, and Reliability Implement observability (logging, metrics, tracing) using open source or cloud tools. Ensure security best practices across infrastructure, pipelines, and deployed services. Troubleshoot incidents, manage disaster recovery, and support high availability. Model DevOps / MLOps Set up pipelines for AI/ML model deployment and monitoring (LLMOps). Support data pipelines, vector databases, and model hosting for AI applications. Required Skills and Qualifications Cloud & Infra Strong expertise in AWS services : EC2, ECS/EKS, S3, IAM, RDS, Lambda, API Gateway, etc. Ability to set up and manage on-premise or hybrid environments using open source tools. DevOps & Automation Hands-on experience with Terraform / CloudFormation . Strong skills in CI/CD tools such as GitHub Actions, Jenkins, GitLab CI/CD, or ArgoCD. Containerization & Orchestration Expertise with Docker and Kubernetes (EKS or self-hosted). Familiarity with Helm charts, service mesh (Istio/Linkerd). Monitoring / Observability Tools Experience with Prometheus, Grafana, ELK/EFK stack, CloudWatch . Knowledge of distributed tracing tools like Jaeger or OpenTelemetry. Security & Compliance Understanding of cloud security best practices . Familiarity with tools like Vault, AWS Secrets Manager. Model DevOps / MLOps Tools (Preferred) Experience with MLflow, Kubeflow, BentoML, Weights & Biases (W&B) . Exposure to vector databases (pgvector, Pinecone) and AI pipeline automation . Preferred Qualifications Knowledge of cost optimization for cloud and hybrid infrastructures . Exposure to infrastructure as code (IaC) best practices and GitOps workflows. Familiarity with serverless and event-driven architectures . Education Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). What We Offer Opportunity to work on modern cloud-native systems and AI-powered platforms . Exposure to hybrid environments (AWS and open source on-prem). Competitive salary, benefits, and growth-oriented culture.
Posted 1 week ago
5.0 - 10.0 years
7 - 12 Lacs
pune
Work from Office
BMCs SaaS Ops team is looking for a DevOps Engineer to join us and design, develop, and implement complex applications, using the latest technologies. Here is how, through this exciting role, YOU will contribute to BMC's and your own success: Participate in all aspects of SaaS product development, from requirements analysis to product release and sustaining. Drive the adoption of the DevOps process and tools across the organization. Learn and implement cutting-edge technologies and tools to build best of class enterprise SaaS solutions. Deliver high-quality enterprise SaaS offerings on schedule Develop Continuous Delivery Pipeline To ensure youre set up for success, you will bring the following skillset & experience: You can embrace, live and breathe our BMC values every day! You have at least 5 years in a DevOps\SRE role You implemented CI\CD pipelines with best practices You have experience in Kubernetes You have knowledge in AWS\Azure Cloud implementation You worked with GIT repository and JIRA tools You are passionate about quality and demonstrate creativity and innovation in enhancing the product. You are a problem-solver with good analytical skills You are a team player with effective communication skills Whilst these are nice to have, our team can help you develop in the following skills: SRE practices GitHub/ Spinnaker/Jenkins/Maven/ JIRA etc. Automation Playbooks (Ansible) Infrastructure-as-a-code (IaaC) using Terraform/Cloud Formation Template/ ARM Template Scripting in Bash/Python/Go Microservices, Database, API implementation Monitoring Tools, such as Prometheus/Jager/Grafana /AppDynamic, DataDog, Nagios etc.) Agile/Scrum process
Posted 1 week ago
8.0 - 10.0 years
6 - 10 Lacs
bengaluru
Work from Office
Our SaaS Ops department focuses on delivering SaaS excellence and a great SaaS experience for our customers. We continuously grow by adding and implementing the most cutting-edge technologies and investing in Innovation! Our team is a global and versatile group of professionals, so if youre looking for a place where your ideas will be heard this is the place for you! BMCs SaaS Ops team is looking for a Senior DevOps Engineer to join us and design, develop, and implement complex applications, using the latest technologies. Here is how, through this exciting role, YOU will contribute to BMC's and your own success: Participate in all aspects of SaaS product development, from requirements analysis to product release and sustaining. Drive the adoption of the DevOps process and tools across the organization. Learn and implement cutting-edge technologies and tools to build best of class enterprise SaaS solutions. Deliver high-quality enterprise SaaS offerings on schedule Develop Continuous Delivery Pipeline To ensure youre set up for success, you will bring the following skillset & experience: You can embrace, live and breathe our BMC values every day! You have at least 8-10 years in a DevOps\SRE role You implemented CI\CD pipelines with best practices You have experience in Kubernetes You have knowledge in AWS\Azure Cloud implementation You worked with GIT repository and JIRA tools You are passionate about quality and demonstrate creativity and innovation in enhancing the product. You are a problem-solver with good analytical skills You are a team player with effective communication skills Whilst these are nice to have, our team can help you develop in the following skills: SRE practices GitHub/ Spinnaker/Jenkins/Maven/ JIRA etc. Automation Playbooks (Ansible) Infrastructure-as-a-code (IaaC) using Terraform/Cloud Formation Template/ ARM Template Scripting in Bash/Python/Go Microservices, Database, API implementation Monitoring Tools, such as Prometheus/Jager/Grafana /AppDynamic, DataDog, Nagios etc.) Agile/Scrum process
Posted 1 week ago
10.0 years
0 Lacs
hyderabad, telangana, india
On-site
Skill : DevOps Observability Experience : 10+ years Location : Hyderabad Must-Have skills : 10+ years of experience in IT Infrastructure, with at least 8+ years in Observability, Monitoring, or SRE roles. Strong expertise in Kubernetes and containerized environments. Hands-on experience with monitoring tools (Prometheus, Grafana, Datadog, Dynatrace). Experience with distributed tracing tools (Jaeger, OpenTelemetry). Proficiency in Python or Go for automation and scripting. Experience with logging tools (Splunk, ELK Stack, Fluentd, Loki). Strong understanding of metrics, logging, and tracing concepts. Knowledge of cloud platforms (AWS, Azure, or GCP) and experience integrating observability solutions in cloud-native environments. Familiarity with databases (MySQL, PostgreSQL). Experience with Infrastructure as Code (IaC) tools like Terraform or Helm.
Posted 1 week ago
3.0 years
0 Lacs
mumbai, maharashtra, india
On-site
About Us Jefferies Financial Group Inc. (‘‘Jefferies,’’ ‘‘we,’’ ‘‘us’’ or ‘‘our’’) is a U.S.-headquartered global full service, integrated investment banking and securities firm. Our largest subsidiary, Jefferies LLC, a U.S. broker-dealer, was founded in the U.S. in 1962 and our first international operating subsidiary, Jefferies International Limited, a U.K. broker-dealer, was established in the U.K. in 1986. Our strategy focuses on continuing to build out our investment banking effort, enhancing our capital markets businesses and further developing our Leucadia Asset Management alternative asset management platform. We offer deep sector expertise across a full range of products and services in investment banking, equities, fixed income, asset and wealth management in the Americas, Europe and the Middle East and Asia. Overview We are seeking a hands-on, technically skilled professional to join our global team as an Associate Platform Reliability Engineer. This role is critical to ensuring the stability, reliability, and resilience of Jefferies’ front-to-back technology infrastructure, with a focus on post-trade processing, operations, and regulatory support. Key Responsibilities Collaborate with a high-performing global team to maintain plant stability across middle-office and operations applications. Lead incident triage, root cause analysis, and communication, with a strong focus on problem management. Partner with regional teams to drive technical and functional initiatives. Identify and eliminate manual support tasks through automation; develop tools for deployment, management, and service visibility. Design and implement robust monitoring and alerting systems using platforms like AppD and Open Telemetry. Work closely with engineering teams to support system architecture, schema design, and performance tuning. Troubleshoot issues across the full technology stack. Qualifications Bachelor’s degree in Computer Science, Engineering, or a related field. Minimum 3 years of experience in programming with Python, Go, C/C++, or C#. Strong foundation in computing fundamentals—data structures, algorithms, and software design. Experience in application design, maintenance, and support. Solid understanding of SRE principles and practices. Proficiency in Linux/Unix and Windows Server environments. Hands-on experience with scripting, databases, and troubleshooting application/data access issues. Self-driven with a strong sense of ownership and commitment to quality. Excellent communication skills, able to engage both technical and business stakeholders. Familiarity with open-source platforms such as Redis, MongoDB, Kafka, and Elasticsearch. Experience configuring observability stacks (Grafana, Prometheus, Jaeger, Loki). Exposure to DevOps tools and technologies including Git, Jenkins, Ansible. Jefferies is an equal employment opportunity employer, and takes affirmative action to ensure that all qualified applicants will receive consideration for employment without regard to race, creed, color, national origin, ancestry, religion, gender, pregnancy, age, physical or mental disability, marital status, sexual orientation, gender identity or expression, veteran or military status, genetic information, reproductive health decisions, or any other factor protected by applicable law. We are committed to hiring the most qualified applicants and complying with all federal, state, and local equal employment opportunity laws. As part of this commitment, Jefferies will extend reasonable accommodations to individuals with disabilities, as required by applicable law. We have been made aware of bad actors falsely claiming to be associated with Jefferies Group soliciting individuals to attend virtual job interviews, complete online tests or courses and sending fictitious employment offer letters. Please note that any email contact with Jefferies personnel will come from an “@jefferies.com” email address. Further, Jefferies will not notify shortlisted candidates through social media platforms (e.g. WhatsApp or Telegram) or ask candidates to make payment to participate in the hiring process.
Posted 1 week ago
0 years
4 - 18 Lacs
mohali district, india
On-site
Shift: Night Shift Role Overview Join a team building resilient, scalable automated test systems for distributed, hybrid-cloud environments. This role suits QA professionals who enjoy designing practical test frameworks and scaling automation to improve reliability and deployment confidence. You'll work closely with engineers to create repeatable tests, simulate real-world failures, and ensure services behave under load and during recovery. Clear communication and pragmatic problem solving are key. Key Responsibilities Architecting Test Systems Design test frameworks to validate microservices and infrastructure across multi-cluster environments Create production-like workload simulations, resource scaling tests, failure injection, and recovery scenarios Automation & Scalability Lead CI/CD-integrated test automation (Jenkins, GitHub Actions) and embed tests into release pipelines Use Kubernetes APIs, Helm, and service mesh tools to automate health checks, failover, and network resilience Apply infrastructure-as-code to make test environments repeatable, extensible, and easy to manage Technical Expertise Familiarity with Kubernetes internals, Helm, and service meshes (Istio, Linkerd) Strong scripting skills: Python, Pytest, Bash; comfortable writing reliable test tooling Experience with observability tools (Prometheus, Grafana, Jaeger) to analyze failures and performance Knowledge of Kubernetes security (RBAC, secrets) and performance testing tools (K6) Working experience with cloud platforms (AWS, Azure, GCP) and containerized CI/CD Comfortable with Linux system administration, networking basics, and container runtimes (Docker/containerd) Hands-on with kubectl, Helm with OCI registries, and GitOps tooling (Flux) We value practical experience and a focus on improving reliability—if you have a strong testing mindset and scripting skills, we encourage you to apply even if you don't meet every item above. Skills: pki management,flux,linkerd,automation,kubectl,helm,ci/cd,istio,qa engineering,jaeger,gitops,kubernetes,rbac,python,pytest,linux,grafana,k6,firewalling,bash,github actions,docker,prometheus,cd,azure,iac,jenkins,oci registries,bash scripting,aws,ci,qa engineer,networking,gcp
Posted 1 week ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
73564 Jobs | Dublin
Wipro
27625 Jobs | Bengaluru
Accenture in India
22690 Jobs | Dublin 2
EY
20638 Jobs | London
Uplers
15021 Jobs | Ahmedabad
Bajaj Finserv
14304 Jobs |
IBM
14148 Jobs | Armonk
Accenture services Pvt Ltd
13138 Jobs |
Capgemini
12942 Jobs | Paris,France
Amazon.com
12683 Jobs |