Home
Jobs

257 Opentelemetry Jobs - Page 8

Filter
Filter Interviews
Min: 0 years
Max: 25 years
Min: ₹0
Max: ₹10000000
Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

Our Company We’re Hitachi Vantara, the data foundation trusted by the world’s innovators. Our resilient, high-performance data infrastructure means that customers – from banks to theme parks ­– can focus on achieving the incredible with data. If you’ve seen the Las Vegas Sphere, you’ve seen just one example of how we empower businesses to automate, optimize, innovate – and wow their customers. Right now, we’re laying the foundation for our next wave of growth. We’re looking for people who love being part of a diverse, global team – and who get excited about making a real-world impact with data. The role This role is pivotal in the development of the VSP 360 platform's on-premises solution, ensuring strict adherence to delivery objectives. The VSP 360 platform stands as the cornerstone of our organization's management solution strategy. What You’ll Bring Bachelor’s degree in computer science or a related field. 6+ years of experience in DevOps or a related field. Strong experience with cloud-based services Strong experience running Kubernetes as a service (GKE, EKS, AKS). Strong experience with managing Kubernetes clusters. Strong experience with infrastructure automation and deployment tools such as Terraform, Ansible, Docker, Jenkins, GitHub, GitHub Actions or similar tools. Strong experience with monitoring tools such as Grafana, Nagios, ELK, OpenTelemetry, Prometheus or similar tools. Desirable experience with Anthos/Istio Service Mesh or similar tools. Desirable experience with Cloud Native Computing Foundation (CNCF) projects, Kubernetes Operators and KeyCloak. Strong knowledge of Linux systems administration. Good To Have Python, Django AWS solution design and deployment skills Experience with cloud-based storage (S3, Blob, Google Storage) Experience with storage area networks (SANs) About Us We’re a global team of innovators. Together, we harness engineering excellence and passion for insight to co-create meaningful solutions to complex challenges. We turn organizations into data-driven leaders that can a make positive impact on their industries and society. If you believe that innovation can inspire the future, this is the place to fulfil your purpose and achieve your potential. -S7 Championing diversity, equity, and inclusion Diversity, equity, and inclusion (DEI) are integral to our culture and identity. Diverse thinking, a commitment to allyship, and a culture of empowerment help us achieve powerful results. We want you to be you, with all the ideas, lived experience, and fresh perspective that brings. We support your uniqueness and encourage people from all backgrounds to apply and realize their full potential as part of our team. How We Look After You We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing. We’re also champions of life balance and offer flexible arrangements that work for you (role and location dependent). We’re always looking for new ways of working that bring out our best, which leads to unexpected ideas. So here, you’ll experience a sense of belonging, and discover autonomy, freedom, and ownership as you work alongside talented people you enjoy sharing knowledge with. We’re proud to say we’re an equal opportunity employer and welcome all applicants for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran, age, disability status or any other protected characteristic. Should you need reasonable accommodations during the recruitment process, please let us know so that we can do our best to set you up for success. Show more Show less

Posted 2 weeks ago

Apply

0 years

0 Lacs

Mumbai Metropolitan Region

On-site

Linkedin logo

We’re looking for a hands-on, self-directed Senior DevOps Engineer to join our fast-paced startup. You’ll be the first line of defense for production issues, architect robust observability systems, and improve deployment and testing practices. If you thrive in startup environments, enjoy taking ownership, and are comfortable in modern JS/TS stacks, we’d love to meet you. Top Outcomes – First 3 Months Implement a reliable observability stack: Leverage Grafana, CloudWatch, and OpenTelemetry within our Node.js and TypeScript codebase. Be on top of alerts and issues: Monitor, triage, fix or escalate production issues with traceability and follow-up. Reduce system noise: Begin reducing the frequency and volume of unexpected errors. Top Outcomes – First 12 Months Improve test coverage: Ensure better code quality and proactively catch regressions. Own DevOps workflows: Deploy, debug, and maintain infrastructure health autonomously. Become a core team member: Handle incidents independently and support the evolution of our infra/dev culture. Key Performance Indicators (KPIs) Leading Indicators: Number of alerts and incidents triaged Trace IDs investigated and logged Bugs found early and resolved Tickets opened/closed efficiently Reduced volume of unhandled or duplicate errors Lagging Indicators: Production uptime and stability % fixes resolved without handoff Number of tests added Reduction in recurring or duplicate issues Core Responsibilities Observability & Alerting Maintain and enhance Grafana dashboards Integrate and manage CloudWatch alarms and OpenTelemetry traces Ensure traceability across all systems (CRM, APIs, webhooks, workflows) Issue Response & Triage Act as first responder for production issues during working hours Troubleshoot, escalate with full context, and coordinate incident response Infrastructure Maintenance Improve deployment workflows and monitor resource usage Maintain the health of critical subsystems (queues, sync jobs, memory/cpu) Testing & QA Add and improve test coverage once baseline reliability is achieved Build confidence in deployments through automated testing and regression checks Candidate Profile Strong experience with Node.js, TypeScript, and React Deep knowledge of AWS, Grafana, OpenTelemetry, and CloudWatch Prior Startup Experience Preferred Clear, proactive communicator with a bias toward ownership Available 1:30 AM to 10:30 PM IST 5 days/week for on-call responsibilities Bonus: Experience reviewing pull requests and deploying code regularly Immediate Tasks Review and phase-implement an internal RFC for observability Refine and own Grafana dashboards; implement meaningful alerts Ensure consistent trace ID usage throughout the codebase Improve logging and tracing to increase debuggability Monitor and respond to production errors daily Investigate, fix, or escalate recurring system issues Show more Show less

Posted 2 weeks ago

Apply

7.0 years

0 Lacs

Bengaluru, Karnataka, India

On-site

Linkedin logo

About The Role We are seeking a Lead Product Manager to drive the product roadmap and strategy for our enterprise Observability Platform , with a strong focus on engaging and aligning priorities with our customers, technology partners, field, UX, and engineering teams. This role is critical in shaping the next-generation observability experience, enabling comprehensive, actionable insights and monitoring capabilities across cloud, on-premises, and hybrid Citrix infrastructure configurations . As the product lead for our Observability Platform, you will define the vision and lead cross-functional collaboration with engineering, UX, customers, technology partners, and field teams to deliver high-impact observability solutions that align with enterprise requirements and future trends in cloud and workspace delivery. Job Description/Responsibilities: Lead the product roadmap and strategy for our Observability Platform, enabling consistent monitoring, logging, and tracing capabilities across cloud, on-premises, and hybrid deployments.. Work closely with engineering, design, and quality teams to define requirements, manage prioritization, and ensure timely execution of observability features. Collaborate with customer success, sales engineering, and marketing to drive adoption, evangelize capabilities, and capture feedback on observability needs. Track market developments, conduct competitive analysis, and incorporate insights into strategic planning for observability tools and practices. Influence observability adoption and innovation across enterprise segments by aligning with modern monitoring, AIOps, and distributed tracing trends. Define and track product success metrics; leverage telemetry, customer feedback, and incident data for continuous improvement of the observability platform. Qualifications & Experience: 7+ years of experience in Product Management (or equivalent) in enterprise software, cloud platforms, or observability solutions. A University degree is mandatory (BE or B Tech preferred) and minimum 8 years of prior relevant experience; or Master's degree (MBA) with 6 years; or PhD with 3 years of experience. Experience working with observability platforms, monitoring solutions, or distributed tracing systems is highly desirable. Strong understanding of modern observability practices and monitoring technologies across cloud, on-premises, and hybrid environments. Proven ability to lead cross-functional teams and manage complex stakeholder relationships. Strong analytical mindset with experience in data-driven decision-making. Excellent communication and stakeholder management skills. Preferred Skills: Prior experience working with observability vendors like Datadog, New Relic, Dynatrace, or similar technology partners. Familiarity with OpenTelemetry, distributed tracing, log aggregation, and metrics collection pipelines. Deep customer empathy and the ability to translate technical monitoring needs into strategic product decisions. Experience launching observability solutions at scale in B2B enterprise environments. About Us: Cloud Software Group is one of the world’s largest cloud solution providers, serving more than 100 million users around the globe. When you join Cloud Software Group, you are making a difference for real people, each of whom count on our suite of cloud-based products to get work done — from anywhere. Members of our team will tell you that we value passion for technology and the courage to take risks. Everyone is empowered to learn, dream, and build the future of work. We are on the brink of another Cambrian leap -- a moment of immense evolution and growth. And we need your expertise and experience to do it. Now is the perfect time to move your skills to the cloud. Cloud Software Group is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination. All qualified applicants will receive consideration for employment without regard to age, race, color, creed, sex or gender, sexual orientation, gender identity, gender expression, ethnicity, national origin, ancestry, citizenship, religion, genetic carrier status, disability, pregnancy, childbirth or related medical conditions (including lactation status), marital status, military service, protected veteran status, political activity or affiliation, taking or requesting statutorily protected leave and other protected classifications. If you need a reasonable accommodation due to a disability during any part of the application process, please email us at AskHR@cloud.com for assistance. Show more Show less

Posted 2 weeks ago

Apply

6.0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Job Position: Azure Devops Specialist Location : Chennai / Kochi / Pune Experience : 6+ Years Notice Period : 0-15 Days (Immediate Joiners Only) We are seeking a skilled and motivated Cloud DevOps Engineer to join our dynamic team. The ideal candidate will be responsible for the commissioning, development, and maintenance of cloud environments with a strong emphasis on automation, security, and modern DevOps practices. You will play a key role in designing scalable cloud architectures, managing CI/CD pipelines, and ensuring system reliability and performance. Key Responsibilities: Design, implement, and maintain scalable and secure cloud environments on Microsoft Azure Manage and optimize CI/CD pipelines using GitHub Actions Ensure security and compliance in cloud infrastructure and services Develop and implement cloud strategies and concepts tailored to project requirements Automate deployment and infrastructure processes across the project landscape Monitor and maintain system health using observability tools such as Prometheus, Grafana, and Alertmanager Collaborate with cross-functional teams to design, build, and maintain cloud-native applications Technical Requirements: Proven experience with Microsoft Azure services such as AKS, KeyVault, Storage Account, EventHub, Service Bus Strong hands-on expertise with Docker and Kubernetes (concepts and best practices) Familiarity with ArgoCD (GitOps) for continuous deployment Experience with GitHub Actions for CI/CD workflows Monitoring and observability tools: Prometheus, Grafana, Alertmanager, Thanos Logging and tracing tools: Grafana Loki, Grafana Alloy, Promtail, OpenTelemetry Collector Infrastructure as Code (IaC) tools: Terraform and Ansible Solid understanding of cloud architectures , deployment strategies, and when to apply them Preferred Qualifications: Certification in Azure (e.g., AZ-104, AZ-303, AZ-400) Experience working in agile and DevOps-focused environments Strong problem-solving skills and ability to troubleshoot complex systems Excellent communication and documentation abilities Show more Show less

Posted 2 weeks ago

Apply

0 years

0 Lacs

Gurgaon, Haryana, India

On-site

Linkedin logo

HyperTest seeks a Senior Java Developer with profound expertise in Java's core mechanics and a passion for solving complex problems. This role is central to the development of our Java SDK, crucial for expanding HyperTest's functionalities. Ideal candidates will have a strong background in Java, including experience with library patching, bytecode manipulation, and observability frameworks like OpenTelemetry, New Relic, Datadog, etc. Responsibilities Develop the HyperTest Java SDK, employing advanced Java techniques for runtime library manipulation and data mocking. Extend OpenTelemetry for observability and monitoring in distributed systems, ensuring our SDK integrates seamlessly with modern development ecosystems. Create solutions for simulated testing environments that operate in various modes without modifying the original application code. Serve as a Java subject matter expert, guiding the team in best practices and innovative software development approaches. Requirements Java Expertise: Extensive experience in Java, including familiarity with its internals, memory model, concurrency, and performance optimization. Not just experience with high-level frameworks, but a solid understanding of underlying principles and the ability to manipulate Java's core functionalities. Software Architecture: Strong grasp of software design patterns, architectural principles, and the ability to solve complex problems with efficient, scalable solutions. Analytical Skills: Exceptional problem-solving abilities, capable of addressing complex challenges and driving innovative solutions. Specialized Knowledge: Experience with bytecode manipulation, library patching (e. g., Byte Buddy), and a clear understanding of Java's compilation and execution process. Ideal Candidate Profile Not Just Another Java Developer: We're looking for someone who has moved beyond just building applications with Spring Boot or similar frameworks. You should have experience that demonstrates a deep understanding of Java, including direct manipulation of bytecode, custom library creation, and performance optimization. A True Java Enthusiast: You find excitement in exploring Java beyond the surface level, delving into its internals, and leveraging this knowledge to build innovative solutions. This job was posted by Karan Raina from HyperTest. Show more Show less

Posted 2 weeks ago

Apply

3.0 years

9 - 9 Lacs

Chennai

On-site

Company Overview KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us. Group/Division The KLA Services team headquartered in Milpitas, CA is our service organization that consists of Service Sales and Marketing, Spares Supply Chain management, Field Operations, Engineering, Product Training, and Technical Support. The KLA Services organization partners with our field teams and customers in all business sectors to maintain the high performance and productivity of our products through a flexible portfolio of services. Our comprehensive services include: proactive management of tools to identify and improve performance; expertise in optics, image processing and motion control with worldwide service engineers, 24/7 technical support teams and knowledge management systems; and an extensive parts network to ensure worldwide availability of parts. Job Description/Preferred Qualifications KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and in 2019 we invested 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us. The Global Service Support Organization (GSS) team headquartered in Milpitas, CA, is our service organization that consists of Service Sales and Marketing, Spares Supply Chain management, Field Operations, Engineering, Product Training, and Technical Support. The GSS organization partners with our field teams and customers in all business sectors to maintain the high performance and productivity of our products through a flexible portfolio of services. Our comprehensive services include: proactive management of tools to identify and improve performance; expertise in optics, image processing and motion control with worldwide service engineers, 24/7 technical support teams and knowledge management systems; and an extensive parts network to ensure worldwide availability of parts. The GSS organization's Engineering group develops data systems for improved diagnostics and predictive maintenance. The data systems monitor the performance of fleets of KLA test and measurement equipment in semiconductor fabrication plant environments. They collect, transform, and store data. They provide visualizations, recommendations based on analytics, and alerts. This exciting position will enable you to interact with a wide variety of engineers and KLA system data. Job Description: Are you driven by curiosity & motivated to generate valuable business performance enhancing insights from data generated by KLA tools? As a Software Engineer for HQ-SW & Analytics, you will work closely with cross-functional teams, data and algorithm engineers, divisional SME’s, business stakeholders to come up with new algorithms, tools and help define the product roadmap for our next generation data analysis platform for KLA tools management . Your responsibilities shall include collaborating with various stakeholders to collect data, pre-process the same and apply statistical analysis and models to understand complex patterns in the data and come up with algorithms to generate actionable insights. In addition, you shall also be heavily involved in the design for software applications in a distributed microservice environment, using containers and orchestration technologies like Kubernetes to develop software components and UIs that help our engineers to more effectively monitor, diagnose and optimize KLA semiconductor inspection and metrology equipment. Basic Requirements: Strong computer science fundamentals (data structures and algorithms) Excellent technical expertise in C#/ Java / Python SQL skills for data extraction and manipulation and experience in database scaling & optimization Ability to interpret data, identify patterns / trends and attention to detail Proven problem solver with ability to distill requirements and design solutions for business problems Motivated to learn new skills independently and do experiments Excellent communication and storytelling skills Desired Experience: D ata Engineering, data pipeline, batch/stream processing M achine learning frameworks (e.g., TensorFlow, PyTorch, Scikit-Learn) D eveloping cloud native applications using Docker, Kubernetes, message buses A gile software development processes M onitoring, logging & tracing tools (e.g., Grafana, ELK stack, Prometheus, OpenTelemetry) We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees. KLA is proud to be an equal opportunity employer. Minimum Qualifications Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees. KLA is proud to be an equal opportunity employer Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA . Please ensure that you have searched KLA’s Careers website for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers.  If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to talent.acquisition@kla.com to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

Posted 2 weeks ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

Remote

Linkedin logo

The mission of Arkose Labs is to create an online environment where all consumers are protected from online spam and abuse. Recognized by G2 as the 2025 Leader in Bot Detection and Mitigation, with the highest score in customer satisfaction and largest market presence four quarters running, Arkose Labs offers the world's first $1M warranties for credential stuffing and SMS toll fraud. With 20% of our customers being Fortune 500 companies, our AI-powered platform combines powerful risk assessments with dynamic threat response to undermine the strategy of attack, all while improving good user throughput. Headquartered in San Mateo, CA, with employees in London, Costa Rica, Australia, India, and Argentina. Arkose Labs protects enterprises from cybercrime and abuse. We are looking for a Platform Engineer to grow our team in India. This team is responsible for managing the availability and reliability of Arkose Labs production systems. As a Platform Engineer you will be responsible for improving observability, scalability, reliability and latency of Arkose's platform, working closely with application development teams to achieve these goals. What You Will Be Doing Improving our OpenTelemetry-based observability stack to include more signals that can be acted upon Improving our reliability by identifying and addressing gaps in our architecture, services and tooling that have a potential to cause outage Working on tooling, automation and circuit breakers that help Arkose to adapt to changing attack patterns Must Have 5+ years of experience as an Platform Engineer or back-end engineer Strong experience at least one of: AWS, Azure Strong experience with high volume and high uptime services Experience with at least one of: Golang, Python Experience with Kubernetes and IaC frameworks like Terraform Experience with modern SaaS-based software stacks Nice To Have Experience with ScyllaDB Experience with OpenTelemetry Experience with CDN providers Why Arkose Labs? At Arkose Labs, our technology-driven approach enables us to make a substantial impact in the industry, supported by a robust customer base consisting of global enterprise giants such as Microsoft, Roblox, OpenAI, and more. We’re not just a company; we’re a collaborative ecosystem where you will actively partner with these influential brands, tackling the most demanding technical challenges to safeguard hundreds of millions of users across the globe. Why do top tech professionals choose Arkose Labs? Cutting-Edge Technology: Our high-efficacy solutions, backed by solid warranties, attract leading, global enterprise clients. Innovation and Excellence: We foster a culture that emphasizes technological innovation and the pursuit of excellence, ensuring a balanced and thriving work environment. Experienced Leadership: Guided by seasoned executives with deep tech expertise and a history of successful growth and equity events. Ideal Size: We’re structured to be agile and adaptable, large enough to provide stability, yet small enough to value your voice and ideas. Join us in shaping the future of technology. At Arkose Labs, you’re not just an employee; you’re part of a visionary team driving global change The most recognizable brands in the world select Arkose Labs, including OpenAI, Roblox, Microsoft, Adobe, Expedia, Snapchat, Zilch, and ZipAir. We value your unique contributions, perspectives, and experiences. Be part of a diverse and high-performing environment that prioritizes collaboration, excellence, and inclusion. We hire the best, focus on their professional development, and offer support for continuing education. We Value People: first and foremost they are our most valuable resource. Our people are independent thinkers who make data driven decisions and take ownership and accountability in all the things they do. Team Work. We demonstrate respect, trust, integrity, and communicate openly with a positive can do attitude and constructively challenge one another Customer Focus. We empathize with our customers and obsess about solving their problems Execution with precision, professionalism and urgency Security. It’s the lens through which we implement our processes, procedures, and programs Benefits Competitive salary + Equity Beautiful office space with many perks Robust benefits package Provident Fund Accident Insurance Flexible working hours and work from home days to support personal well-being and mental health Arkose Labs is an Equal Opportunity Employer that makes employment decisions without regard to race, color, religious creed, national origin, ancestry, sex, pregnancy, sexual orientation, gender, gender identity, gender expression, age, mental or physical disability, medical condition, military or veteran status, citizenship, marital status, genetic information, or any other characteristic protected by applicable law. In addition, Arkose Labs will provide reasonable accommodations for qualified individuals with disabilities. Show more Show less

Posted 2 weeks ago

Apply

12.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Linkedin logo

About Us: mavQ is an innovative AI company that provides intelligent business automation solutions, empowering organizations with AI-driven tools to streamline operations, enhance efficiency, and accelerate digital transformation. Headquartered in the U.S., with offices in India, mavQ simplifies complex workflows, automates document processing, and delivers actionable insights. Scalable and customizable, mavQ enables organizations to optimize processes, reduce manual effort, and achieve their business goals with ease. Role Overview: We are seeking an experienced and dynamic Technology Leader to lead our talented engineering team. The ideal candidate will have a strong background in building B2B SaaS applications using Java, Spring, and Angular or React for the frontend. This role requires extensive experience across the full stack engineering spectrum, a deep understanding of cloud platforms such as AWS and GCP, and proficiency in system and network architecture. Additionally, the candidate should have hands-on experience with cloud infrastructure for application deployment and an understanding of integrating and hosting machine learning models. Job Title: VP - Product Development Location: Hyderabad, India Key Responsibilities: Strategic Leadership: Develop and execute the engineering strategy that aligns with the company’s vision, goals, and business objectives. Collaborate with executive leadership to shape the product roadmap and ensure that engineering efforts are in sync with business priorities. Drive innovation within the engineering team, identifying emerging technologies and trends that can create competitive advantages. Customer Trust & Success: Champion customer-centric development practices, ensuring that all engineering efforts are focused on delivering value and building trust with customers. Collaborate with customer success, product, and sales teams to understand customer needs and feedback, and translate them into actionable engineering strategies. Ensure that engineering teams are equipped to deliver reliable, secure, and scalable products that instill confidence in our customers. Technical Leadership & Operations: Cloud & Infrastructure Management: Design and implement robust system and network architectures utilizing AWS and GCP to build scalable, reliable cloud solutions. Deploy and manage applications on Kubernetes, ensuring optimal performance and scalability. Handle traffic routing with Ingress Controllers (Nginx), oversee Certificate Management using Cert Manager, and manage secrets with Sealed Secrets and Vault. Enhance application performance with Caching solutions like Redis and Memcache, and implement comprehensive logging and tracing systems using Loki, Promtail, Tempo, and OpenTelemetry (Otel). Establish and maintain monitoring and alerting systems with Grafana, Prometheus, and BlackBoxExporter. Manage Infrastructure as Code using Terraform, oversee Manifest Management with Gitlab, and lead Release Management workflows using Gitlab and ArgoCD. Application & Data Management: Manage Authentication and Authorization services using Keycloak and implement Event Streaming solutions with Kafka and Pulsar. Oversee database management and optimization utilizing tools such as Pg Bouncer, Mulvis, OpenSearch, and ClickHouse. Implement and manage distributed and real-time systems with Temporal. Leverage advanced data processing tools like Trino, Apache Superset, Livy, and Hive to meet specialized data specific requirements. Machine Learning Integration: Collaborate with data scientists to integrate and host machine learning models within applications, implementing MLOps practices to streamline the deployment, monitoring, and management of ML models in production. Utilize tools such as TensorFlow Extended (TFX), Kubeflow, MLflow, or SageMaker for comprehensive ML lifecycle management, ensuring robust model versioning, experimentation, reproducibility, and optimizing ML pipelines for performance, scalability, and efficiency. Project Management: Oversee project timelines, deliverables, and resource allocation. Coordinate with cross-functional teams to align on project goals and deliverables. Ensure timely and high-quality delivery of software products. Qualifications: Education & Experience: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. Proven experience (12+ years) in software engineering, with a strong focus on B2B SaaS applications. At least 5 years of experience in a senior leadership role, preferably at the VP level. Strategic & Technical Skills: Demonstrated ability to develop and execute engineering strategies that align with business goals. Expertise in full stack development, cloud platforms (AWS, GCP), and Kubernetes. Strong experience with infrastructure management, MLOps, and integrating machine learning models. Ability to translate customer needs into technical requirements and ensure the delivery of high-quality products. Leadership & Soft Skills: Visionary leadership with the ability to inspire and guide large engineering teams. Strong business acumen with the ability to align technical efforts with business objectives. Excellent communication and interpersonal skills, with a focus on building strong cross-functional relationships. Proven track record of fostering customer trust and delivering products that drive customer success. Why Join Us: Leadership: Be a key player in shaping the future of our company and driving its success. Innovation : Lead the charge in adopting cutting-edge technologies and practices. Customer Impact : Play a pivotal role in ensuring our customers’ success and satisfaction. Growth : Opportunities for professional development and career advancement. Culture : A supportive and collaborative work environment where your contributions are valued. Show more Show less

Posted 2 weeks ago

Apply

6.0 years

0 Lacs

Pune, Maharashtra, India

Remote

Linkedin logo

Job Description ABOUT THIS ROLE As an SRE, your primary responsibility is to ensure the reliability, scalability, and availability of the systems that power Kibo’s products and services. You will work closely with cross-functional teams to build and maintain these systems, and you will be responsible for monitoring them to proactively identify and address production issues. ABOUT KIBO KIBO is a composable digital commerce platform for B2C, D2C, and B2B organizations who want to simplify the complexity in their businesses and deliver modern customer experiences. KIBO is the only modular, modern commerce platform that supports experiences spanning B2B and B2C Commerce, Order Management, and Subscriptions. Companies like Ace Hardware, Zwilling, Jelly Belly, Nivel, and Honey Birdette trust Kibo to bring simplicity and sophistication to commerce operations and deliver experiences that drive value. KIBO's cutting-edge solution is MACH Alliance Certified and has been recognized by Forrester, Gartner, IDC, Internet Retailer, and TrustRadius. KIBO has been named a leader in The Forrester Wave™: Order Management Systems, Q1 2025 and in the IDC MarketScape report “Worldwide Enterprise Headless Digital Commerce Applications 2024 Vendor Assessment”. By joining KIBO, you will be part of a team of Kibonauts all over the world in a remote-friendly environment. Whether your job is to build, sell, or support KIBO’s commerce solutions, we tackle challenges together with the approach of trust, growth mindset, and customer obsession. If you’re seeking a unique challenge with amazing growth potential, then come work with us! WHAT YOU’LL DO Design, implement, and maintain cloud infrastructure and tooling to support software development, deployment, and operations. Develop and enhance monitoring and alerting systems to proactively detect and resolve issues, ensuring system reliability. Automate deployments, configurations, and testing to streamline administration and minimize operational risks. Troubleshoot and resolve performance, availability, and security issues across distributed systems. Lead post-mortems and root cause analyses to drive continuous improvement and prevent recurring incidents. Ensure high availability and system reliability while participating in a 24x7x365 on-call rotation to address critical incidents. Collaborate with engineering teams to build scalable, resilient, and secure infrastructure that meets customer needs. Requirements WHAT YOU’LL NEED 6+ years of experience in an SRE, DevOps, or cloud engineering role. Strong fundamentals in Linux, networking, distributed systems, and cloud architecture. Experience with cloud platforms (AWS and/or GCP preferred; Azure is a plus). Proficiency with Kubernetes and related tools (Flux, Helm, Argo CD, Keel). Expertise in Infrastructure as Code (Terraform preferred) and configuration management (Ansible preferred). Experience with monitoring and observability tools such as Elasticsearch, Prometheus, Grafana, and OpenTelemetry. Scripting skills in Python, Bash, or Go (or similar languages). Deep understanding of security best practices and ability to implement them across cloud infrastructure. Experience operating in a SOC 2, PCI-DSS, and/or ISO 27001 compliant environment is a plus. Strong problem-solving mindset with a proactive approach to reliability engineering. Excellent communication and collaboration skills in a remote team environment. Willingness to participate in a 24x7 on-call rotation to ensure uptime and rapid incident response. KIBO PERKS Flexible schedule and hybrid work setting Paid company holidays and global volunteer holiday Generous health, wellness, benefits, and time away programs Commitment to individual growth and development and opportunity for internal mobility Passionate, high-achieving teammates excited to help you succeed and learn Company-sponsored events and other activities At Kibo we celebrate and support all differences. Kibo is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital, disability, and veteran status. Show more Show less

Posted 2 weeks ago

Apply

5.0 years

0 Lacs

Gurugram, Haryana

On-site

Indeed logo

Job Information Date Opened 05/30/2025 Job Type Full time Industry Financial Services Work Experience 5+ years City Gurgaon State/Province Haryana Country India Zip/Postal Code 122002 About Us indiagold has built a product & technology platform that enables regulated entities to launch or grow their asset backed products across geographies; without investing in operations, technology, people or taking any valuation, storage or transit risks. Our use of deep-tech is changing how asset backed loans have been done traditionally. Some examples of our innovation are – lending against digital gold, 100% paperless/digital loan onboarding process, computer vision to test gold purity as opposed to manual testing, auto- scheduling of feet-on-street, customer self-onboarding, gold locker model to expand TAM & launch zero-touch gold loans, zero network business app & many more. We are rapidly growing team passionate about solving massive challenges around financial well-being. We are a rapidly growing organisation with empowered opportunities across Sales, Business Development, Partnerships, Sales Operations, Credit, Pricing, Customer Service, Business Product, Design, Product, Engineering, People & Finance across several cities. We value the right aptitude & attitude than past experience in a related role, so feel free to reach out if you believe we can be good for each other. Job Description About the Role We are seeking a Staff Software Engineer to lead and mentor engineering teams while driving the architecture and development of robust, scalable backend systems and cloud infrastructure. This is a senior hands-on role with a strong focus on technical leadership, system design, and cross-functional collaboration across development, DevOps, and platform teams. Key Responsibilities Mentor engineering teams to uphold high coding standards and best practices in backend and full-stack development using Java, Spring Boot, Node.js, Python, and React. Guide architectural decisions to ensure performance, scalability, and reliability of systems. Architect and optimize relational data models and queries using MySQL. Define and evolve cloud infrastructure using Infrastructure as Code (Terraform) across AWS or GCP. Lead DevOps teams in building and managing CI/CD pipelines, Kubernetes clusters, and related cloud-native tooling. Drive best practices in observability using tools like Grafana, Prometheus, OpenTelemetry, and centralized logging frameworks (e.g., ELK, CloudWatch, Stackdriver). Provide architectural leadership for microservices-based systems deployed via Kubernetes, including tools like ArgoCD for GitOps-based deployment strategies. Design and implement event-driven systems that are reliable, scalable, and easy to maintain. Own security and compliance responsibilities in cloud-native environments, ensuring alignment with frameworks such as ISO 27001, CISA, and CICRA. Ensure robust design and troubleshooting of container and Kubernetes networking, including service discovery, ingress, and inter-service communication. Collaborate with product and platform teams to define long-term technical strategies and implementation plans. Perform code reviews, lead technical design discussions, and contribute to engineering-wide initiatives. Requirements Required Qualifications 7+ years of software engineering experience with a focus on backend development and system architecture. Deep expertise in Java and Spring Boot, with strong working knowledge of Node.js, Python, and React.js. Proficiency in MySQL and experience designing complex relational databases. Hands-on experience with Terraform and managing infrastructure across AWS or GCP. Strong understanding of containerization, Kubernetes, and CI/CD pipelines. Solid grasp of container and Kubernetes networking principles and troubleshooting techniques. Experience with GitOps tools such as ArgoCD and other Kubernetes ecosystem components. Deep knowledge of observability practices, including metrics, logging, and distributed tracing. Experience designing and implementing event-driven architectures using modern tooling (e.g., Kafka, Pub/Sub, etc.). Demonstrated experience in owning and implementing security and compliance measures, with practical exposure to standards like ISO 27001, CISA, and CICRA. Excellent communication skills and a proven ability to lead cross-functional technical efforts. Preferred (Optional) Qualifications Contributions to open-source projects or technical blogs. Experience leading or supporting compliance audits such as ISO 27001, SOC 2, or similar. Exposure to service mesh technologies (e.g., Istio, Linkerd). Experience with policy enforcement in Kubernetes (e.g., OPA/Gatekeeper, Kyverno). Benefits Why Join Us? Lead impactful engineering initiatives and mentor talented developers. Work with a modern, cloud-native stack across AWS, GCP, Kubernetes, and Terraform. Contribute to architectural evolution and long-term technical strategy. Competitive compensation, benefits, and flexible work options. Inclusive and collaborative engineering culture.

Posted 2 weeks ago

Apply

0 years

0 Lacs

Hyderabad, Telangana, India

Remote

Linkedin logo

About the Company: Transnational AI Private Limited is a next-generation AI-first company committed to building scalable, intelligent systems for digital marketplaces, insurance, employment, and healthcare sectors. We drive innovation through AI engineering, data science, and seamless platform integration powered by event-driven architectures. Role Summary: We are looking for a highly motivated AI Engineer with strong experience in Python, FastAPI, and event-driven microservice architecture. You will be instrumental in building intelligent, real-time systems that power scalable AI workflows across our platforms. This role combines deep technical engineering skills with a product-oriented mindset. Key Responsibilities: Architect and develop AI microservices using Python and FastAPI within an event-driven ecosystem. Implement and maintain asynchronous communication between services using message brokers like Kafka, RabbitMQ, or NATS. Convert AI/ML models into production-grade, containerized services integrated with streaming and event-processing pipelines. Design and document async REST APIs and event-based endpoints with comprehensive OpenAPI/Swagger documentation. Collaborate with AI researchers, product managers, and DevOps engineers to deploy scalable and secure services. Develop reusable libraries, automation scripts, and shared components for AI/ML pipelines. Maintain high standards for code quality, testability, and observability using unit tests, logging, and monitoring tools. Work within Agile teams to ship features iteratively with a focus on scalability, resilience, and fault tolerance. Required Skills and Experience: Proficiency in Python 3.x with a solid understanding of asynchronous programming (async/await). Hands-on experience with FastAPI; knowledge of Flask or Django is a plus. Experience building and integrating event-driven systems using Kafka, RabbitMQ, Redis Streams, or similar technologies. Strong knowledge of event-driven microservices, pub/sub models, and real-time data streaming architectures. Exposure to deploying AI/ML models using PyTorch, TensorFlow, or scikit-learn. Familiarity with containerization (Docker), orchestration (Kubernetes), and cloud platforms (AWS, GCP, Azure). Experience with unit testing frameworks such as PyTest, and observability tools like Prometheus, Grafana, or OpenTelemetry. Understanding of security principles including JWT, OAuth2, and API security best practices. Nice to Have: Experience with MLOps pipelines and tools like MLflow, DVC, or Kubeflow. Familiarity with Protobuf, gRPC, and async I/O with WebSockets. Prior work in real-time analytics, recommendation systems, or workflow orchestration (e.g., Prefect, Airflow). Contributions to open-source projects or active GitHub/portfolio. Educational Background: Bachelor’s or Master’s degree in Computer Science, Software Engineering, Artificial Intelligence, or a related technical discipline. Why Join Transnational AI: Build production-grade AI infrastructure powering real-world applications. Collaborate with domain experts and top engineers across marketplaces, insurance, and Workforce platforms. Flexible, remote-friendly environment with a focus on innovation and ownership. Competitive compensation, bonuses, and continuous learning support. Work on high-impact projects that influence how people discover jobs, get insured, and access personalized digital services. Show more Show less

Posted 2 weeks ago

Apply

13.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

The SRE Observability Lead Engineer is a hands-on leader responsible for shaping and delivering the future of Observability across Services Technology. This role reports into the Head of SRE Services and sits within a small central enablement team. You will define the long-term vision, build and scale modern observability capabilities across business lines, and lead a small team of SREs delivering reusable observability services. This is a blended leadership and engineering role – the ideal candidate pairs strategic vision with the technical depth to resolve real-world telemetry challenges across on-prem, cloud, and container-based environments (ECS, Kubernetes, etc.). You’ll work closely with architecture & other engineering functions to not only resolve common challenges affecting SREs aligned to LoBs, but will ensure observability is embedded as a non-functional requirement (NFR) for all new services going live. You will collaborate with platform and infrastructure teams to ensure enterprise-scale, not siloed solutions. You will also be responsible for managing a small, high-impact team of SREs based in your region. This role requires a comprehensive understanding of observability challenges across Services (Payments, Securities Services, Trade, Digital & Data) and the ability to influence outcomes at the enterprise level. Strong commercial awareness, technical credibility, and excellent communication skills are essential to negotiate internally, influence peers, and drive change. Some external communication may be necessary. Responsibilities: Define and own the strategic vision and multi-year roadmap for Observability across Services Technology, aligned with enterprise reliability and production goals. Translate strategy into an actionable delivery plan in partnership with Services Architecture & Engineering function, delivering incremental, high-value milestones toward a unified, scalable observability architecture. Lead and mentor SREs across Services, fostering a technical growth and SRE mindset. Build and offer a suite of central observability services across LoBs – including standardized telemetry libraries, onboarding templates, dashboard packs, and alerting standards. Drive reusability and efficiency by creating common patterns and golden paths for observability adoption across critical client flows and platforms. Partner with infrastructure, CTO and other SMBF tooling teams, to ensure observability tooling is scalable, resilient, and avoids duplication (“cottage industries”). Work hands-on to troubleshoot telemetry and instrumentation issues across on-prem, cloud (AWS, GCP, etc.), and ECS/Kubernetes-based environments. Collaborate closely with the architecture function to support implementation of observability NFRs in the SDLC, ensuring new apps go live with sufficient coverage and insight. Support SRE Communities of Practice (CoP) and foster strong relationships with SREs, developers, and platform leads across Services and beyond to accelerate adoption & promote SRE best practices like SLO adoption, Capacity Planning. Use Jira/Agile workflows to track and report on observability maturity across Services LoBs – coverage, adoption, and contribution to improved client experience. Remove inefficiencies and provide solutions to enable unified views of consolidated SLOs for critical E2E client journeys for Payments & other Services critical user journeys. Influence and align senior stakeholders across functions (applications, infrastructure, controls, and audit) to drive observability investment for critical client flows across Services. Represent Services in working groups to influence enterprise observability standards, ensuring feedback from Services is reflected. Lead people management responsibilities for your direct team, including management of headcount, goal setting, performance evaluation, compensation, and hiring. Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behaviour, conduct and business practices, and escalating, managing and reporting control issues with transparency, as well as effectively supervise the activity of others and create accountability with those who fail to maintain these standards. Qualifications: 13+ years of experience in Observability, SRE, Infrastructure Engineering, or Platform Architecture, including several years in senior leadership roles. Deep expertise in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms. Strong hands-on experience across hybrid infrastructure, including on-prem, cloud (AWS, GCP, Azure), and container platforms (ECS, Kubernetes). Proven ability to design scalable telemetry and instrumentation strategies, resolve production observability gaps, and integrate them into large-scale systems. Experience leading teams and managing people across geographically distributed locations. Strong ability to influence platform, cloud, and engineering leaders to ensure observability tooling is built for reuse and scale. Deep understanding of SRE fundamentals, including SLIs, SLOs, error budgets, and telemetry-driven operations. Strong collaboration skills and experience working across federated teams, building consensus and delivering change. Ability to stay up to date with industry trends and apply them to improve internal tooling and design decisions. Excellent written and verbal communication skills; able to influence and articulate complex concepts to technical and non-technical audiences. Education : Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or a related technical field. ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Applications Support ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 2 weeks ago

Apply

10.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

The SRE Observability Specialist is a hands-on expert, delivering the future of Observability across Services Technology. This role is a part of a central SRE enablement team within Services Production, working closely with SREs, developers, and platform teams to embed telemetry, implement SLOs, and build meaningful visualizations for key production flows — particularly in critical Payments Business. The ideal candidate will have deep technical knowledge, a collaborative mindset, and the ability to translate strategy into scalable engineering outcomes. You will also act as a bridge between Services Technology teams and central infrastructure/CTO teams, prioritising observability needs from line-of-business teams and driving improvements. A strong understanding of observability tooling, evolving AI/ML capabilities, and enterprise tooling ecosystems will be essential. Key Responsibilities: Deliver against the observability roadmap for Services Technology by building scalable, reusable telemetry solutions. Create and maintain dashboards and visualizations for critical client journeys, including real-time flows across Payments. Guide line-of-business teams in implementing SLIs/SLOs, golden signals, and effective alerting to support operational excellence. Support integration and adoption of observability tooling across on-prem, public cloud (AWS/GCP), and containerized environments (ECS, Kubernetes). Customize shared dashboards and observability components in partnership with CTI and other central Engineering functions, ensuring usability and flexibility. Provide technical support and implementation guidance to SREs and developers facing integration or tooling challenges. Effectively manage the observability book of work for Services Technology and drive initiatives to reduce MTTD and improve recovery outcomes. Serve as a key connection point between line-of-business SREs and central infrastructure functions by gathering tooling feedback, surfacing systemic issues, and influencing platform enhancements via the Services Observability Forum. Stay current with observability trends, including AI/ML-driven insights, anomaly detection, and emerging OSS practices, and assess their applicability. Maintain strong knowledge of observability platform features and vendor offerings to advise teams and maximize the value of tooling investments. Qualifications: 10+ years of experience in SRE, Observability Engineering, or platform infrastructure roles focused on operational telemetry. Hands-on experience in observability tools and stacks such as Grafana, Prometheus, OpenTelemetry, ELK, Splunk, and similar platforms. Deep understanding of SLIs, SLOs, Error Budgets, and telemetry best practices in high-availability environments. Proven ability to troubleshoot integration issues and support observability across hybrid platforms (on-prem, cloud, containers). Experience building dashboards aligned to business outcomes and incident workflows, especially in critical flows like payments. Familiarity with modern observability tooling ecosystems, including AI/ML capabilities, trace correlation, baselining, and alert tuning. Strong interpersonal and collaboration skills — able to operate across federated engineering teams and central infrastructure groups. Experience in enablement or platform teams with a track record of scaling best practices across diverse business units. Education: Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Applications Support ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster. Show more Show less

Posted 2 weeks ago

Apply

2.0 years

0 Lacs

India

Remote

Linkedin logo

This isn't your typical DevOps role. This is your chance to engineer the backbone of a next-gen AI-powered SaaS platform —where modular agents drive dynamic UI experiences, all running on a serverless AWS infrastructure with a Salesforce and SaaS-native backend. We're not building features—we're building an intelligent agentic ecosystem . If you've led complex multi-cloud builds, automated CI/CD pipelines with Terraform, and debugged AI systems in production, this is your arena. About Us We're a forward-thinking organization on a mission to reshape how businesses leverage cloud technologies and AI. Our approach is centered around delivering high-impact solutions that unify platforms across AWS, enterprise SaaS, and Salesforce. We don't just deliver software; we craft robust product ecosystems that redefine user interactions, streamline processes, and accelerate growth for our clients. The Role We are seeking a hands-on Agentic AI Ops Engineer who thrives at the intersection of cloud infrastructure , AI agent systems , and DevOps automation . In this role, you will build and maintain the CI/CD infrastructure for Agentic AI solutions using Terraform on AWS , while also developing, deploying, and debugging intelligent agents and their associated tools . This position is critical to ensuring scalable, traceable, and cost-effective delivery of agentic systems in production environments. The Responsibilities CI/CD Infrastructure for Agentic AI Design, implement, and maintain CI/CD pipelines for Agentic AI applications using Terraform , AWS CodePipeline , CodeBuild , and related tools. Automate deployment of multi-agent systems and associated tooling, ensuring version control, rollback strategies, and consistent environment parity across dev/test/prod Agent Development & Debugging Collaborate with ML/NLP engineers to develop and deploy modular, tool-integrated AI agents in production. Lead the effort to create debuggable agent architectures , with structured logging, standardized agent behaviors, and feedback integration loops. Build agent lifecycle management tools that support quick iteration, rollback, and debugging of faulty behaviors Monitoring, Tracing & Reliability Implement end-to-end observability for agents and tools, including runtime performance metrics , tool invocation traces , and latency/accuracy tracking . Design dashboards and alerting mechanisms to capture agent failures, degraded performance, and tool bottlenecks in real-time. Build lightweight tracing systems that help visualize agent workflows and simplify root cause analysis Cost Optimization & Usage Analysis Monitor and manage cost metrics associated with agentic operations including API call usage , toolchain overhead , and model inference costs . Set up proactive alerts for usage anomalies , implement cost dashboards , and propose strategies for reducing operational expenses without compromising performance Collaboration & Continuous Improvement Work closely with product, backend, and AI teams to evolve the agentic infrastructure design and tool orchestration workflows . Drive the adoption of best practices for Agentic AI DevOps , including retraining automation, secure deployments, and compliance in cloud-hosted environments. Participate in design reviews, postmortems, and architectural roadmap planning to continuously improve reliability and scalability Requirements 2+ years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems . Deep expertise in AWS serverless architecture , including hands-on experience with: AWS Lambda - function design, performance tuning, cold-start optimization. Amazon API Gateway - managing REST/HTTP APIs and integrating with Lambda securely. Step Functions - orchestrating agentic workflows and managing execution states. S3, DynamoDB, EventBridge, SQS - event-driven and storage patterns for scalable AI systems. Strong proficiency in Terraform to build and manage serverless AWS environments using reusable, modular templates Experience deploying and managing CI/CD pipelines for serverless and agent-based applications using AWS CodePipeline, CodeBuild, CodeDeploy , or GitHub Actions Hands-on experience with agent and tool development in Python , including debugging and performance tuning in production. Solid understanding of IAM roles and policies , VPC configuration, and least-privilege access control for securing AI systems. Deep understanding of monitoring, alerting, and distributed tracing systems (e.g., CloudWatch, Grafana, OpenTelemetry). Ability to manage environment parity across dev, staging, and production using automated infrastructure pipelines. Excellent debugging, documentation, and cross-team communication skills Benefits Health Insurance, PTO, and Leave time Ongoing paid professional training and certifications Fully Remote work Opportunity Strong Onboarding & Training program Work Timings - 1pm -10 pm IST Next Steps We're looking for someone who already embodies the spirit of a boundary-breaking AI Technologist—someone who's ready to own ambitious projects and push the boundaries of what LLMs can do. Apply Now : Send us your resume and answer a few key questions about your experience and vision Show Us Your Ingenuity : Be prepared to talk shop on your boldest AI solutions and how you overcame the toughest technical hurdles Collaborate & Ideate : If selected, you'll workshop a real-world scenario with our team—so we can see firsthand how your mind works This is your chance to leave a mark on the future of AI—one LLM agent at a time. We're excited to hear from you! Our Belief We believe extraordinary things happen when technology and human creativity unite. By empowering teams with generative AI, we free them to focus on meaningful relationships, innovative solutions, and real impact. It's more than just code—it's about sparking a revolution in how people interact with information, solve problems, and propel businesses forward. If this resonates with you—if you're driven, daring, and ready to build the next wave of AI innovation—then let's do this. Apply now and help us shape the future. About Expedite Commerce At Expedite Commerce, we believe that people achieve their best when technology enables them to build relationships and explore new ideas. So we build systems that free you up to focus on your customers and drive innovations. We have a great commerce platform that changes the way you do business! See more about us at expeditecommerce.com. You can also read about us on https://www.g2.com/products/expedite-commerce/reviews, and on Salesforce Appexchange/ExpediteCommerce. EEO Statement All qualified applicants to Expedite Commerce are considered for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran's status or any other protected characteristic. Show more Show less

Posted 2 weeks ago

Apply

10.0 years

0 Lacs

Mohali district, India

On-site

Linkedin logo

𝗔𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗥𝗼𝗹𝗲: We looking for a highly experienced and innovative Senior DevSecOps & Solution Architect to lead the design, implementation, and security of modern, scalable solutions across cloud platforms. The ideal candidate will bring a unique blend of DevSecOps practices, solution architecture, observability frameworks, and AI/ML expertise — with hands-on experience in data and workload migration from on-premises to cloud or cloud-to-cloud. You will play a pivotal role in transforming and securing our enterprise-grade infrastructure, automating deployments, designing intelligent systems, and implementing monitoring strategies for mission-critical applications. 𝗗𝗲𝘃𝗦𝗲𝗰𝗢𝗽𝘀 𝗟𝗲𝗮𝗱𝗲𝗿𝘀𝗵𝗶𝗽: • Own CI/CD strategy, automation pipelines, IaC (Terraform, Ansible), and container • orchestration (Docker, Kubernetes, Helm). • Champion DevSecOps best practices – embedding security into every stage of the SDLC. • Manage secrets, credentials, and secure service-to-service communication using Vault, • AWS Secrets Manager, or Azure Key Vault. • Conduct infrastructure hardening, automated compliance checks (CIS, SOC 2, ISO • 27001), and vulnerability management. • Solution Architecture: • Architect scalable, fault-tolerant, cloud-native solutions (AWS, Azure, or GCP). • Design end-to-end data flows, microservices, and serverless components. • Lead migration strategies for on-premises to cloud or cloud-to-cloud transitions, • ensuring minimal downtime and security continuity. • Create technical architecture documents, solution blueprints, BOMs, and migration • playbooks. • Observability & Monitoring: • Implement modern observability stacks: OpenTelemetry, ELK, Prometheus/Grafana, • DataDog, or New Relic. • Define golden signals (latency, errors, saturation, traffic) and enable APM, RUM, and log • aggregation. • Design SLOs/SLIs and establish proactive alerting for high-availability environments. 𝗔𝗜/𝗠𝗟 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 &𝗮𝗺𝗽; 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻: • Integrate AI/ML into existing systems for intelligent automation, data insights, and • anomaly detection. • Collaborate with data scientists to operationalize models using MLflow, SageMaker, • Azure ML, or custom pipelines. • Work with LLMs and foundational models (OpenAI, Hugging Face, Bedrock) for POCs or • production-ready features. • Migration & Transformation: • Lead complex data migration projects across heterogeneous environments — legacy • systems to cloud, or inter-cloud (e.g., AWS to Azure). • Ensure data integrity, encryption, schema mapping, and downtime minimization • throughout migration efforts. • Use tools such as AWS DMS, Azure Data Factory, GCP Transfer Services, or custom • scripts for lift-and-shift and re-architecture. 𝗥𝗲𝗾𝘂𝗶𝗿𝗲𝗱 𝗦𝗸𝗶𝗹𝗹𝘀 &𝗮𝗺𝗽; 𝗤𝘂𝗮𝗹𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀: • 10+ years in DevOps, cloud architecture, or platform engineering roles. • Expert in AWS and/or Azure – including IAM, VPC, EC2, Lambda/Functions, S3/Blob, API • Gateway, and container services (EKS/AKS). • Proficient in infrastructure as code: Terraform, CloudFormation, Ansible. • Hands-on with Kubernetes (k8s), Helm, GitOps workflows. • Strong programming/scripting skills in Python, Shell, or PowerShell. • Practical knowledge of AI/ML tools, libraries (TensorFlow, PyTorch, scikit-learn), and • model lifecycle management. • Demonstrated success in large-scale migrations and hybrid architecture. • Solid understanding of application security, identity federation, and compliance. Familiar with agile practices, project estimation, and stakeholder communication. 𝗡𝗶𝗰𝗲 𝘁𝗼 𝗛𝗮𝘃𝗲: • Certifications: AWS Solutions Architect, Azure Architect, Certified Kubernetes Admin, or similar. • Experience with Kafka, RabbitMQ, event-driven architecture. • Exposure to n8n, OpenFaaS, or AI agents. Show more Show less

Posted 2 weeks ago

Apply

6.0 years

0 Lacs

Gurgaon, Haryana, India

On-site

Linkedin logo

You Lead the Way. We’ve Got Your Back. With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you’ll learn and grow as we help you create a career journey that’s unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally. At American Express, you’ll be recognized for your contributions, leadership, and impact—every colleague has the opportunity to share in the company’s success. Together, we’ll win as a team, striving to uphold our company values and powerful backing promise to provide the world’s best customer experience every day. And we’ll do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong. Join Team Amex and let's lead the way together. About Enterprise Architecture: Enterprise Architecture is an organization within the Chief Technology Office at American Express and it is a key enabler of the company’s technology strategy. The four pillars of Enterprise Architecture include: 1. Architecture as Code : this pillar owns and operates foundational technologies that are leveraged by engineering teams across the enterprise. 2. Architecture as Design : this pillar includes the solution and technical design for transformation programs and business critical projects which need architectural guidance and support. 3. Governance : this pillar is responsible for defining technical standards, and developing innovative tools that automate controls to ensure compliance. 4. Colleague Enablement: this pillar is focused on colleague development, recognition, training, and enterprise outreach. What you will be working on: We are looking for a Senior Engineer to join our Enterprise Architecture team. In this role you will be designing and implementing highly scalable real-time systems following the best practices and using the cutting-edge technology. This role is best suited for experienced engineers with broad skillset who are open, curious and willing to learn. Qualifications : What you will Bring: Bachelor's degree in computer science, computer engineering or a related field, or equivalent experience 6+ years of progressive experience demonstrating strong architecture, programming and engineering skills. Firm grasp of data structures, algorithms with fluency in programming languages like Java, Kotlin, Go Demonstrated ability to lead, partner, and collaborate cross functionally across many engineering organizations Experience in building real-time large scale, high volume, distributed data pipelines on top of data buses (Kafka). Hands on experience with large scale distributed NoSQL databases like Elasticsearch Knowledge and/or experience with containerized environments, Kubernetes, docker. Experience in implementing and maintained highly scalable micro services in Rest, GRPC Appetite for trying new things and building rapid POCs Preferred Qualifications: Knowledge of Observability concepts like Tracing, Metrics, Monitoring, Logging Knowledge of Prometheus Experience with large scale installations of Elasticsearch Knowledge of OpenTelemetry / OpenTracing Knowledge of observability tools like Jaeger, Kibana, Graphana etc. Open-source community involvement Knowledge of contact center, assisted servicing domain We back our colleagues and their loved ones with benefits and programs that support their holistic well-being. That means we prioritize their physical, financial, and mental health through each stage of life. Benefits include: Competitive base salaries Bonus incentives Support for financial-well-being and retirement Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location) Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need Generous paid parental leave policies (depending on your location) Free access to global on-site wellness centers staffed with nurses and doctors (depending on location) Free and confidential counseling support through our Healthy Minds program Career development and training opportunities American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations. Show more Show less

Posted 2 weeks ago

Apply

0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

Linkedin logo

Reporting to: Sr Manager, Availability Management Office Location: Chennai, India Flexible Working: Hybrid (Part Office/Part Home) Cloud Site Reliability Engineer Responsibilities On-board internal customers to our 24x7 Applications Support and Enterprise Status Page services Be involved with creating an SRE culture globally by defining monitoring strategies and best practices at the organization. Monitor application performance and have the ability to provide recommendations on increasing the observability of applications and platforms. Play an important role in the Continual Service Improvement process, identifying and driving improvement Be instrumental to developing standards, guides to assist the business in maximizing their use of common tools . Participate in code peer reviews and enforce quality gates to ensure best practices are followed. Apply automation to tasks which would benefit from this. Automating repetitive tasks and deploying monitors via code are core examples. Document knowledge gained from engagements in the forms of runbooks and other information critical to incident response. Exploring and applying Artificial Intelligence to enhance operational processes/procedures Should-Haves - Skills & Experience Strong skills with modern monitoring tools and demonstrable knowledge of APM, RUM and/or synthetic testing. Experience working with observability tools such as Datadog, NewRelic, Splunk, CloudWatch, AzureMonitor Experience with the OpenTelemetry (OTEL) Standard Working knowledge of at least one programming language, such as Python, JavaScript (NodeJS, etc), Golang or others. Strong experience with IaC tools, such as Terraform and Cloudformation. Experience with cloud environments, especially AWS and/or Azure. Good customer interaction skills and able to understand their needs and expectations. Strength in conviction, able to encourage adoption to a wide audience but comfortable with mandating where necessary Experience with code quality tools, such as SonarQube. Knowledge on code linters tools of various programming languages. Experience with CI/CD tools. Such as Bamboo, Jenkins, Azure DevOps, Github actions. ITIL experience with basic understanding on incident management, problem management and change management. Nice-to-Haves - Skills & Experience Any cloud certification ITIL certifications Experience with ITSM tools Experience using On-Call Management Tooling No travel required Show more Show less

Posted 2 weeks ago

Apply

0 years

0 Lacs

Pune/Pimpri-Chinchwad Area

On-site

Linkedin logo

Job Description We are seeking a highly skilled Senior Reliability Engineer with strong backend software engineering skills to join our team. As a Senior Reliability Engineer , you will be responsible for designing, implementing, and maintaining our cloud infrastructure, ensuring the smooth operation of our applications and services. In addition, you will contribute to the development of our backend software systems, working closely with our engineering team to design, develop, and deploy scalable and reliable software solutions. This role will report to Senior Engineering Manager, Finance Engineering in Pune, Indi What you’ll do: Collaborate with your peers to envision, design, and develop solutions in your respective area with a bias toward reusability, toil reduction, and resiliency Surface opportunities across the broader organization for solving systemic issues Use a collaborative approach to make technical decisions that align with Procore’s architectural vision Partner with internal customers, peers, and leadership in planning, prioritization, and roadmap development Develop teammates by conducting code reviews, providing mentorship, pairing, and training opportunities Serve as a subject matter expert on tools, processes, and procedures and help guide others to create and maintain a healthy codebase Facilitate an “open source” mindset and culture both across teams internally and outside of Procore through active participation in and contributions to the greater community Design, develop, and deploy scalable and reliable backend software systems using languages such as Java, Python, or Go Work with engineering teams to design and implement microservices architecture Develop and maintain APIs using RESTful APIs, GraphQL, or gRPC Ensure high-quality code through code reviews, testing, and continuous integration Serve as a subject matter expert in a domain, including processes and software design that help guide others to create and maintain a healthy codebase What we’re looking for: Container orchestration (Kubernetes) K8s, preferably EKS. ArgoCD Terraform or similar IaC o11y (OpenTelemetry ideal) Public cloud (AWS, GCP, Azure) Cloud automation tooling (e.g., CloudFormation, Terraform, Ansible) Kafka and Kafka connectors Linux Systems Ensure compliance with security and regulatory requirements, such as HIPAA, SOX, FedRAMP Experience with the following is preferred: Continuous Integration Tooling (e.g., Circle CI, Jenkins, Travis, etc.) Continuous Deployment Tooling (e.g., ArgoCD, Spinnaker) Service Mesh / Discovery Tooling (e.g., Consul, Envoy, Istio, Linkerd) Networking (WAF, Cloudflare) Event-driven architecture (Event Sourcing, CQRS) Flink or other streaming processing technologies RDBMS and NoSQL databases Experience in working and developing APIs through REST, gRPC, or GraphQL Professional experience in Java, GoLang, Python preferred Additional Information Perks & Benefits At Procore, we invest in our employees and provide a full range of benefits and perks to help you grow and thrive. From generous paid time off and healthcare coverage to career enrichment and development programs, learn more details about what we offer and how we empower you to be your best. About Us Procore Technologies is building the software that builds the world. We provide cloud-based construction management software that helps clients more efficiently build skyscrapers, hospitals, retail centers, airports, housing complexes, and more. At Procore, we have worked hard to create and maintain a culture where you can own your work and are encouraged and given resources to try new ideas. Check us out on Glassdoor to see what others are saying about working at Procore. We are an equal-opportunity employer and welcome builders of all backgrounds. We thrive in a diverse, dynamic, and inclusive environment. We do not tolerate discrimination against candidates or employees on the basis of gender, sex, national origin, civil status, family status, sexual orientation, religion, age, disability, race, traveler community, status as a protected veteran or any other classification protected by law. If you'd like to stay in touch and be the first to hear about new roles at Procore, join our Talent Community. Alternative methods of applying for employment are available to individuals unable to submit an application through this site because of a disability. Contact our benefits team here to discuss reasonable accommodations. Show more Show less

Posted 2 weeks ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

On-site

Linkedin logo

Job Requisition ID # 25WD86258 Position Overview Autodesk is looking for Cloud Infrastructure Engineers to join the Platform Infrastructure team of the Autodesk Data Platform (ADP). This team is at the heart of Autodesk’s efforts to radically improve how we create value for customers and make decisions through data. As a Cloud Infrastructure Engineer, you will help create a robust and scalable Big Data Platform for teams across the company to leverage. You will tackle hard problems to improve the platform’s reliability, resiliency, and scalability. Ideally, you will be a self-starter, detail-oriented, quality-driven, and excited about the prospects of having a big impact with data at Autodesk. Our tech stack includes Spark, Presto, Hive, Kubernetes, Airflow, Jenkins, Python, Spinnaker, Terraform, Snowflake, Datadog, and various AWS services. Responsibilities Build and scale data infrastructure that powers batch and real-time data processing of billions of records daily Automate cloud infrastructure, services, and observability Help drive observability into the health of our data infrastructure and understanding of system behaviour Develop scripts to manage Cloud Infrastructure using Python or other frameworks for Cloud-native development Drive initiatives to enable best practices across infrastructure, deployments, automation, and accessibility Develop and implement security best practices at the data, application, infrastructure, and network layers Interface with data engineers, data scientists, product managers, and all data stakeholders to understand their needs and promote best practices Minimum Qualifications 5-8 years of relevant industry experience in a large-scale infrastructure environment 3+ years of Automation/DevOps Developer experience Strong experience in AWS Cloud Automation (EMR, EKS, EC2, ECS, S3, IAM Policies, etc.) Strong overall programming skills, able to write modular, maintainable code, preferably in Python Preferred Qualifications Participate in all phases of the product lifecycle, including design, development, and deployment Automation of testing framework management and migration End-to-end monitoring and dashboard tools (Grafana, OpenTelemetry, Datadog) Experience in Big Data infrastructure such as Spark, Hive, Presto, etc Learn More About Autodesk Welcome to Autodesk! Amazing things are created every day with our software – from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made. We take great pride in our culture here at Autodesk – our Culture Code is at the core of everything we do. Our values and ways of working help our people thrive and realize their potential, which leads to even better outcomes for our customers. When you’re an Autodesker, you can be your whole, authentic self and do meaningful work that helps build a better future for all. Ready to shape the world and your future? Join us! Salary transparency Salary is one part of Autodesk’s competitive compensation package. Offers are based on the candidate’s experience and geographic location. In addition to base salaries, we also have a significant emphasis on discretionary annual cash bonuses, commissions for sales roles, stock or long-term incentive cash grants, and a comprehensive benefits package. Diversity & Belonging We take pride in cultivating a culture of belonging and an equitable workplace where everyone can thrive. Learn more here: https://www.autodesk.com/company/diversity-and-belonging Are you an existing contractor or consultant with Autodesk? Please search for open jobs and apply internally (not on this external site). Show more Show less

Posted 2 weeks ago

Apply

4.0 - 6.0 years

3 - 5 Lacs

Mumbai, Kurla

Work from Office

Naukri logo

Required: Expertise in AWS, including basic services like networking, data and workload management. o AWS Networking: VPC, VPC Peering, Transit Gateway, RouteTables, SecurityGroups, etc. Data: RDS, DynamoDB, ElasticSearch Workload: EC2, EKS, Lambda, etc. Required Skills: Experience in any one of the CI/CD tools (Gitlab/Github/Jenkins) including runner setup, templating and configuration. Kubernetes experience or Ansible Experience (EKS/AKS/GKE), basics like a pod, deployment, networking, and service mesh. Used any package manager like Helm. Scripting experience (python), automation in pipelines when required, system service. Infrastructure automation (Terraform/pulumi/cloud formation), write modules, setup pipeline and version the code. Optional: Experience in any programming language is not required but is appreciated. Good experience in GIT, SVN or any other code management tool is required. DevSecops tools like (Qualys/SonarQube/BlackDuck) for security scanning of artefacts, infrastructure and code. Observability tools (Opensource: Prometheus, Elasticsearch, OpenTelemetry; Paid: Datadog, 24/7, etc)

Posted 2 weeks ago

Apply

18.0 years

0 Lacs

Noida, Uttar Pradesh, India

On-site

Linkedin logo

Our Company Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours! Opportunity Adobe is looking for a strategic and results-driven Director of Site Reliability Engineering (SRE) . This role provides a unique opportunity to drive innovation , work alongside senior leaders, and influence business-critical initiatives at scale. The ideal candidate is an experienced engineering leader who will guide a high-performing, globally distributed SRE team. You will be responsible for defining the technical strategy, reliability vision, and operational excellence roadmap, ensuring the availability and performance of Adobe’s multi-tenant, web-scale digital products. Role Summary As Director of Site Reliability Engineering, you will lead multiple SRE teams across Noida and Bangalore, managing multi-tiered leaders reporting to you. You will play a pivotal role in: Driving system reliability, scalability, and performance for Adobe’s solutions. Owning the technical direction, automation, monitoring, and infrastructure provisioning. Collaborating with engineering, product, and operations teams to drive innovation and reliability at scale. What You’ll Do Leadership & Strategy: Develop and execute the SRE roadmap to ensure high availability (99.99%+ uptime), scalability, and reliability of Adobe’s products Operational Excellence: Define and implement best practices for observability, monitoring, and incident response, leveraging advanced AI/ML-powered analytics. Automation & Infrastructure: Drive automation initiatives for CI/CD, infrastructure provisioning, and self-healing capabilities to reduce toil and increase efficiency. Incident Response & Performance Optimization: Establish proactive incident management processes, conduct blameless postmortems, and continuously improve system resilience. Cloud & Big Data Technologies: Optimize Adobe’s cloud-native architectures (AWS, Azure, GCP) and integrate big data technologies such as Hadoop, Spark, Kafka, and Cassandra. Cross-functional Collaboration: Work closely with product management, marketing, customer success, and global consulting teams to align business goals with engineering efforts. Customer Engagement: Partner with enterprise clients on pre-sales and post-sales engagements, providing technical guidance and reliability best practices. Team Development & Mentorship: Build and mentor a world-class SRE team, fostering a culture of innovation, ownership, and operational excellence. What You Need To Succeed 18+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, with at least 8 years in leadership roles. Proven track record of leading large-scale, high-impact engineering projects in a global enterprise. Experience managing multiple teams (4+ years as a second-level manager). Prior experience working with US-based leadership; previous work experience in the US is a plus. Strong expertise in distributed systems, microservices, cloud platforms (AWS/Azure/GCP), and container orchestration (Kubernetes, Docker, ECS). Hands-on experience with monitoring & observability tools (Datadog, Prometheus, ELK, OpenTelemetry). Deep understanding of SLOs, SLIs, SLAs, and error budgets to drive service reliability. Excellent stakeholder management skills, with the ability to collaborate across engineering, business, and customer-facing teams. A strategic thinker with intellectual curiosity about products, market trends, and business growth. Strong communication, analytical, and problem-solving skills with the ability to influence C-suite executives. B.Tech / M.Tech in Computer Science from a premier institute. Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law. Learn more. Adobe aims to make Adobe.com accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call (408) 536-3015. Show more Show less

Posted 3 weeks ago

Apply

6.0 years

0 Lacs

Trivandrum, Kerala, India

On-site

Linkedin logo

Role Description Role Proficiency: Act creatively to develop applications and select appropriate technical options optimizing application development maintenance and performance by employing design patterns and reusing proven solutions account for others' developmental activities Outcomes Interpret the application/feature/component design to develop the same in accordance with specifications. Code debug test document and communicate product/component/feature development stages. Validate results with user representatives; integrates and commissions the overall solution Select appropriate technical options for development such as reusing improving or reconfiguration of existing components or creating own solutions Optimises efficiency cost and quality. Influence and improve customer satisfaction Set FAST goals for self/team; provide feedback to FAST goals of team members Measures Of Outcomes Adherence to engineering process and standards (coding standards) Adherence to project schedule / timelines Number of technical issues uncovered during the execution of the project Number of defects in the code Number of defects post delivery Number of non compliance issues On time completion of mandatory compliance trainings Code Outputs Expected: Code as per design Follow coding standards templates and checklists Review code – for team and peers Documentation Create/review templates checklists guidelines standards for design/process/development Create/review deliverable documents. Design documentation r and requirements test cases/results Configure Define and govern configuration management plan Ensure compliance from the team Test Review and create unit test cases scenarios and execution Review test plan created by testing team Provide clarifications to the testing team Domain Relevance Advise Software Developers on design and development of features and components with a deep understanding of the business problem being addressed for the client. Learn more about the customer domain identifying opportunities to provide valuable addition to customers Complete relevant domain certifications Manage Project Manage delivery of modules and/or manage user stories Manage Defects Perform defect RCA and mitigation Identify defect trends and take proactive measures to improve quality Estimate Create and provide input for effort estimation for projects Manage Knowledge Consume and contribute to project related documents share point libraries and client universities Review the reusable documents created by the team Release Execute and monitor release process Design Contribute to creation of design (HLD LLD SAD)/architecture for Applications/Features/Business Components/Data Models Interface With Customer Clarify requirements and provide guidance to development team Present design options to customers Conduct product demos Manage Team Set FAST goals and provide feedback Understand aspirations of team members and provide guidance opportunities etc Ensure team is engaged in project Certifications Take relevant domain/technology certification Skill Examples Explain and communicate the design / development to the customer Perform and evaluate test results against product specifications Break down complex problems into logical components Develop user interfaces business software components Use data models Estimate time and effort required for developing / debugging features / components Perform and evaluate test in the customer or target environment Make quick decisions on technical/project related challenges Manage a Team mentor and handle people related issues in team Maintain high motivation levels and positive dynamics in the team. Interface with other teams designers and other parallel practices Set goals for self and team. Provide feedback to team members Create and articulate impactful technical presentations Follow high level of business etiquette in emails and other business communication Drive conference calls with customers addressing customer questions Proactively ask for and offer help Ability to work under pressure determine dependencies risks facilitate planning; handling multiple tasks. Build confidence with customers by meeting the deliverables on time with quality. Estimate time and effort resources required for developing / debugging features / components Make on appropriate utilization of Software / Hardware’s. Strong analytical and problem-solving abilities Knowledge Examples Appropriate software programs / modules Functional and technical designing Programming languages – proficient in multiple skill clusters DBMS Operating Systems and software platforms Software Development Life Cycle Agile – Scrum or Kanban Methods Integrated development environment (IDE) Rapid application development (RAD) Modelling technology and languages Interface definition languages (IDL) Knowledge of customer domain and deep understanding of sub domain where problem is solved Additional Comments Senior Java backend Microservices Software Engineer Musts: Strong understanding of object-oriented and functional programming principles Experience with RESTful APIs Knowledge of microservices architecture and cloud platforms Familiarity with CICD pipelines, Docker, and Kubernetes Strong problem-solving skills and ability to work in an Agile environment Excellent communication and teamwork skills Nices: 6+ years of experience, with at least 3+ in Kotlin Experience with backend development using Kotlin (Ktor, Spring Boot, or Micronaut) Proficiency in working with databases such as PostgreSQL, MySQL, or MongoDB Experience with GraphQL and WebSockets Additional Musts: Experience with backend development in the Java ecosystem (either Java or Kotlin will do) Additional Nices: Experience with Typescript and NodeJS Experience with Kafka Experience with frontend development (e.g. React) Experience with Gradle Experience with GitLab CI Experience with OpenTelemetry Skills Restful Apis,Java,Microservices,Aws Show more Show less

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

India

On-site

Linkedin logo

Experience: 5+ years in high-volume ESP integrations & deliverability optimization Tech Stack: Laravel 10/11 • Node.js 18+ (Bun) • SendGrid (Mail & Marketing APIs) • Redis • MySQL 8 • Docker • GitHub Actions • OSS queues (Bee-Queue, BullMQ-OSS, Taskless, etc.) About the Role We are aiming to develop a Bulk Email Marketing module that must land emails in the inbox—not the spam folder. You will design and operate queue‑driven batch sends and—most critically—engineer deliverability safeguards to keep spam rates below 0.1 % across millions of sends. You’ll also migrate our current Node/BullMQ service to an open‑source queue and integrate everything seamlessly into our Laravel-based CRM. All front‑end work is handled by a separate team; your focus is pure back‑end infrastructure. Key Responsibilities Architect & build the bulk‑send workflow: throttling, retries, parallel batch pipelines and dedicated IP management. Implement robust deliverability controls: Automated SPF, DKIM, DMARC, BIMI & ARC checks on every sender domain. List‑hygiene pruning, bounce/complaint feedback loops, and reputation scoring. Pre‑send spam‑filter diagnostics (SpamAssassin rules, seed‑list placement tests). Migrate our existing Node micro‑service from BullMQ’s paid batch feature to an OSS queue without regressions. Expose clean REST APIs for the front‑end team to consume (campaign creation, scheduling, analytics). Handle bounce reports, unsubscribe management, and analytics integration. Ensure proper authentication, template rendering, scheduling, and delivery tracking. Ensure module security, scalability, and performance. Write tests/docs, perform code reviews and mentor teammates on email infrastructure best practices. Required Skills & Experience Expert‑level SendGrid integration (Marketing & Transactional) with proven record raising inbox placement. Proven knowledge of other mail ESP platforms like (SES, Postmark etc). Deep knowledge of deliverability levers: SPF, DKIM, DMARC, BIMI, IP warm‑up, feedback loops, spam‑trap avoidance, content quality scoring. Production experience with Node.js/Bun workers and Redis‑backed queues at 100 k+ emails/hour (Bee‑Queue, BullMQ‑OSS, Taskless, or Redis streams). Strong Laravel background (queues/Horizon, events, policies) to integrate micro‑services with the core CRM. Proficient with Docker‑based deployments and CI/CD pipelines using GitHub Actions. Ability to write clear documentation and conduct rigorous code reviews. Nice to Have Implemented seed‑list/inbox‑placement monitoring tools (GlockApps, Mail‑Tester, Google Postmaster). Experience migrating from paid BullMQ features to Bee‑Queue, Taskless, or custom Redis streams. Familiarity with other ESPs (AWS SES, Postmark) for future multi‑ESP abstraction. Observability with OpenTelemetry traces across micro‑services. Knowledge of Prometheus/Grafana dashboards. Show more Show less

Posted 3 weeks ago

Apply

2.0 years

0 Lacs

India

Remote

Linkedin logo

At Rethem, we're revolutionizing the sales landscape by putting buyer outcomes at the forefront. We understand that customers buy outcomes, and our AI-driven platform empowers your sales reps to deliver those outcomes, helping them crush their quotas. What Sets Us Apart Deep AI Integration: Our platform leverages advanced AI that acts as a personal coach for your reps, adapting to your business processes to automate complex tasks and provide real-time guidance Outcome-Driven Approach: By focusing on delivering measurable outcomes, we enable your sales team to build trust and foster long-term customer relationships Market Leadership: Positioned at the cutting edge of buyer-centric sales transformation, we're leading the shift towards more meaningful and effective sales interactions Proven Expertise: Our leadership and team consist of industry veterans with a track record of driving substantial growth and innovation in sales Our Mission To redefine the sales process by aligning it with buyer needs, leveraging AI to empower sales teams to deliver outcomes that drive mutual success. Transform Your Sales Strategy with AI Rethem turns your sales playbook into an intelligent, always-on guide that adapts in real-time. By harnessing the power of AI, we provide your team with: Real-Time Coaching: Enhance performance with actionable insights during every buyer interaction Enhanced Efficiency: Automate key processes so your reps can focus on building relationships and delivering value Outcome Alignment: Ensure your offerings are perfectly aligned with customer objectives, leading to higher satisfaction and loyalty Accelerate Growth: Drive higher win rates and larger deals through a buyer-focused approach Vision for the Future We envision a future where AI and human expertise collaborate seamlessly to create unparalleled sales experiences. By continuously innovating, we aim to stay at the forefront of buyer-centric sales transformation. Join the Sales Revolution Emerging from stealth mode, Rethem invites a select group of visionary organizations to pilot our groundbreaking platform. If you're ready to elevate your sales team, deliver exceptional customer outcomes, and empower your reps to crush their quotas, visit our website to learn more and apply. Be Part of Our Journey We're assembling a team of innovators passionate about reshaping the sales industry. Explore career opportunities with Re:them and help shape the future of outcome-driven, AI-powered sales. Experience the Power of AI-Driven Sales Transformation with Re:them. The Role We are seeking a hands-on Agentic AI Ops Engineer who thrives at the intersection of cloud infrastructure , AI agent systems , and DevOps automation . In this role, you will build and maintain the CI/CD infrastructure for Agentic AI solutions using Terraform on AWS , while also developing, deploying, and debugging intelligent agents and their associated tools . This position is critical to ensuring scalable, traceable, and cost-effective delivery of agentic systems in production environments. The Responsibilities CI/CD Infrastructure for Agentic AI Design, implement, and maintain CI/CD pipelines for Agentic AI applications using Terraform , AWS CodePipeline , CodeBuild , and related tools. Automate deployment of multi-agent systems and associated tooling, ensuring version control, rollback strategies, and consistent environment parity across dev/test/prod Agent Development & Debugging Collaborate with ML/NLP engineers to develop and deploy modular, tool-integrated AI agents in production. Lead the effort to create debuggable agent architectures , with structured logging, standardized agent behaviors, and feedback integration loops. Build agent lifecycle management tools that support quick iteration, rollback, and debugging of faulty behaviors Monitoring, Tracing & Reliability Implement end-to-end observability for agents and tools, including runtime performance metrics , tool invocation traces , and latency/accuracy tracking . Design dashboards and alerting mechanisms to capture agent failures, degraded performance, and tool bottlenecks in real-time. Build lightweight tracing systems that help visualize agent workflows and simplify root cause analysis Cost Optimization & Usage Analysis Monitor and manage cost metrics associated with agentic operations including API call usage , toolchain overhead , and model inference costs . Set up proactive alerts for usage anomalies , implement cost dashboards , and propose strategies for reducing operational expenses without compromising performance Collaboration & Continuous Improvement Work closely with product, backend, and AI teams to evolve the agentic infrastructure design and tool orchestration workflows . Drive the adoption of best practices for Agentic AI DevOps , including retraining automation, secure deployments, and compliance in cloud-hosted environments. Participate in design reviews, postmortems, and architectural roadmap planning to continuously improve reliability and scalability Requirements 2+ years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems . Deep expertise in AWS serverless architecture , including hands-on experience with: AWS Lambda - function design, performance tuning, cold-start optimization. Amazon API Gateway - managing REST/HTTP APIs and integrating with Lambda securely. Step Functions - orchestrating agentic workflows and managing execution states. S3, DynamoDB, EventBridge, SQS - event-driven and storage patterns for scalable AI systems. Strong proficiency in Terraform to build and manage serverless AWS environments using reusable, modular templates Experience deploying and managing CI/CD pipelines for serverless and agent-based applications using AWS CodePipeline, CodeBuild, CodeDeploy , or GitHub Actions Hands-on experience with agent and tool development in Python , including debugging and performance tuning in production. Solid understanding of IAM roles and policies , VPC configuration, and least-privilege access control for securing AI systems. Deep understanding of monitoring, alerting, and distributed tracing systems (e.g., CloudWatch, Grafana, OpenTelemetry). Ability to manage environment parity across dev, staging, and production using automated infrastructure pipelines. Excellent debugging, documentation, and cross-team communication skills Benefits Health Insurance, PTO, and Leave time Ongoing paid professional training and certifications Fully Remote work Opportunity Strong Onboarding & Training programs Are you r eady to Join the Revolution? If you're ready to take on this exciting challenge and believe you meet our requirements, we encourage you to apply. Let's shape the future of AI-driven sales together! See more about us at https://www.rethem.ai/ EEO Statement All qualified applicants to Expedite Commerce are considered for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran's status or any other protected characteristic. Show more Show less

Posted 3 weeks ago

Apply

4.0 years

0 Lacs

Bengaluru East, Karnataka, India

On-site

Linkedin logo

Overview As a Software Engineer in the Artificial Intelligence group, you will contribute to developing and optimizing the backend infrastructure that supports AI-driven solutions. You will work closely with machine learning engineers and cross-functional teams to build scalable backend services, automate deployments, and improve system performance. Your role will focus on Python-based backend development, Kubernetes operations, and DevOps best practices to ensure reliable and efficient AI model deployments. Responsibilities Develop and maintain backend services and APIs that support AI models and intelligent assistants. Improve scalability and performance of AI model serving and API interactions.Ensure system reliability by implementing logging, monitoring, and alerting solutions. Assist in deploying AI models using Kubernetes and Docker, ensuring smooth model integration into production. Contribute to CI/CD pipelines for AI applications, automating model testing and deployments. Work on data pipelines and optimize storage and retrieval for AI workloads. Work on infrastructure automation using Terraform, CloudFormation, or other Infrastructure as Code (IaC) tools. Support cloud-based deployments on AWS, GCP, or Azure, optimizing resource usage. Work closely with AI/ML engineers to understand infrastructure requirements for AI solutions. Participate in code reviews, architecture discussions, and knowledge-sharing sessions. Continuously learn and improve skills in backend development, cloud technologies, and DevOps. Requirements 4 years of experience in backend development using Python (preferred) or Java. Experience with RESTful API development, micro-services, and cloud-based architectures. Familiarity with Kubernetes, Docker, and containerised deployments. Hands-on experience with CI/CD tools (e.g., Jenkins, GitHub Actions, ArgoCD). Basic understanding of cloud platforms (AWS, GCP, or Azure) and their services. Strong problem-solving skills and a willingness to learn new technologies. Preferred Experience Exposure to AI/ML pipelines, model serving, or data engineering workflows. Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry). Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Show more Show less

Posted 3 weeks ago

Apply

Exploring Opentelemetry Jobs in India

The job market for opentelemetry professionals in India is rapidly growing, with many companies looking to adopt this technology to improve their observability and monitoring capabilities. If you are a job seeker interested in opentelemetry roles, there are plenty of opportunities waiting for you in India.

Top Hiring Locations in India

  1. Bangalore
  2. Hyderabad
  3. Pune
  4. Chennai
  5. Mumbai

Average Salary Range

The average salary range for opentelemetry professionals in India varies based on experience level: - Entry-level: INR 5-8 lakhs per annum - Mid-level: INR 10-15 lakhs per annum - Experienced: INR 18-25 lakhs per annum

Career Path

A typical career path in opentelemetry may progress as follows: - Junior Developer - Developer - Senior Developer - Tech Lead

Related Skills

In addition to proficiency in opentelemetry, employers often look for candidates with the following skills: - Proficiency in cloud platforms like AWS, GCP, or Azure - Knowledge of monitoring and observability tools - Strong programming skills in languages like Java, Python, or Go

Interview Questions

  • What is opentelemetry and how does it differ from other monitoring tools? (basic)
  • How would you set up opentelemetry in a microservices architecture? (medium)
  • Can you explain the benefits of distributed tracing in opentelemetry? (medium)
  • Describe how sampling works in opentelemetry. (medium)
  • How would you troubleshoot performance issues using opentelemetry data? (advanced)
  • Explain the role of exporters in opentelemetry. (basic)
  • What are the key components of an opentelemetry instrumentation library? (medium)
  • How does opentelemetry handle context propagation between services? (medium)
  • Can you explain the concept of spans and traces in opentelemetry? (basic)
  • How would you integrate opentelemetry with a logging framework? (medium)
  • Describe the process of creating custom metrics in opentelemetry. (advanced)
  • What are the common challenges faced when implementing opentelemetry in a large-scale system? (advanced)
  • How does opentelemetry handle data collection in a multi-tenant environment? (advanced)
  • What are the best practices for securing opentelemetry data transmissions? (advanced)
  • Can you explain the role of the opentelemetry collector in data processing? (medium)
  • How would you monitor the performance of opentelemetry itself? (advanced)
  • Describe a scenario where opentelemetry helped improve the performance of a system. (advanced)
  • How does opentelemetry handle sampling in a distributed system? (medium)
  • What are the key differences between opentelemetry and other APM tools? (medium)
  • How can opentelemetry be integrated with containerized applications? (medium)
  • Explain the concept of baggage in opentelemetry context propagation. (medium)
  • How would you handle log correlation with opentelemetry traces? (advanced)
  • Can you share your experience with migrating from a different monitoring tool to opentelemetry? (advanced)
  • What are the key considerations for scaling opentelemetry in a growing infrastructure? (advanced)
  • How would you contribute to the opentelemetry open-source project? (advanced)

Conclusion

As you prepare for opentelemetry job interviews in India, make sure to brush up on your technical knowledge, practice coding exercises, and familiarize yourself with common interview questions. With the right skills and preparation, you can confidently pursue a rewarding career in this exciting field. Good luck!

cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies