Jobs
Interviews

479 Opentelemetry Jobs - Page 6

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

50.0 years

0 Lacs

Noida, Uttar Pradesh, India

On-site

Who We Are For over 50 years, we have worked closely with investment and asset managers to become the world’s leading provider of integrated investment management solutions. We are 3,000+ colleagues with a broad range of nationalities, educations, professional experiences, ages, and backgrounds in general. SimCorp is an independent subsidiary of the Deutsche Börse Group. Following the recent merger with Axioma, we leverage the combined strength of our brands to provide an industry-leading, full, front-to-back offering for our clients. SimCorp is an equal-opportunity employer. We are committed to building a culture where diverse perspectives and expertise are integrated into our everyday work. We believe in the continual growth and development of our employees, so that we can provide best-in-class solutions to our clients. WHY THIS ROLE IS IMPORTANT TO US We are building the architectural foundation for our next-generation SaaS platform—scalable, observable, and operable by design. A key part of this journey is defining a Control Plane : a unified set of capabilities to manage infrastructure, deployments, service discovery, and operational insights across a complex landscape. The System Architect will play a central role in designing how observability, operational control, and deployment models work together across the stack. While the emphasis is on platform and infrastructure architecture, the role also interfaces closely with software architecture—ensuring frameworks and libraries support consistency in telemetry, service integration, and lifecycle management. What You Will Be Responsible For Design Control Plane Architecture Help define and evolve the architecture that enables lifecycle management of services, infrastructure, and operational data across cloud and on-prem environments. Shape the Observability Stack Lead architectural direction for our observability approach using OpenTelemetry and related tools, including centralized vs. decentralized data collection strategies. Integrate with Platform & Runtime Align observability and control functions with infrastructure as code, policy management, service configuration, and runtime orchestration (e.g., Kubernetes, Azure). Support Consistent Service Telemetry Ensure system-level observability is aligned with the frameworks and libraries used by development teams—supporting consistency and developer productivity. Collaborate Across Architecture Domains Work alongside software, infrastructure, identity, and security architects to ensure alignment across boundaries, while maintaining focus on operational and delivery needs. Drive Deployment View and Roadmaps Own the definition of “desired state” architecture and operational views; work with engineering teams to translate vision into roadmap and implementation plans. Contribute to Governance and Standards Participate in architecture forums, technical design reviews, and shared documentation practices that drive convergence and quality across teams. What We Value Most importantly, you can see yourself contributing and thriving in the position described above. How you gained the skills needed for doing that is less important. We expect you to be good at several of the following and be able to - and interested in - learning the rest. Understand Both Infra and App Layers You’re grounded in platform and infrastructure concepts, but understand enough about modern application architecture to collaborate effectively on end-to-end designs. Design for Operability You care deeply about how systems behave in production—and help shape patterns that support reliability, scalability, and insight. Communicate and Align You can engage peers across technical domains and influence without needing direct authority, using structure, clarity, and empathy. Balance Vision with Delivery You work iteratively, breaking down ambitious ideas into steps that move us forward with measurable impact. Thrive in Complexity You are comfortable in large-scale environments with many moving parts, and you bring coherence where others see fragmentation. Benefits Attractive salary, bonus scheme, and pension are essential for any work agreement. However, in SimCorp we believe we can offer more. Therefore, in addition to the traditional benefit scheme, we provide a good work and life balance: flexible working hours and a hybrid model. And opportunities for professional development: there is never justonly one route - we offer an individual approach to professional development to support the direction you want to take. NEXT STEP Please send us your application in English via our career site as soon as possible, we process incoming applications continually. Please note that only applications sent through our system will be processed. At SimCorp, we recognize that bias can unintentionally occur in the recruitment process. To uphold fairness and equal opportunities for all applicants, we kindly ask you to exclude personal data such as photo, age, or any non-professional information from your application. Thank you for aiding us in our endeavor to mitigate biases in our recruitment process. If you are interested in being a part of SimCorp but are not sure this role is suitable, submit your CV anyway. SimCorp is on an exciting growth journey, and our Talent Acquisition Team is ready to assist you discover the right role for you. The approximate time to consider your CV is three weeks. We are eager to continually improve our talent acquisition process and make everyone’s experience positive and valuable. Therefore, during the process we will ask you to provide your feedback, which is highly appreciated.

Posted 3 weeks ago

Apply

2.0 years

0 Lacs

Sahibzada Ajit Singh Nagar, Punjab, India

On-site

Role : AI Developer - Agentic AI Exp: 2-3 Years Work Mode: 12- 10 pm, Onsite( Mohali, Punjab) Job Role & Responsibilities Design, develop, and deploy Agentic AI systems capable of autonomous task execution by integrating reasoning, memory, and tool use to enable intelligent behavior across complex, multi-step workflows. Architect intelligent agents that can dynamically interact with APIs, data sources, and third-party tools to accomplish diverse objectives with minimal human intervention. Optimize performance of agentic frameworks by enhancing model accuracy, minimizing response latency, and ensuring scalability and reliability in real-world applications. Develop reusable, testable, and production-grade code , adhering to best practices in software engineering and modern AI development workflows. Collaborate with cross-functional teams , including product managers, designers, and backend engineers, to convert business requirements into modular agent behaviors. Integrate Retrieval-Augmented Generation (RAG) , advanced NLP techniques, and knowledge graph structures to improve decision-making and contextual awareness of agents. Conduct rigorous profiling, debugging, and performance testing of agent workflows to identify bottlenecks and improve runtime efficiency. Write and maintain comprehensive unit, integration, and regression tests to validate agent functionality and ensure robust system performance. Continuously enhance codebases , refactor existing modules, and adopt new design patterns to accommodate evolving agentic capabilities and improve maintainability. Implement secure, fault-tolerant, and privacy-compliant designs to ensure that deployed agentic systems meet enterprise-grade reliability and data protection standards. Qualification Required: Bachelor's degree in computer science , or related field. Specialization or Certification in AI or ML is a plus. Technical Expertise: 2+ years of hands-on experience in AI/ML/DL projects, with a strong emphasis on Natural Language Processing (NLP) , Named Entity Recognition (NER) , and Text Analytics . Proven ability to design and deploy Agentic AI systems -autonomous, goal-oriented agents that exhibit reasoning, memory retention, tool use, and execution of multi-step tasks. Practical expertise in agent architecture , task decomposition, and seamless integration with external APIs, databases, and tools to enhance agent capabilities. Skilled in agent prompting strategies , including dynamic prompt chaining and context management, to guide language models through intelligent decision-making workflows. Experience with Retrieval-Augmented Generation (RAG) pipelines and generative AI , with a strong focus on optimizing NLP models for low-latency, high-accuracy production use. Solid foundation in deep learning methods , recommendation engines, and AI applications within HR or similar domains. Exposure to Reinforcement Learning (RL) frameworks and holds relevant certifications or specializations in Artificial Intelligence , showcasing continuous learning and depth in the field. Minimum skills we look for: Skills & Expertise (with Agentic AI focus) Proven experience in building Agentic AI systems , including autonomous agents capable of multi-step reasoning, memory management, and tool use. Expertise in agent design patterns , task decomposition , dynamic planning, and decision-making logic using LLMs. Skilled in integrating multi-agent coordination , goal-setting, and feedback loops to create adaptive, evolving agent behavior. Strong command over prompt engineering , contextual memory structuring , and tool calling mechanisms within LLM-powered agent workflows. Proficiency in managing agent memory (short-term, long-term, episodic) using vector databases and custom memory stores. Ability to build autonomous task execution pipelines with minimal human input, combining language models, APIs, and third-party tools. Experience with frameworks and orchestration for agent behavior tracing, logging, and failure recovery . Tools & Technologies – Agentic AI Agentic Frameworks : LangChain, CrewAI, AutoGen, AutoGPT, BabyAGI – for building, managing, and orchestrating intelligent agents. LLM APIs : OpenAI (GPT-4/3.5), Anthropic (Claude), Cohere, Hugging Face Transformers. Memory & Vector Databases : FAISS, Weaviate, Pinecone, Chroma – for embedding-based agent memory and contextual retrieval. Prompt Management Tools : PromptLayer, LangSmith – for testing, evaluating, and refining agent prompts and traces. RAG & Context Enrichment : LangChain RAG pipelines, Haystack, Milvus. Autonomy Infrastructure : Docker, FastAPI, Redis, Celery – for building scalable agent runtimes. Observability : OpenTelemetry, Langfuse (or similar) for tracing agent decisions, failures, and success metrics. Testing Agentic Behavior : Integration with PyTest + mock APIs/tools to validate autonomous decision logic and fallback strategies.

Posted 3 weeks ago

Apply

7.0 years

4 - 8 Lacs

Gurgaon

On-site

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together. We are looking for a highly skilled Lead Site Reliability Engineer (SRE) to join our newly established team in India. As a Lead SRE, you will be responsible for ensuring the reliability, performance, and scalability of our systems. You will lead and contribute to key projects, including performance testing, CI/CD tooling, and facilitating infrastructure and application migrations while working closely with both the India team and the existing SRE team in the United States. Primary Responsibilities: System Reliability: Ensure the availability, performance, and scalability of critical systems by implementing best practices in site reliability engineering Observability & Telemetry: Drive the design and evolution of observability systems by building scalable, extensible solutions using OpenTelemetry (OTEL) and other modern observability tools. Champion innovation in monitoring, distributed tracing, and logging strategies to provide deep visibility into system behavior. Continuously evaluate and integrate emerging technologies to improve observability maturity and reduce mean time to detect (MTTD) and resolve (MTTR) Project Leadership: Lead and contribute to projects such as performance testing, CI/CD tooling, and infrastructure/application migrations with focus to migrate from on-prem to cloud solutions Incident Response: Actively participate in incident response, troubleshooting, and post-mortem analysis to identify root causes and prevent future occurrences Automation and Tooling: Develop and maintain automation tools to reduce manual effort, streamline processes, and enhance system reliability Collaboration: Work closely with other SREs, engineers, and stakeholders across time zones to align on goals, strategies, and ensure smooth project execution Continuous Improvement: Identify opportunities to improve system reliability, performance, and operational efficiency, and implement changes as needed Mentorship: Provide guidance and mentorship to junior engineers on the team, fostering a culture of learning and growth AI Driven Operations: Leverage AI-powered tools and platforms to enhance observability, incident response, and operational efficiency. Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so Required Qualifications: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience) 7+ years of experience in Site Reliability Engineering, DevOps, or a similar role Proven experience architecting and implementing observability platforms using OpenTelemetry, Datadog, Splunk, Grafana, or similar tools. Demonstrated ability to innovate in this space-whether by building custom telemetry pipelines, integrating AI/ML for anomaly detection, or developing new approaches to visualize and interpret system health CI/CD: Experience with CI/CD tools like Jenkins, GitHub Actions, and related automation pipelines Containers & Orchestration: Experience with Docker and Kubernetes Cloud Platforms: Solid knowledge of public cloud platforms, preferrably Azure, and expertise in On-Prem to Cloud migrations. Technical Expertise: Deep understanding of systems architecture, cloud infrastructure, networking, and automation tools Automation Skills: Proven solid in scripting/programming skills (Python, Go, Powershell, Bash, etc), and experience with infrastructure-as-code tools like Terraform and Ansible Problem-Solving: Proven excellent problem-solving skills, with experience in incident management, troubleshooting, and root cause analysis Collaboration: Proven excellent communication and collaboration skills, with the ability to work effectively in a distributed team across time zones AI Tools: Proven exposure to AI tools and their application in SRE workflows for faster delivery and smarter operations Preferred Qualifications: Experience working in a global or distributed team environment Industry experience in Payments, Fintech, or Healthcare Knowledge of security best practices in cloud and distributed systems At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone–of every race, gender, sexuality, age, location and income–deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes — an enterprise priority reflected in our mission.

Posted 3 weeks ago

Apply

3.0 - 7.0 years

0 Lacs

karnataka

On-site

As a DevOps Engineer at Facility & Energy Management, you will be an integral part of a dedicated team of innovators and problem solvers working towards sustainable impact. You will play a crucial role in crafting cutting-edge software solutions that help facilities optimize energy consumption, reduce costs, and enhance operational efficiency. If you have a passion for technology and delivering high-quality software solutions, this opportunity is for you! In this role, you will be responsible for designing and implementing comprehensive infrastructure plans and strategies. Your collaboration with cross-functional teams, including product owners, designers, and architects, will ensure alignment and innovation in software development. Additionally, you will have the opportunity to mentor junior DevOps engineers, fostering their growth and ensuring adherence to best practices and DevOps standards. You will work in a dynamic environment with more than 30 colleagues, striving to create market-leading software solutions for Facility & Energy Management. Your work will involve utilizing cutting-edge technologies and processes such as Kubernetes, Docker, AI, Pair Programming, Mob Programming, Continuous Integration, Continuous Learning, and Microservices. Key Responsibilities: - Develop and maintain infrastructure as code (IaC) using tools like Terraform & Azure Bicep - Implement and maintain Continuous Integration and Continuous Delivery (CI/CD) pipelines (Github CI) - Perform system monitoring using Prometheus and Grafana - Manage log management and distributed tracing with Tempo, Azure Application Insights, OpenTelemetry, and Grafana - Manage cloud environments, primarily Azure with exposure to AWS - Utilize Docker and Kubernetes for containerization and orchestration Ideal Candidate Requirements: - Bachelor's degree or higher in computer science/information science or equivalent - Experience with Docker and Kubernetes - Proficiency in SQL Server, Azure Database, and PostgreSQL - Solid understanding of infrastructure and automation programming Nice-to-Have Skills: - Experience with C# development on .Net tech stack - Familiarity with microservices architecture - Experience with Kafka - Experience in setting up and monitoring Azure App Services, Function App & Logic App If you are ready to dive into the world of cutting-edge technology, drive innovation at lightning speed, and contribute to a sustainable and efficient future, we would love to connect with you for this exciting opportunity. Apply now and be part of a team that is shaping the future of Facility & Energy Management software solutions.,

Posted 3 weeks ago

Apply

10.0 years

0 Lacs

Gurgaon, Haryana, India

On-site

JD - Director of DevOps and Cloud Operations About Us Infra360 is an emerging global leader in cloud consulting that specializes in innovative cloud-native solutions and exceptional customer service. We partner with clients to modernize and optimize their cloud, ensuring resilience, scalability, cost efficiency and innovation. Our core services include Cloud Strategy, Site Reliability Engineering (SRE), DevOps, Cloud Security Posture Management (CSPM), and related Managed Services. We specialize in driving operational excellence across multi-cloud environments, helping businesses achieve their goals with agility and reliability. We thrive on ownership, collaboration, problem-solving, and excellence, fostering an environment where innovation and continuous learning are at the forefront. Join us as we expand and redefine what’s possible in cloud technology and infrastructure. Role Summary The Director of DevOps and Cloud Operations will lead and scale Infra360’s technology team, driving growth, operational excellence, and client success. The role involves strategic leadership, project management, and delivering innovative solutions in cloud, DevOps, SRE, and security. The ideal candidate will foster a culture of collaboration and innovation while ensuring high-quality service delivery and identifying opportunities to expand client engagements. Key Responsibilities Leadership & People Management: Lead, mentor, and grow a team of engineers, scaling the team from 10 to 50. Foster a culture of innovation, collaboration, ownership, and excellence. Oversee talent acquisition, retention, and professional development within the team. Time Management: Prioritize tasks effectively to balance strategic initiatives, team management, and client interactions. Accountability: Take ownership of deliverables and decisions, ensuring alignment with company goals and values. Pressure Handling: Maintain composure under pressure and manage competing priorities effectively. Technology Operations: Requirement Gathering & Statement of Work (SOW) Creation: Client Needs Analysis: As and when required, conduct detailed requirement-gathering sessions with clients to understand their objectives, pain points, and technical needs. Audit Facilitation: Coordinate with the tech team to perform cloud audits, identifying areas for cost optimization, security improvements, and enhanced reliability. SOW Creation: As and when required, draft and finalize comprehensive Statements of Work (SOW) that clearly outline deliverables, timelines, and expectations. Should be able to participate in client discovery calls actively Client & Resource Onboarding: SOW Understanding: Thoroughly review and understand the SOW, including scope, deliverables, timelines, milestones, and SLAs to own the whole process Resource Allocation & Onboarding: Identify and onboard the right resources for the project, ensuring team members are briefed on client requirements, project scope, and deliverables. Stakeholder Alignment: Ensure alignment with clients and internal teams on all aspects of the SOW to avoid scope creep and ensure clear expectations. Onboarding Process: Develop and execute a structured client onboarding process, ensuring a smooth transition and setup of services. Access & Tools Setup: Facilitate timely access to client environments, tools, and necessary documentation for the team. Documentation: Provide regular documentation on service usage, reporting, and escalation processes. Project & Operations Management: Project Monitoring: Weekly sprint planning with clients and daily stand-up calls with project teams to ensure timely delivery, quality, and efficiency of team members Work Review & Oversight: Regularly review team members’ work and technical approaches to ensure alignment with best practices. Quality Assurance: Implement processes to maintain high-quality standards across all deliverables. Delivery Excellence: Ensure timely and successful delivery of projects, meeting client expectations and SLAs. Ensuring progress according to SOW and achieving milestones Client Engagement & Stakeholder Management: Monthly SOW progress & achievements to get the sign-off through feedback integrations Regular Client Meetings: Schedule and conduct weekly/bi-weekly meetings with clients to discuss project progress, address concerns, and gather feedback. Client Rapport Building: Establish and maintain strong relationships with clients through proactive engagement and communication Act as a subject matter expert to clients, helping them achieve their cloud and infrastructure goals. Technical Content & Marketing Support: Case Study Development: Provide technical insights and content for creating impactful case studies that highlight successful client engagements and solutions. Architecture Diagrams: Design and deliver detailed architecture diagrams to visually represent technical solutions for marketing and sales materials. Collaboration with Marketing: As and when required, work with the marketing team to ensure technical accuracy and relevance in promotional content, showcasing the company’s expertise. Strategic Planning & Upselling: Account Growth Strategy: Develop and execute strategies to expand service offerings within existing client accounts. Client Needs Assessment: Regularly engage with clients to identify evolving needs and opportunities for additional services in cloud, DevOps, SRE, and security. Service Expansion: Identify and introduce premium services, add-ons, or long-term engagements that enhance client outcomes. Cross-Selling Opportunities: Collaborate with internal teams to bundle services and present holistic solutions. Process Optimization & Innovation: Process Standardization: Identify areas for improvement and implement standardized processes across projects to enhance efficiency and consistency. Automation: Leverage automation tools and frameworks to streamline repetitive tasks and improve operational workflows. Continuous Improvement: Foster a culture of continuous improvement by encouraging feedback, conducting regular process reviews, and implementing best practices. Innovation Initiatives: Drive innovation by introducing new tools, technologies, and methodologies that align with business goals and client needs. Metrics & KPIs: Define and track key performance indicators (KPIs) to measure process effectiveness and drive data-driven decisions. Requirements Technical Skills of Ideal Candidate: Technical Expertise: Deep knowledge of Infrastructure, Cloud, DevOps, SRE, Database Management, Observability, and Cybersecurity services. Solid 10+ years of experience as an SRE and DevOps with a proven track record of handling large-scale production environments Strong Experience with Databases (PostgreSQL, MongoDB, ElasticSearch, Kafka) Hands-on experience with ELK or other logging and observability tools Hands-on experience with Prometheus, Grafana & Alertmanager and on-call processes like Pagerduty Strong with skills - K8s, Terraform, Helm, ArgoCD, AWS/GCP/Azure etc Good with Python/Go Scripting Automation Strong with fundamentals like DNS, Networking, Linux Experience with APM tools like - Newrelic, Datadog, and OpenTelemetry Good experience with Incident Response, Incident Management, Writing detailed RCAs Experience with Git and coding best practices Solutioning & Architecture: Proven ability to design, implement, and optimize end-to-end cloud solutions, following well-architected frameworks and best practices. Leadership & Team Management: Demonstrated success in scaling teams, fostering a collaborative and innovative work culture, and mentoring talent to achieve excellence. Problem-Solving & Innovation: Strong analytical skills to understand complex client needs and deliver creative, scalable, and impactful solutions. Project & Stakeholder Management: Expertise in project planning, execution, and stakeholder management, ensuring alignment with business objectives and client expectations. Effective Communication: Exceptional verbal and written communication skills to engage with clients, teams, and stakeholders effectively. Documentation & Organization: Ability to maintain well-organized, structured documentation and adhere to standardized folder structures. Attention to Detail & Follow Through Consistently capture key points, action items, and follow-ups during meetings and ensure timely execution. Time Management & Prioritization: Strong time management skills, with the ability to balance multiple priorities, meet deadlines, and optimize productivity. Task Tracking & Accountability: Maintain a personal task tracker to manage work priorities, monitor progress, and ensure accountability. Results-Driven & Growth Mindset: A proactive, results-oriented approach with a focus on continuous learning and improvement. Qualifications: Experience: 12+ years in technology operations, with at least 5 years in a leadership role, managing teams and delivering complex solutions. Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

Posted 3 weeks ago

Apply

6.0 years

0 Lacs

Greater Kolkata Area

On-site

Line of Service Advisory Industry/Sector Not Applicable Specialism Microsoft Management Level Senior Associate Job Description & Summary At PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients. These individuals combine technical experience with creative thinking to deliver innovative software products and solutions. Those in software engineering at PwC will focus on developing innovative software solutions to drive digital transformation and enhance business performance. In this field, you will use your knowledge to design, code, and test cutting-edge applications that revolutionise industries and deliver exceptional user experiences. *Why PWC At PwC, you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities. This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life. Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more about us . At PwC, we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law. We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. " Responsibilities: Frontend Development: Build dynamic, responsive interfaces using React.js, Next.js, and Tailwind CSS. • Backend Development: Design and implement scalable APIs using FastAPI, Node.js, and Express.js. • LLM Integration: Integrate APIs from GPT-4o (Azure), Claude, LLaMA, or open-source models using Ollama or Hugging Face. • Agent Workflow Design: Develop multi-agent logic using LangGraph, Semantic Kernel, or AutoGen for intelligent task execution. • RAG Implementation: Build Retrieval-Augmented Generation pipelines with Qdrant, ChromaDB, or Azure AI Search. • Embedding & Vector Management: Generate and manage embeddings from unstructured data (text, PDFs, charts, etc.). • Fine-Tuning: Set up LoRA/QLoRA pipelines to fine-tune LLMs on proprietary data using Azure ML or Hugging Face Transformers. • Prompt Engineering: Design, test, and optimize prompts with tools like Promptfoo, OpenEval, or DeepEval. • Testing Automation: Implement test suites using PyTest, Postman, Playwright, and prompt validation tools. • Deployment & Scaling: Deploy AI microservices using Docker, Azure App Services, or AKS; ensure system scalability. • CI/CD & Infra-as-Code: Build CI/CD pipelines via GitHub Actions or Azure DevOps; manage infra using Terraform. • Monitoring & Observability: Ensure uptime and performance through Grafana, OpenTelemetry, and production monitoring best practices. Skill sets required: • Strong understanding of the .NET Framework, .NET Core; proficiency in C# • Familiarity with Web API development and RESTful services • Experience with Entity Framework or ADO.NET for data access • Strong skills in SQL; ability to design and optimize queries and work with databases like SQL Server • 5–6 years of full-stack development experience (React.js + Node.js/FastAPI) • Strong understanding of GenAI tools, LLM APIs, and model pipelines • Proficiency in deploying and monitoring AI applications in Azure cloud • Experience with RAG systems, vector DBs (Qdrant, Chroma), and embeddings • Solid foundation in prompt engineering, testing, and evaluation techniques • Hands-on experience with CI/CD, Docker, and observability stacksUnderstanding of common design patterns and best practices in software architecture • Exposure to Agile methodology Certifications/Credentials • Microsoft Certified: Azure AI Engineer Associate (AI-102) • AZ-204: Azure Developer Associate. Mandatory skill sets: NodeJS/React and LLM Preferred skill sets: Hands-on experience with CI/CD, Docker, and observability stacksUnderstanding of common design patterns and best practices in software architecture Years of experience required: 3-6 Years Education qualification: B.E./B.Tech/M.Tech Education (if blank, degree and/or field of study not specified) Degrees/Field of Study required: Bachelor of Engineering, Master of Engineering Degrees/Field of Study preferred: Certifications (if blank, certifications not specified) Required Skills Node.js Optional Skills Acceptance Test Driven Development (ATDD), Acceptance Test Driven Development (ATDD), Accepting Feedback, Active Listening, Analytical Thinking, Android, API Management, Appian (Platform), Application Development, Application Frameworks, Application Lifecycle Management, Application Software, Business Process Improvement, Business Process Management (BPM), Business Requirements Analysis, C#.NET, C++ Programming Language, Client Management, Code Review, Coding Standards, Communication, Computer Engineering, Computer Science, Continuous Integration/Continuous Delivery (CI/CD), Creativity {+ 46 more} Desired Languages (If blank, desired languages not specified) Travel Requirements Available for Work Visa Sponsorship? Government Clearance Required? Job Posting End Date

Posted 3 weeks ago

Apply

6.0 years

0 Lacs

Greater Kolkata Area

On-site

Line of Service Advisory Industry/Sector Not Applicable Specialism Microsoft Management Level Senior Associate Job Description & Summary At PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients. These individuals combine technical experience with creative thinking to deliver innovative software products and solutions. Those in software engineering at PwC will focus on developing innovative software solutions to drive digital transformation and enhance business performance. In this field, you will use your knowledge to design, code, and test cutting-edge applications that revolutionise industries and deliver exceptional user experiences. *Why PWC At PwC, you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities. This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life. Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more about us . At PwC, we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law. We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. " Responsibilities: Frontend Development: Build dynamic, responsive interfaces using React.js, Next.js, and Tailwind CSS. • Backend Development: Design and implement scalable APIs using FastAPI, Node.js, and Express.js. • LLM Integration: Integrate APIs from GPT-4o (Azure), Claude, LLaMA, or open-source models using Ollama or Hugging Face. • Agent Workflow Design: Develop multi-agent logic using LangGraph, Semantic Kernel, or AutoGen for intelligent task execution. • RAG Implementation: Build Retrieval-Augmented Generation pipelines with Qdrant, ChromaDB, or Azure AI Search. • Embedding & Vector Management: Generate and manage embeddings from unstructured data (text, PDFs, charts, etc.). • Fine-Tuning: Set up LoRA/QLoRA pipelines to fine-tune LLMs on proprietary data using Azure ML or Hugging Face Transformers. • Prompt Engineering: Design, test, and optimize prompts with tools like Promptfoo, OpenEval, or DeepEval. • Testing Automation: Implement test suites using PyTest, Postman, Playwright, and prompt validation tools. • Deployment & Scaling: Deploy AI microservices using Docker, Azure App Services, or AKS; ensure system scalability. • CI/CD & Infra-as-Code: Build CI/CD pipelines via GitHub Actions or Azure DevOps; manage infra using Terraform. • Monitoring & Observability: Ensure uptime and performance through Grafana, OpenTelemetry, and production monitoring best practices. Skill sets required: • Strong understanding of the .NET Framework, .NET Core; proficiency in C# • Familiarity with Web API development and RESTful services • Experience with Entity Framework or ADO.NET for data access • Strong skills in SQL; ability to design and optimize queries and work with databases like SQL Server • 5–6 years of full-stack development experience (React.js + Node.js/FastAPI) • Strong understanding of GenAI tools, LLM APIs, and model pipelines • Proficiency in deploying and monitoring AI applications in Azure cloud • Experience with RAG systems, vector DBs (Qdrant, Chroma), and embeddings • Solid foundation in prompt engineering, testing, and evaluation techniques • Hands-on experience with CI/CD, Docker, and observability stacksUnderstanding of common design patterns and best practices in software architecture • Exposure to Agile methodology Certifications/Credentials • Microsoft Certified: Azure AI Engineer Associate (AI-102) • AZ-204: Azure Developer Associate. Mandatory skill sets: NodeJS/React and LLM Preferred skill sets: Hands-on experience with CI/CD, Docker, and observability stacksUnderstanding of common design patterns and best practices in software architecture Years of experience required: 3-6 Years Education qualification: B.E./B.Tech/M.Tech Education (if blank, degree and/or field of study not specified) Degrees/Field of Study required: Master of Engineering, Bachelor of Engineering Degrees/Field of Study preferred: Certifications (if blank, certifications not specified) Required Skills Node.js Optional Skills Acceptance Test Driven Development (ATDD), Acceptance Test Driven Development (ATDD), Accepting Feedback, Active Listening, Analytical Thinking, Android, API Management, Appian (Platform), Application Development, Application Frameworks, Application Lifecycle Management, Application Software, Business Process Improvement, Business Process Management (BPM), Business Requirements Analysis, C#.NET, C++ Programming Language, Client Management, Code Review, Coding Standards, Communication, Computer Engineering, Computer Science, Continuous Integration/Continuous Delivery (CI/CD), Creativity {+ 46 more} Desired Languages (If blank, desired languages not specified) Travel Requirements Available for Work Visa Sponsorship? Government Clearance Required? Job Posting End Date

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Hyderābād

Remote

Your opportunity: Organizations around the world depend on New Relic to help them build better software using data, not opinions. One of the fundamental pillars of New Relic’s observability platform is Logs. The first step in understanding your system is sending logs and other telemetry to New Relic. Thousands of New Relic customers rely on their logs to detect and resolve issues by analyzing many petabytes of log data that is ingested, processed, transformed, analyzed, and enriched by New Relic’s Log Streaming services. If you are an experienced engineer interested in contributing to a team responsible for owning and improving the ingestion and processing experiences for New Relic Logs, please read on! What you’ll do: Lead the design, development, and enhancement of core features for Log streaming Platform addressing both functional and non-functional requirements. Continuously improve the reliability, cost-efficiency, and quality of the Log Platform. Collaborate cross-functionally to design and implement robust, scalable, and efficient systems meeting evolving customer needs. Partner with product managers, designers, and stakeholders to translate requirements into technical solutions, championing best practices and collaboration. Master the architecture and components of the Log Platform. Lead the end-to-end development of key platform features, ensuring their success. Make significant contributions to improving platform reliability. Actively contribute to and review code across platform components. Thoroughly understand and apply organizational security and quality standards. Achieve autonomy in on-call and support responsibilities. This role requires Bachelor’s degree, software development, engineering, or a related technical field 5+ years of software engineering experience with Proficiency in Java and/or Go, strong CS fundamentals and adaptability to new languages. Proven experience with public clouds (AWS, Azure, GCP) and cloud-native tech (e.g., Kubernetes, Docker, Helm, Kafka, OpenTelemetry, serverless), plus a passion for new technologies. Experience designing, building, and maintaining large-scale software systems. Strong understanding of scalable distributed systems and microservices architecture. Exposure to AI/ML, especially applying technologies like LLMs for data analysis or feature integration. Bonus points if you have: Experience with real-time data streaming and processing services. Experience with high-throughput data pipelines and distributed systems. Experience with Observability products, particularly in the SaaS space. Fostering a diverse, welcoming and inclusive environment is important to us. We work hard to make everyone feel comfortable bringing their best, most authentic selves to work every day. We celebrate our talented Relics’ different backgrounds and abilities, and recognize the different paths they took to reach us – including nontraditional ones. Their experiences and perspectives inspire us to make our products and company the best they can be. We’re looking for people who feel connected to our mission and values, not just candidates who check off all the boxes. If you require a reasonable accommodation to complete any part of the application or recruiting process, please reach out to resume@newrelic.com. We believe in empowering all Relics to achieve professional and business success through a flexible workforce model. This model allows us to work in a variety of workplaces that best support our success, including fully office-based, fully remote, or hybrid. Our hiring process In compliance with applicable law, all persons hired will be required to verify identity and eligibility to work and to complete employment eligibility verification. Note: Our stewardship of the data of thousands of customers’ means that a criminal background check is required to join New Relic. We will consider qualified applicants with arrest and conviction records based on individual circumstances and in accordance with applicable law including, but not limited to, the San Francisco Fair Chance Ordinance. Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. New Relic does not accept unsolicited headhunter and agency resumes, and will not pay fees to any third-party agency or company that does not have a signed agreement with New Relic. Candidates are evaluated based on qualifications, regardless of race, religion, ethnicity, national origin, sex, sexual orientation, gender expression or identity, age, disability, neurodiversity, veteran or marital status, political viewpoint, or other legally protected characteristics. Review our Applicant Privacy Notice at https://newrelic.com/termsandconditions/applicant-privacy-policy

Posted 3 weeks ago

Apply

5.0 years

10 - 20 Lacs

India

On-site

As the Senior DevOps Engineer focused on Observability, you will set observability standards, lead automation efforts and mentor engineers ensuring all monitoring and Datadog configuration changes are implemented Infrastructure-as-Code (IaC). You will lead the design and management of a code-driven Datadog observability platform, providing end-to-end visibility into Java applications, Kubernetes workloads and containerized infrastructure. This role emphasizes cost-effective observability at scale requiring deep expertise in Datadog monitoring, logging, tracing and optimization techniques. You'll collaborate closely with SRE, DevOps and Software Engineering teams to standardize monitoring and logging practices to deliver scalable, reliable and cost-efficient observability solutions. This is a hands-on engineering role focused on observability-as-code. All monitoring, logging, alerting, and Datadog configurations are defined and managed through Terraform, APIs and CI/CD workflows — not manual configuration in the Datadog UI. PRIMARY RESPONSIBILITIES: Own and define observability standards for Java applications, Kubernetes workloads and cloud infrastructure Configure and manage the Datadog platform using Terraform and Infrastructure-as-Code (IaC) best practices Drive adoption of structured JSON logging, distributed tracing and custom metrics across Java and Python services Optimize Datadog usage through cost governance, log filtering, sampling strategies and automated reporting Collaborate closely with Java developers and platform engineers to standardize instrumentation and alerting Troubleshoot and resolve issues with missing or misconfigured logs, metrics and traces, working with developers to ensure proper instrumentation and data flow into Datadog Involve in incident response efforts using Datadog insights for actionable alerting, root cause analysis (RCA) and reliability improvements Serve as the primary point of contact for Datadog-related requests, supporting internal teams with onboarding, integration and usage questions Continuously audit and tune monitors for alert quality, reducing false positives and improving actionable signal detection Maintain clear internal documentation on Datadog usage, standards, integrations and IaC workflows Evaluate and propose improvements to the observability stack, including new Datadog features, OpenTelemetry adoption and future architecture changes Mentor engineers and develop internal training programs on Datadog, observability-as-code and modern log pipeline architecture QUALIFICATIONS: Bachelor’s degree in Computer Science, Engineering, Mathematics, Physics or a related technical field 5+ years of experience in DevOps, Site Reliability Engineering, or related roles with a strong focus on observability and infrastructure as code Hands-on experience managing and scaling Datadog programmatically using code-based workflows (e.g. Terraform, APIs, CI/CD) Deep expertise in Datadog including APM, logs, metrics, tracing, dashboards and audit trails Proven experience integrating Datadog observability into CI/CD pipelines (e.g. GitLab CI, AWS CodePipeline, GitHub Actions) Solid understanding of AWS services and best practices for monitoring services on Kubernetes infrastructure Strong background in Java application development is preferred Job Types: Full-time, Permanent, Contractual / Temporary Contract length: 12 months Pay: ₹1,000,000.00 - ₹2,000,000.00 per year Benefits: Paid sick time Schedule: Monday to Friday Night shift US shift Ability to commute/relocate: Musheerabad, Hyderabad, Telangana: Reliably commute or planning to relocate before starting work (Preferred) Education: Bachelor's (Preferred) Experience: DevOps: 5 years (Required) Language: English (Required) Location: Musheerabad, Hyderabad, Telangana (Preferred) Shift availability: Night Shift (Required) Work Location: In person Expected Start Date: 21/07/2025

Posted 3 weeks ago

Apply

0 years

0 Lacs

India

On-site

Role Overview We at viamagus are looking for an experienced and visionary Head of Engineerin g to lead, scale, and inspire our growing technology team. In this pivotal role, you will define the company’s technical strategy and engineering roadmap, working closely with founders and senior stakeholders to build a world‑class engineering culture that delivers high‑impact products and services. This is a hands‑on leadership position where you will ensure engineering excellence, drive fast iteration, and build scalable system s in a cloud‑first environment. If you are passionate about architecting robust solutions, mentoring teams, and balancing product innovation with service delivery, we want to hear from you . Key Responsibiliti esTechnical Strategy & Executi onDefine and drive the overall technology vision and roadma p.Establish engineering processes that enable rapid, high‑quality release cycles and continuous improvemen t.Champion modern best practices and foster a culture of innovation and experimentatio n.Architecture & Scalabili tyLead architectural design reviews to ensure systems are scalable, secure, and maintainabl e.Provide hands‑on technical guidance on key design decisions, code reviews, and infrastructure choice s.Oversee cloud deployment strategies, performance tuning, and cost optimizatio n.Team Leadership & Developme ntHire, mentor, and retain top engineering talent, building a high‑performance culture of ownership and collaboratio n.Develop engineers and future leaders through coaching, feedback, and clear growth path s.Cultivate an inclusive environment that values continuous learning and knowledge sharin g.Cross‑Functional Collaborati onPartner with Product, Design, QA, and Delivery teams to translate business requirements into robust technical solution s.Align engineering initiatives with broader company goals and client commitment s.Act as a technical representative in stakeholder meetings, customer pitches, and due‑diligence call s.Delivery & Operational Excellen ceOwn end‑to‑end execution of engineering projects, ensuring on‑time and on‑budget deliver y.Define and uphold standards for code quality, automated testing, CI/CD, and documentatio n.Lead incident management and root‑cause analysis for critical issues, driving continuous reliability improvement s.Product Lifecycle Ownersh ipGuide the team through the full product lifecycle—from concept and design to deployment and production suppor t.Ensure clear technical documentation, timely progress updates, and smooth handovers across all stage s.Measure and improve post‑launch success through observability, user feedback, and iterative enhancement s. Must‑Have Qualificati ons5–10 ye ars of software engineering experience, including several years in senior leadership (Engineering Manager, Head of Engineering, or simila r).Proven track record of building multiple applications from scratch to product ion, including post‑launch scaling and hardeni ng.Deep expertise in at least 2-3 of our core technologie s (Node.js, React/React Native, Python, My SQL) and solid competence in the re st.Hands‑on experience designing and deployi ng cloud‑based, scalable architectu res (AWS or Azure) with strong knowledge of networking, security, CI/CD, container runtimes, and cost optimizati on.Strong command of microservices, domain‑driven design, event sourcing, caching layers, and data model ingProficiency wi th observability and APM to ols (Datadog, New Relic, Grafana, OpenTelemetry) and the ability to turn metrics into actionable improvemen ts.Exceptional troubleshooting skills—comfortable diving into performance profiles, memory leaks, and distributed‑system edge cas es.Excellent written and verbal communication; able to align executives, engineers, and clients around a shared technical visi on.Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferre d). Good‑to‑Have / Bonus Sk illsPrior experience build ing agentic or AI‑driven prod ucts (LLM orchestration, vector databases, RAG pipelin es).Hands‑on knowledge of mobile C I/CD (Fastlane, EAS) and app‑store delivery workfl ows.Familiarity w ith infrastructure‑as‑ code (Terraform, Pulumi) and policy‑as‑ code (OPA, Sentin el).Exposure to data‑streaming tech (Kafka, Pulsar) and real‑time protocols (WebSockets, MQ TT).Experience guiding organizations thro ugh security/compli ance frameworks (SOC 2, ISO 270 01).Contributions to open‑source projects, tech conferences, or developer communit ies. Technical Stack & ToolsBac kend: Node.js (JavaScript/TypeScript) and Python for APIs and services (Express, NestJS, Django, Fl ask).Fron tend: React.js for web and React N ative for mobile; state management via Redux or Context API.Data base: RDS Databases Mysql, MsSql, PostgreSQL; familiarity with NoSQL stores (e.g., MongoDB) and ORM/ODM frameworks is a plus.Cloud & De vOps: AWS/Azure/GCP infrastructure, Docker, Kubernetes, and CI/CD pipelines (Jenkins, GitHub Actions). Monitoring with Datadog, CloudWatch, or sim ilar.

Posted 3 weeks ago

Apply

6.0 years

0 Lacs

Greater Kolkata Area

On-site

Line of Service Advisory Industry/Sector Not Applicable Specialism Microsoft Management Level Senior Associate Job Description & Summary At PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients. These individuals combine technical experience with creative thinking to deliver innovative software products and solutions. Those in software engineering at PwC will focus on developing innovative software solutions to drive digital transformation and enhance business performance. In this field, you will use your knowledge to design, code, and test cutting-edge applications that revolutionise industries and deliver exceptional user experiences. Why PWC At PwC, you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities. This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life. Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more about us. At PwC, we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law. We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. Job Description & Summary: A career within…. Responsibilities: Frontend Development: Build dynamic, responsive interfaces using React.js, Next.js, and Tailwind CSS. • Backend Development: Design and implement scalable APIs using FastAPI, Node.js, and Express.js. • LLM Integration: Integrate APIs from GPT-4o (Azure), Claude, LLaMA, or open-source models using Ollama or Hugging Face. • Agent Workflow Design: Develop multi-agent logic using LangGraph, Semantic Kernel, or AutoGen for intelligent task execution. • RAG Implementation: Build Retrieval-Augmented Generation pipelines with Qdrant, ChromaDB, or Azure AI Search. • Embedding & Vector Management: Generate and manage embeddings from unstructured data (text, PDFs, charts, etc.). • Fine-Tuning: Set up LoRA/QLoRA pipelines to fine-tune LLMs on proprietary data using Azure ML or Hugging Face Transformers. • Prompt Engineering: Design, test, and optimize prompts with tools like Promptfoo, OpenEval, or DeepEval. • Testing Automation: Implement test suites using PyTest, Postman, Playwright, and prompt validation tools. • Deployment & Scaling: Deploy AI microservices using Docker, Azure App Services, or AKS; ensure system scalability. • CI/CD & Infra-as-Code: Build CI/CD pipelines via GitHub Actions or Azure DevOps; manage infra using Terraform. • Monitoring & Observability: Ensure uptime and performance through Grafana, OpenTelemetry, and production monitoring best practices. Skill sets required: • Strong understanding of the .NET Framework, .NET Core; proficiency in C# • Familiarity with Web API development and RESTful services • Experience with Entity Framework or ADO.NET for data access • Strong skills in SQL; ability to design and optimize queries and work with databases like SQL Server • 5–6 years of full-stack development experience (React.js + Node.js/FastAPI) • Strong understanding of GenAI tools, LLM APIs, and model pipelines • Proficiency in deploying and monitoring AI applications in Azure cloud • Experience with RAG systems, vector DBs (Qdrant, Chroma), and embeddings • Solid foundation in prompt engineering, testing, and evaluation techniques • Hands-on experience with CI/CD, Docker, and observability stacksUnderstanding of common design patterns and best practices in software architecture • Exposure to Agile methodology Certifications/Credentials • Microsoft Certified: Azure AI Engineer Associate (AI-102) • AZ-204: Azure Developer Associate Mandatory skill sets: NodeJS/React and LLM Preferred skill sets: Hands-on experience with CI/CD, Docker, and observability stacksUnderstanding of common design patterns and best practices in software architecture Years of experience required: 3-6 Years Education qualification: B.E./B.Tech/M.Tech Education (if blank, degree and/or field of study not specified) Degrees/Field of Study required: Bachelor of Engineering, Bachelor of Technology Degrees/Field of Study preferred: Certifications (if blank, certifications not specified) Required Skills Node.js, React.js Optional Skills Docker (Software) Desired Languages (If blank, desired languages not specified) Travel Requirements Available for Work Visa Sponsorship? Government Clearance Required? Job Posting End Date

Posted 3 weeks ago

Apply

10.0 years

0 Lacs

India

On-site

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing. "DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC “The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. As a Lead/Sr. Lead Software - L4 Engineer , you’ll be the final escalation point for the most complex and critical issues affecting enterprise and hyperscale environments. This hands-on role is ideal for a deep technical expert who thrives under pressure and has a passion for solving distributed system challenges at scale. You’ll collaborate with other engineers, Product Management, and Field teams to drive root cause resolutions, define architectural best practices, and continuously improve product resiliency. Leveraging AI tools and automation, you’ll reduce time-to-resolution, streamline diagnostics, and elevate the support experience for strategic customers. Key Responsibilities Technical Expertise & Escalation Leadership Own critical customer case escalations end-to-end, including deep root cause analysis and mitigation strategies. Act as the technical expert for Infinia. Helping our customers get the most value out of Infinia. Build tooling to improve TTR. Write code fixes and enhancements to improve Infinia. Utilize AI-powered debugging, log analysis, and system pattern recognition tools to accelerate resolution. Product Knowledge & Value Creation Be the subject-matter expert on Infinia internals: metadata handling, storage fabric interfaces, performance tuning, AI integration, etc. Reproduce complex customer issues and propose product improvements and workarounds. Author and maintain detailed runbooks, performance tuning guides, and RCA documentation. Feed real-world problem insights back into the development cycle to improve reliability and diagnostics. Customer Engagement & Business Enablement Partner with Field CTOs, Solutions Architects, and Sales Engineers to ensure customer success. Translate technical issues into executive-ready summaries and business impact statements. Participate in post-mortems and executive briefings for strategic accounts. Drive adoption of observability, automation, and self-healing support mechanisms using AI/ML tools. Required Qualifications 10+ years in enterprise storage, distributed systems, or cloud infrastructure support/engineering. Deep understanding of file systems (POSIX, NFS, S3), storage performance, and Linux kernel internals. Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf). Hands-on experience with AI/ML data pipelines, container orchestration (Kubernetes), and GPU-based architectures. Exposure to RDMA, NVMe-oF, or high-performance networking stacks. Exceptional communication and executive reporting skills. Experience using AI tools (e.g., log pattern analysis, LLM-based summarization, automated RCA tooling) to accelerate diagnostics and reduce MTTR. Preferred Qualifications Experience with DDN, VAST, Weka, or similar scale-out file systems and/or S3 storage Strong scripting/coding ability in Python, Bash, or Go. Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry. Knowledge of replication, consistency models, and data integrity mechanisms. Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures. This position requires participation in an on-call rotation to provide after-hours support as needed.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Delhi Cantonment, Delhi, India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Chennai, Tamil Nadu, India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Hyderabad, Telangana, India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Mumbai, Maharashtra, India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Pune, Maharashtra, India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

Ahmedabad, Gujarat, India

Remote

About Smart Working At Smart Working, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are seeking an experienced DevOps Engineer to join our dynamic technical team and play a pivotal role in designing, maintaining, and evolving infrastructure that supports highly available, scalable applications. You will collaborate closely with engineering and security teams to automate processes, optimize deployment pipelines, and implement robust observability practices. We value engineers who take proactive ownership , communicate clearly across teams, and maintain a strong security-first mindset. While this role does not require managing people, a demonstrated ability to lead projects and take ownership of outcomes will be highly valued. If you’re driven by innovation and want to help us build resilient, secure, and automated infrastructure, this is the perfect opportunity for you to make a meaningful impact. What You’ll Be Doing Design, maintain, and improve infrastructure leveraging AWS and Terraform, with a focus on migrating workloads to Kubernetes for enhanced scalability and resilience Build and optimize CI/CD pipelines using GitHub Actions to enable faster, more reliable deployments supporting continuous integration and delivery Implement and maintain observability frameworks such as monitoring, logging, and alerting using OpenTelemetry and New Relic to ensure system health and security Partner with cross-functional teams to embed security best practices, including zero trust policies, and support compliance efforts such as SOC2 Must-Have Skills AW S: 5+ years of hands-on experience in designing and managing cloud infrastructure Kubernetes : 1+ year of practical experience with container orchestration and management Terraform : 2+ years of expertise in infrastructure as code to provision and maintain cloud resources CI/CD : 3+ years of experience building and managing continuous integration and delivery pipelines, preferably with GitHub Actions Nice-to-Have Skills Familiarity with SOC2 compliance and associated tooling Strong understanding of zero trust security policies and their impact on infrastructure and access control Experience with OpenTelemetry and modern observability tools Proficiency in New Relic or similar dashboarding and alerting platforms Strong scripting skills for automation and operational improvements Experience implementing advanced continuous delivery practices Key Deliverables (First 90 Days) Lead the migration of existing infrastructure to Kubernetes , ensuring stability and scalability Build or enhance CI/CD pipelines to improve deployment speed and reliability Set up and configure observability frameworks and dashboards to monitor application and infrastructure health effectively At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition. If that sounds like your kind of place, we’d love to hear your story.

Posted 3 weeks ago

Apply

0.0 - 5.0 years

10 - 20 Lacs

Musheerabad, Hyderabad, Telangana

On-site

As the Senior DevOps Engineer focused on Observability, you will set observability standards, lead automation efforts and mentor engineers ensuring all monitoring and Datadog configuration changes are implemented Infrastructure-as-Code (IaC). You will lead the design and management of a code-driven Datadog observability platform, providing end-to-end visibility into Java applications, Kubernetes workloads and containerized infrastructure. This role emphasizes cost-effective observability at scale requiring deep expertise in Datadog monitoring, logging, tracing and optimization techniques. You'll collaborate closely with SRE, DevOps and Software Engineering teams to standardize monitoring and logging practices to deliver scalable, reliable and cost-efficient observability solutions. This is a hands-on engineering role focused on observability-as-code. All monitoring, logging, alerting, and Datadog configurations are defined and managed through Terraform, APIs and CI/CD workflows — not manual configuration in the Datadog UI. PRIMARY RESPONSIBILITIES: Own and define observability standards for Java applications, Kubernetes workloads and cloud infrastructure Configure and manage the Datadog platform using Terraform and Infrastructure-as-Code (IaC) best practices Drive adoption of structured JSON logging, distributed tracing and custom metrics across Java and Python services Optimize Datadog usage through cost governance, log filtering, sampling strategies and automated reporting Collaborate closely with Java developers and platform engineers to standardize instrumentation and alerting Troubleshoot and resolve issues with missing or misconfigured logs, metrics and traces, working with developers to ensure proper instrumentation and data flow into Datadog Involve in incident response efforts using Datadog insights for actionable alerting, root cause analysis (RCA) and reliability improvements Serve as the primary point of contact for Datadog-related requests, supporting internal teams with onboarding, integration and usage questions Continuously audit and tune monitors for alert quality, reducing false positives and improving actionable signal detection Maintain clear internal documentation on Datadog usage, standards, integrations and IaC workflows Evaluate and propose improvements to the observability stack, including new Datadog features, OpenTelemetry adoption and future architecture changes Mentor engineers and develop internal training programs on Datadog, observability-as-code and modern log pipeline architecture QUALIFICATIONS: Bachelor’s degree in Computer Science, Engineering, Mathematics, Physics or a related technical field 5+ years of experience in DevOps, Site Reliability Engineering, or related roles with a strong focus on observability and infrastructure as code Hands-on experience managing and scaling Datadog programmatically using code-based workflows (e.g. Terraform, APIs, CI/CD) Deep expertise in Datadog including APM, logs, metrics, tracing, dashboards and audit trails Proven experience integrating Datadog observability into CI/CD pipelines (e.g. GitLab CI, AWS CodePipeline, GitHub Actions) Solid understanding of AWS services and best practices for monitoring services on Kubernetes infrastructure Strong background in Java application development is preferred Job Types: Full-time, Permanent, Contractual / Temporary Contract length: 12 months Pay: ₹1,000,000.00 - ₹2,000,000.00 per year Benefits: Paid sick time Schedule: Monday to Friday Night shift US shift Ability to commute/relocate: Musheerabad, Hyderabad, Telangana: Reliably commute or planning to relocate before starting work (Preferred) Education: Bachelor's (Preferred) Experience: DevOps: 5 years (Required) Language: English (Required) Location: Musheerabad, Hyderabad, Telangana (Preferred) Shift availability: Night Shift (Required) Work Location: In person Expected Start Date: 21/07/2025

Posted 3 weeks ago

Apply

3.0 years

0 Lacs

Delhi, India

Remote

Position Overview : We are looking for a passionate and skilled AI/ML Engineer with strong MLOps expertise to join our product engineering team. You will be responsible for developing and deploying scalable machine learning solutions that power our content-commerce-collaboration platform used by creators, brands, and consumers. Responsibilities : Design, train, and optimize ML/DL models for personalization, content understanding, search, recommendations, and fraud detection. Develop multimodal pipelines handling video, image, audio, and text inputs. Create embedding workflows and integrate with vector databases like Pinecone for real-time inference. Architect scalable, asynchronous inference systems using Docker, ECS Fargate, S3, Step Functions, SQS, and Lambda. Build CI/CD pipelines using GitHub Actions and AWS CodePipeline for ML lifecycle automation. Monitor model performance using Prometheus, Grafana, OpenTelemetry, and CloudWatch. Develop reusable infrastructure templates for logging, versioning, and evaluation. Secure and manage data using AWS services, including S3, EFS, ElastiCache, and RDS. Requirements : Troubleshoot and resolve complex system issues 3+ years of experience in ML engineering with proven MLOps exposure. Proficiency in Python with frameworks like TensorFlow, PyTorch, and Scikit-learn. Experience with Docker, AWS (ECS, Fargate, Lambda, S3), and CI/CD pipelines. Familiarity with gRPC microservices, REST APIs, and async job processing. Hands-on experience with vector databases such as Pinecone or FAISS. Strong problem-solving and debugging skills. Proactive communicator with the ability to work both independently and collaboratively. Nice to have: Experience with NestJS or Node.js, streaming media embedding, and observability tools (OpenTelemetry, X-Ray, ELK stack). What We Offer: Opportunity to work on cutting-edge tech in media + commerce, + social stack. Flat hierarchy, fast-paced product innovation cycle. Wellness support, flexible hours, and a remote-first policy. About Creator Bridge : Creato is a next-generation social commerce platform integrating content, collaboration, and e-commerce. Our mission is to empower creators, brands, and consumers by providing a seamless ecosystem where content meets commerce.

Posted 3 weeks ago

Apply

5.0 years

0 Lacs

India

Remote

Job Title: Data Engineer – Observability & Insights Platform Location: Remote Employment Type: Contract Experience: 5+ years Key Responsibilities: 1. Observability Signal Correlation Integrate and analyze logs, metrics, and traces using Grafana and Prometheus Enrich observability data with business context for deeper insights 2. Data Enrichment & Pipeline Development Build and maintain data pipelines that enhance technical signals with business metadata Leverage OpenTelemetry (OTel) for observability instrumentation 3. Machine Learning Integration Design, build, and deploy ML models for anomaly detection, forecasting, and incident noise reduction Continuously improve ML models for greater accuracy and business value 4. Disruption Prediction & Risk Mitigation Identify trends and patterns to predict business disruptions and support preemptive actions 5. Action Enablement Deliver actionable observability insights through dashboards and tools Support both automated and manual decision-making 6. Cross-Functional Collaboration Work closely with IT, DevOps, and Business teams to align technical implementations with business goals 7. Continuous Improvement Monitor and optimize data pipelines for accuracy, reliability, and performance Required Skills & Qualifications: Proven experience as a Data Engineer or similar role focused on observability and analytics Strong proficiency in SQL and Python Hands-on experience with Google Cloud Platform (GCP) Expertise in BigQuery, Grafana, Prometheus, and Splunk Familiarity with OpenTelemetry (OTel) Experience with big data tools such as Apache Spark, Kafka, and Airflow Machine Learning & Analytical Expertise: Applied machine learning techniques for anomaly detection and forecasting Ability to reduce alert noise and provide high-value insights Strong analytical skills for interpreting complex datasets Soft Skills: Excellent communication and collaboration skills Strong problem-solving mindset Passion for using data to solve real business problems

Posted 3 weeks ago

Apply

3.0 years

0 Lacs

Telangana, India

On-site

Ignite the Future of Language with AI at Teradata! What You'll Do: Shape the Way the World Understands Data At Teradata, we're not just managing data; we're unleashing its full potential. Our ClearScape Analytics™ platform and pioneering Enterprise Vector Store are empowering the world's largest enterprises to derive unprecedented value from their most complex data. We're rapidly pushing the boundaries of what's possible with Artificial Intelligence, especially in the exciting realm of autonomous and agentic systems We’re building intelligent systems that go far beyond automation — they observe, reason, adapt, and drive complex decision-making across large-scale enterprise environments. As a member of our AI engineering team, you’ll play a critical role in designing and deploying advanced AI agents that integrate deeply with business operations, turning data into insight, action, and measurable outcomes. You’ll work alongside a high-caliber team of AI researchers, engineers, and data scientists tackling some of the hardest problems in AI and enterprise software — from scalable multi-agent coordination and fine-tuned LLM applications, to real-time monitoring, drift detection, and closed-loop retraining systems. If you're passionate about building intelligent systems that are not only powerful but observable, resilient, and production-ready, this role offers the opportunity to shape the future of enterprise AI from the ground up. Who You'll Work With: Join Forces with the Best Imagine collaborating daily with some of the brightest minds in the company – individuals who champion diversity, equity, and inclusion as fundamental to our success. You'll be part of a cohesive force, laser-focused on delivering high-quality, critical, and highly visible AI/ML functionality within the Teradata Vantage platform. Your insights will directly shape the future of our intelligent data solutions. You'll report directly to the inspiring Sr. Manager, Software Engineering, who will champion your growth and empower your contributions. What Makes You a Qualified Candidate: Skills in Action Experience working with modern data platforms like Teradata, Snowflake, and Databricks Passion for staying current with AI research, especially in the areas of reasoning, planning, and autonomous systems. You are an excellent backend engineer who codes daily and owns systems end-to-end. Strong engineering background (Python/Java/Golang, API integration, backend frameworks) Strong system design skills and understanding of distributed systems. You’re obsessive about reliability, debuggability, and ensuring AI systems behave deterministically when needed. Hands-on experience with Machine learning & deep learning frameworks: TensorFlow, PyTorch, Scikit-learn Hands-on experience with LLMs, agent frameworks (LangChain, AutoGPT, ReAct, etc. ), and orchestration tools. Experience with AI observability tools and practices (e. g. , logging, monitoring, tracing, metrics for AI agents or ML models). Solid understanding of model performance monitoring, drift detection, and responsible AI principles. What You Bring: Passion and Potential A Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field – your academic foundation is key. A genuine excitement for AI and large language models (LLMs) is a significant advantage – you'll be working at the cutting edge! Design, develop, and deploy agentic systems integrated into the data platform. 3+ years of experience in software architecture, backend systems, or AI infrastructure. Experience in software development (Python, Go, or Java preferred). Familiarity with backend service development, APIs, and distributed systems. Interest or experience in LLMs, autonomous agents, or AI tooling. Familiarity with containerized environments (Docker, Kubernetes) and CI/CD pipelines. Experience with AI observability tools and practices (e. g. , logging, monitoring, tracing, metrics for AI agents or ML models). Build dashboards and metrics pipelines to track key AI system indicators: latency, accuracy, tool invocation success, hallucination rate, and failure modes. Integrate observability tooling (e. g. , OpenTelemetry, Prometheus, Grafana) with LLM-based workflows and agent pipelines. Strong knowledge of LLMs, RL, or cognitive architectures is highly desirable. Passion for building safe, human-aligned autonomous systems. Bonus: Research experience or contributions to open-source agentic frameworks. You're knowledgeable about open-source tools and technologies and know how to leverage and extend them to build innovative solutions. Why We Think You’ll Love Teradata We prioritize a people-first culture because we know our people are at the very heart of our success. We embrace a flexible work model because we trust our people to make decisions about how, when, and where they work. We focus on well-being because we care about our people and their ability to thrive both personally and professionally. We are an anti-racist company because our dedication to Diversity, Equity, and Inclusion is more than a statement. It is a deep commitment to doing the work to foster an equitable environment that celebrates people for all of who they are. Teradata invites all identities and backgrounds in the workplace. We work with deliberation and intent to ensure we are cultivating collaboration and inclusivity across our global organization. ​ We are proud to be an equal opportunity and affirmative action employer. We do not discriminate based upon race, color, ancestry, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related conditions), national origin, sexual orientation, age, citizenship, marital status, disability, medical condition, genetic information, gender identity or expression, military and veteran status, or any other legally protected status.

Posted 3 weeks ago

Apply

35.0 years

0 Lacs

Chennai, Tamil Nadu, India

On-site

About us One team. Global challenges. Infinite opportunities. At Viasat, we’re on a mission to deliver connections with the capacity to change the world. For more than 35 years, Viasat has helped shape how consumers, businesses, governments and militaries around the globe communicate. We’re looking for people who think big, act fearlessly, and create an inclusive environment that drives positive impact to join our team. What you'll do The Enterprise Architecture team is focused on providing solutions to enable an effective software engineering workforce that can scale to the business needs. This includes exploring how the business needs map to the application portfolio, business processes, APIs, and data elements across the organization. As a member of this team, you will build up a vast knowledge in software development, cloud application engineering, automation, and container orchestration. Our ideal candidate values communication, learning, adaptability, creativity, and ingenuity. They also enjoy working on challenging technical issues and use creative, innovative techniques to develop and automate solutions. This team is focused on providing our executive and business leadership with visibility into how the software organization is functioning and what opportunities lie to transform the business. In this position you will be an integral part of the Enterprise Architecture team and make meaningful impacts in our journey towards digital transformation. The day-to-day A Strong Data-Model and High-Quality Data are pre-requisites to provide better insights and enable solid data-driven decision making. This is also key to take advantage of various technological advances in Artificial Intelligence and Machine Learning. Your responsibilities will involve build out of data models for various aspects of our enterprise in conjunction with domain experts. Examples include but are not limited to Network, Capacity, Finance, Business Support Systems etc. Responsibilities also include working with software product teams to improve data quality across the organization. What you'll need Bachelor's degree or higher in Computer Science & Applications, Computer Science and Computer & Systems Engineering, Computer Science & Engineering, Computer Science & Mathematics, Computer Science & Network Security and Math & Computer Science, and/or a related field Solid understanding of Data Architecture and Data Engineering principles Experience building out data models Experience performing data analysis and presenting data in easy to comprehend manner. Experience in working with Relational Databases, NoSQL, Large Scale Data technologies (Kafka, Big Query, Snowflake etc) Experience with digital transformation across multiple cloud platforms like AWS and GCP. Experience in modernizing data platforms especially in GCP is highly preferred. Partner with members of Data Platform team and others to build out Data Catalog and map to the data model Detail Oriented to ensure that the catalog represents quality data Solid communication skills and ability to work on a distributed team Tenacity to remain focused on the mission and overcome obstacles Ability to perform hands-on work with development teams and guide them to building necessary data models. Experience setting up governance structure and changing the organization culture by influence What will help you on the job Experience with Cloud Technologies: AWS, GCP, and/or Azure, etc. Expertise in GCP data services like Cloud Pub/Sub, Dataproc, Dataflow, BigQuery, and related technologies preferred. Experience with Airflow, DBT and SQL. Experience with Open-source software like Logstash, ELK stack, Telegraf, Prometheus and OpenTelemetry is a plus. Passionate to deliver solutions that improve developer experience and promote API-first principles and microservices architecture. Experience with Enterprise Architecture and related principles

Posted 3 weeks ago

Apply

4.0 - 9.0 years

0 Lacs

Hyderabad, Telangana, India

On-site

Role: Senior Staff Engineer Location: Hyderabad (Work From Office) Experience : 4 to 9 years Backend & Frontend Expertise: Strong proficiency in Python and FastAPI for microservices. Strong in TypeScript/Node.js for GraphQL/RESTful API interfaces . Cloud & Infra Application: Hands-on AWS experience, proficient with existing Terraform . Working knowledge of Kubernetes/Argo CD for deployment/troubleshooting. CI/CD & Observability: Designs and maintains GitHub Actions pipelines. Implements OpenTelemetry for effective monitoring and debugging. System Design: Experience designing and owning specific microservices (APIs, data models, integrations). Quality & Testing: Drives robust unit, integration, and E2E testing. Leads code reviews. Mentorship: Guides junior engineers, leads technical discussions for features.

Posted 3 weeks ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies