Company Description BugRaid.AI harnesses advanced AIOps and AI bots to proactively manage and respond to incidents, revolutionizing the entire process. Our innovative solution integrates comprehensive incident analysis with real-time response capabilities, distinguishing us within the industry. We expedite resolution by swiftly identifying and addressing issues to minimize downtime and improve efficiency. Our platform is engineered for scalability and flexibility, providing in-depth insights through comprehensive analytics to support informed decision-making. Role Description This is a full-time remote position for a Senior AI Engineer. The Senior AI Engineer will be responsible for designing, developing, and optimizing AI systems focused on AWS, Large Language Models (LLMs), Generative AI (GenAI), and AIOps. Responsibilities include building and deploying AI models, analyzing their performance, and integrating AI solutions with existing systems. Responsibilities - Design Agentic Systems: Develop and enhance multi-step reasoning agents that operate with live infrastructure and observability data (logs, metrics, traces). - Develop Real-World GenAI Applications: Incorporate LLMs into production environments demanding low latency and high availability. - Prompt Engineering & Orchestration: Create tools for function calling, agent workflows, and dynamic prompt tuning. - Conduct Experiments & Deployment: Rapidly prototype, evaluate, and refine LLMs, agent workflows, and evaluation mechanisms. - Monitor & Assess: Establish observability and continuous evaluation pipelines to ensure the reliability of AI agents in practical scenarios. - Foster Collaboration: Work closely with the backend, infrastructure, and product teams to embed AI agents within the core BugRaid system. Qualifications - Over 3+ years of experience in software engineering, preferably within AI/ML, GenAI, or data engineering roles. - Proficiency in Python and its machine learning/AI ecosystem. - Hands-on experience with AWS, LLMs, or agent frameworks such as LangChain, CrewAI, etc.. - Familiarity with observability data and incident troubleshooting workflows. - Practical experience in developing distributed systems, data pipelines, or real-time machine learning infrastructure. - Demonstrated initiative and capability to transition ideas from prototypes to production environments. Additional Considerations: - Experience with model fine-tuning, RLHF, or contributions to open-source AI agents. - Knowledge of AWS, Azure, Terraform, Kubernetes, or platforms for ML orchestration. - Contribution to or development of large-scale GenAI platforms or systems ensuring AI reliability. Perquisites & Benefits - Culture emphasizing remote work (within India), supported by team hubs located in Hyderabad and Bangalore. - Competitive startup compensation complemented by generous Employee Stock Ownership Plans (ESOPs) – emphasizing ownership. - Collaboration with passionate engineers, AI specialists, and DevOps leaders from leading organizations. - Significant opportunity to create impactful AI products that transform global infrastructure operations.
Location: Hyderabad/Bangalore/Singapore About BugRaid.AI Incidents are the silent killers of modern enterprises. Every minute of downtime means lost revenue, lost trust, and engineers under fire. BugRaid.AI is building the world’s first enterprise-ready incident copilot — intelligent, agentic systems that can detect, diagnose, and resolve complex production incidents across massive streams of logs, metrics, traces, and alerts. Our vision: autonomous reliability for enterprise infrastructure . Think AIOps meets generative AI , but designed from day one for security, scale, and production readiness . Role Overview We are seeking a Senior AI Engineer or Research Scientist w ith practical experience in developing and deploying AI/ML systems at scale. Your contributions will drive BugRaid.AI’s agentic intelligence, allowing our platform to conduct root cause analysis (RCA) and automated remediation in demanding enterprise settings. This position bridges AI research and production engineering . You will experiment with LLMs, reinforcement learning, and synthetic data, and also implement these concepts into enterprise-grade, scalable, and dependable systems . Responsibilities Enterprise-Ready AI Development: Design, implement, and deploy AI agents that operate reliably under enterprise constraints, such as multi-cloud environments, zero data egress, and compliance requirements. LLM & Agent Research: Develop and test reasoning workflows, prompting strategies, and tool-use policies specifically tailored for large-scale observability data. Training & Alignment: Utilize techniques such as fine-tuning, RLHF, and reward modeling to ensure AI behavior aligns with real SRE workflows, thereby avoiding reliance on toy datasets. Synthetic Data Generation: Create scalable pipelines for producing synthetic incidents—logs, metrics, traces—used for training and benchmarking in scenarios with limited data. Cross-Team Collaboration: Collaborate closely with infrastructure engineers, SREs, and product leaders to incorporate AI research into the BugRaid.AI platform. Production Excellence: Design systems with built-in observability, monitoring, and CI/CD processes, ensuring failover, rollback, and resilience capabilities . Frontier Tracking: Keep up with the latest research in LLMs, agentic AI, and reliability, and incorporate these innovations into enterprise production environments. Requirements 5+ years of production experience building and deploying ML/AI systems at scale. PhD or Master’s in Computer Science, Machine Learning, or related fields (or equivalent experience). Strong foundation in deep learning, reinforcement learning, or robotics . Hands-on experience with LLM fine-tuning, RLHF, or synthetic data generation . Proven track record of shipping AI systems into production (not just research prototypes). Experience with cloud-native architectures (AWS/Azure/GCP) , Kubernetes , and MLOps tooling (SageMaker, MLflow, Ray, Kubeflow) . Comfort working with large-scale data streams (Kinesis, Kafka, Flink) and vector databases (OpenSearch, Pinecone, Weaviate). Strong collaboration skills and ability to communicate across research, engineering, and product teams . Nice to Have Background in observability (logs, metrics, traces) or large-scale production debugging. Experience designing secure AI systems (data privacy, zero-data egress, compliance frameworks like HIPAA/FedRAMP). Contributions to open-source agent frameworks (LangChain, LlamaIndex, Haystack, Ray Serve). Experience with real-time ML serving (low-latency inference, batch + streaming pipelines). Prior experience in enterprise SaaS or reliability-focused products. Why Join BugRaid.AI Work on high-impact AI challenges that sit at the intersection of AI, DevOps, and enterprise reliability . Shape the core intelligence layer of a product designed to be enterprise-ready from day one . Collaborate with a world-class founding team experienced in AI, DevOps, and large-scale systems. Competitive compensation, equity (ESOPs) , and a chance to join at an early stage with an outsized impact. A culture that values research rigor, engineering excellence, and execution speed .