Jobs
Interviews

6 Aiml Infrastructure Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

14.0 - 18.0 years

0 Lacs

karnataka

On-site

As the VP of Engineering, you will lead engineering, DevOps, and infrastructure teams to develop a robust and scalable cloud platform. You will collaborate closely with leadership to drive technical strategy, execute product roadmaps, and optimize cloud orchestration for GPUs, storage, and compute. Your role demands expertise in cloud-native architectures, networking, and large-scale distributed systems while effectively managing and mentoring teams. You will define and execute the engineering vision for the sovereign cloud and GPU-as-a-service platform. Architect and scale cloud infrastructure to ensure high availability, security, and performance. Keep abreast of emerging technologies in cloud orchestration, networking, and AI workloads. Seamlessly orchestrate data centers by integrating storage, compute, and networking components. Lead engineering teams proficient in Go (Golang), Python, Kubernetes, and distributed systems. Oversee the development of high-performance computing (HPC) infrastructure and GPU-based workload management. Drive efficiency in storage and compute orchestration, optimizing for performance and cost. Ensure robust network orchestration for data centers to enable multi-region cloud deployments. Build and mentor engineering, DevOps, and SRE teams to foster a high-performance culture. Collaborate with product management, operations, and business stakeholders to align technical decisions with company goals. Create an engineering roadmap focused on scalability, reliability, and automation. Ensure Kubernetes-based orchestration for compute, storage, and networking workloads. Advocate CI/CD best practices, Infrastructure-as-Code (IaC), and cloud automation. Optimize cloud-native architectures while ensuring regulatory compliance for sovereign cloud environments. Enhance monitoring, observability, and security across the cloud infrastructure. Your technical skills should include deep expertise in Kubernetes, container orchestration, and microservices architecture. Strong experience with data center network orchestration and software-defined networking (SDN). Proficiency in Go (Golang) and Python for building distributed systems. Understanding of storage and compute orchestration in large-scale environments. Experience with GPU-accelerated computing, AI/ML infrastructure, and HPC workloads. Demonstrate hands-on experience with CI/CD, GitOps, and DevSecOps best practices. Experience managing multi-region sovereign cloud environments with regulatory compliance. Expertise in observability, logging, and monitoring tools such as Prometheus, Grafana, and ELK. Proven experience leading engineering, DevOps, and cloud infrastructure teams. Ability to scale engineering teams, set KPIs, and drive technical excellence. Experience managing high-performance distributed teams across multiple regions. Strong ability to align technical execution with business strategy. Preferred qualifications include 14+ years of experience in cloud infrastructure, distributed systems, or software engineering. Prior experience in sovereign cloud, hyperscale cloud, or GPU-based infrastructure is a plus. A Bachelor's or Master's degree in Computer Science, Engineering, or related fields is preferred.,

Posted 6 days ago

Apply

4.0 - 8.0 years

0 Lacs

karnataka

On-site

About CodeRabbit CodeRabbit is an innovative research and development company focused on building extraordinarily productive human-machine collaboration systems. Our primary goal is to create the next generation of Gen AI-driven code reviewers: a symbiotic partnership between humans and advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push the boundaries of software development efficiency and quality. The Role We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in Bangalore. You'll be instrumental in ensuring the high availability, performance, and scalability of CodeRabbit's AI-powered code review platform. This role sits at the intersection of software engineering and systems operations, where you'll build the foundational platforms and automation that enable our engineering teams to deploy, monitor, and scale our services reliably. As an SRE at CodeRabbit, you'll be responsible for enhancing the reliability of our critical services that process millions of code reviews, building sophisticated automation platforms, and owning the infrastructure that powers our AI-driven analysis engine. You'll work with cutting-edge technologies including large language models, real-time processing systems, and distributed architectures that operate at significant scale. Key Responsibilities Infrastructure & Platform Ownership Design, implement, and maintain scalable infrastructure on Google Cloud Platform to support CodeRabbit's growing user base and processing demands Own and operate critical platform services Build and maintain Infrastructure as Code using Terraform to ensure consistent, reproducible, and version-controlled infrastructure deployments Reliability & Performance Engineering Establish and maintain SLI/SLO frameworks for all critical services, ensuring we meet our reliability commitments to users Implement comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation Conduct thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability Optimize application and infrastructure performance to handle millions of pull request analyses with minimal latency Design and implement chaos engineering practices to proactively identify and resolve system weaknesses Automation & Developer Experience Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently Automate operational tasks including scaling, backup/recovery, security patching, and routine maintenance Create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams Security & Compliance Integrate security best practices into all infrastructure and platform services Implement and maintain security monitoring, vulnerability scanning, and compliance reporting Design secure network architectures including VPC configuration, firewall rules, and access control systems Establish and maintain disaster recovery procedures and business continuity planning Required Qualifications Experience & Background 4-6 years of hands-on experience in Site Reliability Engineering, Platform Engineering, or DevOps Engineering roles Proven track record of managing production systems at scale, preferably in high-growth technology companies Experience with cloud platforms, particularly AWS or Google Cloud Platform (GCP), including compute, storage, networking, and managed services Strong background in containerization and orchestration platforms (Kubernetes, Docker) Technical Skills Programming Languages: Proficiency in Node.js and TypeScript for building automation tools, monitoring solutions, and platform services Infrastructure as Code: Advanced experience with Terraform for infrastructure provisioning and management Monitoring & Observability: Hands-on experience with Datadog or similar platforms (Prometheus, Grafana, ELK stack) for observability Cloud Platforms: Comprehensive experience with GCP services including Compute Engine, GKE, Cloud Run, Cloud SQL, Cloud Storage, Load Balancing, and IAM Systems & Operations Strong Linux/Unix systems skills Experience with network protocols, load balancing, and CDN technologies Knowledge of security principles and best practices for cloud infrastructure Familiarity with CI/CD tools and practices (Jenkins, GitLab CI, GitHub Actions) Understanding of microservices architecture and distributed systems principles Preferred Qualifications Experience with AI/ML infrastructure and tools Background in managing high-traffic web applications and API services Experience with disaster recovery planning and execution Familiarity with compliance frameworks (SOC 2, ISO 27001) Contributions to open-source infrastructure or SRE tooling projects Experience with cost optimization and FinOps practices Knowledge of performance testing and capacity planning methodologies What You'll Bring Technical Excellence Strong problem-solving skills with the ability to debug complex distributed systems issues Systematic approach to troubleshooting with excellent attention to detail Passion for automation and eliminating toil through intelligent tooling and processes Understanding of software engineering principles and ability to write production-quality code Collaboration & Communication Excellent communication skills with the ability to work effectively across engineering, product, and business teams Ability to translate complex technical concepts into business impact and user value Strong documentation skills and commitment to knowledge sharing Growth Mindset Enthusiasm for continuous learning and staying current with emerging technologies Ability to thrive in a fast-paced, rapidly evolving startup environment Proactive mindset with the ability to identify and solve problems before they impact users Commitment to building inclusive, diverse, and collaborative team environments Our Values Collaborative Humans: Prioritizing collective intelligence Fearless Innovators: Turning obstacles into growth opportunities Persistent, Passionate Developers: Thriving on complex, long-term challenges Impact-Driven Creators: Crafting intuitive tools for developers Rapid Learners and Un-learners: Adapting quickly in our fast-paced technological world What we offer Work on cutting-edge technology with real-world impact Collaborative and innovative environment Competitive salary, equity, and benefits Professional development opportunities To apply, submit your resume and relevant project samples or GitHub profiles. CodeRabbit is an equal-opportunity employer committed to diversity and inclusion.,

Posted 1 month ago

Apply

8.0 - 12.0 years

0 Lacs

pune, maharashtra

On-site

The tech space is crowded, but most solutions feel like they're cut from the same cloth missing the mark on what businesses truly need. At Dawn Technologies is a niche company laser-focused on delivering specialized tech solutions in Gen AI, Data Engineering, and Backend Systems. What sets us apart is our commitment to the technologists the creators, problem-solvers, and innovators who bring true value to transformation. We are seeking an experienced Lead Solutions Architect with deep expertise in AI/ML infrastructure, High Performance Computing (HPC), and container platforms to join our dynamic team. You will play a key role in architecting, deploying, and optimizing private cloud environments that leverage co-developed solutions with NVIDIA and validated reference architectures to support enterprise-grade AI workloads at scale. As the Lead Solutions Architect, you will provide delivery assurance, serve as the lead design authority, and align solution architecture with NVIDIA Enterprise AI Factory design principles. Your responsibilities will include overseeing planning, risk management, and stakeholder alignment throughout the project lifecycle to ensure successful outcomes. You will architect and optimize end-to-end solutions across container orchestration and HPC workload management domains, ensuring seamless integration of container and AI platforms with the broader software ecosystem. Additionally, you will lead technical responses to RFPs and customer inquiries, conduct proof-of-concept engagements, and assess customer infrastructure to recommend optimal configurations using validated reference architectures. To be successful in this role, you must have extensive knowledge of HPC technologies, containerization technologies, GPU technologies, Linux system administration, virtualization technologies, hybrid cloud architectures, DevOps practices, networking principles, programming languages, and automation. You should also possess strong problem-solving, analytical thinking, and communication skills, along with the ability to lead complex technical projects and align technical solutions with client challenges and objectives. A bachelor's or master's degree in computer science, Information Technology, or a related field, along with professional certifications in AI Infrastructure, Containers, and Kubernetes are highly desirable. You should have 8-10 years of hands-on experience in architecting and implementing HPC, AI/ML, and container platform solutions within hybrid or private cloud environments. Join us at AT Dawn Technologies and be part of a workplace culture centered on innovation, ownership, and excellence. We are an equal-opportunity employer that celebrates diversity and inclusion. Come redefine the future with us!,

Posted 1 month ago

Apply

5.0 - 9.0 years

0 Lacs

karnataka

On-site

As the Senior Product Manager for Platform at Enterpret, you will play a crucial role in defining and executing the strategic vision, product strategy, and roadmap for the core platform. This platform serves as the foundation of Enterpret's product, consolidating customer feedback from various sources and transforming it into valuable insights through the Knowledge Graph infrastructure and Adaptive Taxonomy engine, among others. Your key responsibilities will include driving the strategy by leading the product vision, roadmap, and overall platform development. You will be responsible for delivering a robust and performant platform that provides near real-time, high-quality, predictive insights to customers while ensuring developer productivity and customer satisfaction at scale. Collaboration with engineering and product leadership is essential to make architectural decisions that enhance performance, scalability, reliability, security, and cost efficiency. You will also work cross-functionally to understand platform needs across different product teams, align on roadmap dependencies, and ensure the platform continues to support and accelerate overall product development. Translating complex technical concepts into clear product requirements and owning key success metrics such as latency, scalability, reliability, cost, and internal developer velocity will be part of your role. Additionally, you will invest in improving developer experience through observability, documentation, and tooling to facilitate faster and higher-quality development by Enterpret teams. As a champion of platform-as-a-product, you will promote the platform's capabilities internally and externally, ensuring that shared services are well-understood, adopted, and designed with a customer-centric and metrics-driven approach. Your role will be instrumental in driving Enterpret's platform to new heights and maintaining its position as a key asset in delivering trusted insights to customers.,

Posted 1 month ago

Apply

10.0 - 14.0 years

0 Lacs

telangana

On-site

As the Vice President of Engineering at Teradata in India, you will be responsible for leading the software development organization for the AI Platform Group. This includes overseeing the execution of the product roadmap for key technologies such as Vector Store, Agent platform, Apps, user experience, and AI/ML-driven use-cases. Your success in this role will be measured by your ability to build a world-class engineering culture, attract and retain technical talent, accelerate product delivery, and drive innovation that brings tangible value to customers. In this role, you will lead a team of over 150 engineers with a focus on helping customers achieve outcomes with Data and AI. Collaboration with key functions such as Product Management, Product Operations, Security, Customer Success, and Executive Leadership will be essential to your success. You will also lead a regional team of up to 500 individuals, including software development, cloud engineering, DevOps, engineering operations, and architecture teams. Collaboration with various stakeholders at regional and global levels will be a key aspect of your role. To be considered a qualified candidate for this position, you should have at least 10 years of senior leadership experience in product development or engineering within enterprise software product companies. Additionally, you should have a minimum of 3 years of experience in a VP Product or equivalent role managing large-scale technical teams in a growth market. You must have a proven track record of leading agentic AI development and scaling AI in a hybrid cloud environment, as well as experience with Agile and DevSecOps methodologies. Your background should include expertise in cloud platforms, data harmonization, data analytics for AI, Kubernetes, containerization, and microservices-based architectures. Experience in delivering SaaS-based data and analytics platforms, modern data stack technologies, AI/ML infrastructure, enterprise security, and performance engineering is also crucial. A passion for open-source collaboration, building high-performing engineering cultures, and inclusive leadership is highly valued. Ideally, you should hold a Master's degree in engineering, Computer Science, or an MBA. At Teradata, we prioritize a people-first culture, offer a flexible work model, focus on well-being, and are committed to Diversity, Equity, and Inclusion. Join us in our mission to empower our customers and drive innovation in the world of AI and data analytics.,

Posted 1 month ago

Apply

10.0 - 14.0 years

0 Lacs

telangana

On-site

As the Vice President of Engineering at Teradata, you will be responsible for leading the India-based software development organization within the AI Platform Group. Your main focus will be on executing the product roadmap for key technologies such as Vector Store, Agent platform, Apps, user experience, and AI/ML-driven use-cases at scale. Success in this role will involve building a world-class engineering culture, attracting and retaining top technical talent, accelerating hybrid cloud-first product delivery, and driving innovation that brings measurable value to customers. You will be leading a team of over 150 engineers with the goal of helping customers achieve outcomes with Data and AI. Collaboration with Product Management, Product Operations, Security, Customer Success, and Executive Leadership will be key aspects of your role. Additionally, you will work closely with a high-impact regional team of up to 500 people, including software development, cloud engineering, DevOps, engineering operations, and architecture teams. To qualify for this position, you should have over 10 years of senior leadership experience in product development, engineering, or technology leadership within enterprise software product companies. You should also have at least 3 years of experience in a VP Product or equivalent role managing large-scale technical teams in a growth market. Experience in leading the development of agentic AI and scaling AI in a hybrid cloud environment is essential. Success in implementing and scaling Agile and DevSecOps methodologies, as well as modernizing legacy architectures into service-based systems, will be key qualifications. Your background should include expertise in cloud platforms, data harmonization, data analytics for AI, Kubernetes, containerization, and microservices-based architectures. Experience in delivering SaaS-based data and analytics platforms, familiarity with modern data stack technologies, AI/ML infrastructure, enterprise security, data governance, and API-first design will be beneficial. Additionally, a track record of building high-performing engineering cultures, inclusive leadership teams, and a passion for open-source collaboration are desired qualities. A Masters degree in engineering, Computer Science, or an MBA is preferred for this role. At Teradata, we prioritize a people-first culture, embrace a flexible work model, focus on well-being, and are committed to Diversity, Equity, and Inclusion. Join us in our dedication to fostering an equitable environment that celebrates individuals for all aspects of who they are.,

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies