Job
Description
Provide technical leadership in the design, development, and maintenance of scalable build systems and deployment pipelines for AI/ML components, setting standards for quality, reliability, and performance.Mentor and guide a team of engineers, promoting best practices in C++, Python, CI/CD, and infrastructure automation.Design and implement robust build automation systems that support large, distributed AI/C++/Python codebases.Develop tools and scripts to enable developers and researchers to rapidly iterate, test, and deploy across diverse environments.Integrate C++ components with Python-based AI workflows, ensuring compatibility, performance, and maintainability.Lead the creation of portable, reproducible development environments, ensuring parity between development and production systems.Maintain and extend CI/CD pipelines for Linux and z/OS, applying best practices in automated testing, artifact management, and release validation.Collaborate with cross-functional teams — including AI researchers, system architects, and mainframe engineers — to align infrastructure with strategic and technical goals.Proactively monitor and improve build performance, automation coverage, and system reliability, identifying opportunities for innovation and optimization.Contribute to internal documentation, process improvements, and knowledge sharing to scale impact across teams and foster a culture of continuous improvement.
Required education Bachelor's Degree Preferred education Bachelor's Degree Required technical and professional expertise Expert-level programming skills in C++ and Python, with a strong grasp of both compiled and interpreted language paradigms; able to provide architectural guidance and code-level mentorship.Demonstrated leadership in building and maintaining complex automation pipelines (CI/CD) using tools like Jenkins or GitLab CI, including the ability to define strategy, review team contributions, and drive implementation.In-depth experience with build tools and systems such asCMake, Make, Meson, or Ninja, including development of custom scripts and support for cross-compilation in heterogeneous environments.Proven experience leadingmulti-platform development efforts, particularly onLinux and IBM z/OS, with a deep understanding of platform-specific toolchains, constraints, and performance considerations.Expertise inintegrating native C++ code with Pythonusing tools like pybind11 or Cython, ensuring high-performance and maintainable interoperability across language boundaries.Strong diagnostic and debugging skills, with the ability to lead teams in resolving build-time, runtime, and integration issues in large-scale, multi-component systems.Proficiency inshell scripting (e.g., Bash, Zsh)and system-level operations, with the ability to coach others in scripting best practices.Familiarity withcontainerization technologies like Docker, and a track record of leading the adoption or optimization of container-based development and deployment workflows.Excellent communication and collaboration skills, with the ability to coordinate across disciplines, align technical efforts with strategic goals, and foster a high-performing engineering culture.
Preferred technical and professional experience Working knowledge of AI/ML frameworks such as PyTorch, TensorFlow, or ONNX, with an understanding of how to integrate them into scalable, production-grade environments,able to guide teams in best practices for deployment and optimization.Experience developing or maintaining software onIBM z/OS mainframe systems, with the ability to mentor others in navigating legacy-modern hybrid ecosystems.Familiarity withz/OS build and packaging workflows, including leading efforts to streamline and modernize tooling where appropriate.Solid understanding ofsystem performance tuningin high-throughput compute and I/O environments (e.g., large-scale model training or inference pipelines), and the ability to direct optimization strategies.Knowledge ofGPU computing and low-level profiling/debugging tools, with experience driving performance-critical initiatives.Experience managinglong-lifecycle enterprise systems, ensuring forward- and backward-compatibility across releases and deployments through proactive planning and versioning strategies.Background contributing to or maintainingopen-source projectsin infrastructure, DevOps, or AI tooling domains, with a focus on community engagement and sustainability.Proficiency indistributed systems, microservice architectures, and REST APIs, including guiding architectural decisions that balance performance, maintainability, and scalability.Proven experience leadingintegration of MLOps pipelines with CI/CD frameworks, ensuring seamless, secure, and automated deployment of AI/ML models into production workflows.Exceptional communication and stakeholder management skills, capable of clearly articulating technical strategies and trade-offs to non-technical audiences.Demonstrated ability to fostercollaboration and alignment across diverse, cross-functional teams, including AI researchers, DevOps engineers, and enterprise architects.Track record of ensuringcompliance with industry standards, security policies, and best practicesin enterprise-scale AI engineering.Commitment tomaintaining high standards of code quality, performance, and security, with the ability to review and enforce standards across a team or organization.