About Us
CLOUDSUFI, a Google Cloud Premier Partner
, a Data Science and Product Engineering organization building Products and Solutions for Technology and Enterprise industries. We firmly believe in the power of data to transform businesses and make better decisions. We combine unmatched experience in business processes with cutting edge infrastructure and cloud services. We partner with our customers to monetize their data and make enterprise data dance.
Our Values
We are a passionate and empathetic team that prioritizes human values. Our purpose is to elevate the quality of lives for our family, customers, partners and the community.
Equal Opportunity Statement
CLOUDSUFI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified candidates receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, and national origin status. We provide equal opportunities in employment, advancement, and all other areas of our workplace. Please explore more at https://www.cloudsufi.com/.
Job Summary
We are seeking a highly innovative and skilled AI Engineer to join our AI CoE for the Data Integration Project. The ideal candidate will be responsible for designing, developing, and deploying intelligent assets and AI agents that automate and optimize various stages of the data ingestion and integration pipeline. This role requires expertise in machine learning, natural language processing (NLP), knowledge representation, and cloud platform services, with a strong focus on building scalable and accurate AI solutions.
Key Responsibilities
- LLM-based Auto-schematization: Develop and refine LLM-based models and techniques for automatically inferring schemas from diverse unstructured and semi-structured public datasets and mapping them to a standardized vocabulary.
- Entity Resolution & ID Generation AI: Design and implement AI models for highly accurate entity resolution, matching new entities with existing IDs and generating unique, standardized IDs for newly identified entities.
- Automated Data Profiling & Schema Detection: Develop AI/ML accelerators for automated data profiling, pattern detection, and schema detection to understand data structure and quality at scale.
- Anomaly Detection & Smart Imputation: Create AI-powered solutions for identifying outliers, inconsistencies, and corrupt records, and for intelligently filling missing values using machine learning algorithms.
- Multilingual Data Integration AI: Develop AI assets for accurately interpreting, translating (leveraging automated tools with human-in-the-loop validation), and semantically mapping data from diverse linguistic sources, preserving meaning and context.
- Validation Automation & Error Pattern Recognition: Build AI agents to run comprehensive data validation tool checks, identify common error types, suggest fixes, and automate common error corrections.
- Knowledge Graph RAG/RIG Integration: Integrate Retrieval Augmented Generation (RAG) and Retrieval Augmented Indexing (RIG) techniques to enhance querying capabilities and facilitate consistency checks within the Knowledge Graph.
- MLOps Implementation: Implement and maintain MLOps practices for the lifecycle management of AI models, including versioning, deployment, monitoring, and retraining on a relevant AI platform.
- Code Generation & Documentation Automation: Develop AI tools for generating reusable scripts, templates, and comprehensive import documentation to streamline development.
- Continuous Improvement Systems: Design and build learning systems, feedback loops, and error analytics mechanisms to continuously improve the accuracy and efficiency of AI-powered automation over time.
Required Skills And Qualifications
- Bachelor's or Master's degree in Computer Science, Artificial Intelligence, Machine Learning, or a related quantitative field.
- Proven experience (e.g., 3+ years) as an AI/ML Engineer, with a strong portfolio of deployed AI solutions.
- Strong expertise in Natural Language Processing (NLP), including experience with Large Language Models (LLMs) and their applications in data processing.
- Proficiency in Python and relevant AI/ML libraries (e.g., TensorFlow, PyTorch, scikit-learn).
- Hands-on experience with cloud AI/ML services,
- Understanding of knowledge representation, ontologies (e.g., Schema.org, RDF), and knowledge graphs.
- Experience with data quality, validation, and anomaly detection techniques.
- Familiarity with MLOps principles and practices for model deployment and lifecycle management.
- Strong problem-solving skills and an ability to translate complex data challenges into AI solutions.
- Excellent communication and collaboration skills.
Preferred Qualifications
- Experience with data integration projects, particularly with large-scale public datasets.
- Familiarity with knowledge graph initiatives.
- Experience with multilingual data processing and AI.
- Contributions to open-source AI/ML projects.
- Experience in an Agile development environment.
Benefits
- Opportunity to work on a high-impact project at the forefront of AI and data integration.
- Contribute to solidifying a leading data initiative's role as a foundational source for grounding Large Models.
- Access to cutting-edge cloud AI technologies.
- Collaborative, innovative, and fast-paced work environment.
- Significant impact on data quality and operational efficiency.
Skills:- Natural Language Processing (NLP), Large Language Models (LLM) tuning, Machine Learning (ML), Retrieval Augmented Generation (RAG), Python and Generative AI