Generative AI Engineer Vision-Language Model (VLM) Neuronest AI Pvt Ltd

1.0 - 5.0 years

0 Lacs

coimbatore, all india

On-site

As a Vision-Language Model Developer, you will be responsible for developing, fine-tuning, and evaluating vision-language models such as CLIP, Flamingo, BLIP, GPT-4V, LLaVA, etc. You will design and build multimodal pipelines that integrate image/video input with natural language understanding or generation. Working with large-scale image-text datasets like LAION, COCO, and Visual Genome for training and validation will be a key part of your role. Implementing zero-shot/few-shot multimodal inference, retrieval, captioning, VQA (Visual Question Answering), grounding, etc., will also be within your responsibilities. Collaboration with product teams, ML engineers, and data scientists to deliver...

Posted 2 days ago

AI Match Score

Apply

Generative AI Engineer Vision-Language Model (VLM) Neuronest AI Pvt Ltd

1.0 - 5.0 years

0 Lacs

coimbatore, tamil nadu

On-site

As a Vision-Language Model Developer, your role involves developing, fine-tuning, and evaluating vision-language models such as CLIP, Flamingo, BLIP, GPT-4V, LLaVA, etc. You will design and build multimodal pipelines that integrate image/video input with natural language understanding or generation. Working with large-scale image-text datasets like LAION, COCO, Visual Genome for training and validation will be part of your responsibilities. You will also implement zero-shot/few-shot multimodal inference, retrieval, captioning, VQA (Visual Question Answering), grounding, etc. Collaboration with product teams, ML engineers, and data scientists is essential to deliver real-world multimodal appl...

Posted 2 months ago

AI Match Score

Apply

Generative AI Engineer Neuronest AI Pvt Ltd

1.0 - 5.0 years

0 Lacs

coimbatore, tamil nadu

On-site

You will be responsible for developing, fine-tuning, and evaluating vision-language models such as CLIP, Flamingo, BLIP, GPT-4V, LLaVA, among others. Your role will involve designing and constructing multimodal pipelines that fuse image/video inputs with natural language comprehension or generation. Working with extensive image-text datasets like LAION, COCO, Visual Genome for training and validation will be a key part of your job. You will also be implementing zero-shot/few-shot multimodal inference, retrieval, captioning, VQA (Visual Question Answering), grounding, etc. It is essential to collaborate closely with product teams, ML engineers, and data scientists to deliver practical multimo...

Posted 3 months ago

AI Match Score

Apply

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.