Jobs

Interviews
Job Alerts
Tools

Upskill and Grow with AI

Mock Interview Practice interviews in realistic simulations

Coding Practice Improve your coding skills with challenges

Certification Earn certifications to validate your skills

AI Learning Get trained with AI expert sessions

Career Path AI insights for smarter career decisions

AI Job Match Score AI-Powered Job Match Against Your Resume and Optimize Your Resume

Career Tools and Resources

Resume Builder Build Professional Resume with Ease

ATS Friendliness Check Check Resume Friendliness for Applicant Tracking Systems

Auto Apply Apply to hundreds of jobs on any platform effortlessly

Co-Pilot (Chrome Extension) Your AI Assistant for Seamless Browsing Efficiency

Interview Questions Streamline interviews with ready-to-use questions

Salaries Discover market-driven salary insights across skillsets and geographies

Companies Explore leading companies actively hiring talent
For Employers

Home
>
Jobs in coimbatore
>
Neuronest AI Pvt Ltd
>
Generative AI Engineer Vision-Language Model (VLM)

Generative AI Engineer Vision-Language Model (VLM)

Neuronest AI Pvt Ltd

1 - 5 years

0 Lacs

coimbatore all india

Posted:2 days ago| Platform: Shine logo

Apply

Skills Required

python opencv computer vision nlp transformers pytorch huggingface transformers torchvision vlm architectures dataset curation imagecaption pair processing imagetext embedding strategies crossattention mechanisms contrastive learning

Work Mode

On-site

Job Type

Full Time

Job Description

As a Vision-Language Model Developer, you will be responsible for developing, fine-tuning, and evaluating vision-language models such as CLIP, Flamingo, BLIP, GPT-4V, LLaVA, etc. You will design and build multimodal pipelines that integrate image/video input with natural language understanding or generation. Working with large-scale image-text datasets like LAION, COCO, and Visual Genome for training and validation will be a key part of your role. Implementing zero-shot/few-shot multimodal inference, retrieval, captioning, VQA (Visual Question Answering), grounding, etc., will also be within your responsibilities. Collaboration with product teams, ML engineers, and data scientists to deliver real-world multimodal applications is essential. Additionally, optimizing model inference performance and resource utilization in production environments using ONNX, TensorRT, etc., will be part of your duties. You will also conduct error analysis, ablation studies, and propose improvements in visual-language alignment. Contribution to research papers, documentation, or patents, if in a research-driven team, is expected. Qualifications required for this role include a Bachelors/Masters/PhD in Computer Science, AI, Machine Learning, or a related field. You should have at least 2+ years of experience in computer vision or NLP, with a minimum of 1+ year in multimodal ML or VLMs. Strong programming skills in Python, with experience in libraries like PyTorch, HuggingFace Transformers, OpenCV, and torchvision are necessary. Familiarity with VLM architectures such as CLIP, BLIP, Flamingo, LLaVA, Kosmos, GPT-4V, etc., is expected. Experience with dataset curation, image-caption pair processing, and image-text embedding strategies is also required. A solid understanding of transformers, cross-attention mechanisms, and contrastive learning is essential for this role. Please note that this is a full-time position with a day shift schedule. The work location is in person. As a Vision-Language Model Developer, you will be responsible for developing, fine-tuning, and evaluating vision-language models such as CLIP, Flamingo, BLIP, GPT-4V, LLaVA, etc. You will design and build multimodal pipelines that integrate image/video input with natural language understanding or generation. Working with large-scale image-text datasets like LAION, COCO, and Visual Genome for training and validation will be a key part of your role. Implementing zero-shot/few-shot multimodal inference, retrieval, captioning, VQA (Visual Question Answering), grounding, etc., will also be within your responsibilities. Collaboration with product teams, ML engineers, and data scientists to deliver real-world multimodal applications is essential. Additionally, optimizing model inference performance and resource utilization in production environments using ONNX, TensorRT, etc., will be part of your duties. You will also conduct error analysis, ablation studies, and propose improvements in visual-language alignment. Contribution to research papers, documentation, or patents, if in a research-driven team, is expected. Qualifications required for this role include a Bachelors/Masters/PhD in Computer Science, AI, Machine Learning, or a related field. You should have at least 2+ years of experience in computer vision or NLP, with a minimum of 1+ year in multimodal ML or VLMs. Strong programming skills in Python, with experience in libraries like PyTorch, HuggingFace Transformers, OpenCV, and torchvision are necessary. Familiarity with VLM architectures such as CLIP, BLIP, Flamingo, LLaVA, Kosmos, GPT-4V, etc., is expected. Experience with dataset curation, image-caption pair processing, and image-text embedding strategies is also required. A solid understanding of transformers, cross-attention mechanisms, and contrastive learning is essential for this role. Please note that this is a full-time position with a day shift schedule. The work location is in person.

More Jobs at Neuronest AI Pvt Ltd

Full Stack Developer

India

1.0 - 1.0 yrs

INR 0 - 0 Lacs

Full Stack Web Developer

India

Experience: Not specified

INR 3 - 4 Lacs

Telesales Executive

Gandhipuram, Coimbatore, Tamil Nadu

Experience: Not specified

INR 0 - 0 Lacs

Telesales Executive

India

Experience: Not specified

INR 0 - 0 Lacs

Generative AI Engineer – Vision-Language Model (VLM)

Coimbatore

2.0 - 2.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Neuronest AI Pvt Ltd

Login to

Please Verify Your Phone or Email

Confirm Action

Generative AI Engineer Vision-Language Model (VLM)