Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
5.0 - 9.0 years
0 Lacs
chennai, all india
On-site
As an engineer in this role, you will be responsible for building and optimizing high-throughput, low-latency LLM inference infrastructure using open-source models such as Qwen, LLaMA, and Mixtral on multi-GPU systems like A100/H100. Your ownership will include performance tuning, model hosting, routing logic, speculative decoding, and cost-efficiency tooling. Key Responsibilities: - Deep experience with vLLM, tensor/pipe parallelism, and KV cache management - Strong grasp of CUDA-level inference bottlenecks, FlashAttention2, and quantization - Familiarity with FP8, INT4, and speculative decoding (e.g., TwinPilots, PowerInfer) - Proven ability to scale LLMs across multi-GPU nodes using TP, D...
Posted 3 days ago
6.0 - 10.0 years
0 Lacs
chennai, tamil nadu
On-site
You are a Java Backend Developer with experience of 7+ years, skilled in Java, Spring Boot, Micro-services, GCP/AWS, and possess a strong background in micro-services development in Spring Boot. You must have excellent written and verbal communication skills to effectively collaborate with domain experts and technical experts in the team. **Responsibilities:** - Maintain active relationships with Product Owner to understand business requirements - Lead requirement gathering meetings and review designs with the product owner - Own backlog items and coordinate with other team members to develop planned features for each sprint - Perform technical design reviews and code reviews - Responsible f...
Posted 1 month ago
3.0 - 7.0 years
0 Lacs
karnataka
On-site
Role Overview: You will be responsible for building responsive and scalable web applications using ReactJS. Working with React Hooks such as useState, useEffect, and useContext, you will manage state using Redux or Context API. It will be your duty to implement advanced React patterns like HOCs, render props, lazy loading, and code-splitting. Additionally, you will develop and maintain backend services using Node.js and integrate the frontend with backend systems and third-party APIs via RESTful APIs. You will translate UI designs from Figma/Sketch into pixel-perfect React components using Material UI or similar libraries. Writing unit/integration tests using Jest, React Testing Library, or ...
Posted 2 months ago
5.0 - 9.0 years
0 Lacs
karnataka
On-site
Role Overview: You will be responsible for designing, developing, and maintaining scalable web applications using Python, Angular/React, and cloud technologies. Your role will involve collaborating with cross-functional teams to deliver new features and working with real-time data streaming and messaging platforms like Kafka or GCP Pub/Sub. Key Responsibilities: - Develop and maintain end-to-end web applications using Python and Angular/React. - Design and implement REST APIs and integrate them with frontend components. - Work with Kafka or GCP Pub/Sub for real-time data streaming and messaging. - Handle data modeling and queries in SQL and NoSQL databases. - Collaborate with cross-functiona...
Posted 2 months ago
8.0 - 12.0 years
0 - 0 Lacs
hyderabad, telangana
On-site
You will be hired for 2 contractor positions in India (Hyderabad) with the following skill sets: - 8-12 years of software development experience - Strong proficiency in Python, MongoDB, GCP/AWS, and full-stack development. Previous work experience in building SaaS products - Experience in working on fast-paced projects, demonstrating the ability to deliver independently with high accuracy and commitment The package offered for this position ranges from 20 L to 45 L depending on your experience.,
Posted 2 months ago
5.0 - 9.0 years
0 Lacs
chennai, tamil nadu
On-site
As an engineer in this role, you will be responsible for building and optimizing high-throughput, low-latency LLM inference infrastructure. This will involve using open-source models such as Qwen, LLaMA, and Mixtral on multi-GPU systems like A100/H100. Your main areas of focus will include performance tuning, model hosting, routing logic, speculative decoding, and cost-efficiency tooling. To excel in this position, you must have deep experience with vLLM, tensor/pipe parallelism, and KV cache management. A strong understanding of CUDA-level inference bottlenecks, FlashAttention2, and quantization is essential. Additionally, familiarity with FP8, INT4, and speculative decoding (e.g., TwinPilo...
Posted 3 months ago
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
174558 Jobs | Dublin
Wipro
55192 Jobs | Bengaluru
EY
44116 Jobs | London
Accenture in India
37169 Jobs | Dublin 2
Turing
30851 Jobs | San Francisco
Uplers
30086 Jobs | Ahmedabad
IBM
27225 Jobs | Armonk
Capgemini
23907 Jobs | Paris,France
Accenture services Pvt Ltd
23788 Jobs |
Infosys
23603 Jobs | Bangalore,Karnataka