Senior Data Science Engineer

7.0 years

0.0 Lacs P.A.

Gurugram, Haryana, India

Posted:1 week ago| Platform: Linkedin logo

Apply Now

Skills Required

datadevelopmentpowerdesignlearningreliabilitycheckspricingidsprocessinganalyzeredisanalysiselasticsearchextractmodeloptimizationlatencycollaborationarchitectureapiservicemonitoringstatsdrivescalabilitystatisticstensorflowpytorchalgorithmsregressionclusteringforecastingnetworksapachesparkhadoopkafkaanalyticsqueryingindexingprogrammingpythonnumpydaskawsazuregcpscalingsqlnosqlpostgresqlmongodbextractioncommunicationreinforcement

Work Mode

On-site

Job Type

Full Time

Job Description

As a Senior Data Science Engineer in IOL’s Data Team, you will lead the development of advanced predictive models to power a smart caching layer for our B2B hospitality marketplace. Handling an unprecedented scale of data—2 billion searches, 1 billion price verifications, and 100 million bookings daily—you will design machine learning solutions to predict search patterns and prefetch data from 3P suppliers, reducing their infrastructure load and improving system reliability. This role demands deep expertise in big data, machine learning, and distributed systems, as well as the ability to architect scalable, data-driven solutions in a fast-paced environment. The Challenge IOL operates a high-traffic B2B marketplace that matches hotel room supply with demand. Our platform processes: Searches : 2 billion daily queries for hotel prices based on hotel ID, room type, check-in date, length of stay, and party size. Price Verifications : 1 billion daily checks to confirm pricing. Bookings : 100 million daily bookings. Key Responsibilities Predictive Modeling : Design and implement machine learning models to predict high-demand search patterns based on historical data (e.g., hotel IDs, room types, dates, and party sizes). Big Data Processing : Develop scalable data pipelines to process and analyze massive datasets (2 billion searches daily) using distributed computing frameworks. Smart Caching Layer : Architect and optimize a predictive cache prefetcher that proactively populates the cache cluster (Redis) with high-value data during 3P off- peak hours. Data Analysis : Leverage Elasticsearch and ES Searches Log to extract insights from search patterns, seasonal trends, and user behavior. Model Optimization : Continuously refine predictive models to handle the massive permutations of search parameters, ensuring high accuracy and low latency. Collaboration : Work with the Data Team, platform engineers, and 3P proxy teams to integrate models into the existing architecture (Load Balancer, API Gateway, Service Router, Cache Cluster). Performance Monitoring : Monitor cache hit/miss ratios, model accuracy, and system performance, using tools like Cache Stats Collector to drive optimization. Scalability : Ensure models and pipelines scale horizontally to handle increasing data volumes and traffic spikes. Innovation : Stay updated on advancements in machine learning, big data, and distributed systems, proposing novel approaches to enhance predictive capabilities. Required Skills & Qualifications Education : Master’s or Ph.D. in Data Science, Computer Science, Statistics, or a related field. Experience : o 7+ years of experience in data science, with a focus on machine learning and predictive modeling. o 5+ years of hands-on experience processing and analyzing big data sets (terabyte-scale or larger) in distributed environments. o Proven track record of building and deploying machine learning models in production for high-traffic systems. Technical Skills : o Deep expertise in machine learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn) and algorithms (e.g., regression, clustering, time-series forecasting, neural networks). o Extensive experience with big data technologies (e.g., Apache Spark, Hadoop, Kafka) for distributed data processing. o Proficiency in Elasticsearch for search and analytics, including querying and indexing large datasets. o Strong programming skills in Python, with experience in data science libraries (e.g., Pandas, NumPy, Dask). o Familiarity with Redis or similar in-memory data stores for caching. o Knowledge of cloud platforms (e.g., AWS, Azure, GCP) for deploying and scaling data pipelines.o Experience with SQL and NoSQL databases (e.g., PostgreSQL, MongoDB) for data extraction and transformation. o Proficiency in designing and optimizing data pipelines for high-throughput, low-latency systems. Problem-Solving : Exceptional ability to tackle complex problems, such as handling massive permutations of search parameters and predicting trends in dynamic datasets. Communication : Strong written and verbal communication skills to collaborate with cross-functional teams and present insights to stakeholders. Work Style : Self-motivated, proactive, and able to thrive in a fast-paced, innovative environment. Preferred Skills Experience in the hospitality or travel industry, particularly with search or booking systems. Familiarity with real-time data streaming and event-driven architectures (e.g., Apache Kafka, Flink). Knowledge of advanced time-series forecasting techniques for seasonal and cyclical data. Exposure to reinforcement learning or online learning for dynamic model adaptation. Experience optimizing machine learning models for resource-constrained environments (e.g., edge devices or low-latency systems). Show more Show less

MatchLab Talent
Not specified
[ ]

RecommendedJobs for You

Navi Mumbai, Thane, Mumbai (All Areas)