We are seeking a motivated and results-driven Junior Data Scientist with a passion for problem-solving, data analytics, and natural language processing (NLP). In this role, you will leverage your skills in Python, SQL, Machine learning, and NLP to develop data-driven solutions that drive business decisions and insights. You will be part of a dynamic team and work closely with senior data scientists and business stakeholders to create and optimize models, perform data analysis, and automate processes.
As a Junior Data Scientist, you will have the opportunity to work on a variety of projects, ranging from predictive modelling to text analysis and cloud-based data solutions. You should have a solid foundation in logical thinking, problem-solving, and the ability to apply your knowledge to real-world challenges.
Roles & Responsibilities
- Work independently to manage multiple projects at once while ensuring deadlines are met and data output is accurate and appropriate for the business.
- Implementation of newer algorithms according to the business needs.
- Collaborate with cross functional agile teams of software engineers, product managers, and others in building new product features.
- Think strategically about data as a core enterprise asset and assist in all phases of the advanced analytic development process.
Key Technical Skills:
Programming
:- Strong proficiency in
Python
, with experience in libraries like pandas
, numpy
, scikit-learn
, networkx
, and TensorFlow
for data manipulation and machine learning tasks. - Proficient in building and deploying high-performance APIs using
FastAPI
and Python
, ensuring fast response times and scalability. - Familiarity with Linux/Unix/Shell environments.
- Experience with Data Structures and Algorithms.
Machine Learning
:- Hands-on experience with
supervised learning
(e.g.,regression, classification), unsupervised learning
(e.g., clustering, PCA), and ensemble methods
(e.g., random forests,gradient boosting). - Familiarity with model evaluation metrics (e.g.,
accuracy
, precision
, recall
, F1-score
, ROC-AUC
) and techniques for hyperparameter tuning (e.g., GridSearchCV
, RandomizedSearchCV
).
Natural Language Processing (NLP)
:- Understanding of key NLP tasks like text classification, sentiment analysis, named entity recognition (NER), and text summarization using traditional techniques (e.g., TF-IDF, bag-of-words) and pre-trained models.
- Experience with using LLMs such as
GPT
, BERT
, and T5
for text generation, question answering, and document classification, using libraries like Hugging Face Transformers
. - Ability to fine-tune pre-trained LLMs on domain-specific data and evaluate model performance using metrics like accuracy, F1-score, bleu score, and perplexity, optimizing models for production use.
SQL & Database Management
:- Strong SQL skills for querying relational databases (e.g.,
PostgreSQL
, MySQL
) and managing large datasets. - Experience in writing complex queries for data extraction, transformation, and aggregation.
Azure Cloud Services
:- Familiarity with
Azure Databricks
, Azure Blob Storage
, and Azure SQL Database
for data processing and storage. - Basic knowledge of
Azure Data Factory
for building and orchestrating data pipelines.
Data Visualization
:- Experience creating visualizations with
matplotlib
, seaborn
, and plotly
for data exploration and model results presentation. - Familiarity with dashboarding tools like
Power BI
or Tableau
(optional).
Git
- Proficient in creating and managing feature branches, resolving merge conflicts, and merging code into the main branch using
Git merge
and Git rebase
. - Experienced with
pull requests (PRs)
, reviewing and commenting on code, and working within Git workflows
(e.g., GitFlow
or feature branching) to ensure smooth team collaboration.
Required Qualifications:
- A
bachelors or masters degree in computer science
, statistics
, mathematics
, or a related field. - 12-18 Months of professional experience as a
Data Scientist
, working with machine learning, data analysis, NLP, and cloud technologies. - Strong communication skills, with the ability to explain complex data science concepts to non-technical stakeholders.
- Ability to work independently and as part of a team in a fast-paced environment.
Nice-to-Have Skills:
- Experience with
model deployment
and CI/CD
pipelines. - Familiarity with
big data technologies
such as Spark
or Hadoop
. - Experience with
Docker
for containerization and Kubernetes
for model orchestration.