Pyspark Developer Lead

3 years

0 Lacs

Posted:1 day ago| Platform: Linkedin logo

Apply

Work Mode

Remote

Job Type

Full Time

Job Description

Company Description

Hybrowlabs Technologies is dedicated to building software better, and faster. We explore every tool that hits the market to find the best stack of tools for software development. Our magical formula and stack of tools will accelerate your software development process. Contact us to learn more.


Role Description

  • Design and architect scalable, robust big data solutions using PySpark and related technologies
  • Lead the technical vision for data processing pipelines and analytics platforms
  • Create comprehensive solution architectures aligned with business requirements and technical constraints
  • Design integration patterns for connecting various data sources, APIs, and downstream systems
  • Establish and enforce coding standards, best practices, and design patterns across the team
  • Conduct architecture reviews and provide technical guidance on complex implementation challenges
Hands-On Development
  • Develop high-performance PySpark applications for large-scale data processing and transformation
  • Optimize existing PySpark jobs for performance, cost-efficiency, and scalability
  • Write efficient, maintainable, and well-documented code that serves as a reference for the team
  • Troubleshoot and resolve complex technical issues in production environments
  • Implement data quality frameworks and validation mechanisms
  • Build reusable components and libraries to accelerate development
Team Leadership & Mentorship
  • Provide technical mentorship and guidance to junior and mid-level developers
  • Conduct code reviews ensuring quality, performance, and adherence to standards
  • Foster a collaborative environment that encourages knowledge sharing and innovation
  • Lead technical discussions and facilitate problem-solving sessions
  • Guide the team in adopting new technologies and methodologies
Solution Design & Delivery
  • Collaborate with business analysts and stakeholders to translate requirements into technical solutions
  • Create detailed technical specifications and design documents
  • Estimate effort, identify risks, and plan technical deliverables
  • Drive proof-of-concepts (POCs) for evaluating new technologies and approaches
  • Ensure timely delivery of high-quality solutions meeting functional and non-functional requirements
Integration & Collaboration
  • Design and implement integration solutions with various data platforms (databases, data lakes, cloud storage)
  • Work closely with DevOps teams to establish CI/CD pipelines for data applications
  • Collaborate with data engineers, data scientists, and analytics teams to build end-to-end solutions
  • Interface with enterprise architects to ensure alignment with organizational standards
🔧 Required Technical SkillsCore Expertise
  • PySpark:

    2–3+ years of hands-on experience building production-grade applications
  • Python:

    Strong programming skills with deep understanding of Python ecosystems and libraries
  • Apache Spark:

    Comprehensive knowledge of Spark architecture, internals, and optimization techniques
  • Big Data Technologies:

    Experience with Hadoop ecosystem, HDFS, Hive, or similar platforms
Data Processing & Engineering
  • Expertise in designing and implementing ETL/ELT pipelines at scale
  • Strong SQL skills and experience with both relational and NoSQL databases
  • Proficiency in data modeling, schema design, and data warehouse concepts
  • Experience with data partitioning, bucketing, and optimization strategies
  • Knowledge of data quality frameworks and testing methodologies
Cloud & Infrastructure
  • Experience with cloud platforms (AWS, Azure, or GCP) and their big data services
  • Familiarity with distributed computing concepts and cluster management
  • Understanding of containerization (Docker) and orchestration (Kubernetes) is a plus
  • Knowledge of cloud-native data services (S3, Azure Data Lake, BigQuery, etc.)
Architecture & Design
  • Proven track record in designing scalable, resilient data architectures
  • Experience with microservices architecture and API design
  • Understanding of data governance, security, and compliance requirements
  • Familiarity with streaming technologies (Kafka, Spark Streaming) is advantageous
Tools & Frameworks
  • Version control systems (Git, Bitbucket, GitHub)
  • CI/CD tools (Jenkins, GitLab CI, Azure DevOps)
  • Workflow orchestration tools (Airflow, Databricks workflows)
  • Monitoring and logging tools (ELK stack, Splunk, CloudWatch)
🎓 Required QualificationsExperience
  • Total IT Experience:

    6+ years in software development and data engineering roles
  • PySpark Experience:

    Minimum 2–3 years of dedicated PySpark development
  • Leadership Experience:

    Demonstrated experience leading technical teams or projects
  • Solution Design:

    Proven experience in end-to-end solution design and architecture
Education
  • Bachelor's or Master's degree in Computer Science, Information Technology, Engineering, or related field
  • Relevant certifications (Databricks, AWS/Azure/GCP, or Spark certifications) are highly desirable
✨ Desired Skills & AttributesTechnical
  • Experience with real-time/streaming data processing
  • Knowledge of machine learning pipelines and MLOps
  • Familiarity with modern data platforms (Databricks, Snowflake, Delta Lake)
  • Understanding of data mesh or data fabric architectures
  • Experience with infrastructure as code (Terraform, CloudFormation)
Soft Skills
  • Leadership:

    Ability to inspire and guide technical teams toward excellence
  • Communication:

    Excellent verbal and written communication skills for technical and non-technical audiences
  • Problem-Solving:

    Strong analytical thinking and creative problem-solving abilities
  • Collaboration:

    Proven ability to work effectively across multiple teams and stakeholders
  • Adaptability:

    Comfortable working in fast-paced, evolving environments
  • Ownership:

    Takes accountability for technical decisions and project outcomes
🌟 What You'll Work On
  • Designing next-generation data platforms and analytics solutions
  • Building scalable data pipelines processing terabytes of data daily
  • Architecting integrations across diverse enterprise systems
  • Optimizing existing systems for performance and cost-efficiency
  • Implementing best practices for data quality, governance, and security
  • Mentoring team members and elevating overall technical capabilities
  • Driving innovation through POCs and adoption of emerging technologies
📍 Work Arrangement
  • Primary:

    Remote work with flexibility
  • Office Visits:

    Periodic visits to Mumbai office for team collaboration, planning sessions, and stakeholder meetings (frequency to be determined based on project needs)
  • Flexibility:

    Results-oriented culture with focus on delivery and collaboration

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You