Senior Python Developer

2 - 4 years

4 - 8 Lacs

Posted:22 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Role Overview

We are looking for an experienced Python Developer with expertise in large-scale dataset validation and automation pipelines.

The role involves designing scalable and production-grade scripts to process and validate image, video, and audio datasets, detect anomalies (UUID mismatch, metadata issues, frequency checks, corruption), and automate reporting and error handling workflows.

The ideal candidate will have strong software engineering skills, experience with cloud platforms (AWS/GCP/Azure), and building robust data validation pipelines for high-volume media datasets.

Key Responsibilities

  • Build advanced validation frameworks for image, video, and audio datasets (metadata checks, file validation, resolution and format checks, corruption detection, UUID validation).
  • Develop automated pipelines for validation, error reporting, and summary dashboards.
  • Integrate cloud storage (AWS S3/GCP/Azure Blob) for direct dataset processing.
  • Implement parallel processing and multiprocessing to handle millions of files efficiently.
  • Write modular, reusable, and production-ready code with proper logging and exception handling.
  • Build command-line tools or APIs for internal teams to trigger validation workflows.
  • Collaborate with QA, Data Engineering, and ML teams to define dataset quality standards.
  • Maintain CI/CD workflows for automated script deployment and versioning.

Required Technical Skills

  • Advanced Python programming with experience in building scalable, production-ready scripts.
  • Expertise with libraries such as OpenCV, python, Pillow, PyDub, librosa, mutagen, pandas, boto3/google-cloud-storage.
  • Strong understanding of multithreading, multiprocessing, and performance optimization.
  • Experience in hashing (MD5/SHA) and UUID generation/validation for file integrity checks.
  • Hands-on experience with cloud platforms (AWS/GCP/Azure) for file I/O operations at scale.
  • Knowledge of data pipeline orchestration tools (Airflow, Prefect, etc.) is a plus.
  • Familiarity with Docker, CI/CD tools (GitHub Actions, Jenkins, GitLab CI) for script deployment.
  • Ability to design robust error reporting systems (logs, JSON/CSV reports, dashboards).

Good to Have

  • Experience with media processing frameworks (FFmpeg, GStreamer).
  • Knowledge of database integration (PostgreSQL, MongoDB, Elasticsearch) for metadata storage.
  • Exposure to ML dataset curation workflows.
  • Understanding of API development (FastAPI/Flask) to create validation endpoints.

Qualifications

  • Bachelor's/Masters degree in Computer Science, Software Engineering, or related field.
  • 3–5 years of experience in Python-based automation and data validation at scale.
  • Prior experience in AI/ML data projects or large media dataset handling is highly preferred.

Key Deliverables

  • Develop scalable Python scripts to validate datasets across formats (image, video, audio).
  • Create automated error reports (CSV/JSON/PDF) and dashboards for stakeholders.
  • Build pipeline automation tools/APIs for dataset validation.
  • Optimize scripts for parallel execution on large datasets.
  • Ensure seamless integration with cloud storage & CI/CD workflows.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

pune, maharashtra, india

gurugram, haryana, india