5 - 9 years
7.0 - 11.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
Posted:2 months ago| Platform:
Work from Office
Full Time
Senior PySpark Developer - Complex XML Data Processing Key Responsibilities: Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity. Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures. Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently. Transform raw hierarchical XML data into structured Data Frames for analytics, machine learning, and reporting use cases. Collaborate with data architects and analysts to define data models for nested XML schemas. Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop). Document parsing logic, data lineage, and optimization strategies for maintainability. Qualifications: 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments. Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference. Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data. Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management. Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP). Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi). Bachelor s degree in Computer Science, Data Engineering, or a related field. Preferred Skills: Experience with schema evolution and versioning for nested XML/JSON datasets. Knowledge of Scala or Java for extending Spark XML libraries. Exposure to Databricks, Delta Lake, or similar platforms. Certifications in AWS/Azure big data technologies.
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Bengaluru, Hyderabad
INR 3.5 - 8.5 Lacs P.A.
Mumbai, Bengaluru, Gurgaon
INR 5.5 - 13.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 3.0 - 7.0 Lacs P.A.
Chennai, Pune, Mumbai (All Areas)
INR 5.0 - 15.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 11.0 - 21.0 Lacs P.A.
Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata
INR 15.0 - 16.0 Lacs P.A.
Pune, Bengaluru, Mumbai (All Areas)
INR 10.0 - 15.0 Lacs P.A.
Bengaluru, Hyderabad, Mumbai (All Areas)
INR 0.5 - 3.0 Lacs P.A.
Hyderabad, Gurgaon, Mumbai (All Areas)
INR 6.0 - 16.0 Lacs P.A.
Bengaluru, Noida
INR 16.0 - 22.5 Lacs P.A.