Job
Description
As a Spark Cluster Optimization Specialist at the company, your role will involve optimizing Spark clusters for cost, efficiency, and performance. You will be responsible for: - Implementing robust monitoring systems to identify bottlenecks using data and metrics - Providing actionable recommendations for continuous improvement - Optimizing the infrastructure required for extraction, transformation, and loading of data from various sources using SQL and AWS big data technologies - Collaborating with data and analytics experts to enhance cost efficiencies in the data systems To excel in this role, you should possess the following qualifications: - Experience processing large workloads and complex code on Spark clusters - Proven track record in setting up monitoring for Spark clusters and driving optimization based on insights - Designing and implementing scalable Data Warehouse solutions to support analytical and reporting needs - Strong analytic skills for working with unstructured datasets - Building processes supporting data transformation, structures, metadata, dependency, and workload management - Working knowledge of message queuing, stream processing, and highly scalable big data stores Additionally, you should have experience using the following software/tools: - Expertise with Python and Jupyter notebooks - Proficiency with big data tools like Spark, Kafka, etc. - Familiarity with relational SQL and NoSQL databases such as Postgres and Cassandra - Experience with data pipeline and workflow management tools like Azkaban, Luigi, Airflow, etc. - Knowledge of AWS cloud services like EC2, EMR, RDS, Redshift - Working knowledge of stream-processing systems such as Storm, Spark-Streaming, etc. would be a plus Join the dynamic team dedicated to optimizing data systems for efficiency and performance. As a Spark Cluster Optimization Specialist at the company, your role will involve optimizing Spark clusters for cost, efficiency, and performance. You will be responsible for: - Implementing robust monitoring systems to identify bottlenecks using data and metrics - Providing actionable recommendations for continuous improvement - Optimizing the infrastructure required for extraction, transformation, and loading of data from various sources using SQL and AWS big data technologies - Collaborating with data and analytics experts to enhance cost efficiencies in the data systems To excel in this role, you should possess the following qualifications: - Experience processing large workloads and complex code on Spark clusters - Proven track record in setting up monitoring for Spark clusters and driving optimization based on insights - Designing and implementing scalable Data Warehouse solutions to support analytical and reporting needs - Strong analytic skills for working with unstructured datasets - Building processes supporting data transformation, structures, metadata, dependency, and workload management - Working knowledge of message queuing, stream processing, and highly scalable big data stores Additionally, you should have experience using the following software/tools: - Expertise with Python and Jupyter notebooks - Proficiency with big data tools like Spark, Kafka, etc. - Familiarity with relational SQL and NoSQL databases such as Postgres and Cassandra - Experience with data pipeline and workflow management tools like Azkaban, Luigi, Airflow, etc. - Knowledge of AWS cloud services like EC2, EMR, RDS, Redshift - Working knowledge of stream-processing systems such as Storm, Spark-Streaming, etc. would be a plus Join the dynamic team dedicated to optimizing data systems for efficiency and performance.