Cymasonic Labs

1 Job openings at Cymasonic Labs
AI/ML Engineer – Music Understanding + Video Analysis bengaluru,karnataka,india 0 years None Not disclosed On-site Full Time

Company Description Cymasonic Labs is a multidisciplinary organization where sound meets science and creativity. Inspired by the transformative nature of sound, Cymasonic operates four key divisions: Cymasonic R&D, Cymasonic Records, Cymasonic Productions, and Cymasonic EdTech Labs. These divisions focus on a range of areas, including audio artificial intelligence, sound therapy, bioacoustics, music production, immersive releases, and AI-powered music education solutions for learners worldwide. Rooted in innovation, the company is dedicated to exploring the vast potentials of sound in various fields. Role Description As an ML / Audio AI Engineer at Cymasonic Labs, you will work on applied research and product-level engineering across audio and video modalities. The role involves building and refining machine learning pipelines for tasks such as sound event detection, music feature extraction, pose estimation, and multimodal fusion. You will collaborate with the research and product teams to convert experiments into deployable systems that operate reliably in real-world environments, including low-latency and edge settings. Basic Requirement Strong Python programming: Proficiency in writing clean, efficient, and modular Python code specifically for machine learning applications. Basics of signal processing: Fundamental understanding of how to analyze and manipulate signals in the time and frequency domains. Understanding of audio ML fundamentals: Knowledge of feature extraction techniques including MFCCs, Mel-spectrograms, onset detection, and pitch tracking algorithms. Understanding of video ML fundamentals: Conceptual grasp of computer vision techniques, specifically focusing on human keypoint detection and pose estimation. Knowledge of PyTorch or TensorFlow: Practical experience in building, training, and debugging neural networks using PyTorch or TensorFlow frameworks. Familiarity with audio libraries: Experience utilizing standard Python audio analysis and processing libraries such as Librosa and Torchaudio. Additional / Advanced Requirement Hands-on experience with music/audio ML models: Demonstrated ability to implement models for pitch detection, tempo/rhythm analysis, beat tracking, chord recognition, and note-level analysis. Experience in video analysis: Proficiency in applying computer vision for detailed pose estimation, fine-grained hand/finger tracking, and dynamic gesture recognition. Experience with Transformer-based audio models: Familiarity with fine-tuning or implementing state-of-the-art architectures like Wav2Vec2, Whisper, and Encodec for audio tasks. Ability to process and manage large audio datasets: Competence in curating, cleaning, pre-processing, and organizing large-scale audio datasets for effective model training. Knowledge of video processing frameworks: Skilled in utilizing open-source video processing and computer vision tools like OpenCV, MediaPipe, or MMPose. Real-time system development experience: Background in optimizing model inference pipelines to achieve low-latency performance in real-time application environments. Knowledge of DSP (Digital Signal Processing): Deep understanding of mathematical signal processing theories to improve feature engineering and model accuracy. Tools & Frameworks PyTorch / TensorFlow Librosa, Torchaudio OpenCV MediaPipe / MMPose HuggingFace Transformers NumPy / Pandas Jupyter / Colab FastAPI / FlaskAPI