Universal Sound Separation on HIVE

AudioSep: A foundation model for open-domain sound separation with natural language queries, based on AudioSep .
FlowSep: A flow-matching based separation model with text conditioning, based on FlowSep .

Hive is a high-quality synthetic dataset (2k hours) built via an automated pipeline that mines high-purity single-event segments and synthesizes semantically consistent mixtures. Despite using only ~0.2% of the data scale of million-hour baselines, models trained on Hive achieve competitive separation accuracy and strong zero-shot generalization.

This space provides two separation models trained on Hive: