Universal Sound Separation on HIVE
Hive is a high-quality synthetic dataset (2k hours) built via an automated pipeline that mines high-purity single-event segments and synthesizes semantically consistent mixtures. Despite using only ~0.2% of the data scale of million-hour baselines, models trained on Hive achieve competitive separation accuracy and strong zero-shot generalization.
This space provides two separation models trained on Hive:
- AudioSep: A foundation model for open-domain sound separation with natural language queries, based on AudioSep.
- FlowSep: A flow-matching based separation model with text conditioning, based on FlowSep.
How to use:
- Upload an audio file (mix of sounds)
- Describe what you want to separate (e.g., "piano", "speech", "dog barking")
- Select a model and click Separate
Select Model
Examples
Examples