Universal Sound Separation on HIVE

Hive is a high-quality synthetic dataset (2k hours) built via an automated pipeline that mines high-purity single-event segments and synthesizes semantically consistent mixtures. Despite using only ~0.2% of the data scale of million-hour baselines, models trained on Hive achieve competitive separation accuracy and strong zero-shot generalization.

This space provides two separation models trained on Hive:

  • AudioSep: A foundation model for open-domain sound separation with natural language queries, based on AudioSep.
  • FlowSep: A flow-matching based separation model with text conditioning, based on FlowSep.

How to use:

  1. Upload an audio file (mix of sounds)
  2. Describe what you want to separate (e.g., "piano", "speech", "dog barking")
  3. Select a model and click Separate

[Paper] | [Code] | [Hive Dataset] | [Demo Page]

Select Model

Examples

Examples