Opportunity
Creating natural-looking virtual dance to match input music is costly and challenging. Game and VR producers spend significant resources on professional choreographers and motion-capture equipment. Existing automatic methods fall into two categories. Neural network-based approaches require massive amounts of clean dance data (rare and expensive) and suffer from error accumulation, freezing, or limited generalization to different music rhythms. Feature-similarity-based methods manually organize dance segments from databases but cannot adjust segment length to fit different music rhythms, and often require human intervention for selection. Public dance datasets are either small, noisy (from video pose estimation), or use incompatible skeleton models. There is a need for a robust, fully automated method that can generate long, natural, rhythm-synchronized dances from arbitrary input music without manual choreography.
Technology
This patent presents a computer-implemented method that generates dance by reorganizing existing dance segments based on music feature similarity. The method first preprocesses a dance dataset (e.g., AIST++): it extracts raw music features (MFCC, chroma, envelope, peak), detects musical beats, and segments dances at beat boundaries. For each segment, it reduces feature dimensionality via PCA, clusters vectors using K-means, and assigns a music label per cluster. A probability distribution function of labels over time is computed for each segment. For generating a new dance from input music, the method extracts the same features, detects beats, and segments the music. It compares the input music's PDF with those in the database using Kullback-Leibler divergence, selects the n closest music pieces, and retrieves candidate dance segments. A dynamic programming process minimizes a cost function balancing: (1) music distance (KL divergence of PDFs over cumulative segments) and (2) pose distance between adjacent segments (using a transformation-invariant metric that aligns rotations and translations). For each selected pair of adjacent segments, the last five frames of the first and first five frames of the second generate a smooth three-frame transition via cubic splines. The method adjusts segment length to match different rhythms by resampling continuous motion curves (cubic Hermite spline for position, quaternion cubic spline for rotation). The final output is a continuous, music-synchronized dance sequence.
Advantages
- Fully Automatic: No human intervention needed for segment selection or choreography.
- Handles Variable Segment Lengths: Uses PDF-based similarity via KL divergence to compare segments of different durations, broadening the selectable database.
- Rhythm-Adaptive Motion: Cubic splines allow resampling to match arbitrary input music tempos without distorting motion quality.
- Smooth Transitions: Transformation-invariant pose distance (rotation+translation alignment) plus cubic spline interpolation ensures seamless connections between segments.
- Superior Performance: Outperforms state-of-the-art methods (DanceRevolution, FACT, Bailando) on the AIST++ dataset across FID, motion diversity, and beat alignment metrics; especially robust for random input music-initial pose pairs.
- Long-Form Generation: Maintains quality over extended durations (entire songs), unlike neural methods that accumulate errors.
Applications
- Game & VR Development: Automatically generating character dance animations from any music track.
- Social Media & Content Creation: Creating dance videos for platforms like TikTok or Instagram without manual animation.
- Robotic Choreography: Converting generated dance sequences into control instructions for humanoid or entertainment robots.
- Virtual Performances: Real-time dance generation for virtual concerts, avatars, or metaverse events.
- Dance Education & Training: Generating example dances for learners based on music they choose, without needing a live choreographer.
