Make a Tony Pro humanoid robot dance autonomously to music at a robot rave.
git clone https://github.com/YOUR_USERNAME/RobotRave.git
cd RobotRave
./setup.shThe setup script will:
- Clone the MINT repository (Google's FACT model implementation)
- Download pre-trained model checkpoints (~500MB) from Google Drive
- Optionally install Python dependencies
Note: The model weights are not stored in this repo due to size. The setup script uses gdown to fetch them automatically.
We use Google's FACT model (Full-Attention Cross-modal Transformer) from their AI Choreographer research to generate human dance movements from music. The model was trained on the AIST++ dataset containing 1,408 dance sequences across 10 genres.
Music Audio → FACT Model → SMPL Human Motion → Retarget → Tony Pro Servos → Robot Dances!
We evaluated three approaches:
- MDLT (Music to Dance Language Translation) - No pre-trained weights, requires ~140 hours training (paper)
- DFKI Robot Dance - Physically feasible trajectories but code not publicly available
- FACT (Google) - Pre-trained checkpoints available, state-of-the-art quality (paper)
FACT was the clear winner: Google already spent months training it on professional dancers, and we can just download and use it.
If FACT inference is too slow or fails, we have rule-based beat-synchronized dancing using aubio for real-time beat/onset detection with pre-defined dance moves.
FACT is a neural network that learned to dance by watching 1,408 videos of professional dancers. Google researchers trained it to understand the relationship between music and movement - things like:
- "When there's a drum hit, raise your arms"
- "During a bass drop, crouch down"
- "When the melody rises, extend outward"
You feed it a song, and it outputs a sequence of body poses - one for every frame (60 per second). It's like having a professional choreographer that works instantly.
┌─────────────────────────────────────────────────────────┐
│ INPUT: Music (tempo, beats, melody, energy) │
│ + 2-second "seed" motion to start from │
│ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ FACT Neural Network │ │
│ │ │ │
│ │ Learned from watching 5.2 hours of dance │ │
│ │ videos across 10 genres (pop, hip-hop, │ │
│ │ breakdance, house, krump, etc.) │ │
│ └─────────────────────────────────────────────────┘ │
│ ↓ │
│ OUTPUT: Body pose every 1/60th of a second │
│ (72 numbers describing joint angles) │
└─────────────────────────────────────────────────────────┘
AIST++ is a massive dance dataset that Google created:
- 1,408 dance sequences performed by 30 professional dancers
- 10 dance genres: pop, lock, waack, jazz, house, middle hip-hop, break, krump, street jazz, LA house
- 5.2 hours of motion capture data (10.1 million frames)
- Every dance was recorded with 9 cameras and precise motion tracking
This is what FACT learned from. The quality of the output depends on the quality of the training data, and AIST++ contains genuinely skilled dancers.
SMPL (Skinned Multi-Person Linear Model) is a standard way to describe human body poses mathematically. Instead of describing movement in words ("raise your right arm"), SMPL uses numbers.
head (joint 15)
│
neck (joint 12)
│
┌──────┴──────┐
L shoulder R shoulder
(joint 16) (joint 17)
│ │
L elbow R elbow
(joint 18) (joint 19)
│ │
L wrist R wrist
(joint 20) (joint 21)
│
spine
│
┌──────┴──────┐
L hip R hip
(joint 1) (joint 2)
│ │
L knee R knee
(joint 4) (joint 5)
│ │
L ankle R ankle
(joint 7) (joint 8)
Each of the 24 joints has 3 rotation values (x, y, z axes). So 72 numbers completely describe any human pose. FACT outputs these 72 numbers for every frame of the dance.
The challenge: SMPL describes a human body with 24 joints, but Tony Pro only has 16 servos. They don't match up perfectly.
What we keep:
| SMPL Joint | Tony Pro Servo | Why |
|---|---|---|
| head | head_pitch (8), head_yaw (16) | Direct match |
| r_shoulder | right_shoulder_pitch (7) | Take the forward/back rotation |
| r_elbow | right_elbow_pitch (5) | Take the bend angle |
| l_shoulder | left_shoulder_pitch (15) | Take the forward/back rotation |
| l_elbow | left_elbow_pitch (13) | Take the bend angle |
| r_hip | right_hip_roll/yaw/pitch (1,2,4) | 3 DOF match |
| r_knee | right_knee_pitch (3) | Direct match |
| r_ankle | right_ankle_pitch (6) | Direct match |
| (same for left leg) | ... | ... |
What we lose:
- Shoulder roll (Tony Pro arms only move forward/back, not sideways)
- Wrist rotation (no wrist servos)
- Spine/torso twist (no torso servo)
- Finger movements (no hand servos)
The conversion process:
# SMPL gives us: right_shoulder rotation = [0.5, -0.3, 0.2] radians (x, y, z)
# Tony Pro only has shoulder pitch (forward/back motion)
# 1. Extract the relevant axis (x = pitch = forward/back)
angle = smpl_right_shoulder[0] # 0.5 radians
# 2. Convert radians to servo pulse (0-1000 scale, 500 = center)
pulse = 500 + (angle * 191) # 191 pulse units per radian
# 3. Clamp to safe range (don't break the servo!)
pulse = clamp(pulse, 200, 800)
# 4. Send to robot
servo.set_position(7, pulse) # Servo 7 = right shoulderSafety scaling:
- Arms and head: 100% of SMPL motion (safe to move freely)
- Legs: 30% of SMPL motion (robot can fall over if legs move too much)
The result isn't a perfect recreation of the human dance, but it captures the essence - when the dancer's arms go up, the robot's arms go up. When they bob their head, the robot bobs its head.
For detailed output format documentation, see MODEL_OUTPUT.md.
After discovery, the correct 16-servo mapping:
| ID | Joint | Group |
|---|---|---|
| 1 | right_hip_roll | right_leg |
| 2 | right_hip_yaw | right_leg |
| 3 | right_knee_pitch | right_leg |
| 4 | right_hip_pitch | right_leg |
| 5 | right_elbow_pitch | right_arm |
| 6 | right_ankle_pitch | right_leg |
| 7 | right_shoulder_pitch | right_arm |
| 8 | head_pitch | head |
| 9 | left_hip_roll | left_leg |
| 10 | left_hip_yaw | left_leg |
| 11 | left_knee_pitch | left_leg |
| 12 | left_hip_pitch | left_leg |
| 13 | left_elbow_pitch | left_arm |
| 14 | left_ankle_pitch | left_leg |
| 15 | left_shoulder_pitch | left_arm |
| 16 | head_yaw | head |
Summary:
- Head: 2 DOF (pitch + yaw)
- Each arm: 2 DOF (shoulder pitch + elbow pitch)
- Each leg: 5 DOF (hip roll/yaw/pitch + knee + ankle)
RobotRave/
├── tony_pro.py # Central servo config & controller module
├── tony_pro_config.json # Servo mapping in JSON format
├── beat_sync_dance.py # Fallback: beat-triggered dance moves
├── smart_dance.py # Style-adaptive dancing (chill/pop/edm/hiphop)
├── play_dance.py # Play pre-generated dance sequences
├── retarget_to_tonypi.py # SMPL → servo command conversion
├── generate_dance.py # Run FACT model on audio files
├── fact_server.py # Cloud GPU server for FACT inference
├── fact_client.py # Robot client for cloud inference
├── discover_servos.py # Interactive servo discovery tool
├── setup.sh # Setup script (downloads model weights)
├── RESEARCH.md # Detailed research notes & approach analysis
├── CLOUD_SETUP.md # GPU cloud deployment guide
├── MODEL_OUTPUT.md # FACT model output format documentation
└── (submodules - not tracked)
├── mint/ # Google FACT model
├── TonyPi/ # Hiwonder SDK
└── MDLT/ # Alternative approach (unused)
# Connect to robot's WiFi network: HW-67B69FFB
# SSH into the robot's Raspberry Pi
ssh pi@192.168.149.1
# Password: raspberrypi
# Copy project files to robot
scp -r *.py tony_pro_config.json pi@192.168.149.1:~/RobotRave/# On the robot - test the servo map is correct
python tony_pro.py --map
# If servos aren't right, run discovery
python discover_servos.py# Simulation mode (prints moves, no robot motion)
python beat_sync_dance.py --test --simulate
# Real robot - test all moves
python beat_sync_dance.py --test# Dance to an audio file
python beat_sync_dance.py --audio your_song.wav
# Live microphone input
python beat_sync_dance.py --live# Detects music genre and adapts style
python smart_dance.py --live
# Or with a file
python smart_dance.py --audio song.wav
# Demo all styles
python smart_dance.py --demo --simulate# On cloud GPU server
python fact_server.py --port 8765
# On robot
python fact_client.py --server ws://<server-ip>:8765# Generate dance from audio (requires FACT)
python generate_dance.py --audio song.mp3 --output dance.json
# Play on robot
python play_dance.py --input dance.jsonfrom tony_pro import SERVO, MOVES, TonyProController
# Access servo IDs
SERVO.HEAD_PITCH # 8
SERVO.R_SHOULDER # 7
SERVO.L_ELBOW # 13
# Use pre-defined moves
controller = TonyProController(simulate=False)
controller.execute_move('arms_up')
controller.execute_move('celebrate')
controller.go_neutral()- Leg servos are limited to 30% range to prevent falls
- Start with
--simulateflag to test without robot motion - Use
--neutral-firstto reset pose before playing sequences - Head movements are limited to prevent strain
For real-time FACT inference, see CLOUD_SETUP.md.
Recommended: AWS g5.xlarge (~$1/hr) with your credits.
On Robot (Raspberry Pi):
pip install numpy aubio sounddevice soundfile
# HiwonderSDK is pre-installedOn Development Machine:
pip install numpy aubio sounddevice soundfile scipyFor FACT Model (Cloud GPU):
pip install tensorflow numpy scipy websockets soundfile- AI Choreographer (FACT) - Google Research
- AIST++ Dataset - Dance motion data
- MINT Repository - FACT implementation
- Hiwonder TonyPi - Robot hardware
| Problem | Solution |
|---|---|
| Servo moves wrong direction | Flip sign in tony_pro.py scale factor |
| Movement too jerky | Increase time_ms parameter |
| Robot falls over | Reduce leg servo ranges in SERVO_LIMITS |
| No beat detection | Check microphone with --simulate flag |
| Can't connect to robot | Verify IP, try ping first |
Built for a 24-hour robot rave hackathon.