立即登录

HuMo-1.7B

Play Count7
Fork Count0
Like Count0
创建: 2026-03-16更新: 2026-03-16

HuMo is a unified, human-centric video generation framework designed to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. It supports strong text prompt following, consistent subject preservation, synchronized audio-driven motion.

​​- VideoGen from Text-Image​​ - Customize character appearance, clothing, makeup, props, and scenes using text prompts combined with reference images.

  • ​​VideoGen from Text-Audio​​ - Generate audio-synchronized videos solely from text and audio inputs, removing the need for image references and enabling greater creative freedom. ​​- VideoGen from Text-Image-Audio​​ - Achieve the higher level of customization and control by combining text, image, and audio guidance.

GitHub 转载自Huggingface

返图区

暂无返图