# CogVideo **Repository Path**: tunglinwood/CogVideo ## Basic Information - **Project Name**: CogVideo - **Description**: CogVideo - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 2 - **Created**: 2024-08-21 - **Last Updated**: 2025-08-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CogVideo && CogVideoX [Read this in English.](./README_zh) [日本語で読む](./README_ja.md)
🤗 在 CogVideoX Huggingface Space 体验视频生成模型
📚 查看 论文
📍 前往 清影 和 API平台 体验更大规模的商业版视频生成模型。
## 项目更新 - 🔥🔥**News**: ```2024/8/20```: [VEnhancer](https://github.com/Vchitect/VEnhancer) 已经支持对 CogVideoX 生成的视频进行增强,实现更高分辨率,更高质量的视频渲染。欢迎大家按照[教程](tools/venhancer/README_zh.md)体验使用。 - 🔥**News**: ```2024/8/15```: CogVideoX 依赖中`SwissArmyTransformer`依赖升级到`0.4.12`, 微调不再需要从源代码安装`SwissArmyTransformer`。同时,`Tied VAE` 技术已经被应用到 `diffusers` 库中的实现,请从源代码安装 `diffusers` 和 `accelerate` 库,推理 CogVdideoX 仅需 12GB显存。推理代码需要修改,请查看 [cli_demo](inference/cli_demo.py) - 🔥 **News**: ```2024/8/12```: CogVideoX 论文已上传到arxiv,欢迎查看[论文](https://arxiv.org/abs/2408.06072)。 - 🔥 **News**: ```2024/8/7```: CogVideoX 已经合并入 `diffusers` 0.30.0版本,单张3090可以推理,详情请见[代码](inference/cli_demo.py)。 - 🔥 **News**: ```2024/8/6```: 我们开源 **3D Causal VAE**,用于 **CogVideoX-2B**,可以几乎无损地重构视频。 - 🔥 **News**: ```2024/8/6```: 我们开源 CogVideoX 系列视频生成模型的第一个模型, **CogVideoX-2B**。 - 🌱 **Source**: ```2022/5/19```: 我们开源了 CogVideo 视频生成模型(现在你可以在 `CogVideo` 分支中看到),这是首个开源的基于 Transformer 的大型文本生成视频模型,您可以访问 [ICLR'23 论文](https://arxiv.org/abs/2205.15868) 查看技术细节。 **性能更强,参数量更大的模型正在到来的路上~,欢迎关注** ## 目录 跳转到指定部分: - [快速开始](#快速开始) - [SAT](#sat) - [Diffusers](#Diffusers) - [CogVideoX-2B 视频作品](#cogvideox-2b-视频作品) - [CogVideoX模型介绍](#模型介绍) - [完整项目代码结构](#完整项目代码结构) - [Inference](#inference) - [SAT](#sat) - [Tools](#tools) - [开源项目规划](#开源项目规划) - [模型协议](#模型协议) - [CogVideo(ICLR'23)模型介绍](#cogvideoiclr23) - [引用](#引用) ## 快速开始 ### 提示词优化 在开始运行模型之前,请参考[这里](inference/convert_demo.py) 查看我们是怎么使用GLM-4(或者同级别的其他产品,例如GPT-4) 大模型对模型进行优化的,这很重要, 由于模型是在长提示词下训练的,一个好的提示词直接影响了视频生成的质量。 ### SAT 查看sat文件夹下的[sat_demo](sat/README.md):包含了 SAT 权重的推理代码和微调代码,推荐基于此代码进行 CogVideoX 模型结构的改进,研究者使用该代码可以更好的进行快速的迭代和开发。 (18 GB 推理, 40GB lora微调) ### Diffusers ``` pip install -r requirements.txt ``` 查看[diffusers_demo](inference/cli_demo.py):包含对推理代码更详细的解释,包括各种关键的参数。(24GB 推理,微调代码正在开发) ## CogVideoX-2B 视频作品A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting.
The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from its tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
A street artist, clad in a worn-out denim jacket and a colorful bandana, stands before a vast concrete wall in the heart, holding a can of spray paint, spray-painting a colorful bird on a mottled wall.
In the haunting backdrop of a war-torn city, where ruins and crumbled walls tell a story of devastation, a poignant close-up frames a young girl. Her face is smudged with ash, a silent testament to the chaos around her. Her eyes glistening with a mix of sorrow and resilience, capturing the raw emotion of a world that has lost its innocence to the ravages of conflict.