YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Flash Attention 3 compatible with torch.compile. See this PR by guilhermeleobas for more details.
There is a build here for Torch 2.8, Torch 2.9, Torch 2.10, Torch 2.11, and Torch 2.12, across CUDA 12.6, 12.8, and 13.0.
The recommended use of this build is with Huggingface's kernels. You can see an example of this build being used in modded-nanogpt.
Reproduce:
Torch 2.8.0 Build
Compiled from https://github.com/varunneal/flash-attention on branch guilhermeleobas/fa3-compile.
Compilation commands:
pip install -U pip wheel setuptools ninja numpy packaging psutil
pip install torch==2.8.0
git clone https://github.com/varunneal/flash-attention
cd flash-attention/hopper
git switch fa3-compile
export MAX_JOBS=32
export FLASH_ATTENTION_FORCE_BUILD=TRUE # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE # leave BF16, FP8
# Optional, for faster compilation time
export FLASH_ATTENTION_DISABLE_HDIM64=TRUE
export FLASH_ATTENTION_DISABLE_HDIM96=TRUE
export FLASH_ATTENTION_DISABLE_HDIM192=TRUE
export FLASH_ATTENTION_DISABLE_HDIM256=TRUE
python setup.py bdist_wheel
Torch Nightlies build
Compiled from https://github.com/varunneal/flash-attention on branch stable.
This is a custom fork that combines ABI Compatibility with torch.compile compatbility.
This build should be consistent with Torch Nightlies from 08/30 onward.
Compilation commands:
pip install -U pip wheel setuptools ninja numpy packaging psutil
# Any Torch Nightly after 08/30 should be alright
pip install --pre "torch==2.10.0.dev20250926+cu126" --index-url https://download.pytorch.org/whl/nightly/cu126
git clone https://github.com/varunneal/flash-attention
cd flash-attention/hopper
git switch stable
export MAX_JOBS=32
export FLASH_ATTENTION_FORCE_BUILD=TRUE # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE # leave BF16, FP8
python setup.py bdist_wheel
CUDA 13.0 Build
Compiled from the official Dao-AILab/flash-attention main branch, commit 0f82fea (Feb 18, 2026).
Compilation commands:
pip install -U pip wheel setuptools ninja numpy packaging psutil
pip install torch # any torch >= 2.9
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
export MAX_JOBS=32
export FLASH_ATTENTION_FORCE_BUILD=TRUE # skip prebuilt wheel fetch
export FLASH_ATTENTION_DISABLE_SM80=TRUE # Hopper-only
export FLASH_ATTENTION_DISABLE_FP16=TRUE # leave BF16, FP8
python setup.py bdist_wheel
Tips for ARM builds
On an aarch64/ARM64 system, such as a GH200 server, building requires a bit of finesse. Try:
export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
export MAX_JOBS=4
Please contact me if you would like me to build wheels for any other version of python or torch.
- Downloads last month
- 65