Papers
arxiv:2601.21709

Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis

Published on Jan 29
· Submitted by
Charlie_li
on Feb 2
Authors:
,
,
,
,
,
,
,
,

Abstract

Temporal Attention Pattern Predictability Analysis (TAPPA) provides a unified framework for understanding attention patterns in large language models by analyzing their mathematical formulations from a temporal perspective, distinguishing predictable from unpredictable patterns based on query self-similarity.

AI-generated summary

Attention patterns play a crucial role in both training and inference of large language models (LLMs). Prior works have identified individual patterns such as retrieval heads, sink heads, and diagonal traces, yet these observations remain fragmented and lack a unifying explanation. To bridge this gap, we introduce Temporal Attention Pattern Predictability Analysis (TAPPA), a unifying framework that explains diverse attention patterns by analyzing their underlying mathematical formulations from a temporally continuous perspective. TAPPA both deepens the understanding of attention behavior and guides inference acceleration approaches. Specifically, TAPPA characterizes attention patterns as predictable patterns with clear regularities and unpredictable patterns that appear effectively random. Our analysis further reveals that this distinction can be explained by the degree of query self-similarity along the temporal dimension. Focusing on the predictable patterns, we further provide a detailed mathematical analysis of three representative cases through the joint effect of queries, keys, and Rotary Positional Embeddings (RoPE). We validate TAPPA by applying its insights to KV cache compression and LLM pruning tasks. Across these tasks, a simple metric motivated by TAPPA consistently improves performance over baseline methods. The code is available at https://github.com/MIRALab-USTC/LLM-TAPPA.

Community

Paper submitter
edited 1 day ago

We systematically analyze attention patterns from a unified temporal perspective and find that embedding temporal self-similarity and RoPE are key factors underlying streaming, retrieval, seasonal, and reaccess attention patterns. We further apply time-series analysis methodologies to study attention scores and their dynamics, which is interesting.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.21709 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.21709 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.21709 in a Space README.md to link it from this page.

Collections including this paper 2