Skip to main content

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

% Chen Shani, Stanford University, % Address One, Liron Soffer, Tel Aviv University, % Address Two, Dan Jurafsky, Stanford University, % Address Five, Yann LeCun, New York University; Meta - FAIR, Ravid Shwartz-Ziv, New York University; Wand.AI

Abstract

Humans organize knowledge into compact categories through semantic compression by mapping diverse instances to abstract representations while preserving meaning (e.g., robin and blue jay are both birds; most birds can fly). These concepts reflect a trade-off between expressive fidelity and representational simplicity. Large Language Models (LLMs) demonstrate remarkable linguistic abilities, yet whether their internal representations strike a human-like trade-off between compression and semantic fidelity is unclear. We introduce a novel information-theoretic framework, drawing from Rate-Distortion Theory and the Information Bottleneck principle, to quantitatively compare these strategies. Analyzing token embeddings from a diverse suite of LLMs against seminal human categorization benchmarks, we uncover key divergences. While LLMs form broad conceptual categories that align with human judgment, they struggle to capture the fine-grained semantic distinctions crucial for human understanding. More fundamentally, LLMs demonstrate a strong bias towards aggressive statistical compression, whereas human conceptual systems appear to prioritize adaptive nuance and contextual richness, even if this results in lower compressional efficiency by our measures. These findings illuminate critical differences between current AI and human cognitive architectures, guiding pathways toward LLMs with more human-aligned conceptual representations.

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning

Chen Shani Stanford University cshani@stanford.edu

Liron Soffer Tel Aviv University lironso@mail.tau.ac.il

Humans organize knowledge into compact conceptual categories that balance compression with semantic richness. Large Language Models (LLMs) exhibit impressive linguistic abilities, but whether they navigate this same compressionmeaning trade-off remains unclear. We apply an Information Bottleneck framework to compare human conceptual structure with embeddings from 40+ LLMs using classic categorization benchmarks(Rosch, 1973a; 1975; McCloskey & Glucksberg, 1978). We find that LLMs broadly align with human category boundaries, yet fall short on fine-grained semantic distinctions. Unlike humans, who maintain 'inefficient' representations that preserve contextual nuance, LLMs aggressively compress, achieving more optimal information-theoretic compression at the cost of semantic richness. Surprisingly, encoder models outperform much larger decoder models in human alignment, suggesting that understanding and generation rely on distinct representational mechanisms. Training-dynamics analysis reveals a two-phase trajectory: rapid initial concept formation followed by architectural reorganization, during which semantic processing migrates from deep to mid-network layers as the model discovers increasingly efficient, sparser encodings. These divergent strategies, where LLMs optimize for compression and humans for adaptive utility, reveal fundamental differences between artificial and natural intelligence. This highlights the need for models that preserve the conceptual 'inefficiencies' essential for human-like understanding.

The Enigma of Meaning in Large Language Models

'The categories defined by constructions in human languages may vary from one language to the next, but they are mapped onto a common conceptual space , which represents a common cognitive heritage, indeed the geography of the human mind.' -Croft (2001) p. 139

Humans excel at organizing knowledge into concepts which are compact categories that achieve remarkable compression while preserving essential meaning (Murphy, 2004). A single word like 'bird' compresses information about thousands of species, yet maintains critical semantic properties (can fly, has feathers, lays eggs). This hierarchical organization (robin → bird → animal; Rosch et al. 1976) represents a fundamental cognitive achievement: balancing efficiency with semantic fidelity.

Large Language Models (LLMs) demonstrate striking linguistic capabilities that suggest semantic understanding (Singh et al., 2024; Li et al., 2024). Yet, a critical question remains unanswered: Do

LLMs navigate the compression-meaning trade-off similarly to humans, or do they employ fundamentally different representational strategies? This question matters because true understanding, which goes beyond surface-level mimicry, requires representations that balance statistical efficiency with semantic richness (Tversky, 1977; Rosch, 1973b).

To address this question, we apply Rate-Distortion Theory (Shannon, 1948) and Information Bottleneck principles (Tishby et al., 2000) to systematically compare LLM and human conceptual structures. We digitize and release seminal cognitive psychology datasets (Rosch, 1973b; 1975; McCloskey & Glucksberg, 1978), which are foundational studies that shaped our understanding of human categorization but were previously unavailable in a machine-readable form. These benchmarks, comprising 1,049 items across 34 categories with both membership and typicality ratings, offer unprecedented empirical grounding for evaluating whether LLMs truly understand concepts as humans do. It also offers much better quality data than the current crowdsourcing paradigm.

Analyzing embeddings from 40+ diverse LLMs against these benchmarks, we uncover a fundamental divergence: LLMs and humans employ different strategies when balancing compression with meaning. While LLMs achieve broad categorical alignment with human judgment, they optimize for aggressive statistical compression at the expense of semantic nuance. Humans maintain 'inefficient' representations that preserve rich, multidimensional structure essential for flexible reasoning.

This divergence manifests across three dimensions. First, LLMs capture categorical boundaries but miss fine-grained semantic distinctions like item typicality central to human understanding. Second, our information-theoretic analysis reveals LLMs achieve mathematically 'optimal' compressiondistortion trade-offs, while human categories appear suboptimal. Third, encoder models surprisingly outperform decoder models in human alignment despite smaller scales, indicating that understanding and generation may require fundamentally different representational strategies.

Through analysis of OLMo-7B across 57 training checkpoints, we further uncover how these strategies emerge during learning: conceptual structure develops via rapid initial formation followed by architectural reorganization, with semantic processing migrating from deep to mid-network layers as models discover increasingly efficient encodings.

These findings challenge the assumption that statistical optimality equals understanding. The apparent 'inefficiency' of human concepts may reflect optimization for adaptive flexibility. Our framework and newly-digitized benchmarks provide essential tools for monitoring this critical balance, guiding development toward AI systems that achieve not just compression, but comprehension.

Research Questions and Scope

Prior work has explored LLM conceptual representations through multiple lenses: relational knowledge (Shani et al., 2023; Misra et al., 2021), interpretable concept extraction (Hoang-Xuan et al., 2024; Maeda et al., 2024), sparse activation patterns (Li et al., 2024), and embedding geometry including hierarchical structures (Park et al., 2024). While insightful, these studies often lack deep, quantitative comparison of the compression-meaning trade-off using information theory against rich human cognitive benchmarks.

Separately, cognitive science has applied information theory to human concept learning (Imel & Zaslavsky, 2024; Tucker et al., 2025; Zaslavsky et al., 2018; Sorscher et al., 2022). For example, Zaslavsky et al. (2018) developed an Information Bottleneck framework for color naming efficiency, later extended to animal taxonomies (Zaslavsky et al., 2019). Yet these cognitive studies typically proceed without connecting to modern LLMs, and tend to focus on a specific domain. One notable example is Wu et al. (2025), which examined abstraction transfer in humans and LLMs using a behavioral and cognitive modeling level. Our work is different in the sense that it analyzes how information is preserved or distorted inside LLM embedding spaces under controlled clustering transformations.

These two streams, LLM conceptual analysis and cognitive information theory, rarely intersect. We bridge this gap through rigorous comparison of how LLMs and humans navigate the compressionmeaning trade-off, grounding our analysis in established cognitive benchmarks. This leads to three research questions:

[RQ1] To what extent do LLM-emergent concepts align with human-defined categories?

[RQ2] Do LLMs exhibit human-like internal structure, particularly item typicality?

Our framework approaches each RQ through a unified lens. [RQ1] examines the categorical alignment, or how information is compressed into discrete groups. [RQ2] probes internal structure, which means how semantic meaning is preserved within categories. [RQ3] employs our full L objective to evaluate the integrated trade-off. This progression from compression to preservation to their balance mirrors the fundamental challenge both systems face: creating representations that are simultaneously efficient and meaningful.

Figure 1 overviews the data generation and analyses. Human data was collected by asking whether an item i (e.g., chair) is a good example of the category C (furniture). These ratings are aggregated into ranked similarity profiles for each category. Models generate analogous scores using their embeddings. We then compute three metrics: [RQ1] Mutual Information to assess category recoverability, [RQ2] Spearman correlation to measure alignment with human typicality structure, and [RQ3] a rate-distortion objective capturing the trade-off between representation complexity and meaning preservation.

Figure 1: Overview of the data generation and analyses. Human data was collected by asking whether an item i (e.g., chair) is a good example of the concept C (furniture). These ratings are aggregated into ranked similarity profiles for each category. Models generate analogous scores using their embeddings. We then compute three metrics: [RQ1] MI to assess category recoverability, [RQ2] Spearman correlation to measure alignment with human internal typicality structure, and [RQ3] a rate-distortion objective capturing the trade-off between representation complexity and meaning preservation.

Benchmarking Against Human Cognition

Investigating LLM-human conceptual alignment requires robust benchmarks and diverse models. This section details both components.

Human Baselines: Empirical Data from Seminal Cognitive Science

We draw on three foundational studies that shaped our understanding of human categorization. Unlike many noisy modern crowdsourced datasets, these classic benchmarks were carefully curated by experts, capturing deep cognitive patterns. We focus on three influential works:

Rosch (1973): This foundational work (Rosch, 1973a) explored semantic categories as part of the research program leading to prototype theory (Rosch, 1973b) 1 . The theory posits that categories organize around 'prototypical' members rather than strict, equally shared features. The dataset includes 48 items in eight common semantic categories (e.g., furniture, bird), with prototypicality rankings (e.g., 'robin' as typical bird, 'bat' as atypical).

Rosch (1975): Building on prototype theory, Rosch (1975) further detailed how semantic categories are cognitively represented. This work provides typicality ratings for a larger set of 552 items across ten categories (e.g., 'orange' as a prototypical fruit, 'squash' as less so).

McCloskey & Glucksberg (1978): Investigated the 'fuzzy' boundaries of natural categories, showing membership is graded rather than absolute (McCloskey & Glucksberg, 1978). Covers 449 items in 18 categories with both typicality scores and membership certainty ratings (e.g., 'dress' is typical clothing, 'bandaid' less so).

While originating from different researchers, these datasets share rigorous experimental designs and provide data on both category assignments and item typicality. We aggregated data from these studies, creating a unified benchmark of 1,049 items across 34 categories. This data, which we have digitized and made publicly available (Appendix C), offers a high-quality empirical foundation for evaluating the human-likeness of LLMs. 2

Large Language Models Under Study

Weanalyze 40+ diverse LLMs spanning multiple architectures and scales (300M to 72B parameters) to understand how conceptual representation varies across model design choices. We note that our analysis requires access to LLMs' embeddings, rather than just output. Thus, we are unable to use any closed-source frontier models such as GPT-5 and Claude.

Model Selection. Our study encompasses three architectural paradigms. Encoder models include the BERT family (Devlin et al., 2019; He et al., 2020; Zhuang et al., 2021) and CLIP ViT text encoders (Radford et al., 2021). Decoder models form the majority of our analysis: the Llama family (1B-70B; Touvron et al., 2023a;b; Grattafiori et al., 2024), Gemma variants (2B-27B; Team et al., 2024; 2025), Qwen models (0.5B-72B; Bai et al., 2023; Yang et al., 2024), Phi series (Javaheripi et al., 2023; Abdin et al., 2024; Abouelenin et al., 2025), Mistral-7B (Karamcheti et al., 2021), GPT-2 (Radford et al., 2019), and OLMo-7B (Groeneveld et al., 2024). We also include classic embeddings Word2Vec (Mikolov et al., 2013a;b) and GloVe (Pennington et al., 2014) as baselines.

This diverse selection enables us to disentangle effects of architecture (encoder vs. decoder), scale (300M to 72B), and training objectives (understanding vs. generation). Note that encoder-only models are less represented (and are smaller) since recent LLM development has prioritized decoderonly architectures. Complete model specifications appear in Appendix D.

Embedding Extraction. Weextract representations at two levels to capture different aspects of conceptual knowledge: (1) static embeddings from input layers (E matrix), capturing context-free lexical knowledge directly comparable to isolated words in human categorization experiments; and (2) contextual embeddings from hidden layers using controlled prompts, revealing how context shapes conceptual structure across network depth.

This dual approach allows us to trace how concepts emerge from basic lexical knowledge to contextualized understanding. Critically, our results prove robust to prompt templates and pooling strate-

1 Prototype theory is only one account of how humans form concepts; exemplar theory offers an alternative based on stored instances. We do not aim to adjudicate between theories, but use this framework because it provides structured data suitable for modeling. Our computational analysis remains compatible with alternative accounts.

2 Appendix I shows that polysemy is rare in our data and cannot account for our findings.

gies (Appendix E). Moreover, despite substantial vocabulary overlap between model families, neither token count nor tokenization patterns correlate with our results (Appendix J).

A Framework for Comparing Compression and Meaning

To quantitatively compare how LLMs and humans navigate the fundamental tension between compact representation and semantic richness, we develop a framework that captures both aspects of conceptual organization. Our approach adapts Rate-Distortion Theory (Shannon, 1948) and the Information Bottleneck principle (Tishby et al., 2000) to measure the quality of conceptual systems.

Theoretical Foundations

Human concepts achieve remarkable efficiency: the word 'bird' compresses knowledge about thousands of species into a single category, yet preserves critical semantic information (can fly, has feathers, lays eggs). This reflects a fundamental trade-off that any conceptual system must navigate:

· Compression: Grouping diverse items into manageable categories (fewer bits needed) · Meaning Preservation: Maintaining semantic coherence within groups

Rate-Distortion Theory (RDT; Shannon, 1948) formalizes this trade-off for lossy compression. Given data X and compressed representation ˆ X , RDT seeks encodings that minimize:

$$

$$

where R is the rate (bits required), D is distortion (information lost), and λ controls their trade-off.

Information Bottleneck (IB; Tishby et al., 2000) extends this by compressing X into Z while preserving information about relevant variable Y :

$$

$$

Our adaptation: For conceptual representation, we lack an external relevance variable Y . Instead,'relevance' becomes internal semantic coherence, which means how well categories preserve within-group similarity. We thus combine RDT's geometric distortion with IB's informationtheoretic compression, yielding our framework where clustering C represents items X by minimizing both the information needed to specify items (compression) and the semantic spread within clusters (distortion).

4.2 THE L OBJECTIVE: QUANTIFYING THE TRADE-OFF

We formalize how clustering C represents items X through an objective that combines informationtheoretic compression with geometric coherence:

$$

$$

where β weights the relative importance of compression versus coherence.

The Complexity Term: Measuring Compression

Complexity quantifies how much information the clustering preserves about individual items through mutual information I ( X ; C ) . Intuitively, if knowing an item's cluster tells us little about which specific item it is, compression is high (low complexity).

Given | X | items partitioned into clusters of sizes {| C c |} :

$$

$$

This equals the reduction in uncertainty about item identity when told its cluster. Uniform clusters minimize complexity (one | X | cluster; maximum compression), while singleton clusters maximize it (no compression).

The Distortion Term: Measuring Semantic Coherence

Distortion captures how well clusters preserve semantic relationships and meanings by measuring the average squared distance between items and their cluster centroids in embedding space (spread):

$$

$$

where σ 2 c = 1 | C c | ∑ e i ∈ c ∥ e i -¯ e c ∥ 2 is the variance within cluster c , and ¯ e c is its centroid.

Low distortion indicates tight, semantically coherent clusters with similar embeddings. This geometric measure directly captures what we intuitively mean by 'meaningful' categories: robins and sparrows cluster tightly as similar birds, while bats would increase distortion on downstream tasks if categorized with them.

Connecting Framework to Research Questions

Our framework provides unified metrics for all three research questions:

[RQ1] Categorical Alignment: How do LLMs and humans partition semantic space? The Complexity term I ( X ; C ) directly measures this by quantifies how many bits are needed to specify individual items given their clusters. Comparing I ( X ; C Human ) with I ( X ; C LLM ) reveals whether both systems create similarly-sized groupings with comparable compression rates. Higher mutual information means finer-grained categories; lower means broader, more compressed groupings.

[RQ2] Internal Semantic Structure: Do LLMs capture human-like typicality signal? The Distortion term measures how well clusters preserve semantic coherence. In our typicality analysis, we check whether typical items cluster tightly near centroids while atypical items lie farther away. Low distortion with clear center-periphery structure indicates prototype organization that mirrors human cognitive structure.

[RQ3] Compression-Meaning Trade-off: How do different systems balance efficiency against semantic fidelity? The complete L objective reveals fundamental optimization strategies. By varying K (number of clusters) and computing L curves, we uncover system priorities: aggressive compressors rapidly achieve low L values by sacrificing nuance, while systems preserving semantic richness maintain higher L to retain meaningful distinctions. The shape and level of these curves expose whether a system optimizes for statistical efficiency or cognitive utility.

An Empirical Investigation of Representational Strategies

Building on our information-theoretic framework (Section 4) and established benchmarks (Section 3.1), we empirically investigate how LLMs and humans navigate the compression-meaning trade-off. For each analysis, we examine both static embeddings and contextual embeddings (across all hidden layers), revealing how and when context shapes conceptual organization.

5.1 [RQ1] The Big Picture: Alignment of Conceptual Categories

We first investigate whether LLMs form conceptual categories aligned with humans , which examines how information is compressed into discrete groups (the complexity term in our framework).

Figure 2: LLMs capture categorical boundaries (RQ1 AMI scores) but miss internal geometry (RQ2 Spearman correlations). Left: All 40+ models achieve above-chance AMI with human categories, with encoder architectures (squares, circles, and Xs) matching or exceeding decoder models 100× larger (stars). Results show the layer with peak AMI score per model (see static and mean scores in Figure A.8). Right: Despite categorical success, models show weak correlations ( ρ < 0 . 2 for most) with human typicality judgments, revealing divergent representational strategies. This divergence between capturing boundaries (compression) while missing internal structure (meaning) reveals how LLMs and humans fundamentally differ in their representational strategies. We note that encoder models align more than decoder models, but these correlations are still modest. Computed using the static embeddings, see full results in Tables 3-4 in Appendix N.

Figure 2: LLMs capture categorical boundaries (RQ1 AMI scores) but miss internal geometry (RQ2 Spearman correlations). Left: All 40+ models achieve above-chance AMI with human categories, with encoder architectures (squares, circles, and Xs) matching or exceeding decoder models 100× larger (stars). Results show the layer with peak AMI score per model (see static and mean scores in Figure A.8). Right: Despite categorical success, models show weak correlations ( ρ < 0 . 2 for most) with human typicality judgments, revealing divergent representational strategies. This divergence between capturing boundaries (compression) while missing internal structure (meaning) reveals how LLMs and humans fundamentally differ in their representational strategies. We note that encoder models align more than decoder models, but these correlations are still modest. Computed using the static embeddings, see full results in Tables 3-4 in Appendix N.

5.1 [RQ1] The Big Picture: Alignment of Conceptual Categories

LLM-derived clusters significantly align with human-defined conceptual categories, suggesting they capture key aspects of human conceptual organization. Surprisingly certain encoder models exhibit strong alignment, sometimes outperforming much larger models, highlighting that factors beyond sheer scale influence human-like categorical abstraction.

Approach: We tested whether LLMs naturally organize our 1,049 items into categories resembling human conceptual structure. Token embeddings were extracted at two levels: (i) static embeddings from input layers (E matrix), representing context-free lexical knowledge; (ii) contextual embeddings from all hidden layers, measured layer-wise to identify peak conceptual alignment. These were clustered using k-means ( K matching human category counts) and evaluated against human categories using Adjusted Mutual Information (AMI), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI) metrics. NMI quantifies how much information is shared between the model-derived clusters and the human-labeled categories; AMI refines this measure by correcting for the amount of overlap that would be expected by chance; and ARI assesses the degree of agreement between the two partitions while explicitly accounting for random assignments, providing a complementary view of clustering accuracy.

Broad Categorical Agreement: All 40+ models achieve significant above-chance alignment (Figure 2 (Left)). Even at baseline, static embeddings show substantial alignment (mean AMI ≈ 0 . 45 ), which contextual processing enhances to peak AMI ≈ 0 . 55 . This confirms that LLMs encode human-like categorical boundaries. Full NMI and ARI results in Appendices K-M.

Architecture Matters More Than Scale: Surprisingly, BERT-large-uncased (340M parameters) achieves AMI = 0 . 60 , matching or exceeding models 100× larger. Classic Word2Vec and GloVe, despite having only static embeddings and predating modern architectures by years, reach AMI scores rivaling contemporary LLMs' peak performance. This suggests that fundamental semantic structure emerges from relatively simple distributional learning, with encoder architectures particularly effective at capturing human-like categories regardless of scale.

5.2 [RQ2] Delving Deeper: Fidelity to Fine-Grained Semantics

Having established broad categorical alignment, we now examine whether LLMs capture the internal semantic structure of categories. Specifically, we check how meaning is preserved within clusters .

Key Finding: Limited Capture of Semantic Nuance

While LLMs effectively form broad conceptual categories, their internal representations demonstrate only modest alignment with human-perceived fine-grained semantic distinctions, such as item typicality or psychological distance to category prototypes. This suggests a divergence in how LLMs and humans structure information within concepts.

Approach: We test whether LLMs encode human-like typicality signals, i.e., whether robins are more 'birdy' than penguins. For each item, we compute the cosine similarity with its category name using embeddings (e.g., 'robin' → 'bird'; cosine sim ( E ( robin ) , E ( bird ) ). We then compare these similarities with human typicality ratings using Spearman's correlation coefficient ρ (Wissler, 1905).

We employed two analysis approaches: (i) static-layer analysis using embeddings directly from the input layer; (ii) peak AMI layer analysis using contextual embeddings from the layer that maximized AMI in RQ1. For the peak AMI approach, we extracted category embeddings by replacing items with category names in the same prompt template, ensuring consistent contextualization (see Appendix F for template and pooling robustness analysis).

Weak Typicality Alignment: Correlations between LLM internal organization of concepts and human typicality are modest at best (Figure 2 (Right); Tables 3-4 in Appendix N). Static embeddings show weak correlations: BERT achieves ρ = 0 . 38 ( p < 0 . 05 ), while most decoder models fall below ρ = 0 . 15 . Even when statistically significant, these correlations indicate limited correspondence with human judgments. This shows that the internal concept geometries of models differ from those of humans, with representation-focused models aligning more closely.

Architectural Patterns: Several clear patterns emerge in how different architectures capture typicality. Representation-focused models (Word2Vec, GloVe) and most encoder models (both ViT encoders, BERT-large) demonstrate stronger static-layer performance than decoder-only models (Llama, Gemma, Qwen families). Static correlations range from ρ ≈ 0 . 25 -0 . 40 for representationfocused models versus ρ < 0 . 15 for most decoders.

This divergence likely stems from training objectives: models explicitly trained for representation learning appear more effective at capturing semantic category relationships in their embeddings, while modern decoder-only models, which optimized primarily for next-token prediction, show consistently lower static-layer correlations. The pattern holds across model scales, suggesting architectural design matters more than size for capturing fine-grained semantic similarity.

Layer-wise Analysis Reveals a Trade-off: Comparing static and peak AMI layers exposes an architectural limitation. Peak AMI layers, which are optimal for clustering, show systematically weaker typicality correlations than static layers. This pattern holds across model families: layers that best separate categories (RQ1) poorly preserve within-category structure (RQ2). The implication is clear: current architectures encode different aspects of meaning at different depths, forcing applications to choose between broad categorization and semantic nuance.

Interpretation: The divergence between LLMs and humans reflects fundamentally different organizational principles. Humans judge typicality through rich, multidimensional criteria: robins are typical birds due to size, flight ability, song, frequency of encounter, etc. This creates graded categories with clear prototypes and cognitive structures that optimize flexible reasoning and generalization.

LLMs, in contrast, appear to encode flatter statistical associations between items and category labels. Although sufficient for categorization and fluent text generation, these representations miss the prototype structure that makes categories cognitively useful. This difference suggests that LLMs optimize for different objectives than human cognition, a hypothesis that we test directly in RQ3 by examining how each system balances compression against semantic preservation.

newtext{A Cognitive Intuition of the compression-meaning tradeoff

Having explored categorical alignment (RQ1) and internal semantic structure (RQ2), we now address our central question: How do LLM and human representational strategies compare when balancing compression against meaning preservation?

Emergence During Training: How Divergent Strategies Develop

LLMs demonstrate markedly superior information-theoretic efficiency compared to human conceptual structures. Evaluated via our L objective, LLM-derived clusters consistently achieve more 'optimal' compression-meaning balance. Human conceptualizations, while richer, appear less statistically compact, suggesting optimization for cognitive flexibility over pure statistical efficiency.

Approach: We analyzed human-defined categories and LLM-derived clusters using our L objective function (Equation 3, β = 1 ) and mean cluster entropy ( S α ). For LLMs, we performed k-means clustering across various K values to trace the full compression-meaning frontier.

Results: Our analysis reveals three key patterns (Figure 3; full results in Appendix Q):

Higher human entropy. Human concepts consistently exhibit higher cluster entropy than LLM clusters at comparable K values, indicating less statistical compactness but greater internal diversity.

Lower LLM L scores. LLM-derived clusters achieve significantly lower L values than human categories across all tested K (Figure 3b). Since lower L signifies more optimal compression-distortion balance, LLMs are demonstrably more 'efficient' by this information-theoretic measure.

Architectural differences. The Complexity-Distortion plot (Figure 3a) reveals that encoder models (BERT, ViT, classic models) achieve superior trade-offs. Their distortion at any given complexity is lower compared to decoder models, across both static and contextual embeddings.

Statistical Optimality Versus Cognitive Utility: This divergence reveals fundamental differences in optimization pressures. LLMs, trained on massive text corpora, develop maximally efficient statistical representations that minimize redundancy and internal variance. Although human conceptual systems may look suboptimal under information-theoretic measures, prior work indicates that they are structured to support goals such as flexible generalization and causal reasoning rather than maximal compression (Murphy, 2004). Thus, the differing pressures shaping human and LLM representations help explain this apparent suboptimality.

Our analysis reveals that compression efficiency does not predict functional capability. We find no correlation between L scores and downstream performance ( r = -0 . 20 , ρ = 0 . 51 on MMLU; Appendix S). This suggests that apparent human 'inefficiency' reflects optimization for cognitive flexibility rather than statistical compression. While LLMs excel at compact representation, they may sacrifice the semantic richness essential for human-like understanding.

The consistent architectural patterns observed raise fundamental questions: Do understanding and generation require distinct computational strategies? The superiority of representation-focused models suggests that current architectures may be conflating two fundamentally different cognitive tasks.

Emergence During Training: How Divergent Strategies Develop

Having established that LLMs and humans employ divergent representational strategies, we investigate how these strategies emerge. Analysis of OLMo-7B across 57 training checkpoints (1K to 557K steps, approximately 4B to 2.5T tokens) reveals how conceptual structure emerges.

Two-Phase Representational Development. Conceptual organization emerges via two phases (Figure 15). First, rapid concept formation (1K-100K steps) establishes basic categorical structure, with AMI rising from near zero to approximately 0.45, achieving 80% of final alignment within just 10% of training. Second, architectural reorganization (100K-500K steps) systematically migrates semantic processing from deeper layers toward mid-network while AMI continues to gradually im-

Figure

  • (a) Complexity-distortion trade-offs.
  • (b) L objective comparison with human categories.

Figure

Figure 3: Divergent optimization strategies: LLMs achieve superior information-theoretic efficiency while humans preserve semantic richness. (a) Encoder models (BERT, ViT, classic embeddings) consistently achieve lower distortion than decoder models at any given complexity level. (b) All LLM-derived clusters achieve lower L values than human categories (dashed line), indicating more 'optimal' compression-distortion balance. Data from Rosch (1975).

prove. This migration from layer 29 to layer 23 occurs without sacrificing categorical alignment, suggesting the model discovers more efficient internal representations. Moreover, this double-phase dynamics occurs when testing attention sparsity, effective rank, and L values. Meaning, all of which exhibit the same early rapid shift followed by a slower restructuring phase. This convergence across independent metrics indicates that the model is not merely improving categorical alignment, but reorganizing its internal representations toward increasingly efficient structure. See Appendix H.

Architectural Reorganization as Optimization. The upward migration of semantic processing hints that the model discovers increasingly efficient encodings. Early training relies on deep, memorization-heavy representations, but as training progresses, the model shifts to distributed midnetwork encoding. This reorganization may explain apparent 'emergent' capabilities: they arise not from learning fundamentally new information but from more efficient internal organization of existing knowledge.

Implications. These dynamics demonstrate that the compression-oriented strategy observed in fully trained models develops from the earliest stages of training. The rapid initial alignment followed by efficiency-focused reorganization suggests models are inherently biased toward statistical compression rather than semantic richness. Achieving human-like representations may require not just different final objectives but fundamentally different learning dynamics that actively maintain semantic diversity throughout development.

Discussion and Conclusion

We investigated how LLMs and humans navigate the compression-meaning trade-off in conceptual representation. Using information-theoretic analysis of 40+ models against classic cognitive benchmarks, we reveal fundamental differences in their representational strategies.

Key Findings. LLMs achieve broad categorical alignment with humans (AMI ≈ 0.55), successfully partitioning semantic space into recognizable categories. However, they fail to capture the internal structure that makes these categories cognitively useful, as typicality correlations remain weak ( ρ < 0 . 2 ) across model families. Most strikingly, when evaluated on the compression-meaning trade-off, LLMs consistently achieve lower (better) L scores than human categories, indicating they optimize for statistical efficiency over semantic richness. This pattern holds across architectures, though encoder models surprisingly outperform decoder models in human alignment despite being orders of magnitude smaller. Training dynamics analysis reveals rapid category formation followed by architectural reorganization that shifts semantic processing from deep to mid-network layers, suggesting efficiency optimization continues throughout training.

Implications. These findings challenge the assumption that statistical optimality equals understanding. LLMs excel at their training objective, which is minimizing prediction error, but this drives them toward representations that sacrifice semantic nuances. Encoder models' superior alignment with human representations questions the current paradigm of unified, scaled decoder models, suggesting that language understanding and generation may require distinct architectures and rely on different processes . Our framework provides quantitative tools for monitoring the compressionmeaning balance in future systems.

Conclusions. Our findings reveal an apparent paradox, showing that LLMs are simultaneously better and worse than humans. This occurs because LLMs and humans employ divergent strategies: statistical compression versus semantic richness , likely reflecting different optimization pressures. While LLMs process billions of tokens efficiently, humans enable flexible reasoning and generalization. Progress toward human-like AI may require preserving the apparent 'inefficiencies' that support cognitive flexibility. We provide theoretical understanding, practical metrics, and high-quality digital benchmarks to develop more human-aligned representations. We encourage the community to utilize the data and metrics for future research towards making AI more human-like.

Ethics statement

Our study relies exclusively on publicly available LLMs and digitized datasets from classic cognitive psychology experiments (Rosch, 1973a; 1975; McCloskey & Glucksberg, 1978). No new human subject data was collected, and all benchmark data we release have been properly attributed and curated to preserve research integrity.

We do not foresee privacy, security, or fairness risks arising from our analyses. Our contribution is methodological and theoretical, focusing on representational trade-offs between humans and LLMs. Nevertheless, we acknowledge that insights into model-human divergences could influence how future systems are designed. We caution that optimizing solely for statistical efficiency without considering semantic richness may exacerbate risks of misinterpretation or oversimplification in socially sensitive applications.

We declare no conflicts of interest or external sponsorship that could bias the reported findings.

Reproducibility statement

We have taken several steps to ensure reproducibility. All digitized human categorization datasets used in our analyses are publicly released in machine-readable form (Appendix B.1). Detailed model specifications, including architectures, scales, and hyperparameters, are provided in Appendix B.2, and we document embedding extraction procedures, pooling strategies, and prompt templates in Appendix B.3-B.4. Full experimental results, including layer-wise analyses, clustering metrics, and training dynamics across checkpoints, are reported in the appendices (B.5-B.14). Our theoretical framework and derivations are described in Section 4, with complete definitions and formulations provided to enable replication. We will release the code for dataset processing, embedding extraction, and evaluation upon acceptance (to preserve anonymity).

References

Clark Wissler. The spearman correlation formula. Science , 22(558):309-311, 1905.

newtext{Cognitive Intuition

newtext{A Cognitive Intuition of the compression-meaning tradeoff

The compression-meaning tradeoff refers to the cognitive tension between representing concepts with maximal efficiency (i.e., minimal information) and preserving the semantic richness needed for flexible generalization, inference, and communication. For instance, upon hearing the sentence 'There was a large brown Labrador barking loudly near the playground,' a person will often encode a simplified memory, such as 'big scary dog near kids.' This is not to suggest that humans cannot recall the full sentence, but rather that we typically retain the most meaningful elements to enable efficient reasoning and generalization; without such abstraction and simplification, leveraging past experience for learning and prediction would be more difficult.

We acknowledge that our metrics, while capturing the information-compression trade-off and geometric efficiency in embedding space, do not directly measure all aspects of human-style conceptual abstraction or reasoning. Our aim is not to claim full equivalence with human cognition (nor do we think anyone can or should make such claims), but rather to provide a quantitative, interpretable proxy that highlights where LLMs and humans converge or diverge in how they compress and organize semantic information. We view these metrics as one lens among many, and we are careful in the paper to frame our findings as providing insights into human-like patterns rather than definitive evidence of human-style conceptual processing.

newtext{Cognitively-Inspired Inductive Biases

Future models could more closely align with human conceptual structure by incorporating cognitively motivated inductive biases and representational mechanisms. Hierarchical and compositional structure could enable models to capture nested relationships between categories, reflecting the way humans organize knowledge from superordinate to basic-level concepts (e.g., animal → mammal → dog → Labrador). Feature- and relation-based biases could help models focus on meaningful perceptual or functional attributes and relational patterns, rather than relying solely on statistical co-occurrence.

Additionally, theory- or causally grounded priors that draw on humans' intuitive understanding of how objects interact or behave, could constrain learning in complex domains and support more flexible generalization. Incorporating a hybrid exemplar-rule approach, combining memory of specific examples with abstracted rules, would further approximate human category learning. Modular architectures, in which specialized sub-networks handle different aspects of conceptual representation, could enhance generalization and reduce interference between unrelated features. Finally, meta-learned priors distilled from symbolic or program-based representations offer a way to embed structured, human-like concept hypotheses directly into neural models, allowing them to generalize more like humans across novel situations.

Together, these inductive biases offer a path for models that not only compress information efficiently but also organize knowledge in a manner that mirrors human conceptual richness, capturing graded typicality, family resemblance, and hierarchical relationships. While integrating these biases may trade some compression for fidelity, they provide an exciting opportunity to reduce meaning distortion and bridge the gap between statistical efficiency and human-like conceptual understanding.

Limitations

While this study offers valuable insights, several limitations should be considered.

Future work could address these by expanding to other languages, exploring alternative cognitive models, and testing these principles on different architectures or in real-world applications.

Dataset Access Details

The aggregated and digitized human categorization datasets from Rosch (1973a; 1975); McCloskey & Glucksberg (1978) are made available in CSV format at: [URL deduced for anonymity; Data is attached as Supplementary Material].

LLM Details

Contextual Prompts and Pooling Strategies

Contextual embeddings of LLMs require feeding words into the model through a prompt. Because tokenizers often split a word into multiple tokens, and since some items in our datasets consist of two or more words, we face a design choice regarding how to aggregate token representations. In our methodology, we adopt average pooling over the actual tokens, ensuring that all subword pieces contribute equally. Figures 4 and 5 reveal that the average pooling strategy achieves consistent performance and demonstrates the tightest distribution, making it the most reliable choice for our research.

For prompts, we selected a neutral template, "This is a { word } . " (with a trailing space), designed to minimize any additional semantic bias on the target item. Figures 6 and 7 show this prompt to balance performance and consistency, making it ideal for baseline comparisons.

In this section, we explore alternative pooling strategies and evaluate a diverse set of prompt templates across multiple models.

Pooling Strategies. We compare four common approaches:

Figure 4: Average pooling demonstrates consistent performance across different models, making it the most reliable choice for future research. Each point corresponds to a prompt template applied to a model.

Figure 4: Average pooling demonstrates consistent performance across different models, making it the most reliable choice for future research. Each point corresponds to a prompt template applied to a model.

Prompt Templates. To test robustness, we design eight templates spanning different linguistic framings:

Models. We evaluate eight representative LLMs covering major architectures:

Figure 5: Average pooling demonstrates the tightest distribution, indicating the highest consistency and reliability across different conditions. Performance Distribution by Pooling Strategy Box plots showing AMI distribution for each pooling strategy

Figure 5: Average pooling demonstrates the tightest distribution, indicating the highest consistency and reliability across different conditions. Performance Distribution by Pooling Strategy Box plots showing AMI distribution for each pooling strategy

Figure 6: The neutral prompt template 'This is a word. ' demonstrates balanced performance and moderate consistency, making it ideal for baseline comparisons. Performance Distribution across various prompts.

Figure 6: The neutral prompt template 'This is a word. ' demonstrates balanced performance and moderate consistency, making it ideal for baseline comparisons. Performance Distribution across various prompts.

{Static vs. Contextual AMI Exploration

To understand how LLMs develop conceptual alignment with human categories, we examine the progression from static to contextual embeddings. Figure 8 presents three complementary views of this progression across different model scales and architectures.

The left subplot shows Static AMI scores, which represent the conceptual alignment achieved by models' input embeddings before any contextual processing (i.e., the E matrix embeddings of the target word). These scores reveal that even at the most basic level, LLMs encode semantic information

Figure 7: The neutral template "This is a { word } . " remains a stable choice across families. Heatmap showing template performance per model.

Figure 7: The neutral template "This is a { word } . " remains a stable choice across families. Heatmap showing template performance per model.

that supports human-like categorical grouping. Remarkably, classic static models like Word2Vec and GloVe achieve static AMI scores that rival the peak contextual performance of modern LLMs, suggesting that fundamental conceptual structure is captured early in the learning process.

The middle subplot displays Average AMI across all layers, providing a measure of overall semantic representation quality throughout the network. This metric shows the typical performance a model achieves across its entire depth, offering insight into how consistently different layers maintain conceptual alignment. The improvement from static to average AMI demonstrates that contextual processing generally enhances rather than diminishes semantic understanding.

The right subplot reveals Peak AMI , representing the optimal conceptual alignment achieved by any single layer. This metric identifies where in the network conceptual understanding is maximized, typically occurring in middle-to-late layers before declining in the final layers. The progression from static to peak AMI shows that contextual processing not only preserves but significantly enhances the conceptual alignment present in static embeddings.

Several key insights emerge from this multi-metric analysis. First, all models demonstrate abovechance alignment even in their static embeddings, confirming that basic semantic structure is a fundamental property of learned representations. Second, the consistent improvement from static to peak AMI across all model types suggests that contextual processing universally enhances conceptual understanding rather than creating it de novo. Third, encoder architectures of different types (BERT, ViT encoders, and classic models) achieve comparable or superior performance to much larger decoder models, highlighting that architectural factors and pre-training objectives significantly influence conceptual alignment quality beyond mere model scale.

This analysis complements the main text findings by showing that LLMs do not simply achieve above-chance alignment with human categories but rather so through a systematic progression from basic to sophisticated conceptual representations, with contextual processing serving as an amplifier rather than a generator of semantic understanding.

We also tested the robustness of our results to different clustering seeds (Figure 9). We found AMI to be highly stable across seeds, with negligible variation in the peak values and layer-wise profiles, indicating that our conclusions are not sensitive to the choice of clustering initialization.

Lastly, we plot the peak AMI against the number of FLOPS per token and find no systematic correlation, suggesting that computational cost alone does not predict human-aligned conceptual representations (Figure 10.)

Figure 8: LLMs begin with basic conceptual alignment in static embeddings and achieve progressively stronger alignment through contextual processing. Three subplots showing model size (log scale) versus different AMI metrics. Left: Static AMI reveals baseline categorical structure. Middle: Average AMI across layers shows overall semantic quality. Right: Peak AMI demonstrates high conceptual alignment. The consistent improvement from static to peak AMI across all model types reveals that contextual processing enhances rather than creates conceptual understanding, with encoders (BERT, ViT encoders and classic models) achieving comparable or superior performance to much larger decoder models.

Figure 8: LLMs begin with basic conceptual alignment in static embeddings and achieve progressively stronger alignment through contextual processing. Three subplots showing model size (log scale) versus different AMI metrics. Left: Static AMI reveals baseline categorical structure. Middle: Average AMI across layers shows overall semantic quality. Right: Peak AMI demonstrates high conceptual alignment. The consistent improvement from static to peak AMI across all model types reveals that contextual processing enhances rather than creates conceptual understanding, with encoders (BERT, ViT encoders and classic models) achieving comparable or superior performance to much larger decoder models.

Detailed AMI Scores per Model and Dataset

Rosch 1975

Figure 9: AMIScores are Robust Across Clustering Seeds. Peak AMI between model representations and human categories remains highly stable across multiple random clustering initializations. Peaks in each plot are consistent, indicating that the observed alignment patterns are not sensitive to the choice of clustering seed.

Figure 9: AMIScores are Robust Across Clustering Seeds. Peak AMI between model representations and human categories remains highly stable across multiple random clustering initializations. Peaks in each plot are consistent, indicating that the observed alignment patterns are not sensitive to the choice of clustering seed.

Figure 10: Peak AMI versus computational cost. We plot each model's peak AMI against its FLOPS per token. There is no systematic correlation, indicating that higher computational cost does not necessarily lead to better alignment with human conceptual structure.

Figure 10: Peak AMI versus computational cost. We plot each model's peak AMI against its FLOPS per token. There is no systematic correlation, indicating that higher computational cost does not necessarily lead to better alignment with human conceptual structure.

{{Multilingual Analysis

To further explore conceptual understanding across languages, we translated our dataset into Spanish, German, Italian, and Russian using Google Translate's API and repeated our analyses with the same LLMs and methods. In RQ1, a clear scale effect emerges for all non-English languages, while English shows no such trend (Figures 11, 12). We interpret this as a consequence of limited nonEnglish training data: larger models are more likely to have been exposed to sufficient multilingual data, improving their conceptual alignment. In RQ2, all models struggle to preserve the internal geometry of human concepts across languages (Figure 13). RQ3 shows that non-English languages exhibit greater compression (Figure 14), consistent with our explanation for RQ1: smaller exposure to non-English data leads to more compressed representations, reducing flexibility and interpretability.

{{Training Dynamics

The OLMo analysis examines how semantic structure develops during training by analyzing 57 intermediate checkpoints from the OLMo-7B model, representing evenly spaced sampling (every 10K training steps) spanning from 1K to 557K steps (covering approximately 4B to 2.5T tokens).

The analysis employs two complementary sampling strategies: representative sampling (6 checkpoints) captures major developmental phases at 1K, 101K, 201K, 301K, 401K, and 501K steps, while high-resolution sampling (57 checkpoints) reveals the inherent noise and fluctuations in training. Despite significant training noise, the overall semantic development follows a stable, predictable pattern captured by the representative sampling, as shown in Figure 16. The complete training trajectory with all 57 checkpoints is presented in Figure 17.

Moreover, this double-phase dynamics occurs when testing attention sparsity, effective rank, and L values (Figures 18, 19). Meaning, all of which exhibit the same early rapid shift followed by a slower restructuring phase. This convergence across independent metrics indicates that the model is not merely improving categorical alignment, but reorganizing its internal representations toward increasingly efficient structure.

Figure 11: Scale effect emerges for non-English languages. We compute peak AMI as a function of model size across different languages: English (red), Spanish (blue), Italian (purple), German (yellow), and Russian (pink). While English shows no systematic scaling effect, the other languages exhibit a clear positive relationship between model size and AMI. We interpret this as a consequence of limited non-English training data: larger models are more likely to have been exposed to sufficient multilingual data, improving their conceptual alignment. Additional analyses across other RQs using the multilingual data support this hypothesis.

Figure 11: Scale effect emerges for non-English languages. We compute peak AMI as a function of model size across different languages: English (red), Spanish (blue), Italian (purple), German (yellow), and Russian (pink). While English shows no systematic scaling effect, the other languages exhibit a clear positive relationship between model size and AMI. We interpret this as a consequence of limited non-English training data: larger models are more likely to have been exposed to sufficient multilingual data, improving their conceptual alignment. Additional analyses across other RQs using the multilingual data support this hypothesis.

Correlation between Human Typicality Judgments and LLM Internal Cluster Geometry

Figure 13: Preservation of internal conceptual geometry across languages. Models struggle to maintain the internal structure of human concepts in all tested languages, with similar patterns observed across English, Spanish, German, Italian, and Russian.

Figure 13: Preservation of internal conceptual geometry across languages. Models struggle to maintain the internal structure of human concepts in all tested languages, with similar patterns observed across English, Spanish, German, Italian, and Russian.

{Datasets Polysemy

Scope. We quantify lexical ambiguity in our psycholinguistic stimuli by counting the distinct WordNet synsets associated with each lemma. This polysemy score lets us estimate how many alternative senses a model must implicitly conflate when it produces a single embedding for a word.

Why it matters. Consider bat , which can denote either a flying mammal or a piece of sports equipment. The same vector must account for both senses. Aggregating semantically distant senses can blur the representation and thus confound model-human comparisons, especially in tasks that rely on fine-grained semantic similarity. Explicitly tracking polysemy allows us to verify that any performance effects we observe are not artefacts of lexical ambiguity.

Results Figure 20 shows the distribution of polysemy scores. The majority of items are unambiguous (1-2 senses), but a heavy-tailed minority (e.g. Running (52 senses), Saw (28), Block (28)) is highly polysemous. This suggests that our findings are due to real differences between the models rather than polysemy-related artifacts. An additional 141 lemmas that are not in WordNet were omitted.

{Tokenizer Analysis

Rational. The tokenizer of a model has a significant influence over the representations: segmentation rules (WordPiece vs. BPE), vocabulary size and special control tokens can inflate sequence length, skew frequency statistics, and shape error patterns. To ensure fair cross-model comparisons, we therefore (i) cluster checkpoints by the tokenizer they use and (ii) quantify how much those tokenizers overlap when applied to our datasets.

Procedure. Before computing overlap, we normalize the vocabulary, stripping tokenizer-specific characters; SentencePiece prefixes ( ), GPT-style BPE space prefixes ( _ G/_ g ) and newline markers ( _ C ), WordPiece continuations markers (##), and related block characters. After this cleanup, tokens differing only by such prefixes collapse to a shared canonical form (e.g. house , _ Ghouse , and ##house all become house ). We then compute pair-wise Jaccard similarity on these cleaned vocabularies.

Table 1 summarizes the core statistics and information regarding the tokenizer types, while Figure 21 visualizes the resulting pairwise vocabulary overlap.

Findings. We find that most tokenizer families share substantial lexical overlap, often exceeding 60% , suggesting a de-facto common token inventory across recent open-source models. Firstgeneration BERT WordPiece (bert-large-uncased and bert-base-uncased) are an outlier, sharing under 16% of tokens with any other group.

Table 1: Tokenizer statistics by tokenizer family. Mean Tokens/Items refer to the average of the tokens per item in our datasets. The columns Vocabulary Size and Tokenizer Type are properties of the tokenizer.

Model Clustering By Tokenizer Family

Additional Clustering Metrics

To further validate our cluster alignment findings (Section 5.1), in addition to Adjusted Mutual Information (AMI) and the Normalized Mutual Information (NMI), we also computed the Adjusted Rand Index (ARI) for the k-means clusters derived from LLM embeddings against human-defined categories. ARI measures the similarity between two data clusterings, correcting for chance. Like AMI, a score of 1 indicates perfect agreement and 0 indicates chance agreement.

Across all tested LLMs, the ARI and NMI scores largely mirrored the trends observed with AMI, showing significantly above-chance alignment with human categories and similar relative model performances. Silhouette scores, while more variable, generally indicated reasonable cluster cohesion for both LLM-derived and human categories. Detailed tables of these scores are provided below.

These supplementary metrics reinforce the conclusion that LLMs capture broad human-like conceptual groupings.

newtext{Mini-Controlled Experiment (Matched Training Data)

To evaluate the extent to which dataset differences might account for the architectural patterns we report, we conducted matched-family analyses involving the only model families that can be aligned

3 phi-4 has a different tokenizer than the rest of Phi family. The results of its tokenizer match the tokenizer of the Qwen family.

in both training data: GPT, Pythia, Cerebras, and T5. While these comparisons cannot rule out all confounds, they substantially reduce the influence of dataset variation. Across all matched settings, encoder models continue to outperform decoder models, yielding higher AMI and lower L . This indicates that the architectural effects observed throughout the paper cannot be explained by differences in training data alone.

Detailed AMI Scores per Model and Dataset

Table 2 provides a more granular view of the static AMI scores for each LLM across the three individual psychological datasets.

Table 2: Mutual information measures (normalized mutual information, adjusted mutual information, adjusted rand index) per model per dataset. Aggregated results are shown in the main paper and the Figures in the Appendix.

Correlation between Human Typicality Judgments and LLM Internal Cluster Geometry

The following tables present the Spearman correlation coefficients ( ρ ) between human typicality judgments and LLM internal representations across different analysis approaches:

Table 3 : Static analysis correlations using embeddings from the E matrix. This approach captures the baseline semantic relationships between items and categories without contextual processing.

Table 4 : Peak AMI layer analysis correlations using contextual embeddings from the layer that maximized AMI scores (as identified in RQ1). This approach leverages the optimal layer for semantic clustering to assess fine-grained semantic fidelity.

Both tables present correlations across three cognitive science datasets: Rosch (1973), Rosch (1975), and McCloskey (1978), with asterisks (*) indicating statistically significant correlations ( p < 0 . 05 ). The modest correlation values across most models suggest limited alignment between LLM internal representations and human-perceived semantic nuances.

Table 3: Correlation between Human Typicality Judgments and LLM Internal Cluster Geometry. Spearman static-layer rank correlations between human-rated psychological typicality/distance (higher human scores = less typical/more distant) and item-to-centroid cosine similarity (higher similarity = more central to LLM cluster). ∗ p < 0 . 05 .

Typicality and Cosine Similarity [RQ2]

Figure 25 shows representative scatter plots illustrating the relationship between human typicality scores (or psychological distances) and the LLM-derived item-centroid cosine similarities for selected categories and models. These plots visually demonstrate the often modest correlations discussed in Section 5.2.

Figure 26 shows the aggregated Spearman correlation across model families and datasets. These correlations are very weak and mostly non-significant.

Theoretical Extreme Case Exploration for $ mathcal{L

In the case where | C | = | X | (each data point is a cluster of size 1 , so | C c | = 1 ∀ c ∈ C ), then H ( X | C ) = 1 | X | ∑ c ∈ C 1 · log 2 1 = 0 . The distortion term σ 2 c = 0 for each cluster as the item is its own centroid. Thus, L = I ( X ; C ) + β · 0 = H ( X ) -H ( X | C ) = H ( X ) = log 2 | X | . This represents the cost of encoding each item perfectly without any compression via clustering, and zero distortion.

In the case where | C | = 1 (one cluster C X contains all | X | data points, so | C C X | = | X | ), then H ( X | C ) = 1 | X | | X | log 2 | X | = log 2 | X | . Thus, I ( X ; C ) = H ( X ) -H ( X | C ) = log 2 | X |-log 2 | X | = 0 . This represents maximum compression (all items are treated as one). The distortion term becomes β · 1 | X | | X | · σ 2 X = β · σ 2 X , where σ 2 X is the variance of all items X with respect to the global centroid of X . So, L = 0+ β · σ 2 X = β · σ 2 X . This represents the scenario of maximum compression where the cost is purely the distortion incurred by representing all items by a single prototype.

Compression Figures

Figure 28 depicts the IB-RDT objective ( L ) vs. K . Lower L indicates a more optimal balance between compression ( I ( X ; C ) ) and semantic fidelity (distortion). Human categories (fixed K ) show higher L values.

Humans organize knowledge into compact categories through semantic compression by mapping diverse instances to abstract representations while preserving meaning (e.g., robin and blue jay are both birds; most birds can fly). These concepts reflect a trade-off between expressive fidelity and representational simplicity. Large Language Models (LLMs) demonstrate remarkable linguistic abilities, yet whether their internal representations strike a human-like trade-off between compression and semantic fidelity is unclear. We introduce a novel information-theoretic framework, drawing from Rate-Distortion Theory and the Information Bottleneck principle, to quantitatively compare these strategies. Analyzing token embeddings from a diverse suite of LLMs against seminal human categorization benchmarks, we uncover key divergences. While LLMs form broad conceptual categories that align with human judgment, they struggle to capture the fine-grained semantic distinctions crucial for human understanding. More fundamentally, LLMs demonstrate a strong bias towards aggressive statistical compression, whereas human conceptual systems appear to prioritize adaptive nuance and contextual richness, even if this results in lower compressional efficiency by our measures. These findings illuminate critical differences between current AI and human cognitive architectures, guiding pathways toward LLMs with more human-aligned conceptual representations.

“The categories defined by constructions in human languages may vary from one language to the next, but they are mapped onto a common conceptual space, which represents a common cognitive heritage, indeed the geography of the human mind.” –Croft (2001) p. 139

The human capacity for concept formation is a cornerstone of intelligence, enabling us to manage information overload by deriving meaning from complex signals. We achieve this by identifying essential features and compressing experiences into cognitively tractable summaries (Murphy, 2004). This conceptual architecture, often hierarchical (e.g., a robin is a bird, an animal (Rosch et al., 1976)), is a powerful semantic compression: diverse instances are mapped to compact representations. Crucially, this process balances representational efficiency (compression) with the preservation of essential semantic fidelity (meaning), a trade-off fundamental to learning and understanding.

Large Language Models (LLMs) exhibit striking capabilities in processing and generating human language, performing tasks that often appear to require deep semantic understanding (Singh et al., 2024; Li et al., 2024). Despite this, a fundamental enigma persists: Do LLMs truly grasp concepts and meaning analogously to humans, or is their success primarily rooted in sophisticated statistical pattern matching over vast datasets? This question is particularly salient given the human ability to effortlessly distill extensive input into compact, meaningful concepts, a process governed by the inherent trade-off between informational compression and semantic fidelity (Tversky, 1977; Rosch, 1973b).

As the mental scaffolding of human cognition, concepts enable efficient interpretation, generalization from sparse data, and rich communication. For LLMs to transcend surface-level mimicry and achieve more human-like understanding, it is critical to investigate how their internal representations navigate the crucial trade-off between information compression and the preservation of semantic meaning. Do LLMs develop conceptual structures mirroring the efficiency and richness of human thought, or do they employ fundamentally different representational strategies?

To address this, we introduce a novel quantitative methodology rooted in information theory. We develop and apply a framework drawing from Rate-Distortion Theory (Shannon, 1948) and the Information Bottleneck principle (Tishby et al., 2000) to systematically compare how LLMs and human conceptual structures balance representational complexity (compression) with semantic fidelity. As a crucial human baseline, we leverage seminal datasets from cognitive psychology detailing human categorization (Rosch, 1973a, 1975; McCloskey and Glucksberg, 1978). A contribution of this work is the digitization and public release of these classic datasets, which offer benchmarks of high empirical rigor often exceeding modern crowdsourced alternatives. Our framework is tailored to dissect how these different systems navigate the compression-meaning trade-off.

Our comparative analysis across a diverse suite of LLMs reveals divergent representational strategies. While LLMs generally form broad conceptual categories aligned with human judgment, they often fail to capture the fine-grained semantic distinctions pivotal to human understanding. More critically, we uncover a stark contrast in priorities: LLMs exhibit a strong drive towards aggressive statistical compression, whereas human conceptual systems appear to favor adaptive nuance and contextual richness, even at a potential cost to sheer compressional efficiency by our measures. This divergence underscores fundamental differences and informs pathways for developing AI with more human-aligned conceptual understanding.

Advancing AI beyond pattern matching towards deeper semantic understanding hinges on whether LLMs develop conceptual structures analogous to human cognition. Human concepts efficiently balance semantic richness with cognitive manageability, a trade-off between meaning and informational compression. This paper investigates if and how LLMs replicate this fundamental balance.

Prior work has explored the conceptual landscape of LLMs, including their grasp of relational knowledge (Shani et al., 2023), methods for extracting interpretable concepts (Hoang-Xuan et al., 2024; Maeda et al., 2024), emergent representations via sparse activations (Li et al., 2024), embedding geometry concerning hierarchies (Park et al., 2024), and autoregressive concept prediction (Barrault et al., 2024). While insightful, these studies often lack a deep, quantitative comparison of the compression-meaning trade-off using an information-theoretic lens benchmarked against rich human cognitive data, or they may not ground concept definitions in established cognitive theory. Consequently, a rigorous comparative evaluation of how LLMs and humans balance representational efficiency with semantic fidelity remains a key open area. Separately, cognitive science has applied information theory to human concept learning (Imel and Zaslavsky, 2024; Tucker et al., 2025; Wolff, 2019; Sorscher et al., 2022), yet typically without connecting to modern AI models.

This work aims to bridge this gap by integrating cognitive psychology, information theory, and modern NLP. We pose three central research questions to guide our investigation:

[RQ1]: To what extent do concepts emergent in LLMs align with human-defined conceptual categories?

[RQ2]: Do LLMs and humans exhibit similar internal geometric structures within these concepts, especially concerning item typicality?

[RQ3]: How do humans and LLMs differ in their strategies for balancing representational compression with the preservation of semantic fidelity when forming concepts?

These three questions steer our investigation, which approaches each through the unifying lens of the information-theoretic framework detailed in Section 4. RQ1 begins by examining the alignment of broad conceptual categories, a key aspect of how information is compressed. RQ2 then delves into the finer-grained internal structures of these categories, probing the preservation of semantic nuances such as item typicality. Building on these analyses, RQ3 employs the full framework to comprehensively compare how LLMs and humans may divergently optimize the overall trade-off between compression and meaning. To ground these comparisons, we consistently utilize seminal human categorization datasets (Rosch, 1973a, 1975; McCloskey and Glucksberg, 1978) as empirical benchmarks. Our overarching aim is to use this comparative, information-theoretic approach not only to evaluate current LLMs but also to advance our understanding of efficient and meaningful representation in both artificial and natural intelligence.

Empirically investigating the relationship between LLM representations and human conceptual structures requires two critical components: robust benchmarks of human categorization and a diverse selection of LLMs. This section details these components.

Our comparison is anchored by data from seminal studies in cognitive psychology that mapped human categorization processes. These studies offer rich empirical evidence of how humans form concepts, judge category membership, and perceive typicality. Critically, unlike many modern crowdsourced datasets which can be noisy, these classic benchmarks were meticulously curated by cognitive science experts, reflecting deep cognitive patterns rather than superficial associations, and were grounded in then-advancing theories of conceptual structure. We focus on three influential works:

Rosch (1973): This foundational work by Rosch (1973a) explored semantic categories as part of the research program leading to prototype theory (Rosch, 1973c). This theory posits that categories are organized around “prototypical” members rather than strict, equally shared features. The dataset includes 48 items in eight common semantic categories (e.g., furniture, bird), with prototypicality rankings (e.g., ‘robin‘ as a typical bird, ‘bat‘ as atypical).

Rosch (1975): Building on prototype theory, Rosch (1975) further detailed how semantic categories are cognitively represented. This work provides extensive typicality ratings for a larger set of 552 items across ten categories (e.g., ‘orange‘ as a prototypical fruit, ‘squash‘ as less so).

McCloskey & Glucksberg (1978): McCloskey and Glucksberg (1978) investigated the “fuzzy“ boundaries of natural categories, showing that membership is often graded rather than absolute. Their data covers 449 items in 18 categories, with typicality scores and membership certainty ratings (e.g., ‘dress‘ is typical clothing, ‘bandaid‘ less so).

While originating from different research groups with distinct theoretical emphases, these datasets share rigorous experimental designs and provide data on both category assignments and item typicality. We aggregated data from these studies, creating a unified benchmark of 1,049 items across 34 categories. This aggregated dataset, which we have digitized and make publicly available (see Appendix A.1), offers a crucial, high-fidelity empirical foundation for evaluating the human-likeness of computational models and we encourage its use for future research.

We include a diverse array of LLMs to assess how conceptual representation might vary with computational architecture and scale. This selection covers prevalent architectural paradigms (encoder-only, decoder-only) and a wide spectrum of model sizes, from 300 million to 72 billion parameters.

Our analysis features encoder-only models from the BERT family (e.g., BERT-Large (Devlin et al., 2019; He et al., 2020; Zhuang et al., 2021)). The majority are decoder-only autoregressive models, including: six Llama family models (1B to 70B, e.g., Llama 3.1 70B (Touvron et al., 2023a, b; Grattafiori et al., 2024)); five Gemma family models (2B to 27B (Team et al., 2024, 2025)); thirteen Qwen family models (0.5B to 72B (Bai et al., 2023; Yang et al., 2024)); four Phi family models (e.g., Phi-4 (Javaheripi et al., 2023; Abdin et al., 2024; Abouelenin et al., 2025)); and a Mistral 7B model (Karamcheti et al., 2021). Appendix A.2 provides a comprehensive list of all model variants, identifiers, and architectural details.

For each LLM, we extract static, token-level embeddings from its input embedding layer (the ‘E‘ matrix). This choice aligns our analysis with the context-free nature of stimuli typical in human categorization experiments, ensuring a comparable representational basis. These embeddings form the foundation for deriving LLM-generated conceptual clusters in our subsequent analyses.

To understand how LLMs and human cognition grapple with the fundamental challenge of representing meaning, we introduce an information-theoretic framework. This framework is designed to analyze the critical trade-off, or tension, between compressing information into efficient representations and preserving the rich semantic fidelity essential for true understanding. Drawing upon core principles from Rate-Distortion Theory (RDT) (Shannon, 1948) and the Information Bottleneck (IB) principle (Tishby et al., 2000), our approach provides a cohesive lens for addressing all three of our research questions. Our investigation progresses by first exploring distinct facets of this trade-off related to representational compactness and semantic preservation, before synthesizing these insights to evaluate the overall efficiency of conceptual representation. Our research questions, viewed through this progressive information-theoretic perspective, are approached as follows:

[RQ1] Probing Representational Compactness via Categorical Alignment: We begin by examining how information is condensed into categorical structures. Both human categorization and LLM-derived clustering simplify diverse items XX into structured groups CC. For RQ1, we assess alignment between model-based clusters (CL​L​MC_{LLM}) and human categories (CH​u​m​a​nC_{Human}) by quantifying shared information (e.g., via Adjusted Mutual Information), offering an initial view on how similarly compactness is achieved. The principles of efficient input representation here relate to the “Complexity” aspect of our framework.

[RQ2] Probing Semantic Preservation via Internal Structure: Next, we assess how well meaning is preserved within these compressed representations. An effective system must retain crucial semantic nuances. For RQ2, we investigate this by correlating LLM-internal measures of item centrality with human typicality judgments, probing how faithfully fine-grained semantic information is represented, that is, can LLMs capture the internal structure of CH​u​m​a​nC_{Human}? This relates to the “Distortion” (or fidelity) aspect of our framework.

[RQ3] Evaluating the Integrated Trade-off for Total Representational Efficiency: Finally, having explored compactness and preservation, we leverage our full framework. RQ3 employs a unified objective function, ℒ\mathcal{L} (detailed below), to quantitatively assess the total efficiency with which LLMs and human systems navigate this fundamental trade-off.

The following subsections detail the theoretical underpinnings of this framework.

To rigorously formalize the balance between representational compactness and preserved meaning, we draw upon information theory. Rate-Distortion Theory (RDT) (Shannon, 1948) provides the foundational language. RDT quantifies the minimal “rate” RR (representational complexity) needed to represent a source XX as CC, subject to a maximum “distortion” DD (fidelity loss). The goal is often to optimize R+λ​DR+\lambda D, offering a principled evaluation of representational efficiency.

The Information Bottleneck (IB) principle (Tishby et al., 2000) is a related approach. IB seeks a compressed representation CC of an input XX that maximizes information about a relevant variable YY while minimizing I​(X;C)I(X;C), the mutual information CC retains about XX (the bottleneck’s “cost”). This is typically framed as minimizing I​(X;C)−β​I​(C;Y)I(X;C)-\beta I(C;Y).

Our analytical framework directly applies RDT’s core idea of balancing rate and distortion. We formulate an objective function, ℒ\mathcal{L}, designed to explicitly balance a complexity term (analogous to RDT’s rate), which quantifies the informational cost of representing items XX through their conceptual clusters CC, and a distortion term (analogous to RDT’s DD), which measures semantic information lost or obscured within these clusters. Our complexity term, incorporating I​(X;C)I(X;C), resonates with the IB principle. However, our distortion term directly measures intra-cluster semantic fidelity loss (specifically, the variance of item embeddings relative to their cluster centroids), differing from canonical IB formulations where distortion is often implicitly tied to an external relevance variable YY. This direct approach allows us to evaluate how any given clustering CC, whether derived from human cognitive data or LLM embeddings, intrinsically balances its own structural compactness and the meaningfulness of its components with respect to the original data XX.

Building on these information-theoretic foundations, this section formally defines the two key components of our framework–Complexity and Distortion. These components allow us to quantitatively address the aspects of representational compactness (core to [RQ1]) and semantic preservation (central to [RQ2]) that were introduced earlier. We then combine these into a unified objective function, ℒ\mathcal{L}, designed to evaluate the overall efficiency of the compression-meaning trade-off, which is the primary focus of [RQ3]. The ℒ\mathcal{L} function evaluates the efficiency of the conceptual clusters CC derived from items XX (e.g., token embeddings):

Here, β≥0\beta\geq 0 is a hyperparameter that balances the relative importance of the two terms.

The Complexity (Rate) Term: The first component, Complexity(X,CX,C), measures the informational cost or intricacy of representing the original items XX through their assignments to clusters CC. It is quantified by the mutual information I​(X;C)I(X;C) between the items and their cluster labels. A lower I​(X;C)I(X;C) signifies greater compression, meaning the cluster assignments CC make the specific items XX more predictable (i.e., require less information to specify beyond the cluster label). Defining I​(X;C)=H​(X)−H​(X|C)I(X;C)=H(X)-H(X|C), and assuming |X||X| equiprobable unique items for the initial entropy calculation (H​(X)=log2⁡|X|H(X)=\log_{2}|X|), the conditional entropy is H​(X|C)=1|X|​∑c∈C|Cc|​log2⁡|Cc|H(X|C)=\frac{1}{|X|}\sum_{c\in C}|C_{c}|\log_{2}|C_{c}|. This assumes that for this complexity calculation, items within each cluster CcC_{c} (of size |Cc||C_{c}|) are indistinguishable beyond their shared label cc. Thus:

This term formalizes the representational compactness aspect central to [RQ1].

The Distortion Term: The second component, Distortion(X,CX,C), quantifies the loss of semantic fidelity incurred by grouping items into clusters. It is measured as the average intra-cluster variance of the item embeddings, reflecting how tightly items are bound to their cluster’s central tendency and thus the cluster’s semantic coherence. This directly relates to the preservation of fine-grained semantic information, an idea explored in [RQ2]. For each cluster c∈Cc\in C, its centroid is xc=1|Cc|​∑x∈cxx_{c}=\frac{1}{|C_{c}|}\sum_{x\in c}x (the mean embedding of its items). Its internal variance is σc2=1|Cc|​∑x∈c‖x−xc‖2\sigma_{c}^{2}=\frac{1}{|C_{c}|}\sum_{x\in c}|x-x_{c}|^{2}. The total distortion for the clustering CC is the weighted average of these variances:

A lower distortion value implies that, on average, items are close to their respective cluster centroids, suggesting better preservation of shared semantic features within each cluster.

The Unified Objective Function: Substituting the formal definitions of Complexity (Equation 2) and Distortion (Equation 3) into our general formulation for ℒ\mathcal{L} (Equation 1) yields the complete objective function that underpins our comparative analysis:

This ℒ\mathcal{L} function provides a single, principled measure for evaluating how effectively a given clustering CC balances the need for informational compression against the imperative to preserve semantic meaning, serving as the direct quantitative tool for addressing [RQ3].

With the ℒ\mathcal{L} objective now fully specified, our information-theoretic framework provides a comprehensive toolkit. The Complexity term (Equation 2) allows us to quantify aspects of representational compactness pertinent to [RQ1], while the Distortion term (Equation 3) enables the assessment of semantic preservation, crucial for [RQ2]. The overall ℒ\mathcal{L} function (Equation 4) then directly facilitates the evaluation of the integrated compression-meaning trade-off, central to [RQ3]. Thus, this framework equips us to systematically and quantitatively investigate how LLMs and human cognition manage the balance between informational efficiency and semantic richness. We apply this framework in our empirical investigation detailed in Section 5.

Building on our information-theoretic framework (Section 4) and established benchmarks (Section 3), we now empirically investigate our research questions. This section details the specific methodologies employed to compare LLM and human conceptual strategies across the key dimensions of conceptual alignment, internal semantic structure, and overall representational efficiency.

[RQ1] Assessing Conceptual Alignment To investigate how LLM-derived conceptual categories align with human-defined ones (RQ1), probing representational compactness, we cluster LLM token embeddings using k-means (KK set by human category counts per dataset). Alignment with human categories is quantified using Adjusted Mutual Information (AMI), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI), against a random clustering baseline.

[RQ2] Examining Internal Cluster Geometry and Semantic Preservation To assess how LLM representations capture human-like typicality (RQ2), examining internal category geometry, we calculate the cosine similarity of each item’s token embedding to the token embedding of its human-assigned category name (e.g., ‘robin‘ to ‘bird‘). These LLM-derived similarities are then correlated (Spearman’s ρ\rho) with human typicality ratings from our cognitive science datasets.

[RQ3] Evaluating the Efficiency of the Compression-Meaning Trade-off To evaluate the overall balance of compression and meaning (RQ3), we apply our framework by computing the ℒ\mathcal{L} objective (Equation 4, β=1\beta=1) for both human and LLM-derived conceptual structures (the latter from k-means over a range of KK). This compares how each system balances Complexity I​(X;C)I(X;C) against Distortion. Cluster entropy is an ancillary measure of compactness.

For robustness, all k-means clustering involves one hundred random initializations with averaged results. Appendix A.3 provides details on supplementary metrics like Silhouette scores.

We first investigate whether LLMs form conceptual categories aligned with human judgment.

Experimental Recap: LLM token embeddings from our benchmark datasets (Rosch, 1973a, 1975; McCloskey and Glucksberg, 1978) were clustered (k-means; KK matching human category counts). Alignment with human categories was measured using AMI, NMI, and ARI (AMI shown in Figure 1; see Appendices A.3, A.4 for full details).

Results and Observations: Across all tested LLMs, derived conceptual clusters aligned with human categories significantly above random chance (Figure 1, showing averaged AMI scores). This indicates their semantic spaces encode information supporting human-like grouping at a macro level. Notably, the BERT family (especially BERT-large-uncased) demonstrated robust alignment, often comparable to or exceeding that of much larger decoder-only models. This suggests that architectural or pre-training factors, not just scale, influence the formation of human-like categorical structures.

Interpretation: These findings confirm that LLMs can recover broad, human-like categories from their embeddings, validating deeper comparative analyses. This macro-level agreement necessitates examining the finer-grained internal geometry of these categories, which we address next.

Having established that LLMs broadly align with human conceptual categories (Section 5.1), we next investigate a more nuanced question: Do LLMs also capture the internal semantic structure of these categories, particularly human-like item typicality?

Experimental Recap: For this RQ, as detailed in this section’s introduction, we compared human typicality judgments from the cognitive science datasets (Rosch, 1973a, 1975; McCloskey and Glucksberg, 1978) with an LLM-based measure. Specifically, we calculated the cosine similarity between each item’s token embedding and the token embedding of its human-assigned category name (e.g., ‘robin‘ vs. ‘bird‘). These item-to-category-label similarities were then correlated (Spearman’s ρ\rho (Wissler, 1905)) with human-rated typicality scores.

Results and Observations: Spearman correlations between LLM-derived item-to-category-label similarities and human typicality judgments are generally modest across most models and datasets (Table 2 in Appendix A.5; Figure 6). Although some correlations reach statistical significance (p<0.05p<0.05), their magnitudes typically indicate a limited correspondence. This pattern suggests that items humans perceive as highly typical of a category are not consistently represented by LLMs as substantially more similar to that category label’s embedding. While BERT-large-uncased occasionally exhibited slightly stronger correlations, these remained moderate (Table 2). Consequently, no tested model robustly replicated the full spectrum of human typicality gradients using this measure. Appendix A.6 provides further visualizations supporting these observations.

Interpretation: These findings suggest that while LLMs can identify features for broad categorization, their organization of semantic space around explicit category labels does not fully mirror the nuanced prototype structures evident in human typicality judgments. The factors driving an item’s embedding similarity to its category label’s embedding in LLMs may differ from the rich, multifaceted criteria (e.g., perceptual attributes, functional roles) underpinning human typicality. LLMs might instead capture a more statistically uniform association to category labels, thereby under-representing the graded, prototype-centric nature of human concepts. This divergence in capturing fine-grained semantics leads to our subsequent inquiry into overall information processing efficiency.

Having explored categorical alignment (RQ1) and internal semantic structure (RQ2), we now address our central question: How do LLM and human representational strategies compare in overall efficiency when balancing informational compression against semantic meaning preservation? Our information-theoretic framework directly probes this trade-off.

Experimental Recap: As detailed in this section’s introduction, we analyzed human-defined categories and LLM-derived clusters (from k-means across various KK) using two primary information-theoretic measures: mean cluster entropy (SαS_{\alpha}) (Giraldo et al., 2014; Wei et al., 2025) and our ℒ\mathcal{L} objective function (Equation 4, with β=1\beta=1).

Results and Observations: Illustrative results from one dataset (Rosch, 1975) are shown in Figure 2; trends were consistent across all datasets (full results in Appendix A.8).

Cluster Entropy Insights: Human concepts consistently exhibit higher mean entropy than LLM-derived clusters, even at similar KK values (Figure 2, left). This suggests that, by this measure, human categories are less statistically “compact“ and encompass greater internal diversity than LLM clusters.

Information-Theoretic Objective (ℒ\mathcal{L}) Insights: The ℒ\mathcal{L} objective reveals an even starker divergence (Figure 2, right). LLM-derived clusters consistently achieve significantly lower ℒ\mathcal{L} values than human conceptual categories across most tested KK. Since a lower ℒ\mathcal{L} signifies a more statistically “optimal” trade-off between minimizing complexity and distortion within our framework, this implies LLMs are more “efficient” by this specific information-theoretic benchmark.

Interpretation: The combined results from entropy and the ℒ\mathcal{L} objective strongly indicate a fundamental difference in representational strategy. LLMs appear highly optimized for statistical compactness, achieving information-theoretically “efficient” representations by minimizing redundancy and internal variance. Human conceptual systems, in contrast, while appearing “suboptimal“ by these statistical measures, are likely shaped by a broader array of functional imperatives. These include the demands of adaptive generalization, rich causal and functional inference, the constraints of neural embodiment, and the requirements of nuanced communication–pressures that may favor representations less statistically “tidy” but ultimately more flexible and powerful for navigating a complex world.

Our information-theoretic investigation reveals a fundamental divergence: LLMs and humans employ starkly different strategies in balancing informational compression with semantic meaning. While LLMs achieve broad categorical alignment with human judgment (RQ1; Section 5.1), they falter in capturing fine-grained semantic nuances such as typicality (RQ2; Section 5.2) and, critically, exhibit vastly different representational efficiency profiles (RQ3; Section 5.3). This pattern strongly suggests that LLMs and humans are optimizing for different objectives.

LLMs appear aggressively optimized for statistical compactness. They form information-theoretic efficient representations, as evidenced by their lower cluster entropy and more “optimal” ℒ\mathcal{L} scores. This hints they minimize redundancy and maximize statistical regularity, likely a consequence of their training on immense text corpora. This intense focus on compression, however, limits their capacity to fully encode the rich, prototype-based semantic details vital for deep, human-like understanding.

Human cognition prioritizes adaptive richness, contextual flexibility, and broad functional utility, even if this incurs a cost in statistical compactness as measured by our framework. The higher entropy and ℒ\mathcal{L} scores observed for human concepts likely reflect an optimization for a wider array of complex cognitive demands. These include nuanced representations for robust generalization, supporting potent inferential capabilities (causal, functional, goal-oriented), enabling effective communication through learnable and shareable structures, and grounding concepts in rich, multimodal experiences. The brain’s neural architecture itself may inherently favor distributed, context-sensitive, and adaptable representations over statically optimal compression. Human cognition, therefore, appears to “invest” in what our statistical measures register as inefficiency for better adaptability and versatility.

The noteworthy performance of smaller encoder models like BERT in specific alignment tasks (Section 5.1) also underscores that architectural design and pre-training objectives significantly influence a model’s ability to abstract human-like conceptual information. This observation highlights important avenues for future AI development focused on enhancing human-AI alignment.

These divergent representational strategies carry significant implications. For AI development, achieving more human-like understanding demands moving beyond current paradigms often centered on scaling and statistical pattern matching. Future efforts should explore principles that explicitly foster richer, more nuanced conceptual structures; our information-theoretic framework and ℒ\mathcal{L} objective (Section 4) offer a potential class of tools for guiding and evaluating models toward this more human-like balance. For cognitive science, LLMs, with their distinct optimization biases, serve as valuable computational foils. Comparing their operational strategies against human performance can illuminate the unique constraints and multifaceted objectives that have shaped human concept formation, providing a powerful testbed for cognitive theories.

In essence, LLMs excel at statistical compressibility, treading a representational path fundamentally distinct from human cognition, which champions adaptive richness and functional utility, often above sheer statistical efficiency. This core difference is critical: it highlights current limitations in AI’s pursuit of human-like understanding and charts vital directions for future research. Progressing AI “from tokens to thoughts“, towards systems that genuinely comprehend and reason, will necessitate embracing principles that cultivate this richer, contextually-aware conceptual structure. Our framework offers a quantitative step in this direction, encouraging further exploration of how apparent “inefficiencies” might, in fact, be hallmarks of robust, human-like intelligence.

While this study offers valuable insights, several limitations should be considered.

Our analysis primarily focuses on English; generalizability across languages with different structures is an open question.

Human categorization data as a benchmark may not fully capture cognitive complexity and could introduce biases.

Our IB-RDT objective is applied to specific LLMs; other models or representations might behave differently.

We focus on static, context-free representations. LLMs may fall short in capturing context sensitivity, as human concepts are influenced by factors beyond raw compression efficiency (experience, social interaction, cultural context).

Our analysis is limited to textual input and does not explore image-based representations.

Future work could address these by expanding to other languages, exploring alternative cognitive models, dynamic representations, and testing these principles on different architectures or in real-world applications.

The aggregated and digitized human categorization datasets from Rosch [1973a, 1975], McCloskey and Glucksberg [1978] are made available in CSV format at: [Link reduced for anonymity].

BERT family: deberta-large, bert-large-uncased, roberta-large [Devlin et al., 2019, He et al., 2020, Zhuang et al., 2021].

QWEN family: qwen2-0.5b, qwen2.5-0.5b, qwen1.5-0.5b, qwen2.5-1.5b, qwen5-1.5b, qwen1.5-1.5b, qwen1.5-4b, qwen2.5-4b, qwen2-7b, qwen1.5-14b, qwen1.5-32b, qwen1.5-72b [Bai et al., 2023, Yang et al., 2024].

Llama family: llama-3.2-1b, llama-3.1-8b, llama-3-8b, llama-3-70b, llama-3.1-70b [Touvron et al., 2023a, b, Grattafiori et al., 2024].

Mistral family: mistral-7b-v0.3 [Karamcheti et al., 2021].

To further validate our cluster alignment findings (Section 5.1), in addition to Adjusted Mutual Information (AMI) and the Normalized Mutual Information (NMI), we also computed the Adjusted Rand Index (ARI) for the k-means clusters derived from LLM embeddings against human-defined categories. ARI measures the similarity between two data clusterings, correcting for chance. Like AMI, a score of 1 indicates perfect agreement and 0 indicates chance agreement.

Across all tested LLMs, the ARI and NMI scores largely mirrored the trends observed with AMI, showing significantly above-chance alignment with human categories and similar relative model performances. Silhouette scores, while more variable, generally indicated reasonable cluster cohesion for both LLM-derived and human categories. Detailed tables of these scores are provided below.

These supplementary metrics reinforce the conclusion that LLMs capture broad human-like conceptual groupings.

Table LABEL:tab:ami_detailed provides a more granular view of the AMI scores for each LLM across the three individual psychological datasets.

Figure 5 shows representative scatter plots illustrating the relationship between human typicality scores (or psychological distances) and the LLM-derived item-centroid cosine similarities for selected categories and models. These plots visually demonstrate the often modest correlations discussed in Section 5.2.

Figure 6 shows the aggregated Spearman correlation across model families and datasets. These correlations are very weak and mostly non-significant.

(Content from your original Appendix Section A: “Theoretical Extreme Case Exploration“ would go here, ensuring it refers to ℒ\mathcal{L} as defined in Equation 4).

In the case where |C|=|X||C|=|X| (each data point is a cluster of size 11, so |Cc|=1​∀c∈C|C_{c}|=1\ \forall c\in C), then H​(X|C)=1|X|​∑c∈C1⋅log2⁡1=0H(X|C)=\frac{1}{|X|}\sum_{c\in C}1\cdot\log_{2}1=0. The distortion term σc2=0\sigma_{c}^{2}=0 for each cluster as the item is its own centroid. Thus, ℒ=I​(X;C)+β⋅0=H​(X)−H​(X|C)=H​(X)=log2⁡|X|\mathcal{L}=I(X;C)+\beta\cdot 0=H(X)-H(X|C)=H(X)=\log_{2}|X|. This represents the cost of encoding each item perfectly without any compression via clustering, and zero distortion.

In the case where |C|=1|C|=1 (one cluster CXC_{X} contains all |X||X| data points, so |CCX|=|X||C_{C_{X}}|=|X|), then H​(X|C)=1|X|​|X|​log2⁡|X|=log2⁡|X|H(X|C)=\frac{1}{|X|}|X|\log_{2}|X|=\log_{2}|X|. Thus, I​(X;C)=H​(X)−H​(X|C)=log2⁡|X|−log2⁡|X|=0I(X;C)=H(X)-H(X|C)=\log_{2}|X|-\log_{2}|X|=0. This represents maximum compression (all items are treated as one). The distortion term becomes β⋅1|X|​|X|⋅σX2=β⋅σX2\beta\cdot\frac{1}{|X|}|X|\cdot\sigma_{X}^{2}=\beta\cdot\sigma_{X}^{2}, where σX2\sigma_{X}^{2} is the variance of all items XX with respect to the global centroid of XX. So, ℒ=0+β⋅σX2=β⋅σX2\mathcal{L}=0+\beta\cdot\sigma_{X}^{2}=\beta\cdot\sigma_{X}^{2}. This represents the scenario of maximum compression where the cost is purely the distortion incurred by representing all items by a single prototype.

Figure 7 shows the mean cluster entropy (SαS_{\alpha}) versus the number of clusters (KK) aggregated across the different LLM families, compared against human-defined categories (represented as distinct points or lines at their fixed KK values from the datasets). Higher entropy values indicate less compressed or more diverse clustering.

Table: A1.T1: Mutual information measures (normalized mutual information, adjusted mutual information, adjusted rand index) per model per dataset. Aggregated results are shown in the main paper and the Figures in the Appendix.

DatasetModelNMIAMIARI
[Rosch, 1973c]bert-large-uncased0.194530.20110.11336
[Rosch, 1975]bert-large-uncased0.165470.273240.2216
[McCloskey and Glucksberg, 1978]bert-large-uncased0.120030.159340.06306
[Rosch, 1973c]FacebookAI/roberta-large0.10210.106660.03393
[Rosch, 1975]FacebookAI/roberta-large0.121380.239380.14165
[McCloskey and Glucksberg, 1978]FacebookAI/roberta-large0.062710.088730.03173
[Rosch, 1973c]google-t5/t5-large0.165830.168550.03676
[Rosch, 1975]google-t5/t5-large-0.037990.041790.00758
[McCloskey and Glucksberg, 1978]google-t5/t5-large0.061460.088250.0082
[Rosch, 1973c]google/gemma-2-27b0.085230.090650.04158
[Rosch, 1975]google/gemma-2-27b0.042760.100620.06244
[McCloskey and Glucksberg, 1978]google/gemma-2-27b0.078140.102740.04364
[Rosch, 1973c]google/gemma-2-2b0.040290.041070.01212
[Rosch, 1975]google/gemma-2-2b0.045290.148440.07596
[McCloskey and Glucksberg, 1978]google/gemma-2-2b0.099530.135930.06326
[Rosch, 1973c]google/gemma-2-9b0.12220.127570.06053
[Rosch, 1975]google/gemma-2-9b0.078410.161260.09617
[McCloskey and Glucksberg, 1978]google/gemma-2-9b0.108790.139970.06439
[Rosch, 1973c]google/gemma-2b0.043360.046160.01593
[Rosch, 1975]google/gemma-2b-0.003530.044830.01577
[McCloskey and Glucksberg, 1978]google/gemma-2b0.034720.054840.02142
[Rosch, 1973c]google/gemma-7b0.044590.045470.01052
[Rosch, 1975]google/gemma-7b-0.030550.026440.01506
[McCloskey and Glucksberg, 1978]google/gemma-7b0.033380.057240.02176
[Rosch, 1973c]meta-llama/Llama-3.1-70B0.030080.035280.01936
[Rosch, 1975]meta-llama/Llama-3.1-70B-0.070260.026360.00392
[McCloskey and Glucksberg, 1978]meta-llama/Llama-3.1-70B-0.047730.009720.00236
[Rosch, 1973c]meta-llama/Llama-3.1-8B0.004730.003930.00023
[Rosch, 1975]meta-llama/Llama-3.1-8B-0.039280.054890.01884
[McCloskey and Glucksberg, 1978]meta-llama/Llama-3.1-8B-0.026710.022086.00E-05
[Rosch, 1973c]meta-llama/Llama-3.2-1B0.019360.015670.00246
[Rosch, 1975]meta-llama/Llama-3.2-1B-0.018760.056630.00782
[McCloskey and Glucksberg, 1978]meta-llama/Llama-3.2-1B0.036250.067980.01352
[Rosch, 1973c]meta-llama/Llama-3.2-3B0.037570.035370.00876
[Rosch, 1975]meta-llama/Llama-3.2-3B0.018930.096190.03193
[McCloskey and Glucksberg, 1978]meta-llama/Llama-3.2-3B0.039140.073950.0202
[Rosch, 1973c]meta-llama/Meta-Llama-3-70B0.022890.031330.01514
[Rosch, 1975]meta-llama/Meta-Llama-3-70B-0.064280.01850.00554
[McCloskey and Glucksberg, 1978]meta-llama/Meta-Llama-3-70B-0.045950.010680.00272
[Rosch, 1973c]meta-llama/Meta-Llama-3-8B0.035120.028520.00225
[Rosch, 1975]meta-llama/Meta-Llama-3-8B-0.060110.036940.00676
[McCloskey and Glucksberg, 1978]meta-llama/Meta-Llama-3-8B-0.03550.02190.00676
[Rosch, 1973c]microsoft/deberta-large0.037480.039090.01467
[Rosch, 1975]microsoft/deberta-large0.165680.289930.20527
[McCloskey and Glucksberg, 1978]microsoft/deberta-large0.032170.061750.03019
[Rosch, 1973c]microsoft/phi-1_50.021020.017860.0075
[Rosch, 1975]microsoft/phi-1_50.039890.138870.04305
[McCloskey and Glucksberg, 1978]microsoft/phi-1_50.008950.052150.00639
[Rosch, 1973c]microsoft/phi-10.02490.016980.00133
[Rosch, 1975]microsoft/phi-1-0.036250.028110.00217
[McCloskey and Glucksberg, 1978]microsoft/phi-1-0.011480.030850.00371
[Rosch, 1973c]microsoft/phi-20.037030.029680.00404
[Rosch, 1975]microsoft/phi-2-0.036540.042270.03942
[McCloskey and Glucksberg, 1978]microsoft/phi-2-0.002540.025310.00533
[Rosch, 1973c]microsoft/phi-40.030750.030430.01076
[Rosch, 1975]microsoft/phi-4-0.067370.00092-0.01361
[McCloskey and Glucksberg, 1978]microsoft/phi-4-0.017890.027050.00066
[Rosch, 1973c]mistralai/Mistral-7B-v0.30.04250.035070.00357
[Rosch, 1975]mistralai/Mistral-7B-v0.3-0.050180.012170.0177
[McCloskey and Glucksberg, 1978]mistralai/Mistral-7B-v0.3-0.012640.039020.00931
[Rosch, 1973c]Qwen/Qwen1.5-0.5B0.00148-0.002250.00399
[Rosch, 1975]Qwen/Qwen1.5-0.5B-0.015380.048330.0095
[McCloskey and Glucksberg, 1978]Qwen/Qwen1.5-0.5B0.025590.060230.00771
[Rosch, 1973c]Qwen/Qwen1.5-1.8B0.033970.032320.01034
[Rosch, 1975]Qwen/Qwen1.5-1.8B-0.011290.058030.00683
[McCloskey and Glucksberg, 1978]Qwen/Qwen1.5-1.8B-0.005410.036140.00538
[Rosch, 1973c]Qwen/Qwen1.5-14B0.03720.027380.0028
[Rosch, 1975]Qwen/Qwen1.5-14B-0.026040.051530.01211
[McCloskey and Glucksberg, 1978]Qwen/Qwen1.5-14B0.001240.041360.00338
[Rosch, 1973c]Qwen/Qwen1.5-32B0.026380.024360.00409
[Rosch, 1975]Qwen/Qwen1.5-32B-0.034130.02526-0.00665
[McCloskey and Glucksberg, 1978]Qwen/Qwen1.5-32B-0.019910.02124-0.00059
[Rosch, 1973c]Qwen/Qwen1.5-4B0.038030.040580.01742
[Rosch, 1975]Qwen/Qwen1.5-4B-0.033090.039880.01678
[McCloskey and Glucksberg, 1978]Qwen/Qwen1.5-4B-0.039970.00548-0.00028
[Rosch, 1973c]Qwen/Qwen1.5-72B0.036970.028920.00144
[Rosch, 1975]Qwen/Qwen1.5-72B-0.061840.022130.0017
[McCloskey and Glucksberg, 1978]Qwen/Qwen1.5-72B-0.020220.029180.00297
[Rosch, 1973c]Qwen/Qwen2-0.5B0.022660.019230.00662
[Rosch, 1975]Qwen/Qwen2-0.5B0.05150.145710.04999
[McCloskey and Glucksberg, 1978]Qwen/Qwen2-0.5B0.015080.043570.00643
[Rosch, 1973c]Qwen/Qwen2-1.5B0.029560.027790.00544
[Rosch, 1975]Qwen/Qwen2-1.5B-0.035950.03443-0.01099
[McCloskey and Glucksberg, 1978]Qwen/Qwen2-1.5B0.017680.054070.01604
[Rosch, 1973c]Qwen/Qwen2-7B0.064240.064390.02067
[Rosch, 1975]Qwen/Qwen2-7B0.03330.091550.02832
[McCloskey and Glucksberg, 1978]Qwen/Qwen2-7B0.053290.075990.01977
[Rosch, 1973c]Qwen/Qwen2.5-0.5B0.031650.032910.01029
[Rosch, 1975]Qwen/Qwen2.5-0.5B-0.06534-0.0196-0.01165
[McCloskey and Glucksberg, 1978]Qwen/Qwen2.5-0.5B0.00620.041910.0054
[Rosch, 1973c]Qwen/Qwen2.5-1.5B0.048380.04890.0129
[Rosch, 1975]Qwen/Qwen2.5-1.5B0.037850.1130.02761
[McCloskey and Glucksberg, 1978]Qwen/Qwen2.5-1.5B0.061660.086750.03162
[Rosch, 1973c]Qwen/Qwen2.5-3B0.038820.03480.00465
[Rosch, 1975]Qwen/Qwen2.5-3B0.039770.108210.04302
[McCloskey and Glucksberg, 1978]Qwen/Qwen2.5-3B0.034160.073070.02959
[Rosch, 1973c]Qwen/Qwen2.5-7B0.05290.050510.01605
[Rosch, 1975]Qwen/Qwen2.5-7B-0.009050.032270.01044
[McCloskey and Glucksberg, 1978]Qwen/Qwen2.5-7B0.002220.027590.00551

Table: A1.T2: Correlation between Human Typicality Judgments and LLM Internal Cluster Geometry. Spearman rank correlations between human-rated psychological typicality/distance (higher human scores = less typical/more distant) and item-to-centroid cosine similarity (higher similarity = more central to LLM cluster). Negative correlations suggest alignment. p∗∗<0.05{}^{**}p<0.05.

Rosch (1973)Rosch (1975)McCloskey (1978)
Qwen1.5-72B-0.237-0.049-0.016
Llama-3-70B-0.124∗∗-0.0850.016
Llama-3.1-70B-0.125∗∗-0.0840.015
Qwen1.5-32B-0.051-0.064∗∗0.007
gemma-2-27b-0.166-0.1160.038
Qwen1.5-14B-0.197-0.052-0.029
phi-4-0.061-0.0440.025
gemma-2-9b-0.282-0.0740.117
Llama-3.1-8B-0.184-0.075-0.058
Llama-3-8B-0.162-0.073-0.053
Mistral-7B-v0.30.015-0.1120.040
Qwen2-7B-0.021-0.105-0.008
Qwen2.5-7B0.033-0.066-0.030
gemma-7b-0.135-0.047∗∗0.010
Llama-3.2-3B-0.0070.0000.001
phi-20.049-0.108∗∗-0.001
gemma-2b-0.176-0.0550.052
gemma-2-2b-0.283-0.1070.117
Qwen1.5-1.8B-0.106-0.0850.021
Qwen2.5-1.5B-0.003-0.0350.015
phi-1.50.134-0.1340.007
phi-1-0.219-0.1380.013
Llama-3.2-1B-0.062-0.004∗∗-0.003
Qwen1.5-0.5B-0.122-0.004-0.001
Qwen2-0.5B-0.0440.009-0.009
Qwen2.5-0.5B-0.018-0.009-0.007
roberta-large0.088-0.047-0.074
bert-large-uncased-0.427-0.198∗∗0.206∗∗
deberta-large0.016-0.042-0.023

Refer to caption LLM-derived Clusters Show Above-Chance Alignment with Human Conceptual Categories. Adjusted Mutual Information (AMI) between human categories and LLM-embedding clusters versus model size. Results are averaged over three psychological datasets. All models perform significantly better than random clustering. BERT’s performance is notably strong.

Refer to caption (a) Human conceptual categories exhibit higher mean entropy. Mean cluster entropy (SαS_{\alpha}) vs. number of clusters (KK) for LLMs and human categories (fixed KK). Higher entropy indicates less compression.

Refer to caption (b) LLMs achieve more optimal ℒ\mathcal{L} trade-off. Our information-theoretic objective (ℒ\mathcal{L}) vs. KK. Lower ℒ\mathcal{L} indicates a more statistically optimal compression-meaning balance.

Refer to caption Weak-to-No Correlation Between LLM Embedding Distance and Human Typicality Judgments. Scatter plot examples of the cosine similarity versus the human typicality of items belonging to the category compared to items from other categories.

Refer to caption Weak and Mostly Non-Significant Spearman Correlation Values Between Human Typicality Judgments and LLM Cosine Similarity Indicating Different Structure Representing Concepts. Mean Spearman correlation values across the models belonging to the same family and across the three datasets.

Refer to caption Human Conceptual Categories Exhibit Higher Mean Entropy than LLM-Derived Clusters. Mean cluster entropy (SαS_{\alpha}) versus the number of clusters (KK) for various LLMs, compared against human-defined categories (represented as distinct points or lines at their fixed KK values from the datasets). Higher entropy values indicate less compressed or more diverse clusterings.

Refer to caption LLMs Achieve a More “Optimal” Compression-Meaning Trade-off by the ℒ\mathcal{L} Measure. IB-RDT objective (ℒ\mathcal{L}) vs. KK. Lower ℒ\mathcal{L} indicates a more optimal balance between compression (I​(X;C)I(X;C)) and semantic fidelity (distortion). Human categories (fixed KK) show higher ℒ\mathcal{L} values.

$$ \mathcal{L}(X,C;\beta)=\text{Complexity}(X,C)+\beta\cdot\text{Distortion}(X,C). $$ \tag{S4.E1}

$$ \text{Complexity}(X,C)=\log_{2}|X|-\frac{1}{|X|}\sum_{c\in C}|C_{c}|\log_{2}|C_{c}|. $$ \tag{S4.E2}

newtext{Complexity-Distortion Ratio (on the importance of $ beta$)

Figure 29 provides an additional sensitivity analysis in which we examine the ratio between the distortion and complexity components of L as β varies. Across all three datasets, encoder models maintain flat profiles, indicating stable conceptual structure under compression, whereas decoder models exhibit stronger shifts, reflecting greater reallocation of representational capacity.

{Static vs. Contextual AMI Exploration

Analysis of 13 instruction-tuned models across 5 families (Qwen, Llama, Gemma, Phi, Mistral; see Table 5 for results) indicates no statistical significance ( r = -0 . 202 , p = 0 . 508 ). This finding suggests that while the L objective successfully identifies models that compress semantic categories more effectively, this compression ability does not directly translate to improved performance on standard NLP benchmarks. The lack of correlation implies that concept compression and benchmark accuracy represent distinct aspects of model capability, with the former capturing semantic organization efficiency and the latter measuring general knowledge and reasoning abilities. We specifically chose instruction-tuned models to ensure fair comparison on MMLU, as base models would likely perform poorly on this instruction-following benchmark. While our analysis covers a diverse range of model families and sizes, this represents a subset of available models due to the limited availability of instruction-tuned variants.

Table 5: No correlation between L objective scores and MMLU scores across different sizes and families. Table displays L objective values vs. MMLU scores for 13 instruction-tuned models. Correlation measured ( r = -0 . 202 , p = 0 . 508 ).

Figure

(c)

L

as a function of K for McCloskey & Glucksberg (1978).

Figure 14: Compression of non-English representations. All non-English languages (Spanish, German, Italian, Russian) exhibit higher compression than English, with smaller models showing the greatest compression. This supports the interpretation that limited non-English training data leads to less flexible and less interpretable representations.

Figure 15: OLMo-7B develops conceptual structure through two-phase dynamics. Left : AMI with human categories rises rapidly in early training then refines gradually. Right : Peak semantic processing migrates from deep (layer 28) toward mid-network layers during training, revealing architectural reorganization for efficiency. Representative checkpoints shown; full 57-checkpoint analysis in Appendix B.5.

Figure 15: OLMo-7B develops conceptual structure through two-phase dynamics. Left : AMI with human categories rises rapidly in early training then refines gradually. Right : Peak semantic processing migrates from deep (layer 28) toward mid-network layers during training, revealing architectural reorganization for efficiency. Representative checkpoints shown; full 57-checkpoint analysis in Appendix B.5.

Figure 16: Left: OLMo-7B representations steadily strengthen during training : Concept representations develop rapidly at early steps, then refine more gradually over time. Right: Semantic processing shifts from deep to mid-network layers : The model undergoes a two-phase dynamic - initially moving semantic processing upward during rapid learning, then reorganizing to optimize efficiency while preserving performance. To improve readability, we present six representative checkpoints that capture the trend.

Figure 16: Left: OLMo-7B representations steadily strengthen during training : Concept representations develop rapidly at early steps, then refine more gradually over time. Right: Semantic processing shifts from deep to mid-network layers : The model undergoes a two-phase dynamic - initially moving semantic processing upward during rapid learning, then reorganizing to optimize efficiency while preserving performance. To improve readability, we present six representative checkpoints that capture the trend.

Figure 17: Complete OLMo-7B training trajectory across 57 checkpoints : This high-resolution view reveals the inherent noise and fluctuations in training, with individual checkpoint measurements varying throughout the process. Despite this variability, the overall trend aligns with the stable pattern shown in Figure 16, demonstrating that representative sampling effectively captures the underlying semantic development trajectory while filtering out training noise.

Figure 17: Complete OLMo-7B training trajectory across 57 checkpoints : This high-resolution view reveals the inherent noise and fluctuations in training, with individual checkpoint measurements varying throughout the process. Despite this variability, the overall trend aligns with the stable pattern shown in Figure 16, demonstrating that representative sampling effectively captures the underlying semantic development trajectory while filtering out training noise.

Figure 18: Network-level measures of representational structure across training. Each panel shows a different metric: (a) effective rank, (b) Gini coefficient, and (c) Hoyer sparsity computed across network layers. All three measures reveal the same two-phase developmental pattern observed in Figure 15: an early rapid change followed by slower restructuring, indicating coordinated reorganization of internal representations.

Figure 18: Network-level measures of representational structure across training. Each panel shows a different metric: (a) effective rank, (b) Gini coefficient, and (c) Hoyer sparsity computed across network layers. All three measures reveal the same two-phase developmental pattern observed in Figure 15: an early rapid change followed by slower restructuring, indicating coordinated reorganization of internal representations.

Figure 20: Polysemy is not likely to influence our results as most of the words in our dataset are concrete nouns, which tend to be less polysemous. Histogram of WordNet sense counts for the 943 lemmas in our benchmark. The dashed lines indicate the median and mean (2 senses, 3.47 senses, respectively).

Figure 20: Polysemy is not likely to influence our results as most of the words in our dataset are concrete nouns, which tend to be less polysemous. Histogram of WordNet sense counts for the 943 lemmas in our benchmark. The dashed lines indicate the median and mean (2 senses, 3.47 senses, respectively).

Figure 21: Substantial lexical overlap suggests that tokenization differences alone cannot explain the observed performance variations in our experiments. Vocabulary overlap between different tokenizers; most tokenizer families share substantial lexical overlap.

Figure 21: Substantial lexical overlap suggests that tokenization differences alone cannot explain the observed performance variations in our experiments. Vocabulary overlap between different tokenizers; most tokenizer families share substantial lexical overlap.

Figure 22: LLM-derived Clusters Show Above-Chance Alignment with Human Conceptual Categories. Normalized Mutual Information (NMI) between human-defined categories and clusters from static LLM embeddings. Results are averaged over three psychological datasets. All models perform significantly better than random clustering. BERT's performance is notably strong.

Figure 22: LLM-derived Clusters Show Above-Chance Alignment with Human Conceptual Categories. Normalized Mutual Information (NMI) between human-defined categories and clusters from static LLM embeddings. Results are averaged over three psychological datasets. All models perform significantly better than random clustering. BERT's performance is notably strong.

Figure 24: Evidence that encoder-decoder differences are not driven by dataset artifacts. We compare the only model families that can be matched in training data GPT, Pythia, Cerebras, and T5. While these comparisons cannot eliminate all possible confounds, they show that the architectural patterns we report are robust: encoder models consistently achieve higher AMI and lower L , suggesting that the observed differences cannot be explained by dataset variation alone.

Figure 24: Evidence that encoder-decoder differences are not driven by dataset artifacts. We compare the only model families that can be matched in training data GPT, Pythia, Cerebras, and T5. While these comparisons cannot eliminate all possible confounds, they show that the architectural patterns we report are robust: encoder models consistently achieve higher AMI and lower L , suggesting that the observed differences cannot be explained by dataset variation alone.

Figure 25: Weak-to-No Correlation Between LLM Embedding Distance and Human Typicality Judgments. Scatter plot examples of the cosine similarity versus the human typicality of items belonging to the category compared to items from other categories.

Figure 25: Weak-to-No Correlation Between LLM Embedding Distance and Human Typicality Judgments. Scatter plot examples of the cosine similarity versus the human typicality of items belonging to the category compared to items from other categories.

Figure 26: Weak and Mostly Non-Significant Spearman Correlation Values Between Human Typicality Judgments and LLM Cosine Similarity Indicating Different Structure Representing Concepts. Mean Static Layer Spearman correlation values across the models belonging to the same family and across the three datasets.

Figure 26: Weak and Mostly Non-Significant Spearman Correlation Values Between Human Typicality Judgments and LLM Cosine Similarity Indicating Different Structure Representing Concepts. Mean Static Layer Spearman correlation values across the models belonging to the same family and across the three datasets.

Figure

Figure 27: Static Embeddings Achieve a more 'Optimal' Compression-Meaning Trade-off by the L Measure. IB-RDT objective ( L ) vs. K across all datasets. Lower L indicates a more optimal balance between compression ( I ( X ; C ) ) and semantic fidelity (distortion). Static embeddings consistently achieve lower L values than both human categories and contextual embeddings. The plots correspond to the three datasets in the following order: Rosch (1973a), Rosch (1975), McCloskey &Glucksberg (1978).

Figure

Figure

(a) Complexity-distortion ratio for

Rosch (1973a).

Figure 29: Complexity-Distortion Ratios Show That Encoder-Models are Less Sensitive to Variations in β . For each dataset, we compute the ratio between distortion (loss of human-aligned conceptual information) and complexity (representation size) across values of the rate-distortion tradeoff parameter β . Encoder models yield consistently flatter profiles, indicating that their token embeddings preserve conceptual structure even under increasing compression. Decoder models exhibit more pronounced shifts, suggesting that they redistribute representational capacity more aggressively as β increases, leading to higher sensitivity in this tradeoff.

Model FamilyMean Tokens / ItemVocabulary SizeTokenizer Type
BERT1.6530KWordPiece
DeBERTa &RoBERTa2.350KWordPiece
GPT2.350KBPE
Gemma1.65256KSentencePiece (subword)
Llama2.19128KBPE (SentencePiece/Tiktoken)
Mistral2.3532KBPE + control tokens
Phi2.332KBPE (SentencePiece/Tiktoken)
Qwen2.19151.6KBPE
DatasetModelNMIAMIARI
(Rosch, 1973c)bert-large-uncased0.194530.20110.11336
(Rosch, 1975)bert-large-uncased0.165470.273240.2216
(McCloskey &Glucksberg, 1978)bert-large-uncased0.120030.159340.06306
(Rosch, 1973c)FacebookAI/roberta-large0.10210.106660.03393
(Rosch, 1975)FacebookAI/roberta-large0.121380.239380.14165
(McCloskey &Glucksberg, 1978)FacebookAI/roberta-large0.062710.088730.03173
(Rosch, 1973c)google-t5/t5-large0.165830.168550.03676
(Rosch, 1975)google-t5/t5-large-0.037990.041790.00758
(McCloskey &Glucksberg, 1978)google-t5/t5-large0.061460.088250.0082
(Rosch, 1973c)google/gemma-2-27b0.085230.090650.04158
(Rosch, 1975)google/gemma-2-27b0.042760.100620.06244
(McCloskey &Glucksberg, 1978)google/gemma-2-27b0.078140.102740.04364
(Rosch, 1973c)google/gemma-2-2b0.040290.041070.01212
(Rosch, 1975)google/gemma-2-2b0.045290.148440.07596
(McCloskey &Glucksberg, 1978)google/gemma-2-2b0.099530.135930.06326
(Rosch, 1973c)google/gemma-2-9b0.12220.127570.06053
(Rosch, 1975)google/gemma-2-9b0.078410.161260.09617
(McCloskey &Glucksberg, 1978)google/gemma-2-9b0.108790.139970.06439
(Rosch, 1973c)google/gemma-2b0.043360.046160.01593
(Rosch, 1975)google/gemma-2b-0.003530.044830.01577
(McCloskey &Glucksberg, 1978)google/gemma-2b0.034720.054840.02142
(Rosch, 1973c)google/gemma-7b0.044590.045470.01052
(Rosch, 1975)google/gemma-7b-0.030550.026440.01506
(McCloskey &Glucksberg, 1978)google/gemma-7b0.033380.057240.02176
(Rosch, 1973c)meta-llama/Llama-3.1-70B0.030080.035280.01936
(Rosch, 1975)meta-llama/Llama-3.1-70B-0.070260.026360.00392
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.1-70B-0.047730.009720.00236
(Rosch, 1973c)meta-llama/Llama-3.1-8B0.004730.003930.00023
(Rosch, 1975)meta-llama/Llama-3.1-8B-0.039280.054890.01884
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.1-8B-0.026710.022086e-05
(Rosch, 1973c)meta-llama/Llama-3.2-1B0.019360.015670.00246
(Rosch, 1975)meta-llama/Llama-3.2-1B-0.018760.056630.00782
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.2-1B0.036250.067980.01352
(Rosch, 1973c)meta-llama/Llama-3.2-3B0.037570.035370.00876
(Rosch, 1975)meta-llama/Llama-3.2-3B0.018930.096190.03193
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.2-3B0.039140.073950.0202
(Rosch, 1973c)meta-llama/Meta-Llama-3-70B0.022890.031330.01514
(Rosch, 1975)meta-llama/Meta-Llama-3-70B-0.064280.01850.00554
(McCloskey &Glucksberg, 1978)meta-llama/Meta-Llama-3-70B-0.045950.010680.00272
(Rosch, 1973c)meta-llama/Meta-Llama-3-8B0.035120.028520.00225
(Rosch, 1975)meta-llama/Meta-Llama-3-8B-0.060110.036940.00676
(McCloskey &Glucksberg, 1978)meta-llama/Meta-Llama-3-8B-0.03550.02190.00676
(Rosch, 1973c)microsoft/deberta-large0.037480.039090.01467
(Rosch, 1975)microsoft/deberta-large0.165680.289930.20527
(McCloskey &Glucksberg, 1978)microsoft/deberta-large0.032170.061750.03019
(Rosch, 1973c)microsoft/phi-1 50.021020.017860.0075
(Rosch, 1975)microsoft/phi-1 50.039890.138870.04305
(McCloskey &Glucksberg, 1978)microsoft/phi-1 50.008950.052150.00639
(Rosch, 1973c)microsoft/phi-10.02490.016980.00133
(Rosch, 1975)microsoft/phi-1-0.036250.028110.00217
(McCloskey &Glucksberg, 1978)microsoft/phi-1-0.011480.030850.00371
(Rosch, 1973c)microsoft/phi-20.037030.029680.00404
(Rosch, 1975)microsoft/phi-2-0.036540.042270.03942
(McCloskey &Glucksberg, 1978)microsoft/phi-2-0.002540.025310.00533
(Rosch, 1973c)microsoft/phi-40.030750.030430.01076
(Rosch, 1975)microsoft/phi-4-0.067370.00092-0.01361
(McCloskey &Glucksberg, 1978)microsoft/phi-4-0.017890.027050.00066
(Rosch, 1973c)mistralai/Mistral-7B-v0.30.04250.035070.00357
(Rosch, 1975)mistralai/Mistral-7B-v0.3-0.050180.012170.0177
(McCloskey &Glucksberg, 1978)mistralai/Mistral-7B-v0.3-0.012640.039020.00931
(Rosch, 1973c)Qwen/Qwen1.5-0.5B0.00148-0.002250.00399
(Rosch, 1975)Qwen/Qwen1.5-0.5B-0.015380.048330.0095
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-0.5B0.025590.060230.00771
(Rosch, 1973c)Qwen/Qwen1.5-1.8B0.033970.032320.01034
(Rosch, 1975)Qwen/Qwen1.5-1.8B-0.011290.058030.00683
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-1.8B-0.005410.036140.00538
(Rosch, 1973c)Qwen/Qwen1.5-14B0.03720.027380.0028
(Rosch, 1975)Qwen/Qwen1.5-14B-0.026040.051530.01211
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-14B0.001240.041360.00338
(Rosch, 1973c)Qwen/Qwen1.5-32B0.026380.024360.00409
(Rosch, 1975)Qwen/Qwen1.5-32B-0.034130.02526-0.00665
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-32B-0.019910.02124-0.00059
(Rosch, 1973c)Qwen/Qwen1.5-4B0.038030.040580.01742
(Rosch, 1975)Qwen/Qwen1.5-4B-0.033090.039880.01678
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-4B-0.039970.00548-0.00028
(Rosch, 1973c)Qwen/Qwen1.5-72B0.036970.028920.00144
(Rosch, 1975)Qwen/Qwen1.5-72B-0.061840.022130.0017
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-72B-0.020220.029180.00297
(Rosch, 1973c)Qwen/Qwen2-0.5B0.022660.019230.00662
(Rosch, 1975)Qwen/Qwen2-0.5B0.05150.145710.04999
(McCloskey &Glucksberg, 1978)Qwen/Qwen2-0.5B0.015080.043570.00643
(Rosch, 1973c)Qwen/Qwen2-1.5B0.029560.027790.00544
(Rosch, 1975)Qwen/Qwen2-1.5B-0.035950.03443-0.01099
(McCloskey &Glucksberg, 1978)Qwen/Qwen2-1.5B0.017680.054070.01604
(Rosch, 1973c)Qwen/Qwen2-7B0.064240.064390.02067
(Rosch, 1975)Qwen/Qwen2-7B0.03330.091550.02832
(McCloskey &Glucksberg, 1978)Qwen/Qwen2-7B0.053290.075990.01977
(Rosch, 1973c)Qwen/Qwen2.5-0.5B0.031650.032910.01029
(Rosch, 1975)Qwen/Qwen2.5-0.5B-0.06534-0.0196-0.01165
(McCloskey &Glucksberg, 1978)Qwen/Qwen2.5-0.5B0.00620.041910.0054
(Rosch, 1973c)Qwen/Qwen2.5-1.5B0.048380.04890.0129
(Rosch, 1975)Qwen/Qwen2.5-1.5B0.037850.1130.02761
(McCloskey &Glucksberg, 1978)Qwen/Qwen2.5-1.5B0.061660.086750.03162
(Rosch, 1973c)Qwen/Qwen2.5-3B0.038820.03480.00465
(Rosch, 1975)Qwen/Qwen2.5-3B0.039770.108210.04302
(McCloskey &Glucksberg, 1978) (Rosch, 1973c)Qwen/Qwen2.5-3B Qwen/Qwen2.5-7B0.03416 0.05290.07307 0.050510.02959 0.01605
(Rosch, 1975)Qwen/Qwen2.5-7B-0.009050.032270.01044
(McCloskey &Glucksberg, 1978)Qwen/Qwen2.5-7B0.002220.027590.00551
ModelDataset Correlation (Spearman ρ ) - Static LayerDataset Correlation (Spearman ρ ) - Static LayerDataset Correlation (Spearman ρ ) - Static Layer
ModelRosch (1973)Rosch (1975)McCloskey (1978)
Deberta large (304M)0.1440.107*0.075
Bert large (340M)0.378*0.275*0.250*
Roberta large (355M)0.0050.038-0.029
Gemma (2B)0.0690.0780.007
Gemma 2 (2B)0.2360.119*0.147*
Gemma (7B)0.1310.100*0.007
Gemma 2 (9B)0.2800.135*0.199*
Gemma 2 (27B)0.1120.122*0.161*
Qwen1.5 (0.5B)0.1750.0760.096*
Qwen2 (0.5B)0.2380.0410.040
Qwen2.5 (0.5B)0.2120.0270.037
Qwen2.5 (1.5B)0.1410.086*0.078
Qwen1.5 (1.8B)0.1720.134*0.154*
Qwen2 (7B)0.0360.087*0.040
Qwen1.5 (14B)0.1540.086*0.108*
Qwen1.5 (32B)-0.0320.100*0.081
Qwen2.5 (32B)0.0350.105*0.084
Mistral v0.3 (7B)0.0760.152*-0.009
Llama 3.2 (1B)0.301*0.0560.039
Llama 3 (8B)-0.0020.099*0.080
Llama 3.1 (8B)0.0040.108*0.081
Llama 3 (70B)0.1480.161*0.155*
Llama 3.1 (70B)0.1220.161*0.155*
Phi 1 (1.42B)0.0710.0520.054
Phi 1.5 (1.42B)-0.0880.0790.018
Phi 2 (2.78B)-0.0560.0440.024
Phi 4 (14.7B)0.0790.086*0.097*
T5 Large (770M)0.2350.259*0.178*
GPT-2 Medium (355M)-0.0320.063-0.017
ViT-B/32 Text (63.1M)0.527*0.315*0.286*
ViT-B/16 Text (63.1M)0.528*0.289*0.278*
Word2Vec (300D)0.442*0.349*0.437*
Glove (300D)0.315*0.333*0.350*
ModelDataset Correlation (Spearman ρ ) - Peak AMI LayerDataset Correlation (Spearman ρ ) - Peak AMI LayerDataset Correlation (Spearman ρ ) - Peak AMI Layer
ModelRosch (1973)Rosch (1975)McCloskey (1978)
Deberta large (304M)0.2770.107*0.126*
Bert large (340M)-0.1200.148*0.026
Roberta large (355M)-0.0110.038-0.022
Gemma (2B)0.1270.092*0.039
Gemma 2 (2B)0.0340.139*0.103*
Gemma (7B)0.004-0.0500.088
Gemma 2 (9B)0.0470.110*0.098*
Gemma 2 (27B)-0.1350.090*-0.101*
Qwen1.5 (0.5B)-0.025-0.0640.122*
Qwen2 (0.5B)0.0900.064-0.077
Qwen2.5 (0.5B)-0.0120.121*0.072
Qwen2.5 (1.5B)-0.1140.084*-0.028
Qwen1.5 (1.8B)0.092-0.044-0.004
Qwen2 (7B)0.0040.049-0.087
Qwen1.5 (14B)0.0320.091*0.043
Qwen1.5 (32B)0.0260.0820.111*
Qwen2.5 (32B)0.0450.079-0.060
Mistral v0.3 (7B)0.0130.0390.107*
Llama 3.2 (1B)0.0630.094*0.070
Llama 3 (8B)-0.0450.138*0.084
Llama 3.1 (8B)-0.0960.130*0.087
Llama 3 (70B)-0.1080.0230.050
Phi 1 (1.42B)0.2160.185*-0.029
Phi 1.5 (1.42B)0.1620.098*0.015
Phi 2 (2.78B)0.325*0.142*-0.033
Phi 4 (14.7B)0.0790.129*0.007
T5 Large (770M)0.2190.226*0.282*
GPT-2 Small (117M)-0.0460.118*0.010
GPT-2 Medium (355M)0.0770.109*-0.029
ViT-B/32 Text (63.1M)0.1280.0550.152*
ViT-B/16 Text (63.1M)0.0890.086*0.127*
Word2Vec (300D)0.442*0.349*0.437*
Glove (300D)0.315*0.333*0.350*
ModelSizeL ScoreMMLUScore
Qwen2-0.5B-Instruct494M1.930.433
Qwen2.5-0.5B-Instruct494M1.9820.469
Llama-3.2-1B-Instruct1.2B1.8760.454
Qwen2.5-1.5B-Instruct1.5B2.0710.597
Gemma-2B-IT2.5B2.3820.366
Gemma-2-2B-IT2.6B2.2630.565
Phi-4-mini-instruct3.8B1.9050.678
Mistral-7B-Instruct-v0.37.2B1.7140.603
Qwen2-7B-Instruct7.6B2.3190.7
Meta-Llama-3-8B-Instruct8.0B1.3480.647
Llama-3.1-8B-Instruct8.0B1.320.679
Gemma-7B-IT8.5B2.3720.512
Gemma-2-9B-IT9.2B2.4670.723
Model FamilyMean Tokens / ItemVocabulary SizeTokenizer Type
BERT1.6530KWordPiece
DeBERTa &RoBERTa2.350KWordPiece
GPT2.350KBPE
Gemma1.65256KSentencePiece (subword)
Llama2.19128KBPE (SentencePiece/Tiktoken)
Mistral2.3532KBPE + control tokens
Phi2.332KBPE (SentencePiece/Tiktoken)
Qwen2.19151.6KBPE
DatasetModelNMIAMIARI
(Rosch, 1973c)bert-large-uncased0.194530.20110.11336
(Rosch, 1975)bert-large-uncased0.165470.273240.2216
(McCloskey &Glucksberg, 1978)bert-large-uncased0.120030.159340.06306
(Rosch, 1973c)FacebookAI/roberta-large0.10210.106660.03393
(Rosch, 1975)FacebookAI/roberta-large0.121380.239380.14165
(McCloskey &Glucksberg, 1978)FacebookAI/roberta-large0.062710.088730.03173
(Rosch, 1973c)google-t5/t5-large0.165830.168550.03676
(Rosch, 1975)google-t5/t5-large-0.037990.041790.00758
(McCloskey &Glucksberg, 1978)google-t5/t5-large0.061460.088250.0082
(Rosch, 1973c)google/gemma-2-27b0.085230.090650.04158
(Rosch, 1975)google/gemma-2-27b0.042760.100620.06244
(McCloskey &Glucksberg, 1978)google/gemma-2-27b0.078140.102740.04364
(Rosch, 1973c)google/gemma-2-2b0.040290.041070.01212
(Rosch, 1975)google/gemma-2-2b0.045290.148440.07596
(McCloskey &Glucksberg, 1978)google/gemma-2-2b0.099530.135930.06326
(Rosch, 1973c)google/gemma-2-9b0.12220.127570.06053
(Rosch, 1975)google/gemma-2-9b0.078410.161260.09617
(McCloskey &Glucksberg, 1978)google/gemma-2-9b0.108790.139970.06439
(Rosch, 1973c)google/gemma-2b0.043360.046160.01593
(Rosch, 1975)google/gemma-2b-0.003530.044830.01577
(McCloskey &Glucksberg, 1978)google/gemma-2b0.034720.054840.02142
(Rosch, 1973c)google/gemma-7b0.044590.045470.01052
(Rosch, 1975)google/gemma-7b-0.030550.026440.01506
(McCloskey &Glucksberg, 1978)google/gemma-7b0.033380.057240.02176
(Rosch, 1973c)meta-llama/Llama-3.1-70B0.030080.035280.01936
(Rosch, 1975)meta-llama/Llama-3.1-70B-0.070260.026360.00392
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.1-70B-0.047730.009720.00236
(Rosch, 1973c)meta-llama/Llama-3.1-8B0.004730.003930.00023
(Rosch, 1975)meta-llama/Llama-3.1-8B-0.039280.054890.01884
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.1-8B-0.026710.022086e-05
(Rosch, 1973c)meta-llama/Llama-3.2-1B0.019360.015670.00246
(Rosch, 1975)meta-llama/Llama-3.2-1B-0.018760.056630.00782
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.2-1B0.036250.067980.01352
(Rosch, 1973c)meta-llama/Llama-3.2-3B0.037570.035370.00876
(Rosch, 1975)meta-llama/Llama-3.2-3B0.018930.096190.03193
(McCloskey &Glucksberg, 1978)meta-llama/Llama-3.2-3B0.039140.073950.0202
(Rosch, 1973c)meta-llama/Meta-Llama-3-70B0.022890.031330.01514
(Rosch, 1975)meta-llama/Meta-Llama-3-70B-0.064280.01850.00554
(McCloskey &Glucksberg, 1978)meta-llama/Meta-Llama-3-70B-0.045950.010680.00272
(Rosch, 1973c)meta-llama/Meta-Llama-3-8B0.035120.028520.00225
(Rosch, 1975)meta-llama/Meta-Llama-3-8B-0.060110.036940.00676
(McCloskey &Glucksberg, 1978)meta-llama/Meta-Llama-3-8B-0.03550.02190.00676
(Rosch, 1973c)microsoft/deberta-large0.037480.039090.01467
(Rosch, 1975)microsoft/deberta-large0.165680.289930.20527
(McCloskey &Glucksberg, 1978)microsoft/deberta-large0.032170.061750.03019
(Rosch, 1973c)microsoft/phi-1 50.021020.017860.0075
(Rosch, 1975)microsoft/phi-1 50.039890.138870.04305
(McCloskey &Glucksberg, 1978)microsoft/phi-1 50.008950.052150.00639
(Rosch, 1973c)microsoft/phi-10.02490.016980.00133
(Rosch, 1975)microsoft/phi-1-0.036250.028110.00217
(McCloskey &Glucksberg, 1978)microsoft/phi-1-0.011480.030850.00371
(Rosch, 1973c)microsoft/phi-20.037030.029680.00404
(Rosch, 1975)microsoft/phi-2-0.036540.042270.03942
(McCloskey &Glucksberg, 1978)microsoft/phi-2-0.002540.025310.00533
(Rosch, 1973c)microsoft/phi-40.030750.030430.01076
(Rosch, 1975)microsoft/phi-4-0.067370.00092-0.01361
(McCloskey &Glucksberg, 1978)microsoft/phi-4-0.017890.027050.00066
(Rosch, 1973c)mistralai/Mistral-7B-v0.30.04250.035070.00357
(Rosch, 1975)mistralai/Mistral-7B-v0.3-0.050180.012170.0177
(McCloskey &Glucksberg, 1978)mistralai/Mistral-7B-v0.3-0.012640.039020.00931
(Rosch, 1973c)Qwen/Qwen1.5-0.5B0.00148-0.002250.00399
(Rosch, 1975)Qwen/Qwen1.5-0.5B-0.015380.048330.0095
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-0.5B0.025590.060230.00771
(Rosch, 1973c)Qwen/Qwen1.5-1.8B0.033970.032320.01034
(Rosch, 1975)Qwen/Qwen1.5-1.8B-0.011290.058030.00683
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-1.8B-0.005410.036140.00538
(Rosch, 1973c)Qwen/Qwen1.5-14B0.03720.027380.0028
(Rosch, 1975)Qwen/Qwen1.5-14B-0.026040.051530.01211
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-14B0.001240.041360.00338
(Rosch, 1973c)Qwen/Qwen1.5-32B0.026380.024360.00409
(Rosch, 1975)Qwen/Qwen1.5-32B-0.034130.02526-0.00665
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-32B-0.019910.02124-0.00059
(Rosch, 1973c)Qwen/Qwen1.5-4B0.038030.040580.01742
(Rosch, 1975)Qwen/Qwen1.5-4B-0.033090.039880.01678
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-4B-0.039970.00548-0.00028
(Rosch, 1973c)Qwen/Qwen1.5-72B0.036970.028920.00144
(Rosch, 1975)Qwen/Qwen1.5-72B-0.061840.022130.0017
(McCloskey &Glucksberg, 1978)Qwen/Qwen1.5-72B-0.020220.029180.00297
(Rosch, 1973c)Qwen/Qwen2-0.5B0.022660.019230.00662
(Rosch, 1975)Qwen/Qwen2-0.5B0.05150.145710.04999
(McCloskey &Glucksberg, 1978)Qwen/Qwen2-0.5B0.015080.043570.00643
(Rosch, 1973c)Qwen/Qwen2-1.5B0.029560.027790.00544
(Rosch, 1975)Qwen/Qwen2-1.5B-0.035950.03443-0.01099
(McCloskey &Glucksberg, 1978)Qwen/Qwen2-1.5B0.017680.054070.01604
(Rosch, 1973c)Qwen/Qwen2-7B0.064240.064390.02067
(Rosch, 1975)Qwen/Qwen2-7B0.03330.091550.02832
(McCloskey &Glucksberg, 1978)Qwen/Qwen2-7B0.053290.075990.01977
(Rosch, 1973c)Qwen/Qwen2.5-0.5B0.031650.032910.01029
(Rosch, 1975)Qwen/Qwen2.5-0.5B-0.06534-0.0196-0.01165
(McCloskey &Glucksberg, 1978)Qwen/Qwen2.5-0.5B0.00620.041910.0054
(Rosch, 1973c)Qwen/Qwen2.5-1.5B0.048380.04890.0129
(Rosch, 1975)Qwen/Qwen2.5-1.5B0.037850.1130.02761
(McCloskey &Glucksberg, 1978)Qwen/Qwen2.5-1.5B0.061660.086750.03162
(Rosch, 1973c)Qwen/Qwen2.5-3B0.038820.03480.00465
(Rosch, 1975)Qwen/Qwen2.5-3B0.039770.108210.04302
(McCloskey &Glucksberg, 1978) (Rosch, 1973c)Qwen/Qwen2.5-3B Qwen/Qwen2.5-7B0.03416 0.05290.07307 0.050510.02959 0.01605
(Rosch, 1975)Qwen/Qwen2.5-7B-0.009050.032270.01044
(McCloskey &Glucksberg, 1978)Qwen/Qwen2.5-7B0.002220.027590.00551
ModelDataset Correlation (Spearman ρ ) - Static LayerDataset Correlation (Spearman ρ ) - Static LayerDataset Correlation (Spearman ρ ) - Static Layer
ModelRosch (1973)Rosch (1975)McCloskey (1978)
Deberta large (304M)0.1440.107*0.075
Bert large (340M)0.378*0.275*0.250*
Roberta large (355M)0.0050.038-0.029
Gemma (2B)0.0690.0780.007
Gemma 2 (2B)0.2360.119*0.147*
Gemma (7B)0.1310.100*0.007
Gemma 2 (9B)0.2800.135*0.199*
Gemma 2 (27B)0.1120.122*0.161*
Qwen1.5 (0.5B)0.1750.0760.096*
Qwen2 (0.5B)0.2380.0410.040
Qwen2.5 (0.5B)0.2120.0270.037
Qwen2.5 (1.5B)0.1410.086*0.078
Qwen1.5 (1.8B)0.1720.134*0.154*
Qwen2 (7B)0.0360.087*0.040
Qwen1.5 (14B)0.1540.086*0.108*
Qwen1.5 (32B)-0.0320.100*0.081
Qwen2.5 (32B)0.0350.105*0.084
Mistral v0.3 (7B)0.0760.152*-0.009
Llama 3.2 (1B)0.301*0.0560.039
Llama 3 (8B)-0.0020.099*0.080
Llama 3.1 (8B)0.0040.108*0.081
Llama 3 (70B)0.1480.161*0.155*
Llama 3.1 (70B)0.1220.161*0.155*
Phi 1 (1.42B)0.0710.0520.054
Phi 1.5 (1.42B)-0.0880.0790.018
Phi 2 (2.78B)-0.0560.0440.024
Phi 4 (14.7B)0.0790.086*0.097*
T5 Large (770M)0.2350.259*0.178*
GPT-2 Medium (355M)-0.0320.063-0.017
ViT-B/32 Text (63.1M)0.527*0.315*0.286*
ViT-B/16 Text (63.1M)0.528*0.289*0.278*
Word2Vec (300D)0.442*0.349*0.437*
Glove (300D)0.315*0.333*0.350*
ModelDataset Correlation (Spearman ρ ) - Peak AMI LayerDataset Correlation (Spearman ρ ) - Peak AMI LayerDataset Correlation (Spearman ρ ) - Peak AMI Layer
ModelRosch (1973)Rosch (1975)McCloskey (1978)
Deberta large (304M)0.2770.107*0.126*
Bert large (340M)-0.1200.148*0.026
Roberta large (355M)-0.0110.038-0.022
Gemma (2B)0.1270.092*0.039
Gemma 2 (2B)0.0340.139*0.103*
Gemma (7B)0.004-0.0500.088
Gemma 2 (9B)0.0470.110*0.098*
Gemma 2 (27B)-0.1350.090*-0.101*
Qwen1.5 (0.5B)-0.025-0.0640.122*
Qwen2 (0.5B)0.0900.064-0.077
Qwen2.5 (0.5B)-0.0120.121*0.072
Qwen2.5 (1.5B)-0.1140.084*-0.028
Qwen1.5 (1.8B)0.092-0.044-0.004
Qwen2 (7B)0.0040.049-0.087
Qwen1.5 (14B)0.0320.091*0.043
Qwen1.5 (32B)0.0260.0820.111*
Qwen2.5 (32B)0.0450.079-0.060
Mistral v0.3 (7B)0.0130.0390.107*
Llama 3.2 (1B)0.0630.094*0.070
Llama 3 (8B)-0.0450.138*0.084
Llama 3.1 (8B)-0.0960.130*0.087
Llama 3 (70B)-0.1080.0230.050
Phi 1 (1.42B)0.2160.185*-0.029
Phi 1.5 (1.42B)0.1620.098*0.015
Phi 2 (2.78B)0.325*0.142*-0.033
Phi 4 (14.7B)0.0790.129*0.007
T5 Large (770M)0.2190.226*0.282*
GPT-2 Small (117M)-0.0460.118*0.010
GPT-2 Medium (355M)0.0770.109*-0.029
ViT-B/32 Text (63.1M)0.1280.0550.152*
ViT-B/16 Text (63.1M)0.0890.086*0.127*
Word2Vec (300D)0.442*0.349*0.437*
Glove (300D)0.315*0.333*0.350*
ModelSizeL ScoreMMLUScore
Qwen2-0.5B-Instruct494M1.930.433
Qwen2.5-0.5B-Instruct494M1.9820.469
Llama-3.2-1B-Instruct1.2B1.8760.454
Qwen2.5-1.5B-Instruct1.5B2.0710.597
Gemma-2B-IT2.5B2.3820.366
Gemma-2-2B-IT2.6B2.2630.565
Phi-4-mini-instruct3.8B1.9050.678
Mistral-7B-Instruct-v0.37.2B1.7140.603
Qwen2-7B-Instruct7.6B2.3190.7
Meta-Llama-3-8B-Instruct8.0B1.3480.647
Llama-3.1-8B-Instruct8.0B1.320.679
Gemma-7B-IT8.5B2.3720.512
Gemma-2-9B-IT9.2B2.4670.723

Figure

Figure

Figure

Figure

Figure

Figure

$$ R + \lambda D = I(X;\hat{X}) + \lambda \mathbb{E}[d(X,\hat{X})] $$

$$ \min I(X;Z) - \beta I(Z;Y) $$

$$ \mathcal{L}(X, C; \beta) = \underbrace{I(X; C)}{\text{Complexity: bits needed}} + \beta \cdot \underbrace{\frac{1}{|X|}\sum{c \in C}\sum_{e_i \in c}|e_i - \bar{e}c|^2}{\text{Distortion: semantic spread}} \label{eq:l_objective} $$ \tag{eq:l_objective}

References

[Bengio+chapter2007] Bengio, Yoshua, LeCun, Yann. (2007). Scaling Learning Algorithms Towards {AI. Large Scale Kernel Machines.

[Hinton06] Hinton, Geoffrey E., Osindero, Simon, Teh, Yee Whye. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation.

[goodfellow2016deep] Goodfellow, Ian, Bengio, Yoshua, Courville, Aaron, Bengio, Yoshua. (2016). Deep learning.

[Aho:72] Alfred V. Aho, Jeffrey D. Ullman. (1972). The Theory of Parsing, Translation and Compiling.

[APA:83] {American Psychological Association. (1983). Publications Manual.

[Chandra:81] Ashok K. Chandra, Dexter C. Kozen, Larry J. Stockmeyer. (1981). Alternation. Journal of the Association for Computing Machinery. doi:10.1145/322234.322243.

[andrew2007scalable] Andrew, Galen, Gao, Jianfeng. (2007). Scalable training of {L1. Proceedings of the 24th International Conference on Machine Learning.

[Gusfield:97] Dan Gusfield. (1997). Algorithms on Strings, Trees and Sequences.

[rasooli-tetrault-2015] Mohammad Sadegh Rasooli, Joel R. Tetreault. (2015). Yara Parser: {A. Computing Research Repository.

[Ando2005] Ando, Rie Kubota, Zhang, Tong. (2005). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research.

[murphy2004big] Murphy, Gregory. (2004). The big book of concepts.

[rosch1973natural] Rosch, Eleanor H. (1973). Natural categories. Cognitive psychology.

[rosch1975cognitive] Rosch, Eleanor. (1975). Cognitive representations of semantic categories.. Journal of experimental psychology: General.

[jvinh10a] Nguyen Xuan Vinh, Julien Epps, James Bailey. (2010). Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. Journal of Machine Learning Research.

[rosch1978principles] Rosch, Eleanor. (1978). Principles of categorization. Cognition and categorization/Erlbaum.

[rosch1976basic] Rosch, Eleanor, Mervis, Carolyn B, Gray, Wayne D, Johnson, David M, Boyes-Braem, Penny. (1976). Basic objects in natural categories. Cognitive psychology.

[mervis1981categorization] Mervis, Carolyn B, Rosch, Eleanor, others. (1981). Categorization of natural objects. Annual review of psychology.

[mccloskey1978natural] McCloskey, Michael E, Glucksberg, Sam. (1978). Natural categories: Well defined or fuzzy sets?. Memory & Cognition.

[medin1989concepts] Medin, Douglas L. (1989). Concepts and conceptual structure.. American psychologist.

[rosch1973internal] Rosch, E. (1973). On the internal structure of perceptual and semantic categories. Cognitive development and the acquisition of language/New York: Academic Press.

[giraldo2014measures] Giraldo, Luis Gonzalo Sanchez, Rao, Murali, Principe, Jose C. (2014). Measures of entropy from data using infinitely divisible kernels. IEEE Transactions on Information Theory.

[wei2025generalized] Wei, Lan, Wang, Dong, Wang, Yu. (2025). Generalized relative entropy: New look at R{'e. Mechanical Systems and Signal Processing.

[rosch1973prototype] Rosch, Eleanor. (1973). Prototype theory. Cognitive development and the acquisition of language.

[shannon1948mathematical] Shannon, Claude Elwood. (1948). A mathematical theory of communication. The Bell system technical journal.

[kemp2012kinship] Kemp, Charles, Regier, Terry. (2012). Kinship categories across languages reflect general communicative principles. Science.

[croft2001radical] Croft, William. (2001). Radical construction grammar: Syntactic theory in typological perspective.

[tishby2000information] Tishby, Naftali, Pereira, Fernando C, Bialek, William. (2000). The information bottleneck method. arXiv preprint physics/0004057.

[shani2023towards] Shani, Chen, Vreeken, Jilles, Shahaf, Dafna. (2023). Towards Concept-Aware Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023.

[sajjad2022analyzing] Sajjad, Hassan, Durrani, Nadir, Dalvi, Fahim, Alam, Firoj, Khan, Abdul, Xu, Jia. (2022). Analyzing Encoded Concepts in Transformer Language Models. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[tversky1977features] Tversky, Amos. (1977). Features of similarity.. Psychological review.

[singh2024rethinking] Singh, Chandan, Inala, Jeevana Priya, Galley, Michel, Caruana, Rich, Gao, Jianfeng. (2024). Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761.

[li2024geometry] Li, Yuxiao, Michaud, Eric J, Baek, David D, Engels, Joshua, Sun, Xiaoqing, Tegmark, Max. (2024). The geometry of concepts: Sparse autoencoder feature structure. arXiv preprint arXiv:2410.19750.

[touvron2023llama] Touvron, Hugo, Lavril, Thibaut, Izacard, Gautier, Martinet, Xavier, Lachaux, Marie-Anne, Lacroix, Timoth{'e. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

[team2024gemma] Team, Gemma, Mesnard, Thomas, Hardin, Cassidy, Dadashi, Robert, Bhupatiraju, Surya, Pathak, Shreya, Sifre, Laurent, Rivi{`e. (2024). Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.

[devlin2018bert] Devlin, Jacob. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[yang2024qwen2] Yang, An, Yang, Baosong, Zhang, Beichen, Hui, Binyuan, Zheng, Bo, Yu, Bowen, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Wei, Haoran, others. (2024). Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115.

[jiang2023mistral] Jiang, Albert Q, Sablayrolles, Alexandre, Mensch, Arthur, Bamford, Chris, Chaplot, Devendra Singh, Casas, Diego de las, Bressand, Florian, Lengyel, Gianna, Lample, Guillaume, Saulnier, Lucile, others. (2023). Mistral 7B. arXiv preprint arXiv:2310.06825.

[shahapure2020cluster] Shahapure, Ketan Rajshekhar, Nicholas, Charles. (2020). Cluster quality analysis using silhouette score. 2020 IEEE 7th international conference on data science and advanced analytics (DSAA).

[grattafiori2024llama] Grattafiori, Aaron, Dubey, Abhimanyu, Jauhri, Abhinav, Pandey, Abhinav, Kadian, Abhishek, Al-Dahle, Ahmad, Letman, Aiesha, Mathur, Akhil, Schelten, Alan, Vaughan, Alex, others. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.

[touvron2023llama2] Touvron, Hugo, Martin, Louis, Stone, Kevin, Albert, Peter, Almahairi, Amjad, Babaei, Yasmine, Bashlykov, Nikolay, Batra, Soumya, Bhargava, Prajjwal, Bhosale, Shruti, others. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

[team2025gemma] Team, Gemma, Kamath, Aishwarya, Ferret, Johan, Pathak, Shreya, Vieillard, Nino, Merhej, Ramona, Perrin, Sarah, Matejovicova, Tatiana, Ram{'e. (2025). Gemma 3 technical report. arXiv preprint arXiv:2503.19786.

[karamcheti2021mistral] Karamcheti, Siddharth, Orr, Laurel, Bolton, Jason, Zhang, Tianyi, Goel, Karan, Narayan, Avanika, Bommasani, Rishi, Narayanan, Deepak, Hashimoto, Tatsunori, Jurafsky, Dan, others. (2021). Mistral--a journey towards reproducible language model training.

[bai2023qwen] Bai, Jinze, Bai, Shuai, Chu, Yunfei, Cui, Zeyu, Dang, Kai, Deng, Xiaodong, Fan, Yang, Ge, Wenbin, Han, Yu, Huang, Fei, others. (2023). Qwen technical report. arXiv preprint arXiv:2309.16609.

[abdin2024phi] Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, others. (2024). Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219.

[abouelenin2025phi] Abouelenin, Abdelrahman, Ashfaq, Atabak, Atkinson, Adam, Awadalla, Hany, Bach, Nguyen, Bao, Jianmin, Benhaim, Alon, Cai, Martin, Chaudhary, Vishrav, Chen, Congcong, others. (2025). Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras. arXiv preprint arXiv:2503.01743.

[javaheripi2023phi] Javaheripi, Mojan, Bubeck, S{'e. (2023). Phi-2: The surprising power of small language models. Microsoft Research Blog.

[radford2021learning] Radford, Alec, Kim, Jong Wook, Hallacy, Chris, Ramesh, Aditya, Goh, Gabriel, Agarwal, Sandhini, Sastry, Girish, Askell, Amanda, Mishkin, Pamela, Clark, Jack, others. (2021). Learning transferable visual models from natural language supervision. International conference on machine learning.

[radford2018improving] Radford, Alec, Narasimhan, Karthik, Salimans, Tim, Sutskever, Ilya, others. (2018). Improving language understanding by generative pre-training.

[radford2019language] Radford, Alec, Wu, Jeffrey, Child, Rewon, Luan, David, Amodei, Dario, Sutskever, Ilya, others. (2019). Language models are unsupervised multitask learners. OpenAI blog.

[brown2020language] Brown, Tom, Mann, Benjamin, Ryder, Nick, Subbiah, Melanie, Kaplan, Jared D, Dhariwal, Prafulla, Neelakantan, Arvind, Shyam, Pranav, Sastry, Girish, Askell, Amanda, others. (2020). Language models are few-shot learners. Advances in neural information processing systems.

[devlin-etal-2019-bert] Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, Toutanova, Kristina. (2019). {BERT. Proceedings of the 2019 Conference of the North {A. doi:10.18653/v1/N19-1423.

[he2020deberta] He, Pengcheng, Liu, Xiaodong, Gao, Jianfeng, Chen, Weizhu. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.

[zhuang-etal-2021-robustly] Zhuang, Liu, Wayne, Lin, Ya, Shi, Jun, Zhao. (2021). A Robustly Optimized {BERT. Proceedings of the 20th Chinese National Conference on Computational Linguistics.

[JMLR:v21:20-074] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research.

[rosch1976structural] Rosch, Eleanor, Simpson, Carol, Miller, R Scott. (1976). Structural bases of typicality effects.. Journal of Experimental Psychology: Human perception and performance.

[hoang2024llm] Hoang-Xuan, Nhat, Vu, Minh, Thai, My T. (2024). LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions. arXiv preprint arXiv:2406.08572.

[maeda2024decomposing] Maeda, Akihiro, Torii, Takuma, Hidaka, Shohei. (2024). Decomposing Co-occurrence Matrices into Interpretable Components as Formal Concepts. Findings of the Association for Computational Linguistics ACL 2024.

[barrault2024large] Barrault, Lo{. (2024). Large Concept Models: Language Modeling in a Sentence Representation Space. arXiv preprint arXiv:2412.08821.

[cunningham2023sparse] Cunningham, Hoagy, Ewart, Aidan, Riggs, Logan, Huben, Robert, Sharkey, Lee. (2023). Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600.

[park2024geometry] Park, Kiho, Choe, Yo Joong, Jiang, Yibo, Veitch, Victor. (2024). The geometry of categorical and hierarchical concepts in large language models. arXiv preprint arXiv:2406.01506.

[imel2024optimal] Imel, Nathaniel, Zaslavsky, Noga. (2024). Optimal compression in human concept learning. Proceedings of the Annual Meeting of the Cognitive Science Society.

[tucker2025towards] Tucker, Mycal, Shah, Julie, Levy, Roger, Zaslavsky, Noga. (2025). Towards Human-Like Emergent Communication via Utility, Informativeness, and Complexity. Open Mind.

[wolff2019information] Wolff, J Gerard. (2019). Information compression as a unifying principle in human learning, perception, and cognition. Complexity.

[littlestone1986relating] Littlestone, Nick, Warmuth, Manfred. (1986). Relating data compression and learnability.

[wissler1905spearman] Wissler, Clark. (1905). The Spearman correlation formula. Science.

[sorscher2022neural] Sorscher, Ben, Ganguli, Surya, Sompolinsky, Haim. (2022). Neural representational geometry underlies few-shot concept learning. Proceedings of the National Academy of Sciences.

[misra2021language] Misra, Kanishka, Ettinger, Allyson, Rayz, Julia Taylor. (2021). Do language models learn typicality judgments from text?. arXiv preprint arXiv:2105.02987.

[pennington-etal-2014-glove] Pennington, Jeffrey, Socher, Richard, Manning, Christopher. (2014). {G. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP. doi:10.3115/v1/D14-1162.

[mikolov-etal-2013-efficient] Mikolov, Tomas, Chen, Kai, Corrado, Greg, Dean, Jeffrey. (2013). Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at International Conference on Learning Representations (ICLR).

[mikolov-etal-2013-distributed] Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S., Dean, Jeffrey. (2013). Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems.

[olmo2024] Groeneveld, Dirk, Merrill, William, Soldaini, Luca, Lin, Bill Yuchen, Lee, Madeleine, Vashishtha, Siddharth, Wadden, David, Schoelkopf, Hailey, Tafjord, Oyvind, West, Peter, Lambert, Nathan, Ernst, Jonathan, Kahn, Jeffry, Khanna, Raj, Ma, Kaixin, Manakul, Potsawee, Nye, Maxwell, Richardson, Kyle, Schwenk, Dustin, Shen, Zewei, Shen, Zhengxuan, Song, Xinyi, Wadden, Mark, Zettlemoyer, Luke, Peters, Matthew E., Smith, Noah A.. (2024). OLMo: Accelerating the Science of Language Models.

[raffel2020exploring] Raffel, Colin, Shazeer, Noam, Roberts, Adam, Lee, Katherine, Narang, Sharan, Matena, Michael, Zhou, Yanqi, Li, Wei, Liu, Peter J.. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research.

[zaslavsky2018efficient] Zaslavsky, Noga, Kemp, Charles, Regier, Terry, Tishby, Naftali. (2018). Efficient compression in color naming and its evolution. Proceedings of the National Academy of Sciences.

[zaslavsky2019semantic] Zaslavsky, Noga, Regier, Terry, Tishby, Naftali, Kemp, Charles. (2019). Semantic categories of artifacts and animals reflect efficient coding. arXiv preprint arXiv:1905.04562.

[tucker2022trading] Tucker, Mycal, Levy, Roger, Shah, Julie A, Zaslavsky, Noga. (2022). Trading off utility, informativeness, and complexity in emergent communication. Advances in neural information processing systems.

[wu2025building] Wu, S, Thalmann, M, Dayan, P, Akata, Z, Schulz, E. (2025). Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences. Thirteenth International Conference on Learning Representations (ICLR 2025).

[guo2025deepseek] DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

[bib1] Abdin et al. [2024] Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.

[bib2] Abouelenin et al. [2025] Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras. arXiv preprint arXiv:2503.01743, 2025.

[bib3] Bai et al. [2023] Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.

[bib4] Barrault et al. [2024] Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R Costa-jussà, David Dale, et al. Large concept models: Language modeling in a sentence representation space. arXiv preprint arXiv:2412.08821, 2024.

[bib5] William Croft. Radical construction grammar: Syntactic theory in typological perspective. Oxford University Press, USA, 2001.

[bib6] Devlin et al. [2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423/.

[bib7] Giraldo et al. [2014] Luis Gonzalo Sanchez Giraldo, Murali Rao, and Jose C Principe. Measures of entropy from data using infinitely divisible kernels. IEEE Transactions on Information Theory, 61(1):535–548, 2014.

[bib8] Grattafiori et al. [2024] Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.

[bib9] He et al. [2020] Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.

[bib10] Hoang-Xuan et al. [2024] Nhat Hoang-Xuan, Minh Vu, and My T Thai. Llm-assisted concept discovery: Automatically identifying and explaining neuron functions. arXiv preprint arXiv:2406.08572, 2024.

[bib11] Nathaniel Imel and Noga Zaslavsky. Optimal compression in human concept learning. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 46, 2024.

[bib12] Javaheripi et al. [2023] Mojan Javaheripi, Sébastien Bubeck, Marah Abdin, Jyoti Aneja, Sebastien Bubeck, Caio César Teodoro Mendes, Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, et al. Phi-2: The surprising power of small language models. Microsoft Research Blog, 1(3):3, 2023.

[bib13] Karamcheti et al. [2021] Siddharth Karamcheti, Laurel Orr, Jason Bolton, Tianyi Zhang, Karan Goel, Avanika Narayan, Rishi Bommasani, Deepak Narayanan, Tatsunori Hashimoto, Dan Jurafsky, et al. Mistral–a journey towards reproducible language model training, 2021.

[bib14] Li et al. [2024] Yuxiao Li, Eric J Michaud, David D Baek, Joshua Engels, Xiaoqing Sun, and Max Tegmark. The geometry of concepts: Sparse autoencoder feature structure. arXiv preprint arXiv:2410.19750, 2024.

[bib15] Maeda et al. [2024] Akihiro Maeda, Takuma Torii, and Shohei Hidaka. Decomposing co-occurrence matrices into interpretable components as formal concepts. In Findings of the Association for Computational Linguistics ACL 2024, pages 4683–4700, 2024.

[bib16] Michael E McCloskey and Sam Glucksberg. Natural categories: Well defined or fuzzy sets? Memory & Cognition, 6(4):462–472, 1978.

[bib17] Gregory Murphy. The big book of concepts. MIT press, 2004.

[bib18] Park et al. [2024] Kiho Park, Yo Joong Choe, Yibo Jiang, and Victor Veitch. The geometry of categorical and hierarchical concepts in large language models. arXiv preprint arXiv:2406.01506, 2024.

[bib19] E Rosch. On the internal structure of perceptual and semantic categories. Cognitive development and the acquisition of language/New York: Academic Press, 1973a.

[bib20] Eleanor Rosch. Prototype theory. Cognitive development and the acquisition of language, pages 111–144, 1973b.

[bib21] Eleanor Rosch. Cognitive representations of semantic categories. Journal of experimental psychology: General, 104(3):192, 1975.

[bib22] Rosch et al. [1976] Eleanor Rosch, Carol Simpson, and R Scott Miller. Structural bases of typicality effects. Journal of Experimental Psychology: Human perception and performance, 2(4):491, 1976.

[bib23] Eleanor H Rosch. Natural categories. Cognitive psychology, 4(3):328–350, 1973c.

[bib24] Shani et al. [2023] Chen Shani, Jilles Vreeken, and Dafna Shahaf. Towards concept-aware large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13158–13170, 2023.

[bib25] Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.

[bib26] Singh et al. [2024] Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, and Jianfeng Gao. Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761, 2024.

[bib27] Sorscher et al. [2022] Ben Sorscher, Surya Ganguli, and Haim Sompolinsky. Neural representational geometry underlies few-shot concept learning. Proceedings of the National Academy of Sciences, 119(43):e2200800119, 2022.

[bib28] Team et al. [2024] Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024.

[bib29] Team et al. [2025] Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786, 2025.

[bib30] Tishby et al. [2000] Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000.

[bib31] Touvron et al. [2023a] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.

[bib32] Touvron et al. [2023b] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.

[bib33] Tucker et al. [2025] Mycal Tucker, Julie Shah, Roger Levy, and Noga Zaslavsky. Towards human-like emergent communication via utility, informativeness, and complexity. Open Mind, 9:418–451, 2025.

[bib34] Amos Tversky. Features of similarity. Psychological review, 84(4):327, 1977.

[bib35] Wei et al. [2025] Lan Wei, Dong Wang, and Yu Wang. Generalized relative entropy: New look at rényi entropy and its exploration from complexity measures to sparsity measures with applications in machine condition monitoring. Mechanical Systems and Signal Processing, 223:111917, 2025.

[bib36] Clark Wissler. The spearman correlation formula. Science, 22(558):309–311, 1905.

[bib37] J Gerard Wolff. Information compression as a unifying principle in human learning, perception, and cognition. Complexity, 2019(1):1879746, 2019.

[bib38] Yang et al. [2024] An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024.

[bib39] Zhuang et al. [2021] Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. A robustly optimized BERT pre-training approach with post-training. In Sheng Li, Maosong Sun, Yang Liu, Hua Wu, Kang Liu, Wanxiang Che, Shizhu He, and Gaoqi Rao, editors, Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1218–1227, Huhhot, China, August 2021. Chinese Information Processing Society of China. URL https://aclanthology.org/2021.ccl-1.108/.