Molbap HF Staff commited on
Commit
f47d4d7
·
1 Parent(s): fc6c5e6
app/public/images/transformers/classic_encoders.png ADDED

Git LFS Details

  • SHA256: fd9a7c4300b8fcfdc8fe0aebe6f84f0131efcab8c8928783388e6c54148c4a68
  • Pointer size: 131 Bytes
  • Size of remote file: 532 kB
app/src/content/article.mdx CHANGED
@@ -68,7 +68,7 @@ These principles were not decided in a vacuum. The library _evolved_ towards the
68
  <li class="tenet">
69
  <a id="source-of-truth"></a>
70
  <strong>Source of Truth</strong>
71
- <p>We aim to be a [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is more of a goal than a tenet, but it strongly guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original implementations. If we are successful, they should become reference baselines for the ecosystem, so they'll be easily adopted by downstream libraries and projects. It's much easier for a project to _always_ refer to the transformers implementation, than to learn a different research codebase every time a new architecture is released.</p>
72
  <em>This overarching guideline ensures quality and reproducibility across all models in the library, and aspires to make the community work easier.</em>
73
  </li>
74
 
@@ -81,20 +81,20 @@ These principles were not decided in a vacuum. The library _evolved_ towards the
81
  <li class="tenet">
82
  <a id="code-is-product"></a>
83
  <strong>Code is Product</strong>
84
- <p>Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
85
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
86
  </li>
87
  <li class="tenet">
88
  <a id="standardize-dont-abstract"></a>
89
  <strong>Standardize, Don't Abstract</strong>
90
- <p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
91
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
92
  </li>
93
  <li class="tenet">
94
  <a id="do-repeat-yourself"></a>
95
  <strong>DRY* (DO Repeat Yourself)</strong>
96
  <p>Copy when it helps users; keep successors in sync without centralizing behavior.</p>
97
- <p><strong>Amendment:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.</p>
98
  <em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
99
  </li>
100
  <li class="tenet">
@@ -160,7 +160,7 @@ Transformers is an opinionated library. The previous [philosophy](https://huggin
160
 
161
  We amended the principle of [DRY*](#do-repeat-yourself) by progressively removing all pieces of code that were "copied from" another file.
162
 
163
- It works as follows. In order to contribute a model, let us take GLM for instance, we define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_ already existing in the libary.
164
  The modular file can use inheritance across models: and then, it will be unravelled into a fully functional modeling file.
165
 
166
  <summary id="generated-modeling">Auto-generated modeling code</summary>
@@ -216,7 +216,7 @@ The _attention computation_ itself happens at a _lower_ level of abstraction tha
216
  However, we were adding specific torch operations for each backend (sdpa, the several flash-attention iterations, flex attention) but it wasn't a [minimal user api](#minimal-user-api). Next section explains what we did.
217
 
218
  <div class="crumbs">
219
- Evidence: effective (i.e., maintenable) LOC growth drops ~15× when counting shards instead of expanded modeling files. Less code to read, fewer places to break.
220
 
221
  <strong>Next:</strong> how the attention interface stays standard without hiding semantics.
222
  </div>
@@ -236,8 +236,8 @@ attention_interface: Callable = eager_attention_forward
236
  if self.config._attn_implementation != "eager":
237
  attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
238
  ```
239
-
240
- A strength of the new attention interface is the possibility to enforce specific kwargs, which are needed by kernel providers and other dependencies.
241
 
242
  Backend integrations sometimes require specific kwargs.
243
 
@@ -365,23 +365,20 @@ So what do we see?
365
  Check out the [full viewer here](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) (tab "dependency graph", hit "build graph") for better manipulation and exploration.
366
  <HtmlEmbed src="transformers/dependency-graph.html" />
367
 
368
- Le'ts walk through some sections of this graph together.
369
-
370
- Llama is a basis and an influence for many models, and it shows.
371
 
372
  ![Llama in the center](/images/transformers/llama_center.png)
373
 
374
- Radically different architectures such as mamba have spawned their own dependency subgraph.
375
 
376
- Audio models form sparser archipelagos, see for instance wav2vec2 which is a significant basis.
377
 
378
  ![Wav2vec2 influence](/images/transformers/cluster_wave2vec2.png)
379
 
380
- In the case of VLMs, there's far too many vision-based architectures that are not yet defined as modulars of other existing archs. In other words, there is no strong reference point in terms of software for vision models.
381
- )
382
 
383
-
384
- As you can see, there is a small DETR island:
385
  ![DETR archipelago](/images/transformers/detr_island.png)
386
 
387
  There is also a little llava pocket, and so on, but it's not comparable to the centrality observed for llama.
@@ -402,7 +399,7 @@ Llama-lineage is a hub; several VLMs remain islands — engineering opportunity
402
 
403
  I looked into Jaccard similarity, which we use to measure set differences, to find similarities across models. I know that code is more than a set of characters stringed together. We also tried code-embedding models that ranked candidates better in practice, but for this post we stick to the deterministic Jaccard index.
404
 
405
- It is interesting, for our comparison, to look at _when_ we deployed the modular logic and what was its rippling effect on the library. You can check the [larger space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) to play around, but the gist is: adding modular allowed to connect more and more models to solid reference points.
406
 
407
  Yet, we still have a lot of gaps to fill.
408
 
@@ -412,13 +409,21 @@ Zoom out below - it's full of models. You can click on a node to see its connect
412
 
413
  Let's look at a few highly connected models. Let's start by the foundational work of [Llava](https://arxiv.org/abs/2304.08485).
414
 
415
- ![DETR archipelago](/images/transformers/timeline_llava.png)
416
 
417
 
418
  You see that `llava_video` is a red node, connected by a red edge to `llava`: it's a candidate, something that we can _likely_ remodularize, [not touching the actual model](#backwards-compatibility) but being much more readable with [DRY*](#do-repeat-yourself).
419
 
 
 
 
 
 
 
420
  <div class="crumbs">
421
- Similarity metrics (Jaccard index or embeddings) surfaces likely parents; the timeline shows consolidation after modular landed. Red nodes/edges = candidates (e.g., <code>llava_video</code> → <code>llava</code>) for refactors that preserve behavior. <strong>Next:</strong> concrete VLM choices that avoid leaky abstractions.
 
 
422
  </div>
423
 
424
  ### VLM improvements, avoiding abstraction
@@ -489,6 +494,8 @@ The following [Pull request to standardize placeholder masking](https://github.c
489
 
490
  But this is _within_ the modeling file, not in the `PreTrainedModel` base class. It will not move away from it, because it'd break the [self-contained logic](#one-model-one-file) of the model.
491
 
 
 
492
  <div class="crumbs">
493
  Keep VLM embedding mix in the modeling file (semantics), standardize safe helpers (e.g., placeholder masking), don't migrate behavior to <code>PreTrainedModel</code>.
494
  <strong>Next:</strong> pipeline-level wins that came from PyTorch-first choices (fast processors).
@@ -497,9 +504,9 @@ Keep VLM embedding mix in the modeling file (semantics), standardize safe helper
497
 
498
  ### On image processing and processors
499
 
500
- Deciding to become a `torch`-first library meant relieving a tremendous amount of support for `jax ` and `TensorFlow`, and it also meant that we could be more lenient into the amount of torch-dependent utilities that we were able to accept. One of these is the _fast processing_ of images. Where inputs were once minimally assumed to be ndarrays, enforcing native `torch` and `torchvision` inputs allowed us to massively improve processing speed for each model.
501
 
502
- The gains in performance are immense, up to 20x speedup for most models when using compiled torchvision ops. Furthermore, it allows to run the whole pipeline solely on GPU.
503
 
504
  ![Fast Image Processors Performance](/images/transformers/fast_image_processors.png)
505
  <p class="figure-legend">Thanks <a href="https://huggingface.co/yonigozlan">Yoni Gozlan</a> for the great work!</p>
@@ -519,19 +526,21 @@ Having a framework means forcing users into it. It restrains flexibility and cre
519
 
520
  Among the most valuable contributions to `transformers` is of course the addition of new models. Very recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
521
 
522
- A second one is the ability to fine-tune and pipeline these models into many other software. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
 
 
523
 
524
 
525
  <div class="crumbs">
526
  The shape of a contribution: add a model (or variant) with a small modular shard; the community and serving stacks pick it up immediately. Popularity trends (encoders/embeddings) guide where we invest.
 
527
  <strong>Next:</strong> power tools enabled by a consistent API.
528
  </div>
529
 
530
 
531
  ### <a id="encoders-ftw"></a> Models popularity
532
 
533
- Talking about dependencies, we can take a look at the number of downloads as a measure of popularity. One thing we see is the prominence of encoders, despite the apparent prevalence of decoder LLMs. The reason is that encoders are used to generate embeddings, which have multiple downstream uses. Just check out [EmbeddingGemma](https://huggingface.co/blog/embeddinggemma) for a modern recap. Hence, it is vital to keep the encoders portion of the library viable, usable, fine-tune-able.
534
-
535
 
536
  <div>
537
  <HtmlEmbed src="transformers/model-visualisation.html" />
@@ -552,6 +561,8 @@ Encoders remain critical for embeddings and retrieval; maintaining them well ben
552
 
553
  ## A surgical toolbox for model development
554
 
 
 
555
  ### Attention visualisation
556
 
557
  All models have the same API for attention computation, thanks to [the externalisation of attention classes](#external-attention-classes).
@@ -579,7 +590,9 @@ It just works with PyTorch models and is especially useful when aligning outputs
579
 
580
 
581
  <div class="crumbs">
582
- Forward interception and nested JSON logging align ports to reference implementations, reinforcing "Source of Truth." <strong>Next:</strong> CUDA warmup reduces load-time without touching modeling semantics.
 
 
583
  </div>
584
 
585
 
@@ -613,7 +626,7 @@ curl -X POST http://localhost:8000/v1/chat/completions \
613
  ```
614
 
615
 
616
- `transformers-serve` uses continuous batching (see [this PR](https://github.com/huggingface/transformers/pull/38085) and also [this one](https://github.com/huggingface/transformers/pull/40426)) for better GPU utilization, and is very much linked to the great work of vLLM with the `paged attention kernel` – a futher justification of [external kernels](#community-kernels).
617
 
618
  `transformers-serve` is not meant for user-facing production services, tools like vLLM or SGLang are super optimized for that, but it's useful for several use cases:
619
  - Quickly verify that your model is compatible with continuous batching and paged attention.
@@ -624,6 +637,7 @@ For model deployment, check [Inference Providers](https://huggingface.co/docs/in
624
 
625
  <div class="crumbs">
626
  OpenAI-compatible surface + continuous batching; kernels/backends slot in because the modeling API stayed stable.
 
627
  <strong>Next:</strong> reuse across vLLM/SGLang relies on the same consistency.
628
  </div>
629
 
@@ -635,13 +649,16 @@ The transformers-serve CLI built on transformers, for sure, but the library is m
635
  Adding a model to transformers means:
636
 
637
  - having it immediately available to the community
638
- - having it immediately usable in vLLM, [SGLang](https://huggingface.co/blog/transformers-backend-sglang), and so on without additional code. In April 2025, transformers was added as a backend to run models on vLLM, which optimizes throughput/latency on top of existing transformers architectures [as seen in this great vLLM x HF blog post.](https://blog.vllm.ai/2025/04/11/transformers-backend.html)
 
639
 
640
- This cements the need even more for a [consistent public surface](#consistent-public-surface): we are now a backend, and there's more optimized software than us to handle serving. At the time of writing, more effort is done in that direction. We already have compatible configs for VLMs for vLLM (say that three times fast), [here for GLM4 video support](https://github.com/huggingface/transformers/pull/40696/files), and here for [MoE support](https://github.com/huggingface/transformers/pull/40132) for instance.
 
641
 
642
 
643
  <div class="crumbs">
644
  Being a good backend consumer requires a consistent public surface; modular shards and configs make that stability practical.
 
645
  <strong>Next:</strong> what changes in v5 without breaking the promise of visible semantics.
646
  </div>
647
 
 
68
  <li class="tenet">
69
  <a id="source-of-truth"></a>
70
  <strong>Source of Truth</strong>
71
+ <p>We aim to be a [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is more of a goal than a tenet, but it strongly guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original implementations. If we are successful, they should become reference baselines for the ecosystem, so they'll be easily adopted by downstream libraries and projects. It's much easier for a project to always refer to the transformers implementation, than to learn a different research codebase every time a new architecture is released.</p>
72
  <em>This overarching guideline ensures quality and reproducibility across all models in the library, and aspires to make the community work easier.</em>
73
  </li>
74
 
 
81
  <li class="tenet">
82
  <a id="code-is-product"></a>
83
  <strong>Code is Product</strong>
84
+ <p>Optimize for reading, diffing, and tweaking, our users are power users. Variables should be explicit, full words, even several words, readability is primordial.</p>
85
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
86
  </li>
87
  <li class="tenet">
88
  <a id="standardize-dont-abstract"></a>
89
  <strong>Standardize, Don't Abstract</strong>
90
+ <p>If it's model behavior, keep it in the file; use abstractions only for generic infra.</p>
91
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
92
  </li>
93
  <li class="tenet">
94
  <a id="do-repeat-yourself"></a>
95
  <strong>DRY* (DO Repeat Yourself)</strong>
96
  <p>Copy when it helps users; keep successors in sync without centralizing behavior.</p>
97
+ <p><strong>Evolution:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.</p>
98
  <em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
99
  </li>
100
  <li class="tenet">
 
160
 
161
  We amended the principle of [DRY*](#do-repeat-yourself) by progressively removing all pieces of code that were "copied from" another file.
162
 
163
+ It works as follows. In order to contribute a model, `GLM` for instance, we define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_ already existing in the library.
164
  The modular file can use inheritance across models: and then, it will be unravelled into a fully functional modeling file.
165
 
166
  <summary id="generated-modeling">Auto-generated modeling code</summary>
 
216
  However, we were adding specific torch operations for each backend (sdpa, the several flash-attention iterations, flex attention) but it wasn't a [minimal user api](#minimal-user-api). Next section explains what we did.
217
 
218
  <div class="crumbs">
219
+ Evidence: effective (i.e., maintainable) LOC growth drops ~15× when counting shards instead of expanded modeling files. Less code to read, fewer places to break.
220
 
221
  <strong>Next:</strong> how the attention interface stays standard without hiding semantics.
222
  </div>
 
236
  if self.config._attn_implementation != "eager":
237
  attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
238
  ```
239
+ Having the attention interfaces functionalized allows to do dynamic switching of attentions as well, increasing their [hackability](#code-is-product).
240
+ Another strength of the new attention interface is the possibility to enforce specific kwargs, which are needed by kernel providers and other dependencies.
241
 
242
  Backend integrations sometimes require specific kwargs.
243
 
 
365
  Check out the [full viewer here](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) (tab "dependency graph", hit "build graph") for better manipulation and exploration.
366
  <HtmlEmbed src="transformers/dependency-graph.html" />
367
 
368
+ Let's walk through some sections of this graph together.
369
+ First, Llama is a basis and an influence for many models, and it is very visible.
 
370
 
371
  ![Llama in the center](/images/transformers/llama_center.png)
372
 
373
+ The models linked sometimes pull components from other models than `llama` of course. Radically different architectures such as mamba have spawned their own dependency subgraph.
374
 
375
+ Audio models form sparser archipelagos, see for instance wav2vec2 which is a significant basis for a dozen of them.
376
 
377
  ![Wav2vec2 influence](/images/transformers/cluster_wave2vec2.png)
378
 
379
+ In the case of VLMs which have massively grown in popularity since 2024, there's far too many vision-based architectures that are not yet defined as modulars of other existing archs. In other words, there is no strong reference point in terms of software for vision models.
 
380
 
381
+ As you can see, there is a small `DETR` island:
 
382
  ![DETR archipelago](/images/transformers/detr_island.png)
383
 
384
  There is also a little llava pocket, and so on, but it's not comparable to the centrality observed for llama.
 
399
 
400
  I looked into Jaccard similarity, which we use to measure set differences, to find similarities across models. I know that code is more than a set of characters stringed together. We also tried code-embedding models that ranked candidates better in practice, but for this post we stick to the deterministic Jaccard index.
401
 
402
+ It is interesting, for our comparison, to look at _when_ we deployed the modular logic and what was its rippling effect on the library. Looking at the timeline makes it obvious: adding modular allowed to connect more and more models to solid reference points.
403
 
404
  Yet, we still have a lot of gaps to fill.
405
 
 
409
 
410
  Let's look at a few highly connected models. Let's start by the foundational work of [Llava](https://arxiv.org/abs/2304.08485).
411
 
412
+ ![Llava in its timeline](/images/transformers/timeline_llava.png)
413
 
414
 
415
  You see that `llava_video` is a red node, connected by a red edge to `llava`: it's a candidate, something that we can _likely_ remodularize, [not touching the actual model](#backwards-compatibility) but being much more readable with [DRY*](#do-repeat-yourself).
416
 
417
+ The same can be identified with the classical encoders family, centered on `BERT`:
418
+
419
+ Here `roberta`, `xlm_roberta`, `ernie` are `modular`s of BERT, while models like `mobilebert` are likely candidates.
420
+ ![Classical encoders](/images/transformers/classic_encoders.png)
421
+
422
+
423
  <div class="crumbs">
424
+ Similarity metrics (Jaccard index or embeddings) surfaces likely parents; the timeline shows consolidation after modular landed. Red nodes/edges = candidates (e.g., <code>llava_video</code> → <code>llava</code>) for refactors that preserve behavior.
425
+
426
+ <strong>Next:</strong> concrete VLM choices that avoid leaky abstractions.
427
  </div>
428
 
429
  ### VLM improvements, avoiding abstraction
 
494
 
495
  But this is _within_ the modeling file, not in the `PreTrainedModel` base class. It will not move away from it, because it'd break the [self-contained logic](#one-model-one-file) of the model.
496
 
497
+ What do we conclude? Going forward, we should aim for VLMs to have a form of centrality similar to that of `Llama` for text-only models. This centrality should not be achieved at the cost of abstracting and hiding away crucial inner workings of said models.
498
+
499
  <div class="crumbs">
500
  Keep VLM embedding mix in the modeling file (semantics), standardize safe helpers (e.g., placeholder masking), don't migrate behavior to <code>PreTrainedModel</code>.
501
  <strong>Next:</strong> pipeline-level wins that came from PyTorch-first choices (fast processors).
 
504
 
505
  ### On image processing and processors
506
 
507
+ Deciding to become a `torch`-first library meant relieving a tremendous amount of support for `jax ` and `TensorFlow`, and it also meant that we could be more lenient about the amount of torch-dependent utilities that we were able to accept. One of these is the _fast processing_ of images. Where inputs were once minimally assumed to be ndarrays, enforcing native `torch` and `torchvision` inputs allowed us to massively improve processing speed for each model.
508
 
509
+ The gains in performance are immense, up to 20x speedup for most models when using compiled torchvision ops. Furthermore, lets us run the whole pipeline solely on GPU.
510
 
511
  ![Fast Image Processors Performance](/images/transformers/fast_image_processors.png)
512
  <p class="figure-legend">Thanks <a href="https://huggingface.co/yonigozlan">Yoni Gozlan</a> for the great work!</p>
 
526
 
527
  Among the most valuable contributions to `transformers` is of course the addition of new models. Very recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
528
 
529
+ These additions are immediately available for other models to use.
530
+
531
+ Another important advantage is the ability to fine-tune and pipeline these models into many other libraries and tools. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
532
 
533
 
534
  <div class="crumbs">
535
  The shape of a contribution: add a model (or variant) with a small modular shard; the community and serving stacks pick it up immediately. Popularity trends (encoders/embeddings) guide where we invest.
536
+
537
  <strong>Next:</strong> power tools enabled by a consistent API.
538
  </div>
539
 
540
 
541
  ### <a id="encoders-ftw"></a> Models popularity
542
 
543
+ Talking about dependencies, we can take a look at the number of downloads as a measure of popularity. One thing we see is the prominence of encoders, despite the apparent prevalence of decoder LLMs. The reason is that encoders are used to generate embeddings, which have multiple downstream uses. Just check out [EmbeddingGemma](https://huggingface.co/blog/embeddinggemma) for a modern recap. Hence, it is vital to keep the encoders portion of the library viable, usable, fine-tunable.
 
544
 
545
  <div>
546
  <HtmlEmbed src="transformers/model-visualisation.html" />
 
561
 
562
  ## A surgical toolbox for model development
563
 
564
+ Transformers provides many tools that can help you add a new architecture, understand the inner workings of a model, as well as the library itself.
565
+
566
  ### Attention visualisation
567
 
568
  All models have the same API for attention computation, thanks to [the externalisation of attention classes](#external-attention-classes).
 
590
 
591
 
592
  <div class="crumbs">
593
+ Forward interception and nested JSON logging align ports to reference implementations, reinforcing "Source of Truth."
594
+
595
+ <strong>Next:</strong> CUDA warmup reduces load-time without touching modeling semantics.
596
  </div>
597
 
598
 
 
626
  ```
627
 
628
 
629
+ `transformers-serve` uses continuous batching (see [this PR](https://github.com/huggingface/transformers/pull/38085) and also [this one](https://github.com/huggingface/transformers/pull/40426)) for better GPU utilization, and is very much linked to the great work of vLLM with the `paged attention kernel` – a further justification of [external kernels](#community-kernels).
630
 
631
  `transformers-serve` is not meant for user-facing production services, tools like vLLM or SGLang are super optimized for that, but it's useful for several use cases:
632
  - Quickly verify that your model is compatible with continuous batching and paged attention.
 
637
 
638
  <div class="crumbs">
639
  OpenAI-compatible surface + continuous batching; kernels/backends slot in because the modeling API stayed stable.
640
+
641
  <strong>Next:</strong> reuse across vLLM/SGLang relies on the same consistency.
642
  </div>
643
 
 
649
  Adding a model to transformers means:
650
 
651
  - having it immediately available to the community
652
+ - having it immediately usable in vLLM, [SGLang](https://huggingface.co/blog/transformers-backend-sglang), and so on without additional code. In the case of vLLM, transformers was added as a backend to run models on vLLM, which optimizes throughput/latency on top of _existing_ transformers architectures [as seen in this great vLLM x HF blog post.](https://blog.vllm.ai/2025/04/11/transformers-backend.html)
653
+ - being the reference code for implementations in MLX, llama.cpp and other libraries.
654
 
655
+
656
+ This further cements the need for a [consistent public surface](#consistent-public-surface): we are a backend and a reference, and there's more software than us to handle serving. At the time of writing, more effort is done in that direction. We already have compatible configs for VLMs for vLLM (say that three times fast), check [here for GLM4 video support](https://github.com/huggingface/transformers/pull/40696/files), and here for [MoE support](https://github.com/huggingface/transformers/pull/40132), for instance.
657
 
658
 
659
  <div class="crumbs">
660
  Being a good backend consumer requires a consistent public surface; modular shards and configs make that stability practical.
661
+
662
  <strong>Next:</strong> what changes in v5 without breaking the promise of visible semantics.
663
  </div>
664
 
app/src/styles/components/_tenet.css CHANGED
@@ -5,7 +5,7 @@
5
  }
6
 
7
  .tenet-list ol {
8
- counter-reset: tenet-counter -1;
9
  list-style: none;
10
  padding-left: 0;
11
  display: grid;
 
5
  }
6
 
7
  .tenet-list ol {
8
+ counter-reset: tenet-counter 0;
9
  list-style: none;
10
  padding-left: 0;
11
  display: grid;