Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

|LLM|INTERPRETABILITY|XAI|

Clear Waters: What an LLM Thinks Under the Surface

Anthropic’s Take at Decoding Abstract Features in Large Language Models

Salvatore Raieli
Level Up Coding
Published in
9 min readJun 4, 2024

--

Anthropic’s interpretability

All things are subject to interpretation whichever interpretation prevails at a given time is a function of power and not truth. — Friedrich Nietzsche

Anthropic in recent times has been pushing quite a bit on interpretability in Large Language Models. According to the company, neural networks learn features that are meaningful. The problem is being able to visualize.

The transformer itself is difficult to interpret. Not that it is better for neural networks, but the transformer has taken the black box concept to another level. So from the very beginning approaches were sought to be able to try to interpret the model. Early approaches focused on trying to visualize attention heads

Anthropic’s interpretability
image source: here

However, this approach is a little simplistic. The various attention heads specialize in learning complicated relationships that are not easy to identify (deeper layers learn relationships that are sophisticated and not immediate to understand). Moreover, there are now dozens of attention heads per layer. Not to mention that attention heads at a certain layer learn a representation that is dependent on previous layers.

Opening the black box doesn’t necessarily help: the internal state of the model — what the model is “thinking” before writing its response — consists of a long list of numbers (“neuron activations”) without a clear meaning. (source)

An LLM can understand a large number of concepts, and certainly, these concepts are in a way related to an internal representation. The problem is trying to discern which concepts…

--

--

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

Responses (5)

Write a response