The Importance of Prompt Tuning for Automated Neuron Explanations

UCSD

*indicates equal contribution.
NeurIPS ATTRIB 2023

#1 GPT 3.5 - Layer 36, Neuron 818

Highly activating inputs:

Original (1.4): words related to financial markets and economic conditions.
Summary (4.8): prepositions, specifically the preposition "to".
Highlight (4.0): phrases with the word "to" followed by a verb.
HS (4.8): prepositions, specifically the word "to".
AVHS (3.0): phrases related to causality and connection between events or actions.


#2 GPT 3.5 - Layer 28, Neuron 596

Highly activating inputs:

Original (2.6): mentions of dates or actions taking place in the news industry.
Summary (5.0): instances of the word "been" or words related to time (e.g., after, yet) in the text excerpts.
Highlight (4.6): words related to time and sequence, such as "after" and "been".
HS (4.4): words related to temporal order or sequence, particularly "after" and "been".
AVHS (2.0): punctuation marks and conjunctions that separate or connect phrases or clauses in sentences.


#3 GPT 3.5 - Layer 39, Neuron 445

Highly activating inputs:

Original (1.4): text related to coding or programming, including HTML and cryptocurrency.
Summary (4.4): pronouns and words related to referring to something previously mentioned.
Highlight (4): references to pronouns and possessive determiners.
HS (4.4): pronouns used in reference to objects or entities.
AVHS (3.6): phrases or sentences that contain personal pronouns, especially 'that's who we are,' 'we're Philadelphia, we get after people,' and 'that's what we do.'


#1 GPT 4 - Layer 2, Neuron 1818

Highly activating inputs:

Original (4.2): words and numbers that end with 'ck'.
Summary (4.6): elements in the text that are related to checks or checking, as well as number sequences with repeated figures.
Highlight (2.4): numeric series, coding/scripting language syntax, and U.S. historical and sport team references.
HS (4.0): parts of words that are common in language related to checks and balances, numerical sequences, and programming code.
AVHS (5.0): words related to checks, including checks as in "checks and balances" in a political context, check in programming codes, and the 'ck' pattern in other words. It also seemed to respond to some number sequences.


#2 GPT 4 - Layer 42, Neuron 674

Highly activating inputs:

Original (4.6): phrases implying successful achievement or performance.
Summary (2.6): conversational phrases or idioms commonly used in spoken English.
Highlight (3.6): verbs followed by adverbs or prepositions.
HS (4.6): phrases that indicate levels of performance or effectiveness.
AVHS (4.8): phrases that indicate an evaluation or judgement.


#3 GPT 4 - Layer 5, Neuron 541

Highly activating inputs:

Original (4.2): words ending in 'amy'.
Summary(4.2): words that end with the 'amy' sequence.
Highlight(2.6): adjectives related to texture.
HS(4.6): the partial word 'amy', regardless of the context.
AVHS(4.6): the token 'amy' often found at the end of words.


#1 Puzzles (GPT-4 Explainer) - Puzzle 4

Each prompting method generated 3 explanations for each puzzle and were compared to the ground truth (the "correct" explanation). They receieved a score out of 3 based on how many explanations were correct.

Ground Truth: the substitution of a key word in a common idiom that isn't the usual choice for that idiom
Original - Correct (3/3): synonyms or substitutions for words typically used in idioms or common phrases.
Summary - Correct (3/3): synonyms or alternative words often used to replace more common words or phrases in idiomatic expressions.
Highlight - Incorrect (0/3): idiomatic expressions and their variations.
HS - Correct (3/3): replacements for typical idiomatic expressions.
AVHS - Correct (3/3): literal translations of idiomatic expressions or phrases.


#2 Puzzles (GPT-4 Explainer) - Puzzle 6

Ground Truth: historically inaccurate numerical years in the passage and nothing else
Original - Incorrect (0/3): numerical representations of specific years in historical events.
Summary - Correct (3/3): incorrect historical dates.
Highlight - Incorrect (0/3): numbers related to historic dates.
HS - Correct (3/3): incorrect historical dates or years.
AVHS - Correct (2/3): historical years that are inaccurately represented.


#3 Puzzles (GPT-4 Explainer) - Puzzle 11

Ground Truth: language related to something being stopped, prevented, or halted, but only when negated
Original - Incorrect (0/3): verbs related to cessation or stopping.
Summary - Correct (3/3): negative sentences using the verb 'stop' in the future or present tense.
Highlight - Incorrect (0/3): sentences containing the word 'stop' or variants of it.
HS - Correct (3/3): phrases involving the persistence or continuation of an action, often using the word 'stop' in negative sentences.
AVHS - Correct (3/3): statements where an action is continuing despite some resistance or difficulty, with the word "stop" used in a negation context.

This webpage template was recycled from here.

Accessibility