LLM Discussion: Enhancing Creativity via Discussion Framework and Role-Play

Core Concept

By simulating human collective intelligence, LLM Discussion uses a three-phase process (Initiation, Discussion, Convergence) and assigns distinct roles (e.g., Academic Researcher, Environmentalist, Visionary Billionaire) to generate diverse and creative ideas.

Introduction

The Promise of Creative AI

Large language models (LLMs) have emerged as highly efficient tools in addressing daily challenges across various applications, demonstrating exceptional capabilities in natural language processing. Specifically, LLMs achieve excellent performance in language comprehension tasks, such as sentiment analysis and question answering, and are widely adopted in content generation tasks — from writing articles to composing poetry — by producing coherent and contextually relevant text that closely mimics human-written content.

Multi-LLM Paradigms: Promise and Limitations

Recent studies highlight the potential of multi-LLM paradigms, which aim to simulate human collective intelligence by engaging multiple LLMs in iterative discussion to arrive at more comprehensive and well-balanced solutions. This paradigm allows systems to generate a broader range of ideas beyond what a single model can reach.

In the context of creativity, these frameworks often assign diverse roles to LLMs and employ structured, multi-round interactions. For example, one LLM may act as a creative professional while another serves as an environmentalist, enabling the system to explore and integrate multiple perspectives through iterative exchanges.

⚠️

The Challenge

Despite these advantages, multi-LLM frameworks face two major challenges. First, many modern LLMs are not specifically trained or reinforced for sustained multi-turn conversations, which can lead to unengaging discussions that quickly converge. Second, the high homogeneity of LLMs makes it difficult for discussions to diverge and produce creative outcomes, as agents often generate similar responses.

LLM Discussion

To address these issues, we develop a role-play enhanced LLM discussion framework, dubbed LLM Discussion, as illustrated in Figure 1. To facilitate vigorous and diverging exchanges throughout rounds of discussion and ensure convergence to creative answers, we arrange a three-phase discussion and design prompts specialized in the initiation phase, the discussion phase, and the convergence phase.

To tackle LLM homogeneity, we adopt a role-playing technique, an established practice in Design Thinking. Specifically, we assign a diverse set of roles with distinct backgrounds and perspectives to LLM agents participating in a discussion, such as Visionary Millionaire and Futurist, and roles specialized for various facets of discussion inspired by methods like Six Thinking Hats.

✨ Our Contributions

Framework: We propose LLM Discussion, a role-play enhanced, three-phase discussion framework that encourages divergent exchanges and converges to creative answers.
Role-Play: We introduce a role assignment mechanism to increase diversity among agents and mitigate LLM homogeneity.
Evaluation: We evaluate on four creativity benchmarks adapted from the Wallach–Kogan Creativity Test and report both LLM and human evaluations on Originality, Elaboration, Fluency, and Flexibility.
Results: Experiments and human studies demonstrate that LLM Discussion outperforms baselines in producing creative answers across multiple metrics.

Method

💬 Three-Phase Discussion Framework

To foster meaningful interaction and avoid rapid convergence, we designed a three-phase process:

Initiation Phase: Introduce the topic and objectives, establishing a consensus for group discussion.
Discussion Phase: Over multiple rounds, LLMs receive responses from other agents and build upon their ideas.
Convergence Phase: In the final round, summarize all ideas and converge on a collective, creative conclusion.

🎭 Role-Playing Mechanism

To overcome the "homogeneity" problem where LLMs think alike, we introduce role-play. This helps generate fundamentally diverse perspectives.

Diverse Roles: We automatically generate a set of roles with diverse backgrounds and perspectives (e.g., Visionary Billionaire, Startup Founder, Environmentalist).
Role Prompt: Each LLM is assigned a role before the discussion and reminded of its identity each round to ensure its responses align with the persona.

Evaluation

Creativity Benchmarks

Benchmark	Description	Sample Task
AUT	Evaluates divergent thinking by requiring the generation of numerous unconventional applications for an object.	What are some creative uses for a mug?
INSTANCES	Measures the ability to produce a diverse set of examples that satisfy a given property.	Name 5 square things you can think of.
SIMILARITIES	Assesses associative creativity by challenging participants to identify non-obvious connections between two concepts.	Tell me 5 ways in which a brick and a stone are alike.
SCIENTIFIC	Probes creative problem-solving within a scientific framework.	Find different scientific uses for a spoon.

Evaluation Metrics

Originality

Assesses the novelty and unconventionality of responses.

Elaboration

Measures the amount of detail used to extend a response.

Fluency

Measures the number of relevant responses.

Flexibility

Measures the variety of categories or perspectives in responses.

Evaluation Method

LLM Evaluation

Using gpt-3.5-turbo-0125 as a judge, we score all generated responses based on the defined metrics.

Human Evaluation

We conducted a comprehensive human evaluation, collecting over 1,400 assessments from 42 annotators using the same TTCT-based rubric.

Baselines

We compare LLM Discussion with several single-agent and multi-agent frameworks.

Single Agent

A single LLM agent using a zero-shot prompt to solve the task.

Brainstorm, then Select

A single-LLM framework designed for AUT, which includes a selection phase to judge the creativity of potential solutions.

LLM Debate

A multi-LLM framework aimed at improving factuality and reasoning. Agents check others' responses for correctness rather than building on their ideas.

Results

Main Results

LLM Evaluation Results. LLM Discussion surpasses all baselines in Originality across all four benchmarks and achieves the best Elaboration scores on three benchmarks.

Qualitative Results

Qualitative comparison of generated responses

A comparison of responses for the AUT and SCIENTIFIC benchmarks. LLM Discussion generates more innovative and detailed ideas (e.g., "Towel Storybook Quilt") compared to the more common responses from baselines.

LLM & Human Alignment

Chart showing correlation between LLM and human evaluations

Kendall's τ correlation shows a strong alignment between LLM and average human judgments for Originality, Fluency, and Flexibility. This suggests LLM evaluation is a reliable proxy for human assessment.

Analysis

We conducted ablation studies to determine the optimal settings for our main experiments.

Analysis of Discussion Rounds

Originality increases with more discussion rounds, but Elaboration decreases after 5 rounds. We chose 5 rounds as a balance.

Analysis of Agent Count

With 5 discussion rounds, using 4 LLM agents achieved the best overall performance.

Conclusion

We proposed LLM Discussion, a method to enhance LLM creativity by emulating the human discussion process.

Our three-phase framework facilitates multi-round, interactive discussions, while the role-play mechanism effectively mitigates LLM homogeneity.

Through LLM evaluation and human studies, our framework outperforms single-LLM and existing multi-LLM frameworks on Originality and Elaboration across four creativity benchmarks.

How to Cite

📄 Paper

[https://arxiv.org/abs/2405.06373]

💻 Code

[https://github.com/lawraa/LLM-Discussion]

BibTeX

@inproceedings{lu2024llm,
  title={LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play},
  author={Lu, Li-Chun and Chen, Shou-Jen and Pai, Tsung-Min and Yu, Chan-Hung and Lee, Hung-yi and Sun, Shao-Hua},
  booktitle={Conference on Language Modeling},
  year={2024}
}