I aspire to create AI systems that truly excel in multimodal domains and humans can precisely steer, edit, and control.

Linn Bieske

Interdisciplinary machine learning researcher with strong expertise in computer vision & natural language processing.

Massachusetts Institute of Technology

Candidate for MS Electrical Engineering & Computer Science / MBA (Leaders for Global Operations Fellow)

Research areas

Computer vision

At MIT, I research precise human-guided editing of AI generated images via prompt-to-prompt methods for stable diffusion models. Our model enables a higher level of human control over visual features in generated images.

Natural Language Processing

At Google X, I researched how to teach code to write and re-write itself employing program synthesis methods relying on transformer models. Our models are now deployed as Codey - Google’s foundational code generation models powering tools e.g., Colab.

Natural Language Processing

I collaborate with research groups at Harvard to research knowledge distillation and prompt-tuning methods to harness the reasoning power of LLMs and derive AI-assisted tools for complex adversarial negotiations.

Our improved model enables the precise control of visual features in generated images such as the straightness or waviness of hair in portrait images.

Are you curious how the cross-attention mechanism of the underlying stable diffusion model is controlled to generate these results?

Selected research findings

For a detailed insight into my research, the blog article on my prompt-to-prompt image editing work offers a comprehensive overview. You can access it here.

My mission is to not only make AI systems more powerful across data domains, but also to enhance how precisely humans can control these AI systems.

In MIT’s “6.S898 Deep Learning” course I researched how to improve explainability and controllability of image generation transformer models. Current Stable Diffusion models generate highly different output images for similar prompts, limiting the user’s ability to iteratively edit generated images. We approached this problem by investigating prompt-to-prompt editing mechanisms [1]. To explain root causes of undesired deviations in visual features, we investigated three mechanisms: silhouette selection, cross-attention injection, and self-attention injection. Contrary to existing literature we found that the importance of self-attention injection and cross-attention injection should be reversed. Our improved prompt-to-prompt method significantly outperformed the state-of-the-art method – it allowed full completion of desired edits. This method might fundamentally transform how precisely humans can control AI-generated images.

[1] Hertz, A. et al. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).

Results from method proposed by Hertz, A. et al

The existing prompt-to-prompt editing method shows strong instability across generated images if the attention weight of selected visual features is scales.

Results from our improved model