Skip to main content

Image Processing (March/April 2025)

The Dynamic World of Image Generation Models #

  • Images = pixels in matrices and arrays, Words = broken down into tokens
  • Computer vision, image processing, and language models build on a common foundation: deep learning
  • OpenAI DALL-E sets the stage… DALL-E is a name that combines “Dali” (referencing the surrealist artist Salvador Dalí) and “WALL-E” (the Disney robot character)
  • Stable Diffusion, Midjourney, Genini 2.5 (Google), OpenAI GPT

How Image Generation Models Work #

  • Language model foundations (embeddings)
  • Text caption/image pairs for training (matching of embeddings)
  • GPT = generative pre-trained transformer

Prompts Make the Difference #

  • Use simple, plain language, be concise, be explicit
  • Name the objects, the setting, the style
  • Revise, revise, revise

Learning by Doing, Learning by Programming #

  • Hugging Face repository of machine learning and AI models
  • Python programming packages pypi
  • R programming packages CRAN
  • Go programming packages go.dev
  • AI-assisted programming

References #

  • Elgendy, Mohamed. 2020. Deep Learning for Vision Systems. Shelter Island, NY: Manning. [ISBN-13: 9781617296192]. Amazon Associates Paid Links: Paperback, Kindle.

  • Foster, David. 2023. Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play (second edition). Sebastopol, CA: O’Reilly. [ISBN-13: 978-1098134181] Amazon Associates Paid Links: Paperback, Kindle.

  • Lane, Hobson and Maria Dyshel. 2025. Natural Language Processing in Action (second edition). [ISBN-13: 978-1617299445]. Amazon Associates Paid Link: Paperback.

  • Tunstall, Lewis, Leandro von Werra, and Thomas Wolf. 2022. Natural Language Processing with Transformers: Building Language Applications with Hugging Face (revised edition). Sebastopol, CA: O’Reilly. [ISBN-13: 978-1-098-13679-6] Amazon Associates Paid Links: Paperback, Kindle.

Back to the main page for the Seminars page.