Where can I find the code?

#832
by nickandbro - opened

Hello all,

AI comic blows my mind with the speech bubbles support. Is there code available to see how it does this?

Hello @nickandbro

The speech bubbles work by doing three things:

  1. I've modified the prompts used to generate the stories to add an extra output for dialogue, in addition to the existing caption and scene descriptions.
    This separation improves the overall quality (before this change sometimes captions included dialogue, that should be less the case now).

    The source code for this is here

  2. for each image a segmentation model is run using MediaPipe
    The source code for this part is at the top of injectSpeechBubbleInTheBackground.ts

    This part could be improved to better identify multiple people, animals etc.
    When this doesn't find anything, there is a fallback to point towards a bit above the center of the screen.

    By the way, about using a segmentation model (precise shape detection) versus a simpler boxes/rectangles detection algorithm (which could be faster/lighter/detect more types of entities) : simple box detection could work too, but I prefer segmentation because ultimately this can be used to do something smart, like making sure we position the bubble at the right place, withing hiding stuff too much etc (although I don't fully take advantage of this yet).

  3. then there is bubble drawing:

    I don't usually work with SVG drawing so I've asked Claude 3.5 to generate a bubble drawing algorithm, but it doesn't work perfectly either.
    Sometimes there are bugs/artifacts, but I would say it's a start.

    The source code for this part is in the bottom of injectSpeechBubbleInTheBackground.ts

    I think I will welcome external contributors to improve this part!

Thank you so much! And for your contribution to the open source community!

This comment has been hidden

Sign up or log in to comment