adding space between task prompt and text input makes better/proper result

#2
by flrngel - opened

I think the code should be changed like this

    if text_input is None:
        prompt = task_prompt
    else:
-        prompt = task_prompt + text_input
+        prompt = task_prompt + " " + text_input

for example,

test image: http://farm3.staticflickr.com/2386/2532343535_41a2d3a9a0_z.jpg (from coco)
task prompt: Region to Description
text input: man on the back (without space)

output:

{'<REGION_TO_DESCRIPTION>': 'A woman with a large backpack in an airport terminal.'}

text input: man on the back (with prepending a space)
output:

{'<REGION_TO_DESCRIPTION>': "person on the back of a large green backpack with straps and buckles. \n\nThe backpack appears to be made of a durable material and has multiple pockets and compartments for storage. The straps are adjustable and the buckles are silver. The backpack is resting on a blue and white checkered floor.\n\nThere is a person's leg visible on the right side of the image, but they are not clearly visible. The background is blurred, so it is difficult to make out any other details."}

I think the example code from microsoft/Florence-2-large is wrong.

I will test with some pictures and apply your recommendations afterwards. Thanks for feedback.

In "Region to Description" you need to give "BBOX" coordinates like 'loc_52 loc_332 loc_932 loc_774' instead of plain "text input". Thats why you are getting different results. For other tasks it looked identical to me.

image.png

flrngel changed discussion status to closed

Sign up or log in to comment