If you have some footage with visual elements or concepts that is difficult to find accurately, for example if the search results are consistently confusing or performing poorly for one "class" or type of shot/frame, we could fine-tune a model if we are supplied the training data - i.e. the frames and what they are (label, name, whatever the variations may be). Kind of a labor intensive job for the user to provide all the required frames (or exact timestamps where to find frames at least, so we can extract them with a script) To clarify - we need a dataset of examples, not just a single image or two.