Flying memes » Blog Archive » LLaVA, Google Photo API and a broken e-book reader: Haiku digital photo frame.

LLaVA, Google Photo API and a broken e-book reader: Haiku digital photo frame.

I’ve made a digital photo frame that displays random photo from my Google Photo library enriched with an Haiku generated by a LLaVa instance using the same photo as reference.

Here’s a video of the digital photo frame in action:

The frame is an old Kobo Glo which I repurposed using this great guide I found on MobileRead forum and on which I’m running a super simple script that fetches and displays whatever returned from an arbitrary endpoint every minute or so.

while true; do
  echo "1" > /sys/class/graphics/fb0/rotate
  wget http://myurlendpoint.com -O /koboframe/image.raw
  cat /koboframe/image.raw | /usr/local/Kobo/pickel showpic
  sleep 60
done

On the endpoint, I first use Google Photo API to fetch a picture from my library:

$photosLibraryClient = new PhotosLibraryClient(['credentials' => $credentials]);
$filtersBuilder = new FiltersBuilder();
$filtersBuilder->addIncludedCategory(ContentCategory::PEOPLE);
$filtersBuilder->setMediaType(MediaType::PHOTO);
$filtersBuilder->addDateRange(
  (new Date())->setYear($endSlotY)->setMonth($endSlotM)->setDay($endSlotD),
  (new Date())->setYear($startSlotY)->setMonth($startSlotM)->setDay($startSlotD)
);
try {
  $response = $photosLibraryClient->searchMediaItems(
    [
      'filters' => $filtersBuilder->build()
    ]
  );
} catch (Exception $e) {
  writeLog($logdata."error in searching media items: ".$e->getMessage());
  exit();
}
$page = $response->iteratePages()->current();
$element = $page->getIterator()->current();
$metadata = $element->getMediaMetadata();
$creationTime = $metadata->getCreationTime()->getSeconds();
$date = new DateTime();
$date->setTimestamp($creationTime);
$creationDate = $date->format('d/m/Y');
$imageURL = $element->getBaseURL().'=w1024-h758-c';

Once I got the picture, I’m passing it to LLaVa: a new multimodal LLM based on Vicuna alongside the prompt “Write a profound haiku that capture the essence of the photo”. To interact with LLaVa I’m using replicate.com cURL APIs, which I wrapped in two functions “getPredictionId” and “getPredictionResponse” with some forced waiting in between to allow the model to generate the reply (there are better approaches, but yeah, this is good enough here).

$predictionId = getPredictionID($imageURL, $logdata);
usleep(10 * 1000 * 1000);
$haiku = getPredictionResponse($predictionId, $logdata);

Finally I load the image in an Imagick object and use some of its handy methods to add the haiku, alongside with the date when the picture was taken, in a semi-opaque box at the bottom of the image. Before returning it back to the frame I have to perform a final step to convert the image into the proprietary format Kobo supports.

for($y =0; $y<758; $y++) {
  for($x =0; $x<1024; $x++) {
    $colors = $imagick->getImagePixelColor($x, $y)->getColor();
    $gray_8bit = $colors['r'] * 0.299 + $colors['g'] * 0.587 + $colors['b'] * 0.114;
    $gray_8bit_int = (int) round($gray_8bit);
    $gray_565 = ((($gray_8bit_int >> 3) << 11) | (($gray_8bit_int >> 2) << 5) | ($gray_8bit_int >> 3));
    $binarydata .= pack("S", $gray_565);
  }
}

And that’s it, here’s a few more pictures of the digital photo frame in action: the haikus are a bit too literal for my taste, but I expect some improvements once I’ve spent some more time refining the prompt.