Weds 22 Mayo 2024

PRBTA Presents: Crypto + AI Meetup

Guest Speaker: Christopher “Owen” Owens

Media Attestation Protocol

A novel approach using distributed computing for media provenance

Overview

• Blockchain is good for more than just tokens or currency; Decentralized Identity solutions are another killer feature.

• AI generated media poses a challenge with disinformation / deepfakes. This is an area of active research, with tools like watermarking .

• I propose using to create a “Media Attestation Protocol” for AI-generated images as well as real (non-AI) images. Think EXIF but decentralized.

• By creating a record on the blockchain at the time of image creation, it provides an attestation (proof) of the image source.

• Any questions about the image provenance can be checked to see what the metadata says.

Use Cases

• AI image tracking to prevent deepfakes

• Metadata for non-AI images, such as for investigative journalism, or crowdsourced videos or photos of an event.

• Copyright, Legal, or Intellectual Property concerns

• DRM (access control, fingerprinted media to source leaked content)

Process

• At image creation, create record on blockchain with sha256 identifier and json metadata.

• This can be done by AI tools, OEM cameras, or even manually.

• Lookup: Manual lookup – lookup by hash to get metadata.

• Integrated lookup: when uploading media to social networks, can auto check for copyrights, tag as AI-generated, etc.

• Another Integrated Lookup tool: Right click / long press > check attestation

Sample Implementation

// SPDX-License-Identifier: Unlicense
pragma solidity ^0.8.0;

// smart contract creation
contract MediaAttestationProtocol {

  // our main struct
   struct Metadata {
      bytes32 hash;
      string jsonData;
   }

   // mapping – allows a key/value lookup by metadataMap
   mapping(bytes32 => Metadata) private metadataMap;

   // define event to be logged / emitted
   event MetadataPublished(bytes32 hash, string jsonData);

   // main function to publish
   function publishMetadata(bytes32 _hash, string memory _jsonData) public {
      Metadata memory metadata = Metadata(_hash, _jsonData);
      metadataMap[_hash] = metadata;
      emit MetadataPublished(_uuid, _hash, _jsonData);
   }

   // lookup by hash – direct and easy, no loop
   function getMetadataByHash(bytes32 _hash) public view returns (string memory, string memory) {
      Metadata memory metadata = metadataMap[_hash];
      return (metadata.uuid, metadata.jsonData);
   }

   // lookup by metadata key-value – WIP – currently true/false versus return matching images, etc.
   // note – this is probably better handled by a private geth node,
   // to provide fuzzy search and able to target metadata without expensive queries,
  // and more robust search tools
   function containsKeyValue(string memory _jsonData, string memory _key, string memory _value) private pure returns (bool) {
   // define our storage (memory) types
      
bytes memory jsonDataBytes = bytes(_jsonData);
      bytes memory keyBytes = bytes(_key);
      bytes memory valueBytes = bytes(_value);

   // loop – bounded by data length – should also put a check here for a hard limit, maybe 64 chars?
      
for (uint256 i = 0; i < jsonDataBytes.length – keyBytes.length – valueBytes.length; i++) {
         if (jsonDataBytes[i] == ‘”‘) {
            if (checkEquals(jsonDataBytes, i + 1, keyBytes)) {
                // skip the 0x and 2 bytes for abi.encodePacked
               uint256 valueStart = i + keyBytes.length + 3;
               if (checkEquals(jsonDataBytes, valueStart, valueBytes)) {
                  return true;
               }
            }
         }
      }

   return false;

  }



Issues

Binding to Image Creation

Forcing this to be done “at image creation” is important, otherwise any geek could take any existing Ansel Adams and upload it with their name, eroding trust in the platform. We don’t want to lock it down to only OEM / embedded, since it defeats distributed, decetralized, open-source and permissive use. e.g. what if I need to mark my GIMP photo or local SDXL. And you can’t upload a database of all the existing media in the world.

One solution commonly implemented is Proof of Funds. Many restricted services require a payment method, for example to prevent kids without a credit card from signing up. Perhaps a solution could be the cost. Writing to the blockchain costs. This would prevent large scale abuse by most people with finite funds, but would not prevent one-off misuses, or a large actor from spending to defeat it.

Another option is to only allow certain caller addresses to work with it, e.g. not open to public, but to OEMs and AI companies only. But that cuts out small creators, and may lead to a fractured ecosystem like Apple or Android implementations missing.

Multiple Attestations: Perhaps one hash could have multiple attestations to it. But how to incentivize that without devolving into a money game? First one is real, but then 100 fakes. Or 1st one is fake, with 100 rebuttals. This incentivization can cause false attestations, e.g. 1st one fake offering 100eth for attestations, people will gobble it up even though it’s rubbish.

Screenshots / Modification

Media isn’t always shared directly, but modified in any number of ways (re-encoding, screenshots, etc.) You can’t verify the derivatives or the actual “content”. This is kind of “as intended” because you’re not verifying the content of the image, but rather the file itself. But the point is to verify the “content” by verifying the file.

Example: Someone posts a copy of your photo with “AI / fake tag”, but you have the orig from your camera, how can you fight? That’s why it’s important to do it “at image creation”, but also can’t be enforced.

Sociologically, bias is real so relying on this too much is bad, and also if someone takes a wacky photo, then people will still say “no that has to be fake”.

Cost

Unlike CPU, dAPPs cost gas to run. Publishing your whole insta feed of 100 pics a day can get costly, unless costs are sub 1c.

Adoption / Integration

Integration / Adoption. Unless AI tools enforce this, then the use of lookup to determine it’s AI isn’t as useful. Adobe been interested in this area, but can we get all of them on board? What about running Stable Diffusion on your own computer or new implementations? With open source code and weights there’s no enforcement.

… and more

  • This problem can be somewhat solved just using signing (public / private key) or watermarking (stenography), which isn’t widely adopted; although the novelty is distributed lookup, perhaps embedding is a better way to go.
  • Fuzzy lookup is hard without using tools like a private node and private service to provide search, which can erode trust versus a pure-dAPP implementation.
  • This approach is super simplified and can likely be defeated in a number of ways, such as bad actors writing their own copyright on your AI-generated images without you auditing the code, etc.
  • May be 


Summary

Although crypto has fallen out of being trendy in favor of more accessible and visible AI tools, the power behind decentralized ledgers can be a key technology to prevent unwarranted use of AI tools.

Unfortunately, because of the issues described above, this solution is unlikely to be useful as described here.

Everything here is open-sourced under Unlicense. Open, free to share, permissive ideas, please take it and run with it.