Inspiration

Deepfake attacks are on the rise, an unfortunate byproduct of the rapid proliferation and advancement of AI voice technology. Voice phishing or 'Vishing', social engineering attacks that leverage the voice are being supercharged by generative AI and proving effective in extracting billions of dollars from the global economy.

Deepfake ElonA Deepfake Elon Musk has appeared in thousands of inauthentic ads, and is known as the internet's biggest scammer

And it's not only celebrities being mimicked by this tech, nefarious actors are cloning voices of individuals to target major corporations as well as families, in order to extract money with meticulously co-ordinated conference calls or fake ransom requests. Last May a finance worker in Hong Kong was tricked into paying out $25 million after his CFO and other colleagues were deepfaked. And how many parents would remain cool on receiving a phone call apparently from their offspring in distress, and a chance to solve it with a quick wire transfer? The FTC has already put out a warning about this, but how many families lives will be ruined before we find a way to combat this highly alarming rise of cybercrime?

With much of the technology already open source, and the suggestion of Huang's law that the acceleration of AI is only going to increase, it is clear that this threat needs to be addressed urgently. Law enforcement is not well equipped to handle frequent cross-border attacks, while the intrusion necessary to monitor such threats at scale is near impossible for a human work force, regardless of this mass surveillance has proved controversial enough, a move to expand it would likely face resistance. However there is a suite of technologies that offer an opportunity to build a credibly neutral foundation for a new privacy-preserving internet of security, powered by AI and built on next generation cryptography.

The innovation of blockchains has thus far largely been limited to financial use-cases, however the robustness required for financial trading has stress-tested this technology, and helped to advance a number of cryptographic techniques that are invaluable to establishing a new network architecture that enshrines provenance and data sovereignty. These components are necessary to ensure that voice data is properly attributed within a network, and are made possible in part through cryptographic Zero-knowledge (ZK) Proofs and Trusted Execution Environments (TEEs), the former particularly for privacy with the goal of ZK Proofs being to enable information to be shared without making it public i.e. a bouncer needs to validate you are over 18 to enter a club, but rather than providing your entire ID you are simply able to share an indisputable verification that validates you are over 18 and TEEs being the secure zone within a processor which is responsible for protecting sensitive data i.e. processing payments, storing biometrics etc.

Internet of SecurityThe Internet of Security incorporates the latest from Web3 and proposes ZK Browsers for privacy protecting usage

Once a foundational architecture is established, AI agents can get to work ensuring that the network is effectively policed, and with your personal data protected through TEEs and the privacy preserving ZK Proofs doing their jobs we are able to operate on a network free of Deepfake attacks, and without governments or unaccountable corporations monitoring virtually every action. The fundamental challenge is not actually a technological one, but rather a cultural shift in getting the world to change it's behaviour in such a way that networks such as this have a chance to be adopted, internet users have remained remarkably reticent in moving off the social networks that have consistently profited off our data and filled our feeds with advertising. Could the threat of billions of dollars of further losses through deepfake attacks be the tipping point that persuades the market to adopt a new network architecture? As AI continues to ratchet up, we may have little choice.

What it does

VoiceVault enables anyone to take custody of their own voice biometric data, by securely storing and encrypting it via decentralized cloud infrastructure. This interactive proof of concept features an AI agent that supervises the user journey:

  • User visits agent at voicevault.netlify.app
  • User makes a voice recording
  • Voice audio encrypted
  • Encrypted data stored on IPFS via Pinata
  • Reference Content Identifier (CID) deployed in VoiceVault NFT contract
  • User registers IP Asset

VoiceVault PoCThe NFT is issued as a Soul Bound Token (SBT) on Story which also registers the voice recording as an IP Asset

The NFT enables decryption of the voice biometric data when the NFT is held in the users cryptographic wallet:

  • User visits agent at voicevault.netlify.app
  • If user is NFT holder CID is extracted
  • CID sent to Phala TEE Cloud for decryption with wallet signature
  • Decrypted audio plays back

How we built it

The VoiceVault AI Agent is built with Groq that is fine tuned on this DevPost as a system prompt, we are leveraging Phala's network of TEEs to administer the encryption of the voice data which is instantiated to IPFS and encrypted with cryptography.fernet. The NFTs are minted on the Story Protocol and also registered as IP assets, the wallet signature is verified with eth_account. The front-end is written in React Typescript and Three.js with the help of Replit.

Derive encryption_key from the TEE

    from dstack_sdk import AsyncTappdClient

    client = AsyncTappdClient()
    derive_key = await client.derive_key('salt for encryption key')
    key_bytes = derive_key.toBytes(32)  # Get 32 bytes for encryption key

    return key_bytes.hex()

Fernet cipher for Audio Encryption

    from cryptography.fernet import Fernet

    fernet = Fernet(base64.b64encode(bytes.fromhex(encryption_key)))
    encrypted_content = fernet.encrypt(content) 

Challenges we ran into

Since we last explored Phala it has been significantly revamped, from Phat Contracts, to AI Agent Contracts to Dstack, the initial idea was to fork the Schrödinger's NFT beta demo from Aplion, a Phat Contract to enable NFTs to signal to a wallet that an encrypted file could be decrypted - storing the encryption keys in a Phala TEE. However as Phat Contracts have been deprecated this was no longer feasible, some time was lost in setting up the Schrödinger NFT and even finding test tokens was a pain due to the POC-6 network being essentially redundant. For a while it looked as though we may have been limited to minting an NFT on a Polkadot Parachain, but once the latest developments became apparent we went ahead and used the dstack_sdk to interact directly with the Story Protocol.

Accomplishments that we're proud of

Getting the Phala implementation running with IPFS and Story so voice data can be encrypted is a good step towards providing secure, immutable storage that will serve as a backbone of future applications that leverage this data. It is a sign we are on the right track that the Phala network has been upgraded numerous times recently, we didn't set out with any specific agenda to use Phala over any other network, however it became apparent as the work began that this is the optimal solution for leveraging the power of TEEs. This hackathon project establishes a voice biometric data bedrock for future deepfake defence solutions based on cryptographic administration, fully open-source under the MIT license.

What we learned

Q: How can you reliably encrypt your voice data if you are accessing everything through a commercial web browser?

A: You probably can't

Even with an entirely encrypted backend, mainstream commercial web browsers are closed-source, unauditable and vulnerable to attacks. Should we encourage users to access the backend directly? That is something only possible for the most technically savvy users (though with support of an AI Agent the technical barrier can be significantly reduced).

Q: How then can voice data be secured at scale?

A: A Zero-knowledge browser

The web browser is not just our window onto the internet, but in the coming years will become our interface with life itself, as augmented reality tech gets more wearable, and AI gets better, those that choose not to wear smart glasses will face a disadvantage in much of the modern world. This will create brand new attack vectors (not to speak of the privacy threat), a privacy preserving web browser which enables us to interact with our sensitive data and assets is a requisite for the future.

Zuck's OrionWould you feel secure knowing Mark Zuckerberg has access to a camera feed directly into your personal life?

What's next for VoiceVault

Getting more feedback from the market as to what would make VoiceVault feasible as a product, deeper research into voice biometric data so as to ascertain the feasibility of picking out individual voiceprints. Analyzing potential business models, from Vishing defence and device/service verification, to creator economy for renting out your voice to AI agents.

Refining the user experience so users can encrypt and decrypt their voice data without being exposed to a vulnerable web browser (what does a minimum viable product for a ZK Browser look like?), potentially by introducing a physical hardware component or alternatively creating a voice data custody service. Just as many cryptocurrency users opt to store their cryptocurrency on a centralized exchange, rather than taking advantage of self-custody, we cannot expect every user to take custody of their voice data, this could be the beginning of a competitive marketplace for personal data custodians.

Demo

A non-custodial crypto wallet such as Metamask is required to try the demo

Fund your wallet with IP Tokens on the Story Aenid testnet, through the faucet

Built With

Share this project:

Updates