Ask HN: Create embeddings efficiently for an AI notes app with E2EE

3 points by satyajeetjadhav 15 days ago

Hi,

I am building a notes application that automatically finds related notes - thinkdeli.com. I want to implement end-to-end encryption sometime soon.

To find related notes, I create embeddings, add them to an index, and find the nearest neighbors.

I am creating embeddings of the user's notes locally using transformers.js. But setting up the pipeline takes up a lot of memory (around 400+ MB on Chrome). This makes it somewhat impractical to use on older devices.

Is there a more efficient way to create embeddings locally?

Creating embeddings via an API will be more efficient, but that would mean sending users unencrypted notes over the cloud to the service. Edit - What I mean is this. The user's notes must be unencrypted and readable as plain text on the server to create embeddings. This defeats the purpose of end-to-end encryption.

I would appreciate any pointers. Thanks a lot!

Someone 13 days ago

> The user's notes must be unencrypted and readable as plain text on the server to create embeddings.

Consult a security expert before doing this, but here’s an idea: encrypt each word of the text, send the encrypted tokens over the wire, and then use an embedder trained on text encrypted with that method.

If you use an asymmetric encryption method, you could even throw away the private key.

The result still would be a substitution cypher on words, so it would not resist frequency analysis and it won’t help at all that, if your users manage to extract the key, they can encrypt text to figure out the mapping, but it would protect against people ‘accidentally’ looking at text of your users.

Periodically switching the encryption key wouldn’t be that hard.

pmtolk 15 days ago
  • satyajeetjadhav 15 days ago

    Sorry, I should have phrased the last part of the problem better. I already use https.

    The user's notes must be unencrypted and readable as plain text on the server to create embeddings. This defeats the purpose of end-to-end encryption.

innethread 14 days ago

I’m not sure if there are implementations for browsers, but look into embeddings with homomorphic encryption.