Ask HN: Options for distributed peer-to-peer image datasets?

8 points by lovelearning 5 years ago

I want to start an image dataset that has wide uses and can solve some unsolved problems.

Requirements:

#1 I need others to contribute personal photos and videos.

#2 Since the number and file sizes of relevant photos and videos is likely to be quite high, I think an architecture where photos/videos and their annotations are all stored in their own local machines is better than expecting people to upload GBs of data to some central location.

#3 I'd like contributors to always retain access control of their photos and videos - they can revoke subsets of their files from the dataset at any point.

#4 It'll probably also require creating an annotation solution that can distribute annotation tasks to volunteers, but the photos and videos still remain on local machines and only temporary copies with limited access may get uploaded centrally until annotated and then get deleted.

Questions:

Does some software to do all this - or some of it - already exist? If not, is something like IPFS a good enough storage solution for these requirements? Any other suggestions?

I'm not concerned about issues with distributed learning/compute for now. If the dataset is good enough, eventually some solution will probably emerge.

gus_massa 5 years ago

> #3 I'd like contributors to always retain access control of their photos and videos - they can revoke subsets of their files from the dataset at any point.

I don't understand. If the dataset can be used by other people (you, the owner, and other people) then you can't force everyone to delete the photos. They can pretend that they deleted the photos and avoid using them publicly from this time. This is similar to the API in Tweeter that make you pinky promise to delete the deleted tweets. But there is no delete button in the Internet, only a hide button.

  • lovelearning 5 years ago

    The access control is non-negotiable. Compute solutions will have to be distributed in a way that they can work with it. For example, first few layers of a neural network can be computed locally on contributor machines and transfer only the results of the calculations.

stefkors 5 years ago

Perhaps you can do something with the dat:// protocol?

  • lovelearning 5 years ago

    Hadn't heard of it before, but its goals sound like it could help me with my problem. Thank you very much for telling me about dat!