There are companies making millions of dollars off of your personal information in exchange for giving you a way to easily share data with your friends. Facebook, Twitter and all the rest of these networks are all centralized services. You give them your data, they keep a copy and hopefully they share the data with only the people you told them to share it with. The funny thing is that for decades we have had email which is a federated service that gives us a less structured way to share data with our friends. With email we could send pictures to our friends. With Facebook we get the power of croud-sourcing. Our friends can tag and comment on our pictures. Surely there must be a way for us to do this in a federated way without requiring that we hand our data over to a middle-man.
There have been attempts at building a Federated Social Network. Diaspora is one such attempt that drew a lot of early buzz and funding. When I saw it I thought "thank goodness someone is solving that problem". I must say that one year on it appears to me as though they are not addressing the real problem. I was thoroughly disappointed with the result of their work: a Rails-based clone of Facebook. In my opinion what is needed here is a new federated protocol that can be easily extended with new content types and that protects access to data with private keys. On top of that new clients (web, desktop, mobile, whatever) can be built.
The following is a brain dump of one way of doing this.
Every user would have their own node or share a node with a group of people that they trust on a server of their choice. A working title for this project could be "A League of Nodes" but hopefully we'll come up with something better than that.
Basic infrastructure
Very few systems are as efficient as Git is when it comes to synchronizing data so it will be employed for sending and receiving updates.
Data will be stored in UUID filenames, similar to the way that git stores its data in .git/objects, but we will store these objects in the working tree. The files will be either JSON strings or binary data. The one required JSON field will be type. Creation date and author can be extracted from the Git logs.
A NoSQL document store such as CouchDB or MongoDB would be used to store the files and the JSON documents. At this point if you are familiar with CouchDB and its awesome built-in synchronization capabilities you might be questioning my sanity about implementing a new synchronization protocol. The problem with CouchDB's synchronization is that if we want to share with another user they would automatically get all of our friends' data as well. (There might be a way around this, please leave me a comment if you know of a way.) When an update is received from another user the UUIDs in your database would be updated with the latest content. To prevent tomfoolery UUIDs would be prefixed with your own unique UUID for the user who made the update so people could not clobber or update existing UUIDs in your database. When an update is received it is merged into your database.
A Twitter timeline or Facebook status listing is a single query:
> db.content.find({'type': 'update'}).sort({'date': -1})
{ "_id" : ObjectId("4de3d4a4475e87b4e7ce60d1"), "type" : "update", "user" : "Dan", "body" : "Dan welcomes everyone else", "date" : "Tue May 31 2011 02:32:20 GMT+0900 (KST)" }
{ "_id" : ObjectId("4de3d3f9668d1f97b29312ad"), "type" : "update", "user" : "jane", "body" : "Jane says: here I am", "date" : "Tue May 31 2011 02:29:29 GMT+0900 (KST)" }
{ "_id" : ObjectId("4de3d3db668d1f97b29312ac"), "type" : "update", "user" : "fred", "body" : "First post from Fred", "date" : "Tue May 31 2011 02:28:59 GMT+0900 (KST)" }
Your Facebook photo albums are a little more work on the client (styling and such) but not too much:
> db.content.find({'type': {'$in': ['photo', 'photo-tag', 'photo-comment']}}).sort({'date': -1})
{ "_id" : ObjectId("4de3d746475e87b4e7ce60d4"), "type" : "photo-tag", "user" : "Dan", "photo" : ObjectId("4de3d6f1475e87b4e7ce60d2"), "date" : "Tue May 31 2011 02:43:34 GMT+0900 (KST)", "x" : 20, "y" : 20, "body" : "There I am!" }
{ "_id" : ObjectId("4de3d721475e87b4e7ce60d3"), "type" : "photo-comment", "user" : "Dan", "photo" : ObjectId("4de3d6f1475e87b4e7ce60d2"), "date" : "Tue May 31 2011 02:42:57 GMT+0900 (KST)", "body" : "Nice photo if I do say so myself" }
{ "_id" : ObjectId("4de3d6f1475e87b4e7ce60d2"), "type" : "photo", "user" : "Dan", "photo" : "pointer to file in GridFS", "date" : "Tue May 31 2011 02:42:09 GMT+0900 (KST)" }
Another thing that is great about this system is that it can handle new content types that don't need to be imagined when the system is created. In the same way that web browsers handled unknown tags during their Cambrian Explosion unknown content types can either be ignored or a little blurb can be shown explaining that the client doesn't know how to handle it. Clients could even give users the option to view the raw JSON of an entry to see if there is any useful information therein.
Some problems that need addressing:
- Git is all-or-nothing sync. If I give you access to my repository there is no way for me to limit what content you can get out of it. You have access to the whole thing. If we want more fine grained permissions we would have to build out repos per user or group that we want to share with and then manage the nightmare of controlling how to export data to those repos or we will need come up with our own syncing protocol. I still think Git is a good tool to start with for experimentation since it keeps the synchronization layer separate from the data storage layer which allows us to experiment with different data stores.
- How do you search for friends if you have a federated system? I guess you would actually need to gasp know the people you are sharing data with.
- How do you make a friend request? Maybe build a mobile app that exchanges public keys and repository URLs. Directly emailing someone could also work. If we end up implementing our own synchronization protocol you could allow anonymous messages (that could easily lead to SPAM-city).
This is of course an explanation of the technical implementation of a truly federated social network. The actual implementation would need to be much more user friendly and hide these technical details from the user.
See part 2.