Persistent Folders: Or, why ideas don’t matter, and execution does

I’ll start off this post with a somewhat controversial claim: I invented Dropbox.

I’ll show why this claim doesn’t matter later, but for now, I’ll assure you that it’s true.

How many of you out there use Dropbox? If you don’t, you should — it’s an excellent tool. In its free version, it provides you with 2GB of storage “in the cloud”, using a new kind of folder called a “Dropbox”. What distinguishes a Dropbox from other folders on your computer? The following:

  • Every file put in your Dropbox is automatically (and securely) uploaded to Dropbox’s servers, ensuring you have an offsite backup of all data therein.
  • Multiple computers can gain access to a Dropbox, ensuring files are automatically synchronized across computers without having to use complication version control systems.
  • All files in your Dropbox are versioned, ensuring you can always recover an older version of a file in case you accidentally overwrite a good version.

Dropbox is supported on Windows, Mac OS X, and Linux, and now even has mobile applications, as well. Further, I have a special place in my heart for this service because I started using it almost 2 years ago, and it has acted as a file sharing and project management tool for my own startup’s internal operations at Parse.ly. I was therefore more than ecstatic to discover that this excellent tool and its smart founders had also made it through all of the hurdles necessary to get an early-stage company the financing it needs: they’ve raised over $7 million in financing and have over 3 million users.

But there is another reason I absolutely love Dropbox: because it was my idea. I invented it.

In the summer of 2004, I was really itching to get into Google’s Summer of Code competition. This was the summer I had taken a job working from home as the lead web developer at the Unemployment Action Center of NY. Though the job was great experience — letting me build my first full web application for a real client — I was itching to work on a technically juicy problem, something that affected me in my daily computer use.

And so, I sat down for a day and wrote up a Google Summer of Code proposal for a new system I had invented called Persistent Folders. It wasn’t exactly like Dropbox, but damn close. Even the implementation is close: Dropbox and my system both sync files using rsync, and both use a Python daemon process. The main difference is that since my system was meant to be open source, it did not require the use of a company-maintained service; instead, I proposed that users piggyback existing storage they have via web hosting providers.

Unfortunately, my project wasn’t selected.

Why am I posting this? I recently had a discussion with another engineer after I had discussed some of the technology behind Parse.ly with him. He was surprised at how liberal I was with explaining our internal implementation, architecture, and algorithms. He asked me, “Aren’t you worried that I could steal your idea?”

I responded, “You can steal it all you want; I dare you to try and implement it!” I then explained that to me, ideas don’t matter. I had the idea for a hundred startups that now exist before they started. I know from talking to users and customers of Parse.ly that they had our idea before we implemented it. What matters in software is not an idea, but execution of that idea. Ideas are a dime a dozen.

I began this post with the statement, I invented Dropbox. And now I’m here to tell you that it doesn’t matter one bit, because I never implemented Dropbox. And you can’t own ideas…

If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less, because every other possesses the whole of it. He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.
-Thomas Jefferson

Imagine if when I had come up with this idea, I had patented it. Would it really have been fair to force Dropbox to license my technology? To sue them for “infringing” on my patent?

No way — these guys deserve the success they have. How stifling it would have been for me to put any roadblocks in their way because “I had the idea first”! The world is better because Dropbox exists. And, these guys have had — in my opinion — near-flawless execution. So, kudos to them.

Consider the following matrix:

  Mediocre Idea Great Idea
No implementation 🙁 🙁
Mediocre Implementation 🙁 😕
Great Implementation 🙂 😎

A mediocre idea with a good implementation is worth infinitely more than a carefully-guarded, good idea with no implementation. Of course, the best products are both great ideas and great implementations. And I think my proposal for “Persistent Folders” — written three years before Dropbox even started — proves this to me in a very personal way.

Now, I’m certain the Dropbox guys never read my little proposal, because it was only sent to Google and otherwise sat on my hard drive for years, not viewed by anyone. I had a great idea, but no implementation. And the Dropbox guys took a great idea (one they arrived at on their own, I’m sure), and gave it the implementation it deserved.

For those of you who guard your ideas carefully, I’d suggest you stop wasting your time, get off your butt, and focus on actually building stuff. Because if you don’t, someone else will!

persist-poster

For the curious, below is my proposal to Google Summer of Code 2004, unaltered from its original draft. This is just a relic; for now, I’m glad to keep hacking on my own little idea, hoping one day I can look back and say I executed as well as Dropbox did…

PERSISTENT FOLDERS: A New Metaphor for Data Synchronization
May 13, 2004

The Problem

Computer users are more and more finding themselves with a serious problem: the fragmentation of personal data across multiple physical machines, and even multiple operating systems. Users who find it most comfortable to have a desktop machine at home and a mobile laptop computer “on-the-go” (for business or trips) have to deal with chaotic and often frustrating manual methods to copy that data to the needed places. Some users carry USB memory sticks — which hold their important “working set” of files — and either work directly off those disks (suffering reduced speeds) or make copies of the folders therein on their actual machines, as they become needed, spreading yet more copies and compounding the problem of synchronization. Others abuse ubiquitous technologies for accessing important files, by either e-mailing the files to themselves or by uploading them temporarily to web or FTP servers.

The end result of any of these methods is fragmentation of personal data, with error-prone manual processes for replication, and the inability for a consistent way to search, backup or even just keep track of all the important files and folders spread among the PCs in question.

What is needed to solve this problem, or at least make the problem more manageable, is an intelligent, user-friendly, customizable cross-platform program that allows for transparent synchronization of personal data across multiple computers and systems, either via a LAN or via the Internet.

Introducing the Persistent Folder

Up to this point, most users are used to understanding folders as existing solely in one location. The only way to get a folder’s data onto another machine is to copy that folder, thus making a duplicate. The goal of this project will be to introduce a new high-level data storage mechanism known as a Persistent Folder. Persistent Folders differ from regular folders in that they are meant to be transparently persistent, or synchronized, across different computers.

To explore a hypothetical example, imagine user Joe sits down to do some personal accounting work on Friday. He creates a folder on his desktop called “Personal Accounting,” and begins working on some OpenOffice spreadsheets in that folder. A few minutes later, Joe realizes that some of his accounting work will have to be done while he is away during the weekend, on his laptop computer.

To enable this, he simply right-clicks the folder on his desktop and says “Make this folder persistent across multiple computers.” When he does this, a dialog comes up asking him to type in a description of the persistent folder, and to select computers on which he wishes to make this folder persistent. Joe is presented with a dialog of computers currently available on the LAN. He sees his laptop, “MobileJoe”, and selects it. He then presses OK.

Within seconds, Joe sees the folder “Personal Accounting” appear on his mobile computer’s desktop, even though he hasn’t even touched his mobile computer yet. When he enters that folder, he sees the same spreadsheets that are available on his main PC’s desktop.

These two folders are now treated by Joe as “one persistent folder available across two computers.” He can add files to one folder and they will automatically be propagated to the other. He can modify files and the new versions will then exist in the other. Joe no longer has to worry about pushing his data back and forth across the computers.

Internet Synchronization by way of Ubiquitous Services

One of the major questions one may ask at this point, “that is all well and good for persistence of a folder on a LAN, but what about when I leave my home/office, and need to synchronize over the Internet?”

One approach would be to have a special server application with which all computers that wish to share persistent folders could synchronize. But then the user needs his own server, and needs to install my little server daemon on it. And not many users have their own servers on which they can just install any old application. So for normal users, this, in fact, is no approach at all.

Although most users don’t run their own servers, many modern users do have _server access_. That is, many users do have Web/FTP hosting providers to whom they pay subscriptions, and these servers power their blogs, personal photo stores, etc. The goal of Persistent Folders would be to allow users to piggyback their existing web services to synchronize their files, essentially turning a folder on that server into a Persistent Folder Repository.

A dialog in the properties of a persistent folder would include a checkbox that allows the user to “make this folder available over the Internet.” It would then allow a user to choose a method for making this folder available via the Internet, asking the user to provide a “persistent folder repository” via FTP by default (due to commonality), but equally possible would be scp, rsync, NFS, or even something like cvs/svn (the goal, of course, would be to make the design modular enough that it could support any server type with basic file system operations).

Then, every other visible machine on the LAN who shares that persistent folder would have option enabled automatically. If the machines are offline, the option can be entered manually by the same method explained above.

From that point on, the folders become persistently available and transparently usable, just as before. If the other computers are available on the LAN, then the program utilizes LAN speeds and synchronizes directly. Otherwise, it synchronizes with the Internet Repository.

Lucky Joe is now able to work on his files in the office, leave his laptop there (“the darn thing is so heavy to carry around!”), return home, and continue right on working on those files which are now found in the “same folder” on his desktop machine at home.

Implementation Ideas

I plan on implementing this idea in Python, since one of the major goals of the project is to allow Persistent Folders to exist not just across computers, but also across operating systems (so that a Windows desktop could have a persistent folder that also exists on a Linux laptop, for example). For a user interface, I plan to use PyGTK, since that’s what I know, and since it is also cross-platform. I need something powerful like GTK since there will be times when user intervention will be necessary, but since it is a goal of this project to reduce the number of times the user must intervene, I want to make sure that when he does, he is presented with sane, human-readable, and user-friendly dialogs.

I don’t plan to reinvent the wheel. Most of the magic of Persistent Folders is just making existing synchronization tools work relatively silently and in a way that makes sense with a user’s workflow. The main tool I am thinking of using is rsync, the relatively-ubiquitous UNIX utility for incremental synchronization of directory trees. I considered using the unison, as recommended by Ubuntu’s Wiki entry, but I saw two major problems: (1) unison is no longer under active development and (2) it is written in OCaml, a relatively obscure language which I don’t know. Therefore, any bugs I discover in unison would not be fixed in a timely fashion by its developers, and any features I’d like to add to support my idea would be quite difficult to implement.

As for monitoring folders for changes, I imagine the most elegant solution would be to take advantage of inotify under Linux (like Beagle does) and perhaps handle ChangeNotify events under Windows. I’d really like to avoid polling, since polling is just plain evil for something as potentially neat as this. Regardless, I’ll probably need to code Yet Another Daemon (or Windows service) to keep track of Persistent Folders on the local machine and their equivalents across the network.

Finally, for secure synchronization of files, I plan to use SSH tunneling wherever possible, which is available under Windows under the OpenSSH for Windows Sourceforge project, http://sshwindows.sf.net. Linux distros like Ubuntu, of course, have all the ssh support one needs.

Conclusion

This project aims to introduce a new metaphor users may utilize to share important files across multiple computers: the Persistent Folder. A Persistent Folder is not a shared a folder; rather, it is seen as a single folder that exists locally across multiple computers, and can be treated by the user as such. Changes at any one folder rapidly propagate to the others. Properly implemented, this may provide a better way for users to manage important data that might otherwise be scattered, fragmented, and even lost through the daily shuffle of file transfers across networked PCs.

Appendix A: Project Roadmap

o June 24: Begin Work, with ideas now fully developed in the form of documents. Post these documents to Ubuntu’s Wiki to encourage ideas from community.

o July 5: Have a console synchronization wrapper and some network discovery stuff in Python written, and have UI concepts designed in Glade.

o July 20: Make 0.1 (GUI and basic features working) release, so that Google can show it off at OSCON?

o August 1: Have other great features, like the Internet synchronization and multi-protocol support, in 0.2 release.

o August 10: Consider working on features to allow transparent backup and versioning, to include with bug fixes in a 0.3 release.

o August 20: Make simultaneous 0.4 releases on Linux and
Windows, hammering out as many cross-platform issues as possible. Add inotify/ChangeNotify support if not already there.

o August 30: The big 0.5 release, finished for Google’s deadline. Let Google/Ubuntu make the decision if it’s worthy of being renamed a “1.0” release.

o September 1: Live a less stressful life, since my files are now neatly synchronized among my PCs! But I’ll keep making it better.

Appendix B: Hold On, Isn’t Samba Good Enough?

One common response to this project may be, “Aren’t Samba and SMB shared folders good enough?” Though Samba is good, I do not believe it is good enough. Here’s why:

(1) Samba does not aim to present the user with a metaphor of a folder existing in multiple locations at the same time. I believe this metaphor would be appreciated as powerful by longtime computer users and subconsciously acknowledged as highly usable by novices.

(2) Samba only allows users to share folders on their drives. Other Samba users may then mount those folders via the LAN. This two-step, asymmetric process already seems complicated and convoluted to end users. But more importantly, synchronization is left up to the user. In theory, users could avoid synchronization altogether and work on the files directly, via the share. In practice, LAN speeds are not adequate, and other issues are raised (such as file locking and write conflicts). This forces users to home-brew their own synchronization protocol to make sure duplicate files at different versions doesn’t result in an accidental loss of data.

(3) Samba does not provide any easy method to access personal files once one leaves the LAN and enters the WAN, short of opening up a bunch of ports on your router and trying to connect in from outside (a very slow and insecure method).

(4) Samba provides no recourse when a computer is no longer available. Persistent Folders, on the other hand, make data available “offline” by design.

(5) Samba does not know whether two duplicate files exist across the network. Therefore, Samba cannot be immediately utilized as a transparent form of backup.

Is the Persistent Folder meant to replace Samba? Of course not. I believe Samba has specific purposes: to allow for easy, fast, one-time transfers of files across a LAN, and to allow networked printer and device sharing. I do not believe Samba is an adequate solution to allow a user to treat a folder as if it existed on multiple computers at once.

Appendix C: Versioning’s the Thing

Upon further contemplation of this project, I realized I had left a question unanswered in this document. What is the right way to deal with version conflicts during the silent synchronization phase among Persistent Folders?

Imagine user Joe works on a file in a Persistent Folder which exists on his laptop and desktop computers. The first version (let’s call it version 0.1) was created on his desktop, and Joe had the laptop on the same network, allowing automatic synchronization to occur. But then Joe’s laptop was disconnected from his desktop, as Joe was without network or even Internet access. He works on the document a bit, which can now be called version 0.2. But then Joe forgets that he worked on the document on his laptop, and when he gets home, he works on his desktop computer, creating a version 0.2 there too. Which version 0.2 should be synchronized?

Normally, the answer would be to pick the most recent one. But we don’t want Joe’s laptop work to be lost, just because Joe forgot about his prior work, do we? Well, what needs to happen is a bit of smart conflict resolution with no data loss.

I propose that Persistent Folders should also be versioned to a sane degree. Perhaps by default 3-5 versions of files are always kept, with the most recent versions visible directly in the persistent folder, and other versions buried (in .dotfiles or hidden folders) behind there. Then, the Persistent Folder monitor should inform Joe (“whisper” to Joe via the notification area/systray) that there was a conflict, and the newest file was chosen. But at any point, Joe can choose to revert a file back to an older version. The interface should be such that Joe can even see where the file was edited, for example:

Personal Accounts.xls – Prior Versions
o version 0.2: edited on MobileJoe
o version 0.2: edited on DesktopJoe
o version 0.1: edited on DesktopJoe

Of course, versioning could be disabled (at risk of data loss to the user), or could be enabled as “smart versioning” to only version files when conflicts occur during synchronization. Some may say that versioning is dangerous because it can drastically increase disk usage, but as I mentioned above, the other benefit is that the user gets redundant network backup of files for free. In higher versions of Persistent Folders, things could get more sophisticated by allowing users to turn off versioning even at the file-level, but I don’t think that’d be necessary in the first few releases.

So, for those not keeping score, Dropbox supports:

  • Windows, Linux and Mac OS X syncing, using a scheme similar to that described above, except with a managed set of servers (Dropbox’s) acting as the “repository”
  • sync over LAN (in latest beta releases) when PCs are local
  • file-level versioning
  • Collaboration by sharing folders with other users — a feature I didn’t discuss in my proposal, but that would be a clear next step for a managed service

Go Dropbox!

3 thoughts on “Persistent Folders: Or, why ideas don’t matter, and execution does”

  1. Pingback: startupbug.com

Leave a Reply