In our previous post, we identified a need for a quick web setup that would allow Jeff to send Mary a secured, 15MB PowerPoint presentation. In this post, we outline some good practices for design and implementation of a web based technology.
Design Criteria
For our design, we want to be as secure as email, if not more. Our first design goal is to allow multiple bins for upload so that both Jeff and Mary can communicate, but also Jeff and Bob can communicate without crossing files. Additionally, because we don't initially know the contents of the files, we should make things secure so that if Bob's hacker son Jason happened to figure out how the site works, he still can't get the data being shared between Jeff and Mary. We also don't want any nosy information technology staffer to be able to open the files either while on the site.
We also thought about making it pretty. Since people have varying definitions of what pretty is, we made it "themeable". Those that like a blue background can choose to have one, those browsing or downloading with Mobile devices get a reduced image or bandwidth-friendly theme, etc.
Our Implementation
We started with a database that keeps track of the upload bins and files. Our original idea of storing files within the database was changed to putting them in a file store, as the database would take a long time to store a BLOB (Binary, large object) of 150MB. That meant adding a database/file consistency checker, but in the end the file storage was the best solution.
The security was our next focus. We used a public key infrastructure asymmetric algorithm and keys to associate with each upload bin. Jeff creates a key unknowingly when he creates the upload bin, and that key is protected by a system-created strong password, and Jeff then passes that to Mary. When Mary enters the password, the key data can then be unlocked to provide the information needed to decode the encrypted file for download.
This security system also has some interesting applications we didn't plan, and leaves room for additional features. For example, with only a slight change, we can add a recovery key to get into the upload bin should Jeff and Mary forget the password. It could also be used for an administrator to ensure that company secrets weren't being published, etc. We used the same technology as referenced in a previous article, Securing a Secret for multiple readers. In addition, because the creator and the recipient is the only one with the key, this also means that with this type of implementation, the site could be produced into a secure public service. There would be no way (other than attacking the algorithm or the password) that a service provider would be able to see the data. With the installation of an SSL certificate, even the security of the data during transport is more-or-less guaranteed.
In terms of user featires, and based on past experience, we knew that Jeff wouldn't have a lot of extra time to maintain the files in the system. We planned for him to post the files, and then walk away and have the system maintain itself. Each file was given an expiry date, and an application that would check and remove expired files automatically.
We added trivial logging, where each file download was tracked - and some spam protection to ensure that someone couldn't attempt a dictionary attack on the login. On a certain number of attempts, it locks out that IP address for 5 minutes and makes the upload bin unavailable to them. We could have done this a thousand different ways, too - as obviously this may lock out a company when a particular user was misbehaving. For some ISP's, it may block out a whole community or City. Regardless, we have other options here to counteract those type of attacks.
The decoding of the file appeared to be instant relative to the time to actually download the data. Because we buffer the file, I suspect the loss to decoding is absorbed within the bandwidth completely. In our tests, we couldn't decipher the difference between decoding encrypted files and just transferring non-encoded files. The processor on the server was utilized more during decoding though - and so there was a limit to how many downloads could be performed at any time.
Future improvements
One of the issues we have with HTTP transfers was timeouts. Large files (>250MB) would take a long time to push to the server and would break if network conditions weren't just right. We can improve that using BITS, or by putting a client control that would chunk-transfer the file in pieces, having the server re-assemble them. Downloading seemed to be the opposite and was handled well with the browsers that we tested, so the only change would be to use a technology like BITS to have files slowly and reliably downloaded instead of utilizing all the bandwidth we can with the transfer.
The skin is also set via a configuration file at install time. It may be benefitial if providing this as a public service to have a theme selectable at upload bin creation time. (Perhaps even allowing customers to upload their own customization file?)
Conclusion
Companies or individuals that end up needing to receive or send large amounts of data can be better served with a tool or tools that focus on the needs. We see a benefit for this particular technology in operations like Print or Copy centers, engineering firms, software companies, design companies, even business centers in Hotels or airports.
Having quick and reliable access to data you know is secured makes your data all that more profitable, too.