I will email you those files... Part 2

Posted by Darin Rousseau | Filed under , ,

In our previous post, we identified a need for a quick web setup that would allow Jeff to send Mary a secured, 15MB PowerPoint presentation.  In this post, we outline some good practices for design and implementation of a web based technology.

Design Criteria

For our design, we want to be as secure as email, if not more.  Our first design goal is to allow multiple bins for upload so that both Jeff and Mary can communicate, but also Jeff and Bob can communicate without crossing files.  Additionally, because we don't initially know the contents of the files, we should make things secure so that if Bob's hacker son Jason happened to figure out how the site works, he still can't get the data being shared between Jeff and Mary.  We also don't want any nosy information technology staffer to be able to open the files either while on the site.

We also thought about making it pretty.  Since people have varying definitions of what pretty is, we made it "themeable".  Those that like a blue background can choose to have one, those browsing or downloading with Mobile devices get a reduced image or bandwidth-friendly theme, etc.

Our Implementation

We started with a database that keeps track of the upload bins and files.  Our original idea of storing files within the database was changed to putting them in a file store, as the database would take a long time to store a BLOB (Binary, large object) of 150MB.  That meant adding a database/file consistency checker, but in the end the file storage was the best solution.

The security was our next focus.  We used a public key infrastructure asymmetric algorithm and keys to associate with each upload bin.  Jeff creates a key unknowingly when he creates the upload bin, and that key is protected by a system-created strong password, and Jeff then passes that to Mary.  When Mary enters the password, the key data can then be unlocked to provide the information needed to decode the encrypted file for download.

This security system also has some interesting applications we didn't plan, and leaves room for additional features.  For example, with only a slight change, we can add a recovery key to get into the upload bin should Jeff and Mary forget the password.  It could also be used for an administrator to ensure that company secrets weren't being published, etc.  We used the same technology as referenced in a previous article, Securing a Secret for multiple readers.  In addition, because the creator and the recipient is the only one with the key, this also means that with this type of implementation, the site could be produced into a secure public service.  There would be no way (other than attacking the algorithm or the password) that a service provider would be able to see the data.  With the installation of an SSL certificate, even the security of the data during transport is more-or-less guaranteed.

In terms of user featires, and based on past experience, we knew that Jeff wouldn't have a lot of extra time to maintain the files in the system.  We planned for him to post the files, and then walk away and have the system maintain itself.  Each file was given an expiry date, and an application that would check and remove expired files automatically.

We added trivial logging, where each file download was tracked - and some spam protection to ensure that someone couldn't attempt a dictionary attack on the login.  On a certain number of attempts, it locks out that IP address for 5 minutes and makes the upload bin unavailable to them.  We could have done this a thousand different ways, too - as obviously this may lock out a company when a particular user was misbehaving.  For some ISP's, it may block out a whole community or City.  Regardless, we have other options here to counteract those type of attacks.

The decoding of the file appeared to be instant relative to the time to actually download the data.  Because we buffer the file, I suspect the loss to decoding is absorbed within the bandwidth completely.  In our tests, we couldn't decipher the difference between decoding encrypted files and just transferring non-encoded files.  The processor on the server was utilized more during decoding though - and so there was a limit to how many downloads could be performed at any time.

Future improvements

One of the issues we have with HTTP transfers was timeouts.  Large files (>250MB) would take a long time to push to the server and would break if network conditions weren't just right.  We can improve that using BITS, or by putting a client control that would chunk-transfer the file in pieces, having the server re-assemble them.  Downloading seemed to be the opposite and was handled well with the browsers that we tested, so the only change would be to use a technology like BITS to have files slowly and reliably downloaded instead of utilizing all the bandwidth we can with the transfer.

The skin is also set via a configuration file at install time.  It may be benefitial if providing this as a public service to have a theme selectable at upload bin creation time.  (Perhaps even allowing customers to upload their own customization file?)

Conclusion

Companies or individuals that end up needing to receive or send large amounts of data can be better served with a tool or tools that focus on the needs.  We see a benefit for this particular technology in operations like Print or Copy centers, engineering firms, software companies, design companies, even business centers in Hotels or airports. 

Having quick and reliable access to data you know is secured makes your data all that more profitable, too.

I will email you those files... Part 1

Posted by Darin Rousseau | Filed under , ,

There are a lot of technologies that we use in the information technology world to get our jobs done.  Most of the technologies are very solid and are as old as the internet, and... Did I mention they are old and robust and reliable and... Not being used? 

In today's world of non-technical management often doing the technical decision making for their companies, we also find that many times, these technologies go unnoticed in favour of something else that "will get the job done for now because that's what I know."  Let's look at one of them : sending data to customers via email.

Most people think email was designed for attachments.  Sure, there is a button that allows that in my email client, but there are far better means to perform that specific operation than within email.  Email (specifically SMTP, the transfer protocol that is email) was never designed to be a file-transfer protocol - no matter what those people tell you.  It is a Simple-Message-Transfer Protocol, in fact.  The technology of email is advanced enough that it can, and did adopt attachments very early on in its life -  but that doesn't mean it is suitable for transferring large amounts of data.  And, as data sizes grow - there are more problems doing it that way. 

An Example 

Take a 15MB PowerPoint sales presentation that Jeff will send to Mary.  Jeff and Mary only know email addresses, and Jeff happily attaches it to the email, and... *poof* it just works, right?  Well, Jeff forgot something...  This particular "file transfer system" (if we are going to call it that) has some important limitations built in that lots of people don't know about.  Jeff may only be able to send 5MB at a time, thanks to his service provider or company.  Mary may be allowed to receive 15MB, but her mailbox only has 5MB free.  Any file transfer system that has these limitations really limits its use - especially when most of the time, the limits are non-negotiable.  Jeff is no more able to convince his ISP to up their limit for a while than Mary is convincing her IT staff to open up her email limits "for now."  Especially when resources are at a premium and may not always be available for them.  (Yes, even big ISP servers have storage limits!)

Now, when we talk about this, we are going to follow the direction of a business-used protocol.  While software or protocols like Torrents may be suited for really large and fast transfers, our example is between one sender and maybe multiple recipients, but not enough to make a torrent really functional.  Certainly there wouldn't be enough seeders for Jeff's PowerPoint presentation to speed anything up in the process.

How about FTP? 

A File Transfer Protocol is what we want.  This protocol allows us to download files and is perfect - just look again at the name of the technology!  However, many of the non-technical people we come in contact with don't understand it, either to use it, or administer it.  When faced with the Windows FTP client DOS window (that's what they call it!) they sit and stare.  It isn't all that simple to use, that is for sure.  Then they may be faced with downloading an FTP client.  Even then - things aren't always just a click away like email.

Web server? 

A web server is just a file transfer server, so would it work?  To our knowledge, no ISP's block web traffic, other than for international filtering or proxying or something like that.  Your browser connects to my server and asks for something.  It sounds like Mary could go to Jeff's site and ask for the presentation...  The prerequisites are that both Jeff and Mary have to know how to web browse.  I would suggest that if they are working with PowerPoint, they at one time had a chance to use the internet.

Out of the box, most web server's don't have the web programming to do this, but it could be done, (and, it only took us only a weekend from concept to fully secure, functional site!).   The problem is that some web coding is often required to do this and most companies don't know how to get started.

We will look at our simple design as an example in part 2... 

 

A defragger in a day

Posted by Darin Rousseau | Filed under

As part of our general server maintenance, we deploy several tools.  One of them is a script that runs a defrag on the disks periodically, and reports back the fragmentation and other disk-related information.  Generally, this can help us troubleshoot a server outage before it happens.  This time, it failed, and in a really interesting way.

The server running the tool recently started acting extremely slow in the field, and the defrag script was reporting that it couldn't complete.  The role of the server happens to be that of a security monitor, and it creates logs periodically, encrypts and compresses them and then fires them off to us for review.  The idea sounds simple and without any long-term complication, so what happened that caused this disk fragmentation?

The problem either was related to our log, compress, send, verify receipt, and delete procedures when multiple telemetry was being collected, or some update process in combination was doing it.  Somehow it happened, and now we solve for that, first.  

This defrag screen shot - although cropped, was fragmented like this across the whole drive, beginning to end.  Believe it or not, there is 26% free space on the volume.

Now the Windows Defrag tool works well, until you get into this situation.  There isn't enough space to do a defrag in one pass, and I am about forty passes in.  Each time, the fragmentation gets smaller, but not enough to complete and have little blue blocks everywhere.  Frankly, I am wondering how many times it will take!

So, I decided to build a defragger tool myself, and of course in C#.  I had some experience with the Defrag DeviceIOControl API's in C++ from back in the day when I built device drivers and needed to control them, but could I do this project solely in C# with interop?  I took a look at Jeffrey Wall's blog and found some wrappers - tried them and found that they worked without problems in terms of executing the IO, but were not functional when they pulled the structures out from the API.  In this case, the Marshal.PtrToStructure was incorrectly pulling out the UINT64's in an improper endianness, and so I had to fix that, wrap the API and functionality in proper classes and speed the whole thing up by allocating things that I needed at init time instead of before every file was processed.

The way I implemented the tool was to run in two passes.  I wasn't thinking very generic, but for this problem I decided it was best solved by finding the largest fragmented file and moving the fragments towards the end of the volume, filling back towards the middle.  I then shifted to the other files, working my way to the smaller ones in the same manner.  The center now had free space.  I then walk from the middle towards the front of the drive, defragging each fragmented file as I go, and then start from the middle and work to the end.

Out of the "defragger in a day" project came a couple of tools, one mimicking contig by Mark Russinovich, where I could not only specify that the file needs to be contiguous, but favour where on the disk to place it (begin, middle, or end, etc...)  The next was a graph maker that would display the free space bitmap of the drive - in a bitmap nonetheless, which helped to see that the APIs were doing things, and finally an analysis tool that describes the fragmentation, for use in telemetry in later projects.

When technology lets down consumers

Posted by Darin Rousseau | Filed under

I was working on a project related to Bluetooth and the operating system requirement was Microsoft Windows XP.  The service was to connect with the devices built-in Bluetooth Radio, find a specific headset and pair a device at the will of the user.  Think of it as speaking "Connect to my headset that I have here" and it would just work.  Simple, and quick - that was the goal.

Technical perspective 

It appeared that we could connect the headset, find the audio service and then... The code would sit, waiting for the pairing to complete, and would eventually fail.  The poorly implemented WIDCOMM tools could see it, pair with it, and Windows could play audio through it if the WIDCOMM tools did the pairing - but something was preventing our code, using the API calls that we saw WIDCOMM use, to fail during pairing.  They were doing something else that we couldn't readily discover.  (The other problem was lack of any usable error information...) 

I found that there was an update to the WIDCOMM Bluetooth stack that was installed in the system, and downloaded it, installed it and...  Now another problem - the Audio service profile for the headset was suddenly not available.  I could pair the device through code or Windows manually and send and receive text to it as a serial port, but not either play nor receive audio over the device.

I found that even if I installed the old Bluetooth stack, the audio service had somehow been removed from the system during the install.  I put back a backup copy and was left with the error issue and now a new issue.  The project was refocused to make this a 'Nice to Have', but I was still bummed.  WIDCOMM didn't use the Microsoft stack, and Microsoft didn't know how to use the WIDCOMM Audio driver component of their stack.

From a customer's perspective

This is the type of thing that drives consumers crazy, never mind the technically minded.  When we ignore the technical issues of stack versions and manufacturers, the bottom line is that the hardware says it supports Bluetooth, but the technology is deeper than just a simple "yes/no" in terms of full support.  Technical people may know there are different communications types, from network to Audio, to Serial Port, PIM transfer, etc. - but to the client, they just want to grab a laptop or desktop with Bluetooth, grab a Bluetooth headset and plug and play. 

Other things work this way, too.  CD-R at the beginning was a problem, where you could only play your CD-R's on some CD players.  DVD is now that same way too.  Early DVD drives may not play DVD+R disks, etc.  I am also finding that differing Windows Mobile 5 devices having similar problems, such as a Treo trying to use ActiveSync over Bluetooth with Vista, or sending/receiving files to an HP IPAQ 6900 from the Treo.

I wonder sometimes if technology is moving to fast for the companies producing it, because it happens all over and continues to happen.  One day, will I get the chance to blog about the problem being fixed?  I won't hold my breath.

Coding challenge

Posted by Darin Rousseau | Filed under ,

During the hiring process, Fundamental Software Solutions has always passed small tests of skill to potential developer hires.  We don't ask much - just spot some commonly made mistakes, or identify what will most likely happen with some sample code.  Now, we haven't been been the only ones doing this - and for good reason.  Programming skill is highly varied.  Does it work?  So far.

Now for some fun.  We have stumbled across the results of one particular interviewee of a pretty major company looking for C++ programmers.  The interviewee was asked to create a small, efficient program that would take the input from a user and make it lowercase, without using any pre-made library functions to do the 'lowercasing.'  The text will be plain, old ascii english characters, so don't worry about UNICODE or anything like that.

Here is a modified result of what was written.  (We modified it to protect the job seeker from being googled, but the code is essentially the same.)

  1. char * MakeTextLower(char *ptr) 
  2. {   
  3.    while (*ptr != '\0'
  4.    {   
  5.       if (*ptr > 'A' && *ptr < 'Z'
  6.       {   
  7.          *ptr -= 'A' - 'a';   
  8.       }   
  9.       *ptr++;   
  10.    }   
  11.    return ptr;   
  12. }  

 There are at least two problems with this code that make it non-functional.  Can you spot them? 

 [Edit: For extra credit, the function is also going to return the wrong thing.  Can you tell us what will be returned?]