Development is Never Done

One of the many challenges in doing development is trying to anticipate all the ways your platform will be used. Given both Steve and myself are analysts already, we have a solid idea of how we would use the system, but that doesn't always represent the status quo. As our user base has increased since the release of enterprise services, so have our usage patterns and thus, we've needed to adapt. This post covers some of the ideas we tried, some of the observations we had and what we changed to create a better platform. We view our platform as a constantly evolving system based on user feedback in the form of data and opinion. If you have ideas or comments on our work, send us a message at feedback[@]passivetotal.org.

Please Wait for Administrator Approval

You have an incident going on or just heard about PassiveTotal and want to check it out. You browse to the registration page, fill out the form and click the button only to see the message that your account requires registration approval. When we first released PassiveTotal, it was geared to a small set of researchers with the intention to grow slowly and so Steve and I would hand approve all registrations. When looking at the numbers, it was clear this process wasn't working well for us. Users seemed to be excited about using the platform or had a query in mind, but the delay between registering for an account and running a search was sometimes too much.

Knowing we were the problem made the solution fairly simple, remove our approval process and that's exactly what we did. The flow was changed so that users registering for the site would get a standard verification email sent to them with a link to validate their account. We went a step further and made the link in the email automatically activate the user session, so instead of logging in again, the user would be dropped right into their account to perform a search. Since pushing this change, we've noticed that registrations have increased by a factor of three and that engagement on first sign-up is now hovering comfortably at 79%.

if (typeof(WebSocket) != "function") { return false; }

Websockets, one of the greatest features to creep into most modern browsers. Being able to let the client establish a real-time connection to the server in order to exchange and act upon data is a powerful capability and one we really wanted to take advantage of. When running a search in our platform, we broker the connections between the remote entities and our own data in order to build a complete picture to return back to the user. Depending on data sizes, this can take a while which leaves the user staring at a bunch of spinning icons despite some of the data being ready to load. Using websockets, we made it so that data ready to display was passed back and rendered inside the user's browser. The result, a more gradual load and a better experience.

The headline of this section alludes to the major problem we faced when introducing websockets. In order to operate, websockets need their own channel to exchange data, which means a potentially new host and a new port. Due to costs, it didn't make sense to use an entire new machine just to process the websocket data, so we re-used our primary web host and simply set the port to use 8080 since 80 and 443 were already taken. We found that this solution worked well for most home users, but when it came to the enterprise, many companies would block the connections either because of port reasons or due to non-standard HTTP traffic flowing through the connection. Simply testing the connection before use allowed us to get around the issue, but it meant we would need to support server-side loading (the old way) as well as websockets (the new way) at the same time for different users.

One of the last unintended consequences of websockets was the lack of support when using content delivery networks. We wanted to take advantage of caching our data and improving the performance on our site, but because websockets weren't supported, we weren't able to get the full benefits of being behind a CDN. In the end, we decided that websockets were not worth the trouble and swapped our loading to a traditional AJAX process which works just as well, albeit, not as gradual.

Load All the Things

Knowing how much to preload on behalf of the user is always a balance. Do you load everything in order to provide the complete snapshot of data? Do you load only what you think the user will care about first and load the rest later? What if the user changes the data as it's still being loaded? Loading processes are based heavily on the patterns of the user. In the history of PassiveTotal, we have now changed our loading process at least 100 times. Small tweaks on the processing, adjustments to the user interface or maybe just replacing the AJAX loading icon we used. For the longest time, we opted to process all data at once, so the user would understand the entire scope of what they queried. The problem? Processing all the data often meant longer wait times and more often than not, the user would pivot before really looking at the detailed information. Additionally, as our user base expanded, we found that queries for popular websites like Google or Facebook were causing our system to hang forever since it was just too much to handle all at once.

Having the understanding that not all data was needed on first load, combine with the failure of websockets, led us to our current solution of bare-minimum processing. When a query is done, we drop the user into a shell page that kicks off the AJAX process to begin the process. On the backend, we execute against all remote sources, deconflict all the records and then identify which records will be shown on the first page of results. Those records are ran through an enrichment process and then returned back to the user. Pagination is handled in a similar fashion, so that as a page is requested, it's data is pulled from a short-term cache and is then ran through the enrichment process before being returned. The bi-product of enriching-on-request is being able to account for data changing even though it’s not fully loaded. Overall, processing only what's needed means faster load times, but also puts less load on our system.