Monday, February 15, 2016

Types of Clouds

"The Cloud" is all the rage these days in the world of information technology. Whether it's a startup subscribing to email services in the cloud, or a global enterprise contemplating a move to put all its infrastructure in the cloud, "cloud" is a thing. But, what kind of cloud are you talking about?

Technical types may already understand the following, but the distinction may be lost on the managerial and executive class. In my current line of work I run into misunderstandings about what cloud computing means, and when working with clients I start with this simple distinction.

Are you moving to cloud-based services, or cloud-hosted infrastructure?

Cloud services include offerings like Microsoft's Office 365. You get email, Sharepoint, and Lync, with no servers to manage. Your IT team can manage the settings of the services themselves, but they do not have access to the servers that host them.

The consequences of a move to cloud services are that you no longer have to maintain these servers (email, Sharepoint, and Lync), but there are certain things that your IT team cannot do, which they might have if they were still using actual servers. 

Alternately, cloud infrastructure means you are still going to build servers, but they will be hosted in the cloud. Microsoft's Azure offering, as well as Amazon Web Services, both offer this. Your servers are virtualized and hosted in vendor data centers, and your team can build and manage services on these servers just as they did on-premise. From an operations perspective, very little changes other than the location of these servers.

Put another way, cloud services is taking your laundry to have someone else clean it, and cloud infrastructure is going to the laundromat and using the machines there. Not a perfect analogy, but it'll do.

So which approach is right for you?

As is usually the case, it depends. What are your goals with going to the cloud? What services do you need?

In both cases, you're no longer on the hook for buying expensive hardware that will last three to six years, monitoring that hardware to make sure it doesn't go down, and managing the life cycle of that hardware. This is all hosted in the cloud now. This can save capital costs, this can save rents on floor space, electric, and temperature controls, and this can reduce labor costs if you don't need server engineers for other reasons.

In the case of cloud services, you're also no longer managing updates to the operating system and other software, or ensuring security updates are installed, because these things are all included in the service. Your vendor says, "I will give you email", for example. That's it. You get email, with no concern about what boxes are used to provide it. You do still need engineers to manage those services - someone to create new mailboxes, create mailing lists, and so on, in this example - but they're managing the house, not the foundation, so to speak.

In the case of cloud infrastructure, you still need engineers to build and maintain these servers, as well as understand the cloud service itself. This is more common in businesses where significant infrastructure exists to support platforms that are not commonly offered as services. For example, a company whose primary product is an online classroom-management service, or a database used to manage mining operations across the world. 

What about privacy? With the growth of the cloud, as well as the globalization of business and commerce, privacy protection has become a concern - not just for citizens but for companies as well. Where is my data, is the common refrain. I'll write about that in more detail later. For now I'll just say, it's a valid concern and should be brought up in discussions with cloud vendors.

Hopefully this simple distinction will serve to clarify discussions with vendors, and also between engineering leads and business managers. Moving to the cloud offers significant savings in operations and capital. Knowing what kind of cloud you need, and which you're moving to, helps to understand those savings in the first place.

Sunday, February 7, 2016

Dark Data

One of the challenges I've come across with several customers lately is that of dark data, that is, data in the environment about which little is known. This can be content with no owner, content that is no longer required, content with an unknown purpose. It's kind of like the big dark attic in a house you've lived in for many years: boxes and boxes of stuff that may or may not be important one day.

This problem usually presents itself in one of two ways: either a company is going to migrate content from one environment to another and realizes there is a lot of "cruft" to move, or a company is seeking to make an environment more efficient by reducing, by various means, the amount of content in the environment. Dark data can be files on file systems, in collaboration platforms such as Sharepoint, or on endpoints such as PCs or mobile devices.

Dark data consumes storage and processing resources that can inflate IT costs.

There are basically two ways to address dark data: reactively and proactively. A reactive approach fixes the problem, while a proactive one prevents it.

Reactively, IT can use a discovery tool to find data of questionable utility. When was the last time it was accessed? When was it created? Is there a valid owner of this content? What kind of content is this - tax documents, or research, or is it just memos about where to order pens and paper from? The IT team can communicate this information to business managers to say, "hey, here's how we're going to find stuff that isn't needed, and then take it offline."

If there is no discovery tool, then IT has to rely on the information available about the content. Even a simple file system has file creation and last modification metadata. In some cases IT may try to figure out who it belongs to based on file name or nearby content, but without a proper discovery tool, IT can only take stabs in the dark.

The proactive approach is to adopt a content life cycle policy, then communicate it and enforce it. Content has to have a valid owner, and there must be a process for electing one in the case one isn't known. Content is not expected to be immortal: project files can be archived and taken offline once a project is complete, forms go out of date and are replaced with new forms, and employees (and their content) come and go.

Certain content types have specific requirements. Financial and legal documents may be required to be kept for years at a time - and once no longer required there may be a requirement to destroy them. Often, a business unit may be unsure of how long it needs content - no one knows what it might be used for, but no one wants to be responsible for throwing it away. This is how attics gets stuffed with three different Christmas trees - it's the role of IT to offer a method for keeping things that are important without consuming unnecessary resources.

Dark data is a problem everyone has but few address. Cleaning it up, followed by measures to prevent its recurrence, are low-cost "easy wins" requiring just a little planning.