Hybrid IT: Making datacentre and cloud work better together in the enterprise


Enterprise datacentre infrastructure has not changed drastically in the past decade or two, but the way it is used has. Cloud services have changed expectations for how easy it should be to provision and manage resources, and also that organisations need only pay for the resources they are using.

With the right tools, enterprise datacentres could become leaner and more fluid in future, as organisations balance their use of internal infrastructure against cloud resources to gain the optimal balance. To some extent, this is already happening, as previously documented by Computer Weekly.

Adoption of cloud computing has, of course, been growing for at least a decade. According to figures from IDC, worldwide spending on compute and storage for cloud infrastructure increased by 12.5% year-on-year for the first quarter of 2021 to $15.1bn. Investments in non-cloud infrastructure increased by 6.3% in the same period, to $13.5bn.

Although the first figure is spending by cloud providers on their own infrastructure, this is driven by demand for cloud services from enterprise customers. Looking ahead, IDC said it expects spending on compute and storage cloud infrastructure to reach $112.9bn in 2025, accounting for 66% of the total, while spending on non-cloud infrastructure is expected to be $57.9bn.

This shows that demand for cloud is outpacing that for non-cloud infrastructure, but few experts now believe that cloud will entirely replace on-premise infrastructure.  Instead, organisations are increasingly likely to keep a core set of mission-critical services operating on infrastructure that they control, with cloud used for less sensitive workloads or where extra resources are required.

More flexible IT and management tools are also making it possible for enterprises to treat cloud resources and on-premise IT as interchangeable, to a certain degree.

Modern IT is much more flexible

“On-site IT has evolved just as quickly as cloud services have evolved,” says Tony Lock, distinguished analyst at Freeform Dynamics. In the past, it was pretty static, with infrastructure dedicated to specific applications, he adds. “That’s changed enormously in the last 10 years, so it’s now much easier to expand many IT platforms than it was in the past.

“You don’t have to take them down for a weekend to physically install new hardware – it can be that you simply roll in new hardware to your datacentre, plug it, and it will work.”

Other things that have changed inside the datacentre are the way that users can move applications between different physical servers with virtualisation, so there is much more application portability. And, to a degree, software-defined networking makes that much more feasible than it was even five or 10 years ago, says Lock.

The rapid evolution of automation tools that can handle both on-site and cloud resources also means that the ability to treat both as a single resource pool has become more of a reality.

In June, HashiCorp announced that its Terraform tool for managing infrastructure had reached version 1.0, which means the product’s technical architecture is mature and stable enough for production use – although the platform has already been used operationally for some time by many customers.

Terraform is an infrastructure-as-code tool that allows users to build infrastructure using declarative configuration files that describe what the infrastructure should look like. These are effectively blueprints that allow the infrastructure for a specific application or service to be provisioned by Terraform reliably, again and again.

It can also automate complex changes to the infrastructure with minimal human interaction, requiring only an update to the configuration files. The key is that Terraform is capable of managing not just an internal infrastructure, but also resources across multiple cloud providers, including Amazon Web Services (AWS), Azure and Google Cloud Platform.

And because Terraform configurations are cloud-agnostic, they can define the same application environment on any cloud, making it easier to move or replicate an application if required.

“Infrastructure as code is a nice idea,” says Lock. “But again, that’s something that’s maturing, but it’s maturing from a much more juvenile state. But it’s linked into this whole question of automation, and IT is automating more and more, so IT professionals can really focus on the more important and potentially higher-value business elements, rather than some of the more mundane, routine, repetitive stuff that your software can do just as well for you.”

Storage goes cloud-native

Enterprise storage is also becoming much more flexible, at least in the case of software-defined storage systems that are designed to operate on clusters of standard servers rather than on proprietary hardware. In the past, applications were often tied to fixed storage area networks. Software-defined storage has the advantage of being able to scale out more efficiently, typically by simply adding more nodes to the storage cluster.

Because it is software-defined, this type of storage system is also easier to provision and manage through application programming interfaces (APIs), or by an infrastructure-as-code tool such as Terraform.

One example of how sophisticated and flexible software-defined storage has become is WekaIO and its Limitless Data Platform, deployed in many high-performance computing (HPC) projects. The WekaIO platform presents a unified namespace to applications, and can be deployed on dedicated storage servers or in the cloud.

This allows for bursting to the cloud, as organisations can simply push data from their on-premise cluster to the public cloud and provision a Weka cluster there. Any file-based application can be run in the cloud without modification, according to WekaIO.

One notable feature of the WekaIO system is that it allows for a snapshot to be taken of the entire environment – including all the data and metadata associated with the file system – which can then be pushed to an object store, including Amazon’s S3 cloud storage.

This makes it possible for an organisation to build and use a storage system for a particular project, than snapshot it and park that snapshot in the cloud once the project is complete, freeing up the infrastructure hosting the file system for something else. If the project needs to be restarted, the snapshot can be retrieved and the file system recreated exactly as it was, says WekaIO.

But one fly in the ointment with this scenario is the potential cost – not of storing the data in the cloud, but of accessing it if you need it again. This is because of so-called egress fees charged by major cloud providers such as AWS.

“Some of the cloud platforms look extremely cheap just in terms of their pure storage costs,” says Lock. “But many of them actually have quite high egress charges. If you want to get that data out to look at it and work on it, it costs you an awful lot of money. It doesn’t cost you much to keep it there, but if you want to look at it and use it, then that gets really expensive very quickly.

“There are some people that will offer you an active archive where there aren’t any egress charges, but you pay more for it operationally.”

One cloud storage provider that has bucked convention in this way is Wasabi Technologies, which offers customers different ways of paying for storage, including a flat monthly fee per terabyte.

Managing it all

With IT infrastructure becoming more fluid and more flexible and adaptable, organisations may find they no longer need to keep expanding their datacentre capacity as they would have done in the past. With the right management and automation tools, enterprises should be able to manage their infrastructure more dynamically and efficiently, repurposing their on-premise IT for the next challenge in hand and using cloud services to extend those resources where necessary.

One area that may have to improve to make this practical is the ability to identify where the problem lies if a failure occurs or an application is operating slowly, which can be difficult in a complex distributed system. This is already a known issue for organisations adopting a microservices architecture. New techniques involving machine learning may help here, says Lock.

“Monitoring has become much better, but then the question becomes: how do you actually see what’s important in the telemetry?” he says. “And that’s something that machine learning is beginning to apply more and more to. It’s one of the holy grails of IT, root cause analysis, and machine learning makes that much simpler to do.”

Another potential issue with this scenario concerns data governance, as in how to ensure that as workloads are moved from place to place, the security and data governance policies associated with the data also travel along with it and continue to be applied.

“If you potentially can move all of this stuff around, how do you keep good data governance on it, so that you’re only running the right things in the right place with the right security?” says Lock.

Fortunately, some tools already exist to address this issue, such as the open source Apache Atlas project, described as a one-stop solution for data governance and metadata management. Atlas was developed for use with Hadoop-based data ecosystems, but can be integrated into other environments.

For enterprises, it looks like the long-promised dream of being able to mix and match their own IT with cloud resources and be able to dial things in and out as they please, may be moving closer.


Leave a Reply

Your email address will not be published. Required fields are marked *