|
1. White Paper: How to build datacenters for cloud
1.1. Introduction
Any organisation that is considering switching its IT model to cloud computing is mainly migrating for the following reasons:
- Simplicity: Many organisations require an automated front-end to create, manage and monitor their infrastructure. This avoids complexity such as managing drivers, creating RAID groups, and so on. Behind this single interface, the platform performs the technical aspects around the creation and management of server, storage and networking infrastructure.
- Cost optimisation: Many organisations feel that the traditional way of dealing with IT infrastructure is sub optimal. They believe that cloud computing equates to half the cost of buying, provisioning and maintaining racks of IT equipment. Techniques like virtualisation, moving workloads, dispersed storage; power optimisation and datacenter airflow need to be considered to obtain significant cost savings.
- Go green: more and more companies are enacting policies to reduce their energy usage and carbon footprint. In order to obtain significant reductions, it is important evaluate and improve the lowest layers in the datacenter ecosystem. Green is not only about saving the planet; it also achieves significant cost savings!
- Agility: Let IT follow the needs of the business and not vice versa. Techniques like auto provisioning and scalability of servers and desktops will bring this agility.
- Following the leaders: big players are adopting cloud computing to obtain advantages of cost and scale. This means they can offer IT service at a lower price. These new cloud and virtualisation techniques allow them to enter new market segments such as SME and even directly to end-users. Google, Apple, Microsoft and many more are investing a lot of money to become market leaders in delivering IT infrastructure through the cloud.
1.2. Challenge to the Datacenter Best Practices
1.2.1. Power Conversion Efficiency
Today most datacenters have an inefficient power conversion methodology.
Power is converted in multiple steps, for example:
- High Voltage is converted to 380V DC
- 380V DC is used as the input voltage for the UPS (batteries)
- 380V DC is now converted to 208V AC since servers work on AC power
- Inside the server, the power is converted again from AC to DC (between 5V and 12V)
All these steps, especially the last per server conversion results in energy loss up to 30%!
1.2.2. Power Usage Efficiency
Datacenters Power Usage Efficiency, PUE, can be much better!
Datacenters are historically built out from a real estate perspective: renting out floorspace in a secured and redundant environment including power and bandwidth for production infrastructure. Those datacenters have shortages that generate extra costs and create inefficiencies that impact uptime. This is because these datacenters use an older design where the Power Usage Effectiveness (PUE) is poor. To power 1000 watts of production servers, most datacenters require another 1500 watts to cool down the datacenter, convert the power via UPSs for a total of 2500 watts! (a PUE of 2,5)
Datacenters that retrofitted design to optimise PUE are averaging 1,8, by making improvements on the supply side of the datacenter such as airco, airflow, containment.
New datacenter challengers like Google, Yahoo, Facebook, ... significantly improved the way they are reducing costs through innovative datacenter designs and new ways of dealing with energy usage. Google and Apple are the best examples of companies with existing and growing datacenter builds that are more efficient in cost and uptime. Their investment capital has brought them to a position that makes it almost impossible for others to obtain the same cost advantages. They report PUEs of 1,21 and are aiming lower.
1.2.3. Raised floor Efficiency
In reality, some supposed best practice such as Raised floor turns out to more and more inefficient:
- Raised floors still require too much over cooling to be able to handle all the hotspots in the datacenter
- A raised floor also allows poor cable paths and design errors to be hidden away from view and increases the risk of cable damages and inefficiency.
- Raised floors also add a significant additional cost on datacenter construction and maintenance
1.2.4. Sizing issues
To meet maximum occupancy, large-scale datacenters tend to install large capacity equipment like huge generators and enormous chilling installations. This approach requires a large initial investment to meet growth but deliver low usage/filling ratios at the beginning since a datacenter fills up gradually.
1.2.5. MTBF/MTBR
MTBF/MTBR: The Mean Time Between failure versus The Mean Time Between Repairs.
Making a solution more redundant will reduce the number of failures. However reducing the failures often leads to additional complexity that in turn can lead to a higher time to repair. Consequently, the MTBR gets worse! Entire datacenters have been down for hours, sometimes days, since they could not figure out what went wrong.
1.2.6. Co-location issues
Traditional co-location is inefficient and time consuming. Everybody hosting a server is faced with the same tasks: install, maintain, spare parts, networking etc. Datacenters try to help out co-location customers by providing a “remote hands” option, but this solution has its limitations: remote hands services will not be able to execute all tasks and moreover this adds an additional cost.
Co-location also means hiring a space, and not all spaces are optimally filled, leaving a part of the datacenter unused.
Finally co-location offerings today have no optimised cooling and energy design for high-density cloud computing.
The shift to build out datacenters with standardized equipment is an ongoing challenge.
1.2.7. Monitoring Effectiveness
Monitoring on room level is ineffective. It does not provide enough per device information to allow the datacenter operator to find a route cause of power and temperature issues. Without granular information it is difficult to handle critical situations appropriately!
1.3. Start to care, take control and become green!
- Datacenters that avoid managing the underlying equipment need to get ready for a new approach! The raw layer in a datacenter needs to be under the datacenter manager control. This allows the changes needed to become a green Datacenter to deliver a PUE equal or better than the market leaders.
- Analysts generally accept that in a traditional Datacenter, the load side usages of its resources like CPU, Storage and Networking is about 10% utilisation. This is because most servers are over specified and have configuration that have occasional peaks in processing requirements. Designs should consider rebalancing servers so load is spread across the physical servers to improve usage of resources.
- Datacenters typically do not care (enough) about power over usage: power over usage provides extra revenue. Security measures to protect the available power capacity are done at the level of a fuse in the rack. A customer that experiences an overload in a rack is punished by a power cut. Moreover, A and B feed policies are hard to verify and often leads to problems when one of the feeds resulting in an excessive load that the other feed is incapable of managing. Finally, after a short general power break down, restarting servers will trigger a 30% extra power consumption, that can cause another overload with additional downtime as a consequence. History teaches us that a Datacenter that wants to improve its uptime and improve power management needs to take control of the infrastructure layer in order to achieve both.
1.4. The next generation Datacenter is a cloud Datacenter
We need to improve our datacenters, which requires a change in our behaviour:
- Start measuring the power (load side) in detail for every piece of equipment;
- We need to automatically shut down non used equipment based on power and load analyses;
- We need to put the less important equipment into a lower power state;
- We need the move virtual servers to more efficient equipment and shut down the less efficient servers at off-peak;
- We need to optimise airflow in datacenters and avoid over-cooling, redefine rack layouts to optimize air cooling and consider air corridor, free air, adiabatic cooling.
1.5. Market analyses
- When building a datacenter, a lot of components seem standardised yet still every datacenter is a new custom project.
- Companies like Yahoo and Google are sharing their general concepts, but never in structured format that other organisations can reuse with the same effectiveness.
- Datacenter equipment vendors like APC, Liebert and Rittal have good solutions but they are badly integrated and mainly work at optimising the PUE factor.
So most people building a datacenter are stuck hiring an integrator or talking to 6 different vendors hoping that a good solution emerges. This approach is fine but experience teaches us that by using cloud principles we can achieve twice the power savings and up to fives times the budgetary savings without sacrificing quality!
1.6. Some best practices used when Dacentec builds or helps to build a cloud datacenter.
- Avoid Big, bigger, biggest: when scaling datacenters, we tend to start by using more heavy duty and powerful equipment to allow for growth. For example it seems more logical to add 1 or 2 big generators instead of 10 smaller generators. Even cost wise, the bigger generator will even bring some cost benefit. Experience in managing datacenters has shown that the more redundancy you try to build into your setup, failures still occur. The primary goal is to avoid a failure that shuts down the entire datacenter. Therefore it is wiser to spread risk evenly to reduce the possibility of full-scale datacenter outage. The best way of implementing a datacenter design to avoid site wide outage is to divide the datacenter in smaller autonomous compartments.
- Spread your investments better: By using smaller autonomous compartments, the financial burden is better spread over smaller investment increments.
- Stop using raised floors: Raised floors are often noted as datacenter best practice. Based on our extensive experience, we have proven consistently that they are not needed. Raised floors can prevent an optimised airflow and can potentially lead to a messy network structure. Alternative methods can provide better cooling and better cable organisations. Avoiding a raised floor will also bring cost savings in the datacenter design.
- Avoid complexity: Datacenter design needs to consider two fundamental questions. Would I rather spend more on redundant equipment that is more complex and more subject to human error? or do I prefer a less redundant less complex alternative? In the first case, higher redundancy means it takes longer before an issue occurs, but history shows that when it occurs, the time to restore is much longer and will have a bigger impact on customer satisfaction. In the second case with less redundant equipment, it can fail more often, but with a lower cost and time-scale to achieve an immediate fix – neither customer satisfaction nor SLA’s are impacted! A typical example can be found in networking, where over complexity often leads to errors and longer down times. The simpler network might break, but when it occurs, datacenter operators don’t loose time looking for the issue and simply replace it instantly.
- Provide an alternate infrastructure management path: Passing control and diagnostic information over the production network is a common practice. However, adding a low cost Zigbee network interface on each rack as a backup network interface offers considerable advantages for reliability and disaster recovery.
- Optimize building automation: Use building automation with the ability to turn off cooling, power, lights as needed.
- Intelligent equipment speeds: The use of variable speed motors, fans and pumps, allows the system to scale up or down depending on a wider range of environmental factors will also reduce energy consumption.
- Use Air Corridors: Use air corridors instead of installing air handlers within the racks. Using this method, all racks connect to a central air supply and return system that scales cooling on the back end without going near the racks. An additional benefit is that there are no liquids near the servers.
- Optimise power conversion: Use low voltage DC power to the equipment removing power conversion energy loss
- Cloud Datacenter Model: Offer to customers raw capacity instead of colocation: raw storage, raw CPU capacity and raw bandwidth.
- Keep the focus: Focus on what is really important: a datacenter needs to be structured, neatly organised and secured, with audited procedures. A Datacenter does not need to look like a laboratory clean room and installation of equipment that serves little purpose should be avoided.
- Implement Hot Aisle containment: Today over-cooling is a fact in many datacenters. To deliver best efficiency, the return air needs to just cool enough to absorb more heat. New technologies like Hot aisle containment offers an innovative solution.
- Introducing Plate Rack Design: For the last two years, Incubaid has pioneered a plate design, which changes the way cpu and storage capacity is deployed in a datacenter. It is out of the scope of this document and is still not formally disclosed but this system saves an additional 50% on power and additional 50% on CAPEX required to build a datacenter. However, this system can be retrofitted on most existing datacenters designs.
- Introducing Loadbalancing over Power Phases: Incubaid is developing a new technology to load balance servers over power phases in an automatic way, which removes complexity and improves uptime.
1.7. Tips to optimise Datacenters TCO
- Don't build UPS rooms, there are alternatives!
- Change the way power is being distributed!
- Measure power usage at a server level, find broken power supplies, and find polluters!
- Look at free cooling; this is the only way to have a very low PUE!
- Look at the load side; most savings can be made by changing to servers that draw less power!
- Build in smaller increments, the costs of the more commodity-based equipment is often cheaper and they require lower power requirements. Using smaller increment means you do not always need to invest in higher load capable datacenter equipment!
2. Get up to speed now!
Incubaid developed the first Green Cloud Data Center Initiative (GUCI). Incubaid is an incubator that creates or helps companies to create technology for the cloud.
Incubaid is staffed by a datacenter expert team that has helped build over a dozen large-scale datacenters during the past two decades, developed cloud software companies, datacenter software, high end and mid market large scale hosting companies.
From that experience Incubaid has learned that the significant improvements offered by a cloud solution can only be achieved when the entire cloud ecosystem is considered: Datacenter, Power Management, Physical infrastructure, Management Platform, User Interfaces.
There are 2 approaches followed by companies today to build clouds.
- The best of breed approach where companies try to mix and match 20+ products to create a cloud. This is a complex and not cost effective approach. These clouds are not well integrated and are not disruptive. Many big vendors try to use their products built for the enterprise in the cloud world. This does not work well.
- The disruptive build using a ground up approach. Only a few players like Google, Amazon, and Facebook are making progress but these clouds are not generic and are used for specific use cases.
Over the years Incubaid has proven that there is a way to build a cloud that is more reliable and cost effective. To achieve this aim, Incubaid created companies that focus in one specific domain of the cloud stack. Each company challenges the best of breed philosophy by creating new technology concepts that replace existing technologies to solve many of the pain points for cloud adoption.
The Incubaid cloud concept is built from the ground up with many elements created from fresh. The team at Incubaid started working on many of these elements 15 years ago when no one was talking about clouds.
Incubaid does not keep its datacenter design technology secret like most of the big players. Instead, Incubaid helps other companies to build greener and more cost effective datacenters!
Contact Incubaid now: www.incubaid.com
|