Hardware Tradeoff - CustodesTechnologia/System GitHub Wiki

Bolter and Chainsword

Current setup

Current Hardware is Intel Xeon E3-1270 Average CPU marks are: 5395 Link to CPU Marks

Variables

Hosting the site can be done really three different ways.

  1. Co-location. This means the organization buys a machine. The organization owns the machine. The machine is fitted into a datacenter. The datacenter provides power and network. Billing involves renting of the space, power and network use. In-person access is granted by the landlord.

  2. Rent dedicated hardware. This involves selecting a hardware profile and renting the hardware from a vendor who already has a network datacenter. The cost of the rental includes the power, network use and the machine.

  3. Cloud based. This is similar to the rent-dedicated hardware except that the load exhibited by the machine is a factor in the cost. The higher the load on the machine, the higher the cost. The cost is also affected by network use. The higher the network use, the higher the cost.

Pro/Con

Type Pro Con
Co-locate Easier to tailor HW to specific needs Very expensive up-front cost
Rent Options for HW, low cost, bandwidth included 1-2 yr contract to get best price
Cloud Billing a factor of use, Flexible configuration Difficult to customize, entry-system expensive

These days, no one in their right mind would go to the trouble of co-location. The hardware is obsolete by the time it's deployed and the cost to make that hardware is prohibitive compared to like-wise capable rent/cloud systems. It's next to impossible to build a machine more powerful than the same HW profile rented.

The choice is really between renting a dedicated server and using a service within the cloud.

There is a lot of allure about the cloud based solutions, but there are some drawbacks to be aware of.

Cloud based computing is not a sliding scale for billing. Each type of cloud based computing system come in discrete steps in size and performance. Best analogy is differentiating between a row-boat, yacht, and ferry.

  1. Each is a form of transportation.
  2. Each has a maximum load capacity
  3. Each can be partially used.

A row-boat type cloud system is cheap. But there's not a lot of capacity. I'm not billed a lot because the system cannot load much to begin with. Also, typically, the amount of network bandwidth assigned or available to the "low cost" solution is limited in accordance with the size of the computing system. It is counterintuitive to believe that you can use excessive bandwidth and process that data on limited hardware. It makes no sense.

That's why there are discrete levels of computing performance. The next larger "size" (yacht, in this analogy) is more capable. Has much more capacity and can carry a good load, but when its not in use as much, the space/power goes to waste. However, you're still billed at the "yacht" rate. Even if the use goes down to "row-boat" level, the baseline cost is at the "yacht" level. Why bother with the row-boat then? What developers would do is characterize the computing load they have on average. The nominal moving average (pick a metric) of the load and then put the median capability of the hardware on that figure. They account for the excess of capacity the system offers measured against the likelihood of the nominal load reaching that limit. But, regardless of how they calculate their load and capacity metrics, they are still billed at the "yacht" rate no matter how low the use is.

Last we come to the "ferry" analogy. A huge capable vessel that can carry all the load you can imagine. Like the "yacht" though you're paying for the capacity as a premium even though you may not use all of it.

The costs for cloud based operations will exceed the costs of dedicated hardware because to get the baseline nominal capability of even modest rented hardware, the cloud equivalent is more expensive.

So, what are people using the cloud for? Why the trouble? Why pay the expense? For some applications the distributed nature of cloud services makes it attractive since the service is available across the net and through replication the service allows for continuation of operation even under adverse network conditions. But some services aren't optimized for that. Our system for example is strongly tied to the operation of a database. The database itself is not designed or optimized to be distributed. That is a characteristic we cannot rely on for our database. Thus it does not really add benefit to distribute the 'web function' of the service across a cloud since all action must be resolved at the single database (one that is not optimized or designed for distributed functionality).

So, for us, in our case, the cloud is the least favored approach. It's expensive and it's overkill for the type of service we want to support. If the database was designed to be flexible and adapt to the distributed nature of the cloud and if the cost to distribute the web-based service (think size of boat) was more affordable, it would make sense. But at our niche level, it doesn't.

That leaves rented HW. Unlike the other two options, the real benefit of the rented hardware solution lays in the bandwidth afforded to the system. Dedicated hardware rental data centers are on very good backbones, usually. So, choosing first a partner that has a 100Mbit/s backbone with peer networks is not difficult to find. This "fat pipe first" approach means that with dedicated rental, the capability of the network is already at the "ferry boat" level of network bandwidth -- without the cost of the ferry. What you can do with rental is have the right-sized system with the ferry-sized network. That's a big plus.

The bottom line is what kind of platform gives the best bang for the buck. Choosing a cloud based solution will render any and all activity on the server to maintain the site as a cost. Making backups, moving data, experimentation -- do not come for free. Every cycle costs money. For a heavily moderated system such as this, those costs will add up. Having a dedicated "ferry-boat" sized capability with a "row-boat" cost gives the site a benefit of not being billed for administrative and routine actions on the site.

Last, the sheer amount of customization and setup that makes a site simple to maintain is easier to do when the box is either co-located or rented. Co-location was never a real possibility as explained . Cloud options also limit the degree to which customization can be made. It really comes down to choosing a dedicated server rental that has the right amount of processing power and gaining the benefit of the fat-pipe network that comes with it.

Options

  1. https://us.ovhcloud.com/bare-metal/advance/adv-1/

  2. TBD. I will need to keep digging around looking for comparable options, but the "advance-1" scale and baseline CPU performance is what should be considered. The benchmark is around 15000 compared to the 5500-6000 benchmark of what the site currently runs. I'd prefer to use Intel over AMD processing parts. They are machine-instruction compatible (IA64) but their silicon design principles are different. I've seen (anecdotally) that the Intel based silicon is a better performance decision over AMD.

  3. TBD. I also would advise against SSD. The HDD today are fast and reliable and cheap enough for what we want to do. Our site is not benefiting from SSD, it benefits from large space available to store media. Having the speed improvement of SSD is offset by the cost of having equal sized storage in SSD compared to HDD.

  4. TBD. Last, (again) the situation of the datacenter in terms of the backbone connected is a very important feature of the selection. Geographically we want a data center that is on the well-routed path. East coast is generally good, North-midwest can be good and has good route to EU (as does East coast). West coast is obviously well routed but the span across the NA continent does add a tiny bit of latency that would not be there if it was hosted in the North or East. I'm ruling out hosting in EU or Asia.

Homework

The homework assignment is to find vendors of rented silicon that address these coarse grained objectives for the hardware platform of the site.

I have about 6-7 years of background with OVH. There are others out there that are comparable. Let's find them.

Addendum

I did have a conversation with the AWS sales engineer about this approach of using the Cloud and the expressed message I got from them was the billing of use is not a sliding scale. One has to choose the initial "size" of the computing unit up front and the minimum billing is based on that. I still will try to track down more definite answer from AWS on this question since at first I was not keen on their answer, but perhaps there is some kind of product in their offering that actually serves us for our goals.