Interesting Statistics -- Recently I was reviewing our logs and statistics to better optimize our hardware, and a few things stood out for me. First off our web servers are currently averaging almost 5 million hits per month. This has resulted in an average bandwidth of just over 8mbps. A few more interesting statistics are that we have a total of 110,000 arrests and releases for the calendar year to date. We have taken 245,000 mugshots this year. Our down time for BOSS has been a total of 16.23 hours year to date with 11.4 hours of that during one outage last week. FM on the other hand has a total of 14.18 hours of outages total for the year. That number would have been higher, but FM was fortunate enough not to experience the 11.4 hour outage that BOSS experiences last week. These outages do not include the scheduled outages. Our goals from this point forward are to reduce the outages to as close to zero as possible.
Current Events -- Over the past couple of weeks we have been experiencing several problems with our servers. I am sure some of you are wondering what is going on, we have been rock solid since March, and then all of a sudden things start having issues.
I realize that these problems have been a nuisance, and have been frustrating, but let me assure you that things are going to be much better once we are finished with all these changes. Let me outline some of the things that are happening, and how they have affected our service, and just why things will be better.
Two Service Outages -- There were two outages in Late September and early October, and they happened withing 10 days of each other. These were the catalyst for the events that have followed. The data center that I have been using for the past several years in Euless was owned by CI-Host. I have never really been happy with CI-Host, but their rates were by far the lowest in the Dallas area, and their bandwidth was decent. Their level of professional service to their customers was severely lacking, and their Data Center was plagued by power outages that caused short outages off and on.
Well in September CI-Host announced that they were moving all of the servers in the Euless Data center to a new Data center in Dallas. I took this opportunity to get quotes from other co-location companies in the area with the intention of moving my servers out of CI-Host and into a new company.
Thursday October 22 -- I had a single server hosted in CI-Hosts family colo which supplies all of the public Web Sites used by our software. This server also provides a sort of traffic cop for our BOSS Software telling it where each sites Databases reside. Well CI-Host decided to move this server without notifying me that they were going to move it. The move should have taken about 2-3 hours. Well due to poor planning and documentation on their part this server was offline for almost 12 hours. Unfortunately it also too BOSS offline for the full 12 hours too.
I was livid when I was finally allowed into the new data center and found that they had simply plugged the network cable into the wrong port on the computer and this is why we were down. While this was frustrating, it did give me the opportunity to see what the new building looked like, and see how things were built. All personal issues aside the new building looks to be very well done, and I was given a level of confidence that this new building would be better than the old by far.
Friday October 23 -- After spending some time calming down, I got thinking about how to best handle the overall situation, and decided that it is best to stick with the Devil you know than the one you don't. I decided to hedge my bets on this though and take precautions to cover myself.
I went ahead and signed a contract for a new Data center which will be setup and ready for use in November. In the mean time I still had to move my existing rack of servers that CI-Host was hosting, so I began preparations for this move next.
Saturday October 24 -- While at the old Data center tagging all of my servers and removing some un-needed servers I accidental unplugged the FM 4.5 Web Server which caused it to re-boot when I plugged it back in. This brought a very important discovery to light. This server did not fully reboot, it turned out that some of the Windows files were corrupted and the machine would not reboot. So I temporarily moved the customers over to the backup web server and got them back online, and took the computer to Keith to repair.
How was this a good thing? Well by finding it before the move it was actually able to be rebuilt and was able to be deployed during the upcoming move.
Sunday October 25 (1pm) -- At this time we shutdown all of our servers and packed them up and loaded them into my Jeep and headed to the new Data center.
(1:45) -- We sat all of the servers in the new rack and began connecting them and bringing them all online. At this time we also brought our two new virtual servers online as well.
(3:00) -- We finally had off of BOSS online, we had all of FM 4.5 online, and we had all of FM 5.4 online with the exception of the Mugshot server which had another bad hard drive.
(4:30) -- At this point we had the full system online including the FM 5.4 Mugshot server.
We had been running both the FM 4.5 Report Server and the FM 5.4 Report Server on a temporary Virtual Server as a testing platform at the old Data Center. This experiment had been very successful, and is why we decided to deploy two dedicated new virtual servers to the new data center at this time.
We moved the two Report Servers to the new virtual servers. These two servers were VERY old, and were on their last leg. In fact one of them had already failed twice before and we had band aided it until we were able to get the Virtual Server setup and functioning.
I also started the process of virtualizing the FM 5.4 Security Server which was running on the oldest server in the rack. This completed and I tested it remotely to make sure that everything was going to run fine.
Monday October 25 -- WOW what a day, it was very busy with lots of small issues that Julie and I were able to work through pretty easily, but there were sure a lot of them. I was able to order all of the replacement hardware for the servers that we found over the weekend.
Tuesday October 26 -- I went to the data center and replaced all of the hardware that needed replaced, and Murphy popped up again, one of the hard drives that I ordered was bad, or I have another problem that I haven't found yet.
I also took the time to bring the FM 5.4 Security Server up on our Virtual Server which allowed me to feel much more secure in this machine.
Wednesday October 27 -- Until this day I really did not believe that it was possible for me to go through so many messed up things in my life. We went to add a new user into FM 5.4 only to find that there was an error and this error was with the server it's self. I initially felt that this was caused by the Virtual Server so I made plans to take the old server back to the Data Center and take the Virtual Server version offline.
Thursday October 28 -- This morning I woke up at 2:15 from a dream about this Security Server. In the dream I took the old server back to the Data center and it did not work either. This prompted me to pull the computer out of the car and set it up and test, and sure enough it wasn't going to work.
I then started researching the problem, and narrowed down the problem. This problem turned out to be due to how the server was built in the first place back in March. When we took over FM from Securus we had to basically make a copy of the Security Server that they had because they were using theirs for the entire corporate network. When we brought it online in our data center on March 17th, we did not properly set it up, and finally it timed out and locked down to not allow ads to it.
After fixing this problem the system started working properly under the Virtual Server, so now I am confident that it will run perfectly from now on. With it on the Virtual Server we also have the ability to back it up correctly so that we won't lose information in the future.
How is all this positive? I realize that so far many of these things sound bad, but simply put, this move was able to allow us to see many of our week points in the setup and correct them, as well as allow us to give our servers a good solid review and see what was finally failing. I like to look for the positive things in every bad thing that happens.
I'm not really upset that there were issues during this move, the bad hardware was expected. The down side to running your equipment in a data center is that it frequently goes months without someone personally looking over the equipment for failures. It's pretty common to show up at a data center and fine a hard drive has failed or something is flagging an alert for you to look at. That's why everything is mirrored and redundant.
For me this was the first time since March that I have been able to do a solid review of the equipment and see what needed replaced and what was still in good shape. I've now identified a number of things that I will be upgrading or replacing, and the parts will all be ordered within the week.
By using Virtual Servers I was able to reduce our rack of computers from 12 physical machines to 7 machines without harming performance at all. This also has the positive affect of reducing our power requirements and being a little better for our environment. And probably the biggest positive thing about this is that we are able to now maintain a redundant backup of all of the virtual sessions allowing us to literally restore a crashed server in a matter of minutes by simply shutting down the bad session on one server and loading the backup on a second server. This is as simple as shutting down Microsoft Word and Loading Microsoft Excel.
The new Data center -- The new Data center will be entirely run with Virtual Servers which will allow me to do more for less. These servers will be used to bring on a number of new functions including our "Cloud Computing"
Currently BOSS resides in two physically separate Data centers in the Dallas area, when we bring the new Data center online in November BOSS will reside in three physically separate locations. This will really not be all that positive for BOSS, but this process will allow me to now spread FM across two separate data centers with it's actual data residing in all three centers. This will give FM a level of redundancy that it has never had before, not with me or with Securus.
The virtual servers that I have been talking about will also be a positive thing for FM. By taking physical machines and turning them into virtual machines I am able to take the low volume and low usage servers such as the Mugshot and Security servers and re-deploy them into more intensive roles, and run the lower usage roles and combine them onto single machines to better use the resources of the processor and hard drives. The virtual servers can also work as redundant roles for the more intensive servers should something happen that takes them off line. While the intensive servers won't function as well on virtual servers as they do on their default hardware, they will still allow us to continue to run while we repair any failed system, and save us from keeping a completely built copy of the hardware on stand by.
New items in our future -- We have a ton of new plans on the burners for the near and far future as well. We will be incorporating our SMS messaging into the BOSS and FM products allowing you to setup alert conditions for events that will send out email, or sms messages to you when they happen in the software. We are also incorporating scheduled tasks into both systems as well. This will allow you to schedule repetitive tasks such as running emailing or faxing daily reports to other agencies etc.