How we moved (and fixed) a Wordpress site with 100k monthly views from BlueHost to DigitalOcean

tldr; we made a previously unusable student newpaper website usable with minimal downtime!


cheering_minions

One of the things I really like about the software world is that there are a lot of opportunities to do pro-bono work, often with some interesting technical challenges attached. This is one of those times :)

Background

Earlier this year I got an email from the Daily Free Press, Boston University's Independent Student Newspaper, requesting help with their website. Their issues were simple but severe: page load speed was really, really slow, and it appeared that once every 4-5 requests a user would get an annoying "Error establishing a database connection" page. The database connection errors made it difficult to publish articles and created a terrible user experience for the DFP's ~40k active readers. I felt that this would be good practice for my new job (insert note here) and a good learning experience for all involved, so several members of the BostonHacks tech team and I decided to take ownership of the DFP's web presence and dove in.

Forensics + Fighting BlueHost

Like any other confused sysadmin, the first thing we did was gather information by looking at the deployment configs and logs to try to diagnose what was wrong. The existing site was running on BlueHost's "Optimized Wordpress Hosting" plan (oh, the irony). We were able to get access to DFP's BlueHost account and poke around, with some difficulty - the BlueHost interface (mostly cPanel, with some extra custom configs) was very slow to load (likely because it had to query the barely-breathing server for info). After some finagling we were able to SSH into the server and poke around.

Hard Drive Investigation

The site was running on a standard linux box with 2GB of RAM and a 30GB hard drive. Using df -h, we were able to find out that an additional 30GB backup hard drive was attached, as well as two more 30GB hard drives, with various (wonky) levels of usage:

Screen-Shot-2018-01-30-at-12.40.55-AM

DFP's wordpress installation was in /home/dailyfr3/public_html, and while the drive was mostly full (25/30GB used), it wasn't close to being 100% full. The two other 30GB drives, mounted on / and /home/dailyfr3, weren't anywhere near full either. However, the /backup drive was 100% full, and upon checking the BlueHost cPanel we realized that someone had enabled four-times-daily backups on the entire /home/dailyfr3 directory, which had ~25GB of stuff in it... meaning the backup hard drive was definitely rejecting 100GB of disk writes a day. We turned off the backups and tried to delete stuff from /backup, but since BlueHost doesn't give you root access on their Optimized Wordpress plan we weren't able to delete anything from the full drive. Site performance stayed the same, though, so moved on.

CPU/Memory Investigation

Using top, we inspected the server's memory and CPU usage. Here's a screenshot:

Screen-Shot-2018-01-30-at-12.36.37-AM

Three things stand out here: the high RAM usage (1.6/1.9GB), high CPU usag by mysqld (spikes up to 95%...grabbed this screenshot at a relatively high 83%) and, most importantly, huge swap usage by mysqld - you can see in the "VIRT" column it's using 2.2GB of virtual memory (swap), which is greater than the size of server RAM. Swapping is a symptom of an OS running out of memory and deciding to use disk space as backup emergency memory. On a MYSQL DB powering Wordpress, it's especially bad, because Wordpress/PHP has no concept of DB "connection pooling" like some other languages and creates and destroys a new database connection for every query. This causes a pretty big memory initialization on every page load, which, if the OS is out of memory and falling back to swap, means that DB connection has to be paged in and out of RAM from disk, which is a horribly, horribly slow process, meaning queries take forever. After there is a long enough queue of queries to be processed Wordpress will give up waiting for a connection and kick back a "Error establishing database connection." Voila. The culprit.

MYSQL Investigation

With the swap issue in mind, we decided to poke around the MYSQL installation to see if there were any anomalies there. In doing so, we found that DFP had three active websites (dailyfreepress.com, blog.dailyfreepress.com, and hockey.dailyfreepress.com), each with its own separate database and its own separate Wordpress installation. Basically, DFP were running three separately managed Wordpress sites on one box, which was contributing to the high memory/swap usage. We were also not sure if this was a shared server instance - it's possible that there were other users using the same MYSQL DB and server, but we weren't able to find out because we didn't have root access to the server.

Sample mysql connection issue (we had this and the dreaded "Cannot create database connection")

Screen-Shot-2018-02-14-at-1.31.13-PM

We also weren't able to look at the mysql logs (or any logs in /var/logs) because of the lack of root access. We did notice that while both Apache and nginx were installed in /etc/, only nginx was running. With all this in mind, we decided to migrate to a clean install on DigitalOcean, where we would have root access, better monitoring tools and overall a much more reputable hosting service. (And we would save DFP a moderate amount of $$$ - cutting their bill from $30 to $7/month.)

The Great Migration


Bird Migration


It was important to DFP that we do the migration with minimal downtime, so we decided to setup a staging environment on DigitalOcean with a clean Wordpress install, and to copy five years worth of content to this install. After configuring everything, we would shift traffic to the DigitalOcean server and turn off the old one (bye-bye, BlueHost!).

Switching DNS to Cloudflare

There were a few access logs from the old server that showed bots attempting to pwn the Wordpress admin login page, which also didn't have TLS/SSL enabled. To protect the new server and provide some cover to the old one in the process, we imported DFP's DNS settings and started proxying traffic through CloudFlare. This gave us the ability to quickly shift traffic later on, i.e. in moving the staging server to production, as well as some basic TLS/SSL protection.

Previous site login page w/o SSL:
Screen-Shot-2018-01-30-at-12.40.24-AM

Backing up the DB and the Images

We tried to get a DB backup via phpMyAdmin but the BlueHost control panel failed us. Instead, we generated a database dump for each of DFP's sites using mysqldump. Altogether, the DB dump files were only about 1GB and were only three files, so we just scp'd those over to the new server.

The images were a bit more tricky - there were 23GB worth of images/gifs that DFP had collected and uploaded over the years. After attempting and failing to compress the images into a tarball, we decided to just rsync the images between the servers, which took a few hours.

Setting up nginx, Wordpress and LetsEncrypt

While the content was being copied over, we pushed a clean install of Wordpress to the staging server. Because nginx is a bit more performant when it comes to memory usage (and we had more experience with it) we decided not to use it in lieu of Apache for our proxy. We also setup TLS/SSL certs on staging using the awesome LetsEncrypt/Certbot project. TLS/SSL would help protect the admin login pages as well as increase DFP's SEO ranking.

Importing the Main DFP site

With a clean Wordpress install on the new server, it was time to import the content from the dump files. With a little trial and error, we were able to import the main DFP site. For everything to work, we needed to change some prefixes (the original site had a Wordpress prefix of "wp_main", which we had to change to "wp_") and we needed to change the URLs of the images, which were prefixed with "http://dailyfreepress.com" instead of the new, secure "https://dailyfreepress.com".

I had the bright idea to try to use Visual Studio's find-and-replace tool to convert these prefixes, which prompty bricked my computer for ten minutes when I tried to run find-and replace on a 0.7GB long text file. Fun fact: sed will do the job in less than 10 seconds without crashing your computer.

With the prefixes converted, we ran a MYSQL import on the modified dump file and everything worked!

Creating a wordpress network and connecting the DFP Blog and DFP Hockey sites

Instead of having three separate Wordpress installs (yes, three separate folders with the same source code) like the old server, we wanted to use Wordpress' "network" feature where we could manage multiple sites from the same admin panel. Through a little more trial and error, we were able to enable this feature and create a linked DFP Blog and Hockey site via Wordpress admin. After creating the linked sites, we had to get the content from the old server into the DB, which we were able to do after finagling with sed and Wordpress prefixes. Ta-Da!

Post-Mortem


Root cause: Wordpress/PHP had issues connecting to a shared MySQL database instance because of high RAM and swap usage.

Solution: Clean LAMP and Wordpress install. Also added logging and site backup to our new DigitalOcean setup!

If you've read this far, thank you :) and if you're a BU student interested in working on fun problems like this for other BU students, shoot us an email at dfp [at] bostonhacks.io :)

Thanks to Noah Naiman and Ken Garber for their help in troubleshooting the above.