I am currently Interning as an Infrastructure Engineer at MeetUp.com. The company that helps organizes MeetUps around the globe. Along with this, I create a lot of projects and support/maintain projects for other people/companies at the same time.
From one of my previous employers, I had a hard time Migrating a site from one server to another. More context on that later. On June 30, it was a BFF or something called as a Back Log Free Friday. Engineers are allowed to work on anything they like related to the company. This is when Ian asked me in the morning after arriving where was was I with migrating the site since I had discussed the same issue with him earlier. He said I was free to work on the same since its a BFF and not much was going around that day. Pretty cool of him to say that.
So here is what all went down:
It is a WordPress site. It is a heavy WordPress site. It has a couple of thousands of visitors per month. Enough to drive the company revenue from google ads and keep it running the entire year. The site is hosted on Hostgator and the new server to be migrated was a different server of the same company but were unhelpful when I approached them for migration since they were losing money as we were not renewing the contract for the old server. Bummer.
The site uses an SSL certificate issued from Comodo. Everything worked fine till now. The only issue was that June 30th the SSL was going to expire. It already did that same day since the certificate was issued in India and I didn’t count for the time difference, me being in the Head Quarters of MeetUp.com. Users started seeing a big red warning that the site is not secure. Bounce Rate went off the roof and the company was bleeding money as Ian and I were talking about the issue.
What had I tried before:
People talk that Migrating WordPress sites are easy. They are, but not if you built a ton of shit over the platform. You have no idea what might break what. Since someone of the developers hired by my previous employer didn’t have much idea what all the things I had done, it was advisable that the person who built it should be the one to migrate the site.
The issues I faced earlier:
I tried to migrate the site as any normal WordPress Dev would. Take a Backup, of the site, Database, Content etc. Redirect Nameservers to the new server. Deploy WordPress. Import content and build the blocks. Easy Peasy right? Not quite.
Whenever I tried all this, the earlier SSL certificate caused issues. How? Well, I was using a CDN (Cloudflare) on the new server and issuing a new SSL Certificate from the same company. What I came to know later that the parent company that issues SSL certificates to CloudFlare is also Comodo.
When I did all this, 2-3 times, the site hit a 526 error (Invalid SSL). I was having no clue what was happening since I don’t think I am knowledgeable enough to troubleshoot these issues by self.
Note: The users on the site were thrown a Big red page but only for two days. On July 2, the server was going to shut down because there was no way we were going to renew the contract. It would just cause the company to lose more money to keep it afloat. Revenue right?
So how did I solve it:
Ian and I had a lot of brainstorming when I actually sat to do it. I admit I am reckless at times while carrying out things but I make sure I fix and patch things back up if I break something. There was no fallback for breaking things now since Users weren’t going to see anything after July 2.
We anticipated I could pull this off in 4 hours. (Or at least I thought)
– Started brainstorming myself how I built the site 2 years ago and what all I had done to make it robust to failure. Took a backup of all minor additional files which had to be done manually every time. Media/Content etc were backed up again since I didn’t bring my personal laptop to the HQ that day.
– I first deleted all the SSL certificate on the old server. CSR’s, Keys everything. (Just to be sure) Flying blind. I know this shouldn’t matter but it does. Why? Because of Comodo, that’s why.
– Once everything was deleted which wasn’t needed and useful anyway, I changed the nameservers to the new servers first (Not Cloudflare). I created a test index page and kept refreshing the page if it was coming up. For 15 minutes, I tried different browsers, cleared cache and still no luck. WTF? By that time I started pulling my hair that the new nameservers should get propagated soon and shouldn’t take so much time.
– Some time back, I used websites and tools like geopeeker which checks the website you feed it from different parts of the world. I tried it and voila! They were showing the test page. Well, why was I not being able to see the test page? (Just to be clear, nothing has been migrated yet).
– I thought, maybe there might be some issues with my own computer and other engineers were busy with their own stuff so I didn’t think of bothering them.
– I then routed the traffic through Cloudflare and a new SSL certificate anyway. The certificate was issued in 36 seconds. I kid you not. It took 2-3 hours earlier when I was trying this. (What I think is the first one was directly from them and this was one from Cloudflare, it might have taken some time to issue new ones (Still not sure). But since there was no earlier certificate that Comodo had to check, It was quick).
– I started pinging the site and expected the new IP to show up on the ping. I know the response times of the Hostgator servers and Cloudflare servers so I know where the traffic was coming from. For 20 minutes it still was throwing me the own server. By this time I really had no clue what was happening since I should receive a response from the Cloudflare server, forget about the new hosting server. (Which I should have checked)
– There are more websites hosted on the new server so I pinged and got a response from Cloudflare server. Wait, What?
So what was happening? Well, I was behind the VPN of MeetUp and I guess some requests are cached because even after I restarted the machines, I was still getting a response from the old server. I even tried pinging the site from a different website and they were showing the correct server, that’s what was bumming me out.
– I turned off the VPN, Cleared the cache, Restarted my Laptop and pinged and I got a response from the Cloudflare server.
– I Installed WordPress on the domain and it worked fine, along with the SSL.
– There is a plugin known as All-In-One WP Migration. It exports everything and we can import it from another site. I thought wow. This is awesome, It did export everything from the old server…. but… Cloudflare started throttling the upload and I couldn’t upload more than 17.85% of the file. Never mind, I’ll build it back it back myself.
– Imported the crucial stuff so that users don’t see a blank page or hit 404. Have to edit the site for minor details but it works! Lol
And that was it. In a Matter of 1 hour. When I finally contacted cloudflare about the issue, they told me, that they didn’t know why the site was being shown 526 before but it was mainly because of both SSL’s overlapping and conflicting. I have no clue what they mean by that.