About the Dec 26th Outage

  • Posted on
  • By

On December 26th, OwnerRez experienced a database corruption issue with our cloud service provider.

We saw the issue right away and started working on it, but there was a domino effect that grew significantly beyond our initial expectations.  Our engineering staff has been recalled from vacation, connecting in from overseas, etc. to diagnose and resolve the problems as quickly as possible.

At present, there are no indications of any hostile activity or attacks, and no reason to believe that any previously gathered data has been lost. We are currently operating on a restored system with up-to-date database backups that occurred at the same time the issue started to develop.

We will be posting information and analysis (here in this same blog post), and holding a public webinar, as soon as we have a more thorough understanding of what went wrong and future preventive actions.

In the meantime, here is a list of common questions that have been coming in.  This list will grow over time as we add more.

Why did it happen? What caused it?

We are still investigating all the details, and there are puzzling issues we still don't fully understand, but what we can say is this...

This past Saturday, as part of a routine update, we upgraded some database servers with a new configuration that we had used extensively for many months.  The configuration was a very routine thing and, again, something that our engineers have used in their own environments (and elsewhere in testing/staging) for a while.  There was no reason to believe the update or new configuration would cause anything out of the ordinary.

From Saturday until Monday, the new configuration started to corrupt data.  We're not sure exactly when or why (this is the part still being investigated) but several database areas began to be unresponsive.  The entire platform started to go down on Monday, and our remediation/failover processes started to kick in, but there was a domino effect, and the remediation/failover processes were affected similarly as the system bounced from up to down and back again.

The investigation into the bad database configuration is ongoing.  By all accounts, it should not have happened.  Our cloud provider has service guarantees with us, so we are working with them on the investigation.

Were others affected, is this a widespread issue?

No, not that we know of.  This is isolated to something we did (updating a database configuration) to our database servers with our cloud provider.  But the obvious question is - what about other businesses that used the same database configuration?

Yes, indeed - we are wondering the same thing.  We purposefully watch to see if software updates by third parties are stable before upgrading our server configurations.  In fact, we purposefully waited on this database configuration (from back in March) because there were some known issues with it affecting servers.  We skipped over it and used a different configuration that was reported to be stable and in use by others, and we used that configuration in test/stage environments for several months before moving forward.  As mentioned above, we are very puzzled by why it happened and by all accounts, it should not have.

(And no, the Southwest Airlines debacle is not related to this. At least, we highly doubt it.)

Was any data lost? Doesn't restoring a backup mean some recent data was lost?

The short answer is: no lost data, but we are still investigating and there may be discrepancies with third parties.

The reason we say "no lost data" is because we have data redundancies in place that save data to multiple places at the same time.  When our primary systems go down, we still have access to other systems.  When we rebuilt and restored the underlying configurations, we took data from secondary systems that were up to date as of the time the issue happened.

After the recovery period finished, we went back and compared the new data with the corrupted data.  There were no discrepancies in terms of missing records.  In other words, no records were showing in the old corrupted version that didn't exist in the new recovered version.

That being said, there are many channels and 3rd parties that we connect with that also send data back and forth and could have "sent" or "received" some data that never got through during the outage.  So a channel or 3rd party might show something different than what OwnerRez does, which means the data is incorrect on one side or another (or both).  Messaging could be another example of this.  Gmail and other messaging services watch volume and throttle accordingly.  When the final recovery effort fully kicked in, a large firehose of queued messaging went out (eg. from you to guests, or from OR to you).  Gmail may have throttled or dropped some of that due to volume concerns on their side.  We have not seen signs of that, but it is possible.

Also, there were intermittent "up" periods, early in the recovering period, when you (or a third party) may have changed data that ultimately did not get saved when we did the final recovery effort.  So there too, data may be missing.

Was any sensitive data (credit card, phone number, PII) lost or taken?

No. This was not the result of a security breach nor any outside influence. Nothing was hacked or stolen.  This was a database configuration issue where a seemingly-insignificant update led to data corruption and then domino'd elsewhere as our fail-over processes started responding.

Don't you have redundancies in place to prevent this from happening?

Yes, many.  Our engineering team spends a lot of time (and on an ongoing regular basis) managing and analyzing our infrastructure.  We have lots of tooling and logging in place that gives us visibility into errors, slow responses, spikes, partner issues, background queues, and much more.  OwnerRez is large and full of many moving parts, and in each of those moving parts, we store data where it can be replicated and archived stably as it grows.

Yesterday, we saw the issue immediately and our fail-over process kicked into gear, but the database configuration affected every remediation effort that swung into place.  We were forced to pause everything completely, get to the root of the problem, and rebuild configurations before turning the firehose back on.

The bad configuration was, itself, something we tested and used extensively for months beforehand, so we are still puzzled by (and investigating) why it happened.  But we made the decision to stop the failover/automation and roll back to a configuration that we thought would make a bigger impact.  We knew it would take a couple of hours, but we believed (and still do) that it was a better trade-off for a return to stability rather than fight corrupt data in rolling cycles for what could be many days.

But we clearly have work to do. We do not want to shift blame or throw up our hands here.  We are committed to being your elite PMS and channel manager and that comes with certain expectations on your part and responsibilities on our part.  We are taking steps to learn from this and incorporate new processes for additional redundancies and better communication.  Some stark problems became very clear over the past 36 hours.

Does Airbnb automatically reactivate property listings, or must that be done manually?

During the outage, you may have received automatic emails from Airbnb stating that your listings were disabled or hidden because your software (ie. OwnerRez) was offline.  Airbnb does this as a safety precaution so that guests are not double-booking your property since Airbnb has no way of confirming with you (ie. via OwnerRez) that everything is okay.

Post-outage, Airbnb should have reactivated any hidden listings automatically, but you should check your listings and not assume this is the case.  When our system started communicating with Airbnb again, we confirmed that they were turning listings back on again by checking with specific listings and seeing that they were visible.  However, do not trust that this is the case across the board.  Check your listings so that you know for sure.  Our helpdesk has reported that some users are still seeing hidden listings.  That could be for specific reasons that don't apply to you, but everyone should still check that their listings are showing and the calendars are open and bookable.  We apologize for the inconvenience this places on you.

If you did not receive any "hidden listings" email from Airbnb, you should be fine.  Airbnb only hides listings if they can't send bookings through, not for other reasons.  However, if you're worried about it, we recommend that you still verify your listings on Airbnb to make sure they are showing and bookable.  It could be that it happened to you, but you didn't see the email or an email was never sent.

During the outage, I disconnected Airbnb. Now what?

During the outage, some users were recommending to each other to go into Airbnb and manually disconnect OwnerRez from the Airbnb side so that they could return to "platform mode" and deal with guests and bookings directly on Airbnb.  If you did this, there is no way to undo it from the Airbnb side.  You need to go into OwnerRez > Settings > API > Airbnb and reconnect the account from the OwnerRez side.  This is safe to do, and your previous connection will reactivate and merge in changes safely.

I did a full resync for Airbnb, but nothing changed.

Triggering a "full resync" for Airbnb does not happen immediately in OwnerRez, and it never has.  The request for the resync is put into the queue, along with many other sync actions, and is pulled out as the queue processes every few seconds.  Typically, in normal conditions, the queue is processed so quickly it can appear to be immediate, but it always follows the queue.

After the outage, the queues were filled with many waiting requests, so everything took time to process.  If you were clicking the "trigger full resync" option on the channel, it was put into the queue behind other things and probably took a while to process.

Post outage, we have been monitoring all of our queues to watch many types of requests process - bookings, channel syncs, messaging, and so on. If you are waiting for a message to be delivered or channel sync to go through, make sure to give it some time to catch up.  If it's been a while (ie. an hour) reach out to helpdesk, and we'll take a look.

Where are my Booking.com bookings? They aren't showing up on my calendar.

Unlike some of our other channel partners, Booking.com does not provide an automatic method to restore lost or missed data, including booking records. Our team does have access to logs and is manually working through that information to find affected accounts and properly update calendars. This should be completed by the end of today, but, note that not all booking information can be restored. However, the calendar data will be correctly entered, preventing any double bookings.

OwnerRez says my Airbnb properties are fully synced and online, but, I still can't see them on Airbnb.

We are seeing a relative handful of this situation - as in, a few dozen properties out of tens of thousands. It appears to be individual properties affected - that is, you might have 10 Airbnb properties in one API-connected account, 9 are working fine, but 1 refuses to come back online no matter what you do. We've tried several things to get those properties working again and haven't had success either - so, we've filed a Critical bug with Airbnb partner support. If you have properties in this condition and have not already reported it to us, either via the Helpdesk or in this thread, please do so - we'll add your information to the Airbnb bug report and track them all together.

**UPDATE** - All but a couple of the offline Airbnb properties have been restored, and those are expected to be fixed by the end of the day.

121 Comments (add yours)

CWV
Dec 27, 2022 1:40 PM
Joined Nov, 2019 61 posts

A big thank you to OR staff and especially those who were called unexpectedly to work during their time off.

Ruterra Apartmen
Dec 27, 2022 1:42 PM
Joined Mar, 2022 2 posts

We're currently receiving overbookings for new years eve...

 

Erika F
Dec 27, 2022 1:42 PM
Joined Jul, 2022 3 posts

Oh boy. As someone who has run websites for years, I understand how frustrating and stressful this can be.

I got really lucky. A booking sneaked in and took care of their contract and security deposit during a brief interval when the site was back up. It was a last minute booking so I was really worried but it all worked out and they are safely checked in. 

Hope they are able to diagnose whatever happened for you. Happy holidays.

Holly W
Dec 27, 2022 1:47 PM
Joined Nov, 2022 4 posts

I am happy about the communication I am seeing here from Owner Rez. To all the owners who had "no access to their calendar" during this time- this is a highlight for you. You are responsible for your own system of backup for your calendar. Yes, Owner Rez is our "master calendar" but all businesses need to have backup systems in place. Just like I have a generator here for my home/business (and vacation rental) should the local power company "go down", I have a complete calendar for all properties, offline, at all times. Owner Rez will do their fact finding and possibly implement additional measures if needed, but bottom line is outages occur, and are not always the fault of the provider (example major widespread natural disaster) so its a good time to put your own backup measures in place. For me, this is as simple as 1) maintaining a log of all guests, 2) a copy of all contracts after signature (gives me contact info, I download the PDF to a file on my computer, which yes, is backed up - in two places) and 3) an actual "calendar" of all guests in a paper or PDF version.

Angelo G
Dec 27, 2022 2:11 PM
Joined Oct, 2022 1 post

My Booking.com reservations from yesterday are still not showing in the system. What should I do?

Ken T
Dec 27, 2022 2:13 PM
Joined Aug, 2019 1707 posts

We are aware of the issue with dropped Booking.com bookings and are investigating as to the best resolution.

Lisa +
Dec 27, 2022 2:20 PM
Joined Feb, 2022 6 posts

Love the OR crew!  Thanks for working through all the mirad of cascading issues and for taking time away from your vacation to set things right.  Stuff happens its what you do when the _____ hits that fan that matters.  Great example for all of us OR.  Kuddos! 

Ruterra Apartmen
Dec 27, 2022 2:25 PM
Joined Mar, 2022 2 posts

Wish you all the best 

 

Please fix the problem with booking.com channel and be aware of it, since we've received overbookings from that channel today

Randy H
Dec 27, 2022 2:46 PM
Joined Jun, 2019 46 posts

While I cannot deny this was a huge inconvenience I have worked in I.T. for over 40 years and all I can say to those who feel they were let down all I can say is, SHIT HAPPENS.

Sometimes no amount of redundancy will keep the enterprise up and running during a failure however restoring data back in time to the exact point of failure is not difficult though time consuming.  

With that said thanks for keeping us informed throughout the day.

 

Paul W
Dec 27, 2022 2:52 PM
OR Team Member Joined Jun, 2009 833 posts

Hi everyone, we updated this blog post with a FAQ section.  Please refresh and give it a read.  We covered most of the questions that have come in the most, and we'll be adding more over the next 48 hours.

Ken T
Dec 27, 2022 2:57 PM
Joined Aug, 2019 1707 posts

A fix for the Booking.com async issue is under way and should be completed today.  More information will be added to the main post soon.

Angela R
Dec 27, 2022 3:29 PM
Joined Sep, 2022 8 posts

Hi ken! Mine finally shows synced, but my Airbnb is still hidden . What can we do to have our Airbnb gets restored ? This is really making us lose on bookings 

Ken T
Dec 27, 2022 3:34 PM
Joined Aug, 2019 1707 posts

@Angela I see what you mean, we are investigating.

Paul W
Dec 27, 2022 3:42 PM
OR Team Member Joined Jun, 2009 833 posts

Two more questions have been added, so please refresh again and read.  Booking.com and triggering full resyncs for Airbnb.

BMH
Dec 27, 2022 3:50 PM
Joined Feb, 2020 3 posts

This is what Airbnb has said to me
"Even when the information looks correct in your software provider, changes to your account still need to be made through your software and not on Airbnb. We can only pull the information we’re shown by your software.

If you'd like assistance making updates, the best thing to do is reach out to your software provider. They're in the best position to help. Often, a simple re-sync of your software can fix the issue.I will be closing this message thread now, and I hope that is okay with you. We are here with you all throughout your Airbnb journey so if ever you need assistance or have any questions, please don't hesitate to reach out to us. We will be happy to help you."

 

Do we resync or just continue waiting it out?

Erika F
Dec 27, 2022 4:02 PM
Joined Jul, 2022 3 posts

Today I synced my Airbnb several times but rate changes and season changes are not taking effect on Airbnb yet.

Paul W
Dec 27, 2022 4:06 PM
OR Team Member Joined Jun, 2009 833 posts

@Brittany Any action made in OwnerRez (changing a photo, updating a rate, changing availability) is already queued to update on the Airbnb side. It may not have happened yet because the channel queues are running a good hour or more behind, but they are queued to happen and will happen when it catches up.

Now, if you're worried about that or just want some extra confirmation, you can certainly trigger a full resync and OwnerRez will add a "full sync" of everything to the channel queue.  However, the full resync will wait in line behind everything else.

So what will happen is all of a sudden all of your stuff will update, both the waiting changes from before and the full resync.

Unless there's a significant and pressing issue, I would wait a bit longer and check later in the evening (or overnight, depending on where you're at)  to see if the listing is still off.  Check out the Airbnb channel dashboard in OwnerRez and the Sync Actions tab.  We have a lot of transparency there on when your OwnerRez side changed and when e last updated Airbnb, including specific categories of content.

Kylie R
Dec 27, 2022 4:18 PM
Joined Jun, 2021 3 posts

Our listings are showing synched with Airbnb as of 3 hours ago, and our listings are still hidden on Airbnb.

Angela R
Dec 27, 2022 5:23 PM
Joined Sep, 2022 8 posts

Ken it is Still not showing …. Anyone else ? Been almost 48 hrs since this ownerrez maintenance snafu started and 24 hrs since this issue triggered Airbnb to hide listings… is there any new update ? This is reallt costing us a lot of bookings missed … we have people who are wanting to book and can’t… I know issues happen and we understand that but the frustratingpart here is how long it is persisting considering how mission/time critical this service is to all the users . 

Patricia Knight
Dec 27, 2022 5:57 PM
Joined Jan, 2020 20 posts

Thank you to the OwnerRez team for your dedication to resolving this issue and communicating updates to everyone. Greatly appreciated!  Also shows how we all become dependent on other providers for our business tech stack. This definitely highlighted the need to at least do a weekly backup of our calendars and guest information just in case. Thankfully I had template messages saved elsewhere so I was able to manually send an email to a guest that booked last minute yesterday morning during all this chaos.