Discussion:
Facebook service outage
(too old to reply)
Stephen Hoffman
2021-10-04 20:54:02 UTC
Permalink
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.

The Facebook IP address space and all its services are inaccessible.

Reportedly, this outage is due to erroneous BGP routing announcements.

Reportedly, internal access including badges and badge readers and
building and door access controls are all offline, and remote access to
their peer routers is also all offline.

This looks to be one of the biggest service outages around, and with a
detail—the BGP routing announcements—that y'all might want to consider
within your business continuity and business recovery plans.
--
Pure Personal Opinion | HoffmanLabs LLC
Paul Anderson
2021-10-04 21:15:06 UTC
Permalink
Post by Stephen Hoffman
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.
The net productivity of the US workforce has gone up 30% today!

Paul
Simon Clubley
2021-10-04 21:33:21 UTC
Permalink
Post by Stephen Hoffman
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.
The Facebook IP address space and all its services are inaccessible.
I guess someone took the move fast and break things mantra literally. :-)

Is anyone seeing the actual DNS entries for facebook.com disappearing
as well ?

Depending on which DNS server I use, I either get the entries or a
SERVFAIL from nslookup.

Does anyone else see this ?

Just tried it from Eisner as well. This is what I get:

$ nslookup
Default Server: dns.google
Address: 8.8.8.8
Post by Stephen Hoffman
facebook.com
Server: dns.google
Address: 8.8.8.8

Non-authoritative answer:
Name: facebook.com
Address: 157.240.220.35
Post by Stephen Hoffman
Exit
$ nslookup facebook.com 4.2.2.2
Server: b.resolvers.Level3.net
Address: 4.2.2.2

*** b.resolvers.Level3.net can't find FACEBOOK.COM: Non-existent host/domain
Post by Stephen Hoffman
Reportedly, this outage is due to erroneous BGP routing announcements.
Let's just hope it's an internal screwup and not an external attack.
The consequences in the latter case might be rather interesting.
Post by Stephen Hoffman
Reportedly, internal access including badges and badge readers and
building and door access controls are all offline, and remote access to
their peer routers is also all offline.
Someone has some lessons to learn about not making all systems depend
on each other. Internal access should not have been affected by this
in a properly designed system IMHO.
Post by Stephen Hoffman
This looks to be one of the biggest service outages around, and with a
detail?the BGP routing announcements?that y'all might want to consider
within your business continuity and business recovery plans.
Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
chris
2021-10-04 22:08:58 UTC
Permalink
Post by Simon Clubley
Post by Stephen Hoffman
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.
The Facebook IP address space and all its services are inaccessible.
I guess someone took the move fast and break things mantra literally. :-)
Is anyone seeing the actual DNS entries for facebook.com disappearing
as well ?
Depending on which DNS server I use, I either get the entries or a
SERVFAIL from nslookup.
Does anyone else see this ?
$ nslookup
Default Server: dns.google
Address: 8.8.8.8
Post by Stephen Hoffman
facebook.com
Server: dns.google
Address: 8.8.8.8
Name: facebook.com
Address: 157.240.220.35
Post by Stephen Hoffman
Exit
$ nslookup facebook.com 4.2.2.2
Server: b.resolvers.Level3.net
Address: 4.2.2.2
*** b.resolvers.Level3.net can't find FACEBOOK.COM: Non-existent host/domain
Post by Stephen Hoffman
Reportedly, this outage is due to erroneous BGP routing announcements.
Let's just hope it's an internal screwup and not an external attack.
The consequences in the latter case might be rather interesting.
Post by Stephen Hoffman
Reportedly, internal access including badges and badge readers and
building and door access controls are all offline, and remote access to
their peer routers is also all offline.
Someone has some lessons to learn about not making all systems depend
on each other. Internal access should not have been affected by this
in a properly designed system IMHO.
Post by Stephen Hoffman
This looks to be one of the biggest service outages around, and with a
detail?the BGP routing announcements?that y'all might want to consider
within your business continuity and business recovery plans.
Simon.
nslookup fails here as well, so looks likes it might be a dns hack...

Chris
Stephen Hoffman
2021-10-04 22:11:05 UTC
Permalink
Is anyone seeing the actual DNS entries for facebook.com disappearing as well ?
It's reportedly BGP, and having null-routed most of Facebook IP address
space, including their DNS servers, and their service status server.
Depending on which DNS server I use, I either get the entries or a
SERVFAIL from nslookup.
As the DNS caches age out, there'll be no canonical server for Facebook
DNS translations reachable.




I haven't been able to get to Eisner for a while, but haven't bothered
to dig into the details.

$ dig +short eisner.decuserve.org
216.41.237.174
$ dig +short decuserve.org
184.168.131.241
$

ssh: connect to host eisner.decuserve.org port 22: Connection refused

Local ssh here is OpenSSH_8.1p1, LibreSSL 2.7.3.
--
Pure Personal Opinion | HoffmanLabs LLC
Simon Clubley
2021-10-05 12:10:34 UTC
Permalink
Post by Stephen Hoffman
I haven't been able to get to Eisner for a while, but haven't bothered
to dig into the details.
$ dig +short eisner.decuserve.org
216.41.237.174
$ dig +short decuserve.org
184.168.131.241
$
I'm using eisner.decus.org to connect to Eisner, which gives the same
IP address as your first example.
Post by Stephen Hoffman
ssh: connect to host eisner.decuserve.org port 22: Connection refused
Local ssh here is OpenSSH_8.1p1, LibreSSL 2.7.3.
Do you have any local manual host name table entries for that
domain name which point to a different IP address ?

What happens when you try ping ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Stephen Hoffman
2021-10-05 15:16:19 UTC
Permalink
Post by Simon Clubley
Post by Stephen Hoffman
I haven't been able to get to Eisner for a while, but haven't bothered
to dig into the details.
$ dig +short eisner.decuserve.org
216.41.237.174
$ dig +short decuserve.org
184.168.131.241
$
I'm using eisner.decus.org to connect to Eisner, which gives the same
IP address as your first example.
Post by Stephen Hoffman
ssh: connect to host eisner.decuserve.org port 22: Connection refused
Local ssh here is OpenSSH_8.1p1, LibreSSL 2.7.3.
Do you have any local manual host name table entries for that domain
name which point to a different IP address ?
While Eisner DNS has been scattershot for a while and some of the older
entries now go to a corporate web login portal, if I'm getting the same
IP address translations as you are, then I'm not hitting a difference
caused by local DNS services or by local hosts entries.
Post by Simon Clubley
What happens when you try ping ?
ping pings.

traceroute:
...
8 ip4.gtt.net (208.116.129.178) 21.771 ms 21.776 ms 21.781 ms
9 216.41.236.33 (216.41.236.33) 23.490 ms 23.559 ms 23.314 ms
10 216.41.236.122 (216.41.236.122) 23.417 ms 23.516 ms 23.391 ms
11 216.41.236.122 (216.41.236.122) 24.949 ms 24.593 ms 24.554 ms
$ dig +short eisner.decus.org
216.41.237.174
$

With OpenVMS access adjustments:

$ ssh -vvvvv -o HostKeyAlgorithms=ssh-rsa,ssh-dss -o
KexAlgorithms=diffie-hellman-group1-sha1 -o Ciphers=aes128-cbc,3des-cbc
-o MACs=hmac-md5,hmac-sha1 ***@eisner.decus.org
OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data /Users/hoffman/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Connecting to eisner.decus.org port 22.
ssh: connect to host eisner.decus.org port 22: Connection refused
$

Without:

$ ssh -vvvvv ***@eisner.decus.org
OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data /Users/hoffman/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Connecting to eisner.decus.org port 22.
ssh: connect to host eisner.decus.org port 22: Connection refused
$

For giggles:

$ ssh ***@eisner.decus.org
ssh: connect to host eisner.decus.org port 22: Connection refused
$

Both config and ssh_config referenced above are currently entirely
commented out.

As I'm getting rejected by the ssh server or seemingly something just
ahead of the ssh server, that implies the ssh client and ssh server or
the firewall don't want to play.

This all reeks of destination firewall settings or of related
port-forwarding settings, too. Which is why I haven't bothered to
pursue it.

VSI is porting OpenSSH, which will help with ssh support more generally.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-04 22:04:49 UTC
Permalink
Post by Stephen Hoffman
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.
The Facebook IP address space and all its services are inaccessible.
Reportedly, this outage is due to erroneous BGP routing announcements.
Reportedly, internal access including badges and badge readers and
building and door access controls are all offline, and remote access to
their peer routers is also all offline.
This looks to be one of the biggest service outages around, and with a
detail—the BGP routing announcements—that y'all might want to consider
within your business continuity and business recovery plans.
For those who don't get out much, how about explaining what "BGP routing
announcements" are.

I could try looking it up, but, you used it, so you explain what it is ...

I just had to find -o Port=, that's my one search for the day ...

:-)
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Stephen Hoffman
2021-10-04 22:17:20 UTC
Permalink
Post by Dave Froble
For those who don't get out much, how about explaining what "BGP
routing announcements" are.
BGP is how routing information is passed around on the internet.

Just before Facebook dropped offline, a bunch of Facebook-related BGP
updates propagated, and the updates effectively expunged Facebook from
the Internet.

The whole of the Facebook IP address space—including their DNS servers,
their web servers, their Facebook status server, and everything else
Facebook—are presently inaccessible.

DNS translation requests can't get to the Facebook DNS servers to get a
DNS translation for any of the Facebook servers or services, as there's
no route to the Facebook DNS servers.

Widespread internal Facebook network service outages are also being reported.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2021-10-04 22:56:58 UTC
Permalink
Post by Dave Froble
Post by Stephen Hoffman
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.
The Facebook IP address space and all its services are inaccessible.
Reportedly, this outage is due to erroneous BGP routing announcements.
Reportedly, internal access including badges and badge readers and
building and door access controls are all offline, and remote access to
their peer routers is also all offline.
This looks to be one of the biggest service outages around, and with a
detail—the BGP routing announcements—that y'all might want to consider
within your business continuity and business recovery plans.
For those who don't get out much, how about explaining what "BGP routing
announcements" are.
I could try looking it up, but, you used it, so you explain what it is ...
BGP routing is how the internet figures out how to actually get from one
IP address to another IP address.

If you need to access a web site on the west coast does it flow:

your PC--your ISP router--ISP to Texas router--Texas router--Texas to
west coast router--west coast router--web site firewall

or does it flow:

your PC--your ISP router--ISP to Chicago router--Chicago router--Chicago
to west coast router--west coast router--web site firewall

If there is no valid info on how to get to that web site, then that web
site is unavailable.

The example may be oversimplified a bit, but I am not a network guy, so ...

:-)

Arne
Scott Dorsey
2021-10-04 23:16:05 UTC
Permalink
Post by Dave Froble
For those who don't get out much, how about explaining what "BGP routing
announcements" are.
I could try looking it up, but, you used it, so you explain what it is ...
If you are old you remember when there were all kinds of different ways
that backbone sites used to decide where to route packets too. RIP, GGP,
EGP, there were a bunch of different ways to decide what the "best" path
from here to there is.

That's all gone now. There is one way to transfer routing data around,
and it is BGP. There is no more picking up the phone and calling up
up Jon Postel to ask if he thought it was better to route through here
or there. There are no more static routing tables that need manual updating
at inopportune moments. BGP just works, and it works well when it's fed good
data.

Unfortunately BGP was designed in an era when ISPs could trust one another,
and that's not really the case any more. It is possible to publish bad
routing data for sites you don't like and get other sites to accept them.
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."
chris
2021-10-05 00:22:40 UTC
Permalink
Post by Scott Dorsey
Post by Dave Froble
For those who don't get out much, how about explaining what "BGP routing
announcements" are.
I could try looking it up, but, you used it, so you explain what it is ...
If you are old you remember when there were all kinds of different ways
that backbone sites used to decide where to route packets too. RIP, GGP,
EGP, there were a bunch of different ways to decide what the "best" path
from here to there is.
That's all gone now. There is one way to transfer routing data around,
and it is BGP. There is no more picking up the phone and calling up
up Jon Postel to ask if he thought it was better to route through here
or there. There are no more static routing tables that need manual updating
at inopportune moments. BGP just works, and it works well when it's fed good
data.
Unfortunately BGP was designed in an era when ISPs could trust one another,
and that's not really the case any more. It is possible to publish bad
routing data for sites you don't like and get other sites to accept them.
--scott
If you have routing tables that are working, wouldn't it be a good idea
to verify any new routes before admitted them to the working set tables,
or perhaps it does that already ?. Some sort of fail over mechanism.

Doesn't seem very robust as is...

Chris
Javier Henderson
2021-10-05 15:27:46 UTC
Permalink
Post by chris
If you have routing tables that are working, wouldn't it be a good idea
to verify any new routes before admitted them to the working set tables,
or perhaps it does that already ?. Some sort of fail over mechanism.
Doesn't seem very robust as is…
There are some mechanisms to verify authenticity, for example RPKI, RIR and LOAs. The first two allow for some automation, the last one involves some manual verification (which in many cases means glance at it, then allow the prefixes listed on the letter).

As for robustness… since its inception the whole internet has been held together with chewing gum and gaffer’s tape and it continues to amaze me that it works so well.

-jav
John Wallace
2021-10-05 17:58:07 UTC
Permalink
Post by chris
Post by Scott Dorsey
Post by Dave Froble
For those who don't get out much, how about explaining what "BGP routing
announcements" are.
I could try looking it up, but, you used it, so you explain what it is ...
If you are old you remember when there were all kinds of different ways
that backbone sites used to decide where to route packets too.  RIP, GGP,
EGP, there were a bunch of different ways to decide what the "best" path
from here to there is.
That's all gone now.  There is one way to transfer routing data around,
and it is BGP.  There is no more picking up the phone and calling up
up Jon Postel to ask if he thought it was better to route through here
or there.  There are no more static routing tables that need manual
updating
at inopportune moments.  BGP just works, and it works well when it's
fed good
data.
Unfortunately BGP was designed in an era when ISPs could trust one another,
and that's not really the case any more.  It is possible to publish bad
routing data for sites you don't like and get other sites to accept them.
--scott
If you have routing tables that are working, wouldn't it be a good idea
to verify any new routes before admitted them to the working set tables,
or perhaps it does that already ?. Some sort of fail over mechanism.
Doesn't seem very robust as is...
Chris
There was a time when wise people, including some round here, used to
advocate various radical but untrendy tactics such as keeping the
"production" network as separate as possible from the "remote
management" network, in the interests of robustness and defence in depth.

Presumably that's been a rather dated concept in some circles in recent
years?

If some network config update screws up both the "production" network
and the "remote management" network it sounds rather as though someone
forgot to check their architecture (procedures, etc) for single points
of failure and shared failure modes.

Henry Crun
2021-10-05 04:11:42 UTC
Permalink
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc., are all offline.
The Facebook IP address space and all its services are inaccessible.
Reportedly, this outage is due to erroneous BGP routing announcements.
Reportedly, internal access including badges and badge readers and building and door access controls are all offline,
and remote access to their peer routers is also all offline.
This looks to be one of the biggest service outages around, and with a detail—the BGP routing announcements—that y'all
might want to consider within your business continuity and business recovery plans.
some explanations at
https://www.zdnet.com/article/what-took-facebook-down-major-global-outage-drags-on/
--
Mike R.
Home: http://alpha.mike-r.com/
QOTD: http://alpha.mike-r.com/qotd.php
No Micro$oft products were used in the URLs above, or in preparing this message.
Recommended reading: http://www.catb.org/~esr/faqs/smart-questions.html#before
and: http://alpha.mike-r.com/jargon/T/top-post.html
Missile address: N31.7624/E34.9691
Stephen Hoffman
2021-10-05 14:51:12 UTC
Permalink
Post by Stephen Hoffman
Facebook, Instagram, Facebook Messenger, WhatsApp, Login with FB, etc.,
are all offline.
The best write-up I've encountered on the Facebook outage yesterday is
from Cloudflare, a content delivery network service with a whole lot of
network instrumentation:

https://blog.cloudflare.com/october-2021-facebook-outage/

Discusses IP routing and what happened here with BGP in some detail.
--
Pure Personal Opinion | HoffmanLabs LLC
Loading...