Protocol CoAP - FILabs/homebridge-skybell GitHub Wiki
When originally launched the SkyBell HD exchanged CoAP (Constrained Application Protocol) messages with its cloud services without any encryption, authentication or integrity checking (as reported at Purdue University's The 2017 Annual CERIAS Information Security Symposium). This made it easy to sniff exchanges and identify interesting events such as button presses and motion detection.
A subsequent firmware update added encryption, presumably utilising DTLS (Datagram Transport Layer Security). However, this is insufficient to provide complete confidentiality. It is still possible to identify interesting events (button/motion and on-demand video) just by passively monitoring the packet lengths. These events can be detected far quicker by this method than is possible via polling the cloud services.
Refer to the protocol overview for the wider context of these packet exchanges.
Note:
- The (encrypted) CoAP packets are all carried in UDP datagrams with the source and destination port set to 5683.
- Lengths specified below are the size of the UDP data (payload), i.e. 8 bytes less than the length field in the UDP header.
- Due to the use of UDP the precise sequencing of these packets may vary, and some of the datagrams may also be dropped.
When the doorbell is idle the following sequence repeats continuously.
ββββββββββββββ βββββββ
β SkyBell HD β β AWS β
βββββββ¬βββββββ ββββ¬βββ
β β
β321 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 97 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~30 secs β β
βΌ β113 bytes β
βββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 97 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~30 secs β β
βΌ β321 bytes β
βββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 97 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~30 secs β β
βΌ β113 bytes β
βββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 97 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~30 secs β β
βΌ β ... β
The 97-byte packet appears to be some form of acknowledgement that is also used in other message exchanges. If it is not received then the initiator (the doorbell in this case) retransmits its packet.
While the doorbell is streaming live video to an app (via a cloud services proxy) the repeating idle sequence changes to:
β β
β113 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 97 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~30 secs β β
βΌ β113 bytes β
βββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 97 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~30 secs β β
βΌ β ... β
When the doorbell starts recording video it sends a notification to the cloud services:
ββββββββββββββ βββββββ
β SkyBell HD β β AWS β
βββββββ¬βββββββ ββββ¬βββ
β β
<Button pressed> β
β β
β465 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 49 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
This notification is sent regardless of the cause (button pressed, motion detected, or call from the app). Presumably the content of the CoAP message specifies the type of event, but this cannot be determined purely from the packet length.
If the app initiates a call while the doorbell is recording video for a button press or motion event then the cloud services sends a call request to the doorbell:
β β
β <Answer Button/Motion>
β β
β ~900 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β49 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β β
The precise length of the packet from the cloud services varies (lengths of 897 and 913 bytes have been seen).
If Watch Live is requested from the app then the cloud services sends a call request to the doorbell (with the same length as for an answered button/motion event). However, in this case the doorbell responds with a notification that it has started recording on-demand video (again the same length as for a button/motion event):
ββββββββββββββ βββββββ
β SkyBell HD β β AWS β
βββββββ¬βββββββ ββββ¬βββ
β β
β <Watch Live>
β β
β ~900 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β49 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β β
β465 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 49 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
The ordering of the message exchanges, i.e. whether the 465-byte notification from the doorbell is preceded by a ~900-byte call request, can be used to distinguish between button/motion events and Watch Live.
If the app ends the call then it sends two requests:
ββββββββββββββ βββββββ
β SkyBell HD β β AWS β
βββββββ¬βββββββ ββββ¬βββ
β β
β <End call>
β β
β 289 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β49 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 33 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β49 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β β
Alternatively, if the doorbell ends the call (after approximately 5 minutes):
ββββββββββββββ βββββββ
β SkyBell HD β β AWS β
βββββββ¬βββββββ ββββ¬βββ
β β
<End call> β
β β
β449 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β289 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 33 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 33 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βββββββ
β β β²
β β ~2.5 secs
β 33 bytesβ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βββββββ
β 33 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βββββββ
β β β²
β β ~2.5 secs
β 33 bytesβ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βββββββ
β 49 bytesβ
βββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β² β β
~1.5 secs β β
βΌ β49 bytes β
βββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β449 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β289 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β449 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β289 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β49 bytes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊβ
β 65 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 65 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 65 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 65 bytesβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
The precise details of this sequence (packet ordering and number of duplicates) can vary slightly, but the packet lengths appear to be consistent.
CoAP messages are also exchanged at other times, e.g. when the app is used to modify the doorbell's configuration. It is quite likely that these will sometimes use packets of the same lengths as those illustrated above.