Skip to content

Web conference notes, 2019.08.08

margodawes edited this page Aug 16, 2019 · 30 revisions

Attendees

Agenda

  1. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/347
  2. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/268
  3. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/345
  4. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/334
  5. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/341
  6. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/315
  7. https://github.com/CityOfLosAngeles/mobility-data-specification/issues/281

Minutes

Thank you for the report @rf-, do we need an emergency release? Or part of 0.4.0?

Consensus that we can wait for 0.4.0.

Action Item: Brady Law from Lyft will propose PR of log-rotation idea, target readiness to discuss for next call 8/22.

Current major use-cases of APIs:

  • Nearish real-time (what happened last hour? last day?)
  • True historical backfill (what happened last week, last month, last 6 months, etc.)
  • Comparison (what happened last weekend vs. a couple weekends ago?)

Lime: long timescale queries (e.g. far back in time) could theoretically be cached.

Lyft:

  • difficult to predict how we will be queried, especially in the backfill case.
  • do we want to design around use-cases (current) or around the data itself, and what makes sense for storage/serve
  • Brady will propose PR of log-rotation idea, target readiness to discuss for next call 8/22.

Generally, want to make API more performant and easier to implement over longer time ranges.

Solution options include:

  • Fix query window to some interval (e.g. UTC hour) - model like a rotating logfile

    • query for "active log" is subject to change
    • backwards queries will be for static data
  • Live vs. Historical / Hot vs. Cold feeds

    • e.g. a timepoint in the past where the data universe becomes "clipped", and more recent times behave as now

Ride Report:

  • no problems with caching. But what happens if we query across a boundary? E.g. caching hours, but want last hour and half?
    • Lyft: query across both hours, assumption is a regular query occuring on a normalish schedule.
  • log-rotation model: it would take some time to do the rotation

Remix:

  • Variety of different windows. Backfills usually go daily, but sometimes down to the hourly.
  • Caching semantics don't necessarily need to be exposed to client?
  • Hot vs. Cold model seems to make sense
  • Would be best to make this explicit in the query itself

Bird:

  • With log rotation model, potential issue with large trip payloads in high-traffic cities (dense route objects)
  • Probably need to consider trips separately from status_changes in these caching conversations

Shared Streets:

  • Large systems do exactly this segmenting over time (log-rotation model)

Long Beach:

  • Optional API where consumer can specify intent? E.g. "here comes a request for 12, one-month blocks"

LA: is anyone using device_id or vehicle_id query params?

  • No one on the call is using these query params
  • "city" sharding is already encoded in the token / URL, so aggregators may need to make multiple requests for given period for each geography.

Louisville, similar option:

  • Pass in time period parameter, like hour or day and an anchor timestamp
  • Query responds with everything that aggregation
  • Still allow start/end time queries, but ability to pre-cache for backfill

3. Closing Notes

  • Agenda set from active issues/PRs since last convening - keep the comments coming!
  • Cities working toward 0.3.0, LA and Santa Monica testing and close to deployment in next couple weeks
Clone this wiki locally