Weekly check in 2012.08.02 - GeoSmartCity-CIP/gsc-opentripplanner GitHub Wiki
13:32 <demory> hi everyone
13:32 <novalis_dt> Hi!
13:32 <demory> mele grant_h, think we should wait for Frank?
13:33 <mele> yeah, probably. looks like he's just been pushing some commits so i imagine he'll be right in
13:34 <demory> ok
13:35 <mele> Our big release is very soon so Frank has been working on getting all of the T-s crossed and I-s dotted
13:36 -!- FrankP [1815504e@gateway/web/freenode/ip.24.21.80.78] has joined #opentripplanner
13:37 <demory> alright, why don't we get started
13:37 <demory> i think we have everyone -- kpw is off this week btw
13:37 <demory> my update -- just updated deployer to 0.7.10 and am currently rebuilding the DC bikeshare graph -- hopefully will fix the elev-related issues kpw was having earlier. also working on a blog post about recent updates. otherwise focused on general deployer stuff and the leaflet-based ui work
13:37 <novalis_dt> I've been working on optimizing and improving RAPTOR. There have been a few big wins so far: (1) reusing walking SPTs between rounds, (2) more aggressive target bounding (this reduces the number of alternate paths found but is probably worth it), some micro-optimizations in board/alight (which will even help non-raptor code), and, most significantly (architecturally), replacing the edge-based graph with a plain graph and usin
13:37 <novalis_dt> g an edge-based Dijkstra (what pgRouting calls "shooting star"). Looking at the heatmap of the search space, I'm still seeing a lot of time spent in midtown, so I'm trying another variant of Tim Merrifield's heuristic to try to improve target bounding. If target bounding were strong enough, I could attempt to build some bidirectionality into the search to find some bounding path earlier.
13:38 <novalis_dt> (I've also fixed some misc bugs in other areas)
13:39 <FrankP> just added an example base layer to config.js that shows how to limit the extent of the map (in answer to a user question).
13:39 <abyrd> novalis_dt, sounds interesting
13:40 <novalis_dt> abyrd, for what it's worth, the edge-based graph uses about 30% more time than the plain graph.
13:41 <abyrd> so it is both faster and conceptually simpler. sounds like a win.
13:42 <novalis_dt> Yeah.
13:42 <novalis_dt> The one bad thing is that it requires some changes to our state model
13:43 <abyrd> I have been looking at input data for bicycle routing in Paris, including elevation data sets. Also looking at input data for the new york study and how to prep/filter it.
13:43 <abyrd> Currently adapting the service ID refactor to the realtimeStoptimes branch
13:43 <abyrd> novalis_dt, what must change in the state model?
13:44 <novalis_dt> abyrd, we need to be able to clone/edit states outside a traversal
13:44 <novalis_dt> Because when we are done traversing an edge, we need to generate new states for each of the following edges, with different costs depending on turn angles
13:44 <abyrd> oh right, you mentioned that when I was in the office
13:46 <novalis_dt> Yeah.
13:46 <novalis_dt> So, one question that all of this Raptor stuff brings up is: what is a good path?
13:47 <novalis_dt> Raptor, by default, finds the earliest arrival
13:47 <novalis_dt> Well
13:47 <novalis_dt> It finds the earliest arrival in under N boardings
13:48 <novalis_dt> And along the way, it finds all pareto-optimal paths between (boardings, walk distance, arrival time)
13:48 <novalis_dt> I have pruned this by assuming that the worst total time allowable is no more than 1.5x the first-found (lowest transfers) path
13:49 <novalis_dt> Anyway, so I get a bunch of paths which go some distance out of the way to avoid walking
13:49 <novalis_dt> Does this make any sense?
13:49 <novalis_dt> What sorts of paths do people actually want to see?
13:49 <demory> so, the core raptor algorithm doesn't take into account user prefs for transfers vs time, walking dist, etc?
13:50 <novalis_dt> demory, it takes into account max walk distance
13:50 <demory> it just generates every logical path, and then we choose from those?
13:50 <demory> ok
13:50 <mele> I guess we just need to define how far we're willing to go out of the way to avoid walking/how much extra time on the bus for xx distance walking
13:51 <demory> but we can still have a user preference for transfers vs. travel time, and use that to inform which path(s) we return
13:51 <novalis_dt> mele, by default, presently, walking time is considered 2x as bad as all other time
13:51 <novalis_dt> demory, yes, we could do that, to prune the list of paths.
13:51 <novalis_dt> The other question is when we cut off the searches. Search proceeds in order of number of boardings. So, first we find pure-walk trips, then one-boarding trips, and so on
13:52 <novalis_dt> The number of transfers we allow is the major contributor to compute time
13:52 <mele> ok great, so it's not sticking to that, or is it?
13:52 <novalis_dt> Right now, it's finding all combinations of earliest arrival/short walk
13:52 <novalis_dt> (within some epsilon)
13:53 <demory> would it make sense to have user-specified weighted preferences (a la the bike triangle)?
13:53 <demory> a transit triangle? :)
13:53 <novalis_dt> It would only sort of make sense
13:53 <novalis_dt> That is, we could use that to sort the paths found
13:54 <novalis_dt> Also, the nontransit portion already uses the bike triangle, for what that's worth.
13:54 <demory> yeah, that's true
13:55 <demory> also, currently OTP will ignore the max walk if no results are found at first, right? would that be the same w/ raptor?
13:56 <novalis_dt> We would have to rerun the search
13:56 <novalis_dt> But I think it would tend to be fast
13:56 <demory> ok
13:57 <novalis_dt> The typical problem with short walk distances is that they don't get you out of your origin/destination space
13:57 <novalis_dt> And that will be discovered pretty immediately by raptor.
13:59 <demory> ok, so we could just double the walk radius, and if still no results, double again, etc..
13:59 <novalis_dt> Yes, which is what we do now
13:59 <demory> maybe cap that at 2 or 3 passes
14:00 <demory> so, say we stick w/ the existing inputs -- shortest trip vs fewest transfers, and max walk/bike. (plus the bike prefs, if applicable)
14:00 <novalis_dt> Well, "fewest transfers" is not a real preference
14:00 <novalis_dt> We implement that with a vast transfer penalty
14:01 <novalis_dt> I actually did see raptor find the insane min-transfer route that I love so much
14:02 <novalis_dt> We definitely would use the transfer penalty if we decide to rank returned paths by weight. We just have to make sure not to cut off the search just because some paths have been found
14:02 <novalis_dt> in the case where the found paths are all insane
14:02 <novalis_dt> What defines "insane" is an interesting question, however.
14:02 <FrankP> Dave, where did Raptor come from? Any links to a paper describing the algorithm?
14:02 <novalis_dt> FrankP, yes, there's a paper. It's from Microsoft Research
14:03 <novalis_dt> The thing that the paper does not really address is how "footpaths" (in its terminology) work.
14:03 <FrankP> thanks
14:04 <novalis_dt> But the reason that the algorithm works well for cases like wilsonville->sandy is that it goes in order of number of boardings rather than absolute weight. And since there is some path with relatively few transfers between those two places, it does find that path.
14:06 <demory> so when you say "goes in order of # of boardings" what does that mean? does it mean it considers all 0-transfer trips first, then 1-transfer, etc.?
14:06 <novalis_dt> Exactly.
14:07 <demory> yeah, this actually sounds similar in some ways to the approach i took when I wrote my 1st trip planner in Atlanta (I did a presentation on that at the last workshop in Portland)
14:08 <demory> so, if a user wants a "fewest transfer" trip, we can just stop after the first "round" that produces any results, right?
14:08 <novalis_dt> We could
14:08 <demory> and return the "best" one of those (by time)
14:08 <novalis_dt> But nobody really wants that
14:08 <demory> well, technically that is the fewest-transfer trip, right?
14:08 <novalis_dt> The "fewest transfers" route from Grand Army Plaza to 28th & lex is:
14:08 <abyrd> actually, I think people often do really want least transfers as long as the trip is not totally insane
14:08 <novalis_dt> Wait until 1:00am. Then the 4 train will be running local.
14:09 <demory> ok. isee
14:09 <novalis_dt> Whereas the reasonable path is: take the 2/3 to the 4/5 to the 6. These are all across-the-platform transfers
14:09 <abyrd> you just need to apply some sanity checks to the first result you get back. if it fails then try more transfers.
14:09 <novalis_dt> And another option is: take the Q to the 6
14:10 <abyrd> of course one criterion is how much worse the 0-transfer trip is than the 1+ transfer trips... which you'll never know unless you find them
14:10 <novalis_dt> Exactly.
14:11 <abyrd> but assuming the 0-transfer trip is not insane (which could be judged almost entirely on the wait time for boarding) the user will be happy with that result
14:12 <novalis_dt> Yeah.
14:12 <novalis_dt> And we don't have to prune out insane waits if all of our waits turn out to be insane.
14:12 <novalis_dt> Like, if you plan a trip at 1:00am almost anywhere in Portland, you're going to be waiting 4 hours until the system starts up again
14:14 <novalis_dt> So, yeah, that will definitely help
14:15 <demory> so, likewise, if someone wants the "fastest trip" result, are there cases where the literal fastest trip is not actually what makes sense to return?
14:15 <novalis_dt> demory, when it involves six transfers
14:15 <novalis_dt> To go a tiny distance
14:15 <abyrd> and I really do think it's as simple as examining wait time to decide whether to proceed with more transfers
14:16 <abyrd> yeah, I think you definitely want some kind of combined weight to order all the pareto-optimal options
14:16 <novalis_dt> What's a "reasonable" wait time?
14:16 <novalis_dt> Just to put some numbers on it: a long trip (southampton to the bronx) at 11am. 3 boardings: 1 second. 5 boardings: 3.5 seconds. Of course I expect to get both of these numbers down.
14:16 <abyrd> and I suspect that trip planner users will not want to set those parameters themselves
14:16 <abyrd> weighting transfers, walking, etc. properly is part of the trip planner design
14:17 <novalis_dt> Yep
14:17 <novalis_dt> And our current weighting system is pretty good
14:17 <abyrd> people are more personally involved with the safety and energy expenditure of biking, so enjoy setting those params, but not in transit
14:17 <novalis_dt> And since the way I have implemented raptor produces a traditional path, we have those weightings
14:18 <abyrd> cool, then the simplest thing to do is return pareto-optimal paths in order of generalized cost, just like MOA*
14:19 <novalis_dt> OK.
14:19 <abyrd> with the slight difference that in RAPTOR you don't know that you've found the lowest generalized cost until you iterate all the way up to max transfers
14:20 <abyrd> whereas MOA* will always spit out paths in that order
14:20 <novalis_dt> Right.
14:20 <novalis_dt> But probably weight can be used to bound the search
14:20 <novalis_dt> Just as we now do for walk distance and time
14:20 <demory> btw, how is max_transfers determined? that heavily influences the overall running time right?
14:20 <novalis_dt> I think I tried a variant of that and it didn't improve speed, but I will try again because I have made a bunch of changes since
14:21 <novalis_dt> demory, yes, it's huge
14:21 <abyrd> but usually you can get up to 3+ transfers very quickly right? you'd only have to bail early for really awful searches.
14:21 <novalis_dt> demory, almost all of the compute time happens in later rounds
14:21 <demory> yeah, that makes sense
14:21 <novalis_dt> abyrd, well, "very quickly" is relative, but basically, yes
14:22 <abyrd> later rounds are what, >3 or something crazy like the 8 in the paper?
14:22 <novalis_dt> I don't recall exactly how they define rounds
14:22 <novalis_dt> For us, max rounds = max transfers + 2
14:22 <abyrd> well, I can judge from your numbers above (1 vs 3.5 sec)
14:23 <novalis_dt> That's +1 for the zero-transfer case and +1 because n transfers = n+1 boardings
14:23 <abyrd> right
14:23 <novalis_dt> My numbers are in terms of transfers
14:24 <abyrd> as for what constitutes insane wait time, that's hard to define. I think it's going to be determined through experience, but I'd start with something like 1 hour.
14:25 <novalis_dt> 1 hour total?
14:25 <abyrd> or how about the lowest route frequency in the graph, plus a few minutes?
14:25 <novalis_dt> The lowest frequency will likely be > 12 hours
14:25 <novalis_dt> because routes run differently at night
14:25 <novalis_dt> That late-night 4 only runs for like 4 hours total.
14:26 <abyrd> I was thinking of during the day, not over service day boundaries, but still there could be a service that only runs twice a day
14:26 <abyrd> so yeah that idea is out
14:26 <novalis_dt> Yeah, pretty often the first/last runs of the day are special
14:27 <abyrd> I'd say anything that involves waiting an hour or more for the first boarding is a but crazy
14:27 <abyrd> and should only be returned if all other options involve waiting that much
14:28 <novalis_dt> Yeah, that's fair.
14:28 <abyrd> or paths that involve waiting more than ah hour (or whatever arbitrary value we choose) for /any one/ boarding
14:29 <novalis_dt> Yeah, that's fair.
14:29 <abyrd> those should all be a hint to try another round
14:29 <abyrd> up to your absolute max
14:30 <demory> so maybe insane wait time is defined relatively, i.e. > 1 hr more than median wait time of all trips returned?
14:30 <novalis_dt> all trips returned when?
14:31 <novalis_dt> Like, ever?
14:32 <demory> well, for a given search
14:32 <novalis_dt> Here, we're contemplating when to stop the search early, tho.
14:34 <demory> so, take the case of the 2am search in Portland, when every trip will have like a 4+ hr wait
14:34 <abyrd> hey, following discussion on transit-developers, OTP Deployer and Analyst are both mentioned in the draft TRB paper just posted by Aaron Antrim and Sean Barbeau
14:34 <novalis_dt> Awesome.
14:34 <novalis_dt> demory, ah, for a given start time
14:34 <demory> normally, seeing those waits would normally tell us to keep doing more rounds, right?
14:35 <demory> but in this case we're never going to do better than that
14:35 <novalis_dt> Normally in A*?
14:35 <novalis_dt> Or in raptor?
14:35 <demory> so, i mean in raptor
14:35 <novalis_dt> Yeah, we could definitely use time of day information for that
14:35 <novalis_dt> That's going to be hard to benchmark, but probably reasonable in practice.
14:35 <abyrd> in the case where low-transfers results have a lot of waiting, we have to do more rounds to verify that they are not insane, in the given context
14:40 <demory> so, just so i understand -- we're talking about having typical waits for, say, each hour of the day and using that as a baseline for evaluating trips returned for a given search at that time?
14:41 <novalis_dt> I think that sounds like a good idea
14:41 <novalis_dt> Just to tell you when to keep searching after you find N paths
14:42 <demory> isn't there also geographic component -- i.e. in NYC at 2am, typical waits in Manhattan (where trains still run every 30 min) are going to differ considerably compared to a more suburban area (where the next bus isn't until 6am)
14:42 <novalis_dt> Yes.
14:42 <novalis_dt> But if we ignore that, we will still probably get good results
14:43 <demory> ok.
14:44 <demory> i need to step away for an errand pretty soon
14:45 <demory> but sounds like you are making good progress on this