Weekly check in 2012.03.15 - GeoSmartCity-CIP/gsc-opentripplanner GitHub Wiki
13:32 <demory> Hey folks, ready to get started w/ the check-in?
13:32 <mattwigway> Yep.
13:32 <novalis_dt> Sure.
13:32 <kpw> it might be nice to have a both eventually (I can see apps that would make use of both) but in the near-term let's focus replicating basic timetable functionality
13:32 <andrewbyrd> Hi, sure.
13:33 <demory> ok, let's get started then
13:33 <novalis_dt> I've been working on misc issues all over the codebase. I'm waiting to hear back from Brian Ferris on the DST issue; if I don't hear from him by Monday I will pester him by email.
13:34 <kpw> novalis_dt: brian's been away at a conf this week so he's probably got an email backlog
13:34 <novalis_dt> Yeah, I'm willing to give him time - - the next DST issue won't be until november
13:34 <demory> My week has been split between system mapping and OTPsetup work, but focusing on OTPsetup now. Have upgraded to 0.5, though there are some issues to be worked out esp. w/ larger deployments. Currently working on setting up a parallel "test" version of the workflow on AWS that will allows us to test w/o disturbing the live application
13:34 <novalis_dt> Well, hopefully it will be never -- I wrote to my senators to ask them to abolish DST, and maybe this time they'll listen.
13:35 <novalis_dt> Also, I'll be out Tuesday next week for some minor surgery. I hope to be back at work on Weds, but that depends on how I feel; I won't be around for the meeting regardless because I have a follow-up appointment at that time.
13:36 <demory> ok, hope it goes well!
13:36 <novalis_dt> Thanks.
13:38 <andrewbyrd> I think the agency issues are fixed now. Those changes will be in the release, so they should be available in OTPSetup for the NYC demo.
13:38 <kpw> great! so that's all related to agency ids?
13:38 <demory> excellent. when are you planning to do the release?
13:39 <novalis_dt> andrewbyrd, so, the upshot is that now if a feed wants to share stops with another feed, it must either have the same agency id in agency.txt, or set defaultAgencyId?
13:39 <andrewbyrd> I have also been experimenting with different ways of pulling requests out of http query strings and paths. Just using the java servlet SPI, no Jersey annotations.
13:39 <andrewbyrd> And combining that with a lightweight dependency injection framework instead of spring. I'm trying it out on analyst first to see how it goes.
13:40 <FrankP> Sneak my update in (I'll only be on the chat til 11'am) -- I fixed the UI bug (show / hide) from last week's chat...also added agency to the itinerary output. Was going to add maxTransfers, but need to change the API to accept null values, and use it's default value.
13:40 <andrewbyrd> novalis_dt : yes
13:40 <novalis_dt> andrewbyrd, can you make a note about that on the GraphBuilder wiki page?
13:41 <mattwigway> With help from novalis, I coded up a quick annotation search tool in VizGui.
13:42 <novalis_dt> kpw, actually, one more question for 636(b) -- is this mainly intended for consumer facing things? Because there's actually a lot of complexity around service ids and stops that don't allow pickups and various other things that we could avoid by simply specifying a single date (or date/time) to do the search on.
13:42 <kpw> mattwigway: that's awesome, thanks!
13:42 <andrewbyrd> OK, will add a note about agency issues on the wiki.
13:43 <kpw> novalis_dt: let's specify a date. that's important for a lot of reasons
13:43 <novalis_dt> OK.
13:43 <andrewbyrd> kpw: I think agencies are fine for now, and if there are any other problems lurking this new york graph should reveal them.
13:43 <kpw> (default to today but specify another if desired, perhaps?)
13:44 <novalis_dt> kpw, sure.
13:45 <novalis_dt> FrankP, did you manage to figure out why you were getting different results on the stop linking than I was? Can I see the config you're using for the instance built with MapBuilder?
13:45 <FrankP> novalis_dt ... beyond date, it might be interesting to have relative dates, e.g., next Saturday, next Sunday
13:46 <mattwigway> As to the first trip/last trip discussion, SF 511 (transit.511.org) has this functionality.
13:46 <novalis_dt> FrankP, Well, this is for the API
13:46 <novalis_dt> FrankP, so handling that on the client side should be fine.
13:46 <mattwigway> FrankP: BART also has generic Weekday, Sat and Sun schedules.
13:47 <novalis_dt> (you'll notice that this is my answer to everything)
13:47 <FrankP> client...
13:47 <kpw> FrankP, novalis_dt, mattwigway:that's a common scenario but it gets tricky fast (esp. with holidays). i agree about not baking that into the api just yet
13:48 <novalis_dt> It's not that I'm lazy. It's that I really prefer minimal APIs with layering where necessary.
13:48 <FrankP> The graph yesterday didn't have mapbuilder I guess...looked today at the config, and my edits were gone. New purl=/osm graph has map builder.
13:49 <FrankP> novalis_dt -- re: FrankP, did you manage to figure out why you were getting different results on the stop linking than I was? ^^
13:49 <novalis_dt> FrankP, cool. So does 6668 look good?
13:49 <novalis_dt> And interns, when you get the chance, can you check out the linking in /osm to see if these changes made any difference?
13:50 <FrankP> Haven't found a trip that looks different (old version looks okay), but 7964 does look better.
13:51 <grant_h> just tested 6668 and it snapping to the transit route now
13:51 <novalis_dt> Excellent.
13:53 <mele> will check more later in the day
13:53 <FrankP> 6668 is origin in both new & old urls above, and it does the same thing ... destination is 7964, and it does better in the 1st (map builder) url
13:54 <FrankP> Are there any worries, downsides to using mapsBuilder? Should I go with that for our production graph?
13:54 <mattwigway> Does the new code try to snap to routes of the same agency as the stop? I've got a place where there's a light rail platform adjacent to a heavy rail, and the light rail stop is snapped to the heavy rail platform (this is two different agencies).
13:54 <novalis_dt> There are some worries about mapbuilder -- it's very undertested.
13:55 <novalis_dt> But other than that it is unlikely to break anything
13:55 <novalis_dt> mattwigway, actually, it's more specific than that
13:55 <novalis_dt> mattwigway, and less
13:55 <novalis_dt> mattwigway, for rail stops, it prefers to snap to any platform over a non-platform
13:56 <novalis_dt> mattwigway, for bus stops, it tries to snap to streets which are used by routes which serve that specific stop
13:57 <kpw> novalis_dt, demory: we fixed the issue with duplicate segments, correct?
13:57 <novalis_dt> kpw, yeah
13:57 <novalis_dt> I guess we could have platforms in OSM specify which stopid they're for
13:57 <mattwigway> I've got an issue with two parallel platforms, one for light rail and one for heavy rail. I guess there's no way to fix that though.
13:57 <novalis_dt> Well, if the stops were precisely located, that could work
13:58 <FrankP> Orthogonal to this, do we have any good automated test recommendations (beyond the default jUnit) to capture and compare API / trip output. We'd like to build a set of tests that run (automated) before we release a new graph. Currently, a lot of manual work here, and I'd like to automated it to run with new GTFS.zip productions...
13:58 <FrankP> (BTW, have to run but will have window open...chat later)
13:59 <novalis_dt> I don't off the top of my head know of one.
13:59 <kpw> FrankP, we've talked about getting batch trip setting for benchmarking and graph/algo changes. folks, how are we on that front?
13:59 <mattwigway> I had a quick Python script that I used to do batch trips everywhere to everywhere from a PostGIS table... let me find it.
14:00 <kpw> can we add that to the CI work you've been doing andrewbyrd?
14:00 <andrewbyrd> yes, we really need better integration testing and that sort of code could be reused for trying out new graphs
14:00 <novalis_dt> later, mele.
14:00 <kpw> when we discussed this last i think there were questions about the data, but i thought we could use a reference data set
14:01 <kpw> given gtfs/otp and query set (could be historical)
14:01 <andrewbyrd> I have a little script that loads a bunch of origins and destinations categorized by center, suburbs, outlying and tries all the permutations of endpoints with ranges of walk distance, modes, etc
14:02 <mattwigway> Sounds like andrew's is more sophisticated than mine.
14:02 <andrewbyrd> but we need to run that sort of thing against expected output
14:02 <kpw> i'd love to capture, the time/resource requirements (via novalis_dt's memory monitoring code) plus outcomes for the same trip pairs. which routes changed/can/can't be completed now as a result
14:02 <kpw> et.c
14:03 <kpw> if the CI server could do that automatically it would be great!
14:03 <andrewbyrd> and also make sure the whole batch meets certain requirements - no uncaught exceptions, realistic trip lengths etc
14:03 <andrewbyrd> kpw: I planned to have that kind of tests run at every commit on the CI server
14:03 <kpw> great!
14:04 <kpw> can you leverage novalis_dt's resource monitoring stuff too?
14:04 <kpw> just so we can log the performance impacts of changes
14:04 <novalis_dt> What I did was pretty primitive, but it was probably usable at least for testing
14:04 <andrewbyrd> sure, I need to familiarize myself with it. I guess I will translate these ideas into Java so we can have better access to internal information.
14:04 <kpw> yeah, even just a ballpark is better than not knowing
14:05 <kpw> it would be great to catch stuff that ballons memory/cpu utilization early
14:05 <kpw> esp. as folks move toward productions
14:05 <andrewbyrd> and while we're at it we might as well stuff the performance results into a database or at least a text file so we can compare with past restuls
14:05 <kpw> yep!
14:05 <kpw> if we can log it to the db that would be great
14:06 <kpw> we can compare results
14:06 <kpw> also gives folks the ability to test these impacts automatically (no need to run the test locally, just check in your code and the CI server will run it using reference data/hardware)
14:07 <andrewbyrd> one record per trip, keyed on build number,origin,destination,traverseoptions with memory consumption and itinerary summary included
14:07 <andrewbyrd> that way we can mine out other information later if we want to compare
14:07 <kpw> perfect!
14:08 <kpw> also, let's create test query sets (1000 random trips) so they can be re-run precisely
14:08 <kpw> 1000 was just an example
14:08 <andrewbyrd> frankp: what's your testing procedure like? could you send me a list of endpoints with names?
14:08 <mattwigway> kpw: excellent, I don't have enough RAM for the current integration tests, so I never run them.
14:09 <demory> i think the Trimet folks had to step off at 10
14:09 <kpw> and for resource/performance impacts i think we need to have a better reference platform otherwise it's not possible to make comparisons
14:10 <andrewbyrd> kpw: I was going to do a pseudo-random endpoint set with a density distribution centered on the center of the metropolitan area, and falling off farther out. by specifying the seed value we can get the same set each time.
14:10 <kpw> perfect!
14:10 <andrewbyrd> or different sets as needed.
14:10 <kpw> as long as we can repeat the values that's great
14:10 <novalis_dt> I actually think edge-to-edge times are the more important thing to look at, as those are the ones that tend to give the worst performance
14:11 <andrewbyrd> demory: oh right, I will catch frank by email later.
14:13 <andrewbyrd> novalis_dt: true. I was thinking in terms of measuring real life workload, where there will be drastically more trips in the center. since some optimizations work by lowering the median response time rather than the worst case, I figured getting the trip mix right would be important for estimating performance.
14:13 <andrewbyrd> but of course we need to test the long trips as well.
14:13 <novalis_dt> Maybe two different trip sets
14:13 <andrewbyrd> if the endpoint sets are parametric, we can just do center x outlying or outlying x outlying as needed.
14:14 <andrewbyrd> in different tests (realistic workload vs. try to break OTP tests)
14:14 <kpw> andrewbyrd: can you record distance and time for each trip also? that will be a useful metric for improved routing
14:14 <kpw> (avg trip time changes between built for e.g.)
14:15 <andrewbyrd> kpw: so will we be sticking with one city for these integration tests or several? I ask because I'm trying to figure out how we are
14:16 <andrewbyrd> going to identify trips.
14:16 <novalis_dt> I think we ought to just use Portland
14:16 <andrewbyrd> I suppose the full tuple of SPT request fields is fine, along with a city indication.
14:16 <kpw> yep: let's use a reference data set
14:16 <novalis_dt> With maybe SMART and Cherriots too
14:17 <kpw> can you just hash the request params to create a trip id? but for most reporting i thing we'd take the aggregate for the whole query set
14:17 <andrewbyrd> But it would be better if instead of lat/lon we had foreign keys for the endpoints, which would all be in a table with names.
14:17 <novalis_dt> Well, the request params might change over time
14:18 <kpw> ah, the date maybe. but if we have a historical data set we can always just query using the same time/date range, no?
14:19 <kpw> we could always have multiple data sets too
14:19 <novalis_dt> Well, I meant that we might add new params
14:19 <andrewbyrd> I think it's a good idea to keep every trip result in a database and compute the aggregates from there. Disk space is cheap and we may want to do cross-run profiling on a given trip.
14:19 <kpw> got it
14:19 <kpw> for data, what about portland and nyc for e.g.?
14:20 <novalis_dt> Sure, although we should make sure we get a good build for NYC first.
14:20 <andrewbyrd> yes, one medium city and one metropolis... let's get something working on PDX, keeping multi-city option in mind when designing the schema.
14:20 <kpw> great!
14:21 <kpw> pdx makes sense
14:21 <kpw> then nyc once we have it working
14:22 <andrewbyrd> it would be great if this could communicate with OTP both via the public api and by directly calling code, so it could be reused in different test situations.
14:24 <kpw> such as?
14:26 <andrewbyrd> i mean just the pseudo-random request bit. integration tests / profiling (running in servlet container) vs. unit tests (no servlet, want to assert things about the result internals)
14:29 <demory> Sorry I have a basic question -- where will the integration test code live? Will this be separate from the main OTP repo?
14:30 <andrewbyrd> there's a module for it, maybe some of it would be pulled out into utils/common
14:30 <demory> ok
14:33 <novalis_dt> OK, so let's call this meeting over, and move into the next meeting via phone.