Road Map - michaeltelford/wgit GitHub Wiki
Raise an issue if you have a feature request.
Nothing below is set in stone (until it's been implemented) and the order doesn't matter. Primary and Secondary refer to the importance of the change, not the amount of work or even when it'll be implemented. If in doubt, reach out and ask!
Primary
- ...
Secondary
- Honour
robots.txtCrawl-delayparameter for allindex_*methods. - Move
Crawler.browser_getmethod'sFerrum::Browser.newstatement into its own protected method (for user overriding).
Refactoring
- Use
refineinstead ofcore_ext.
Experiments
- Try improving crawl perf with the “async” gem (for both parallel http requests and multiplex http calls aka using the same https connection when crawling a site). Benchmark the results. See this article for reference: https://losangelesaiapps.com/concurrent-web-crawling-in-ruby-with-async/
- Try out lightweight alternatives to
nokogirie.g.ox,rexmletc. How much lighter are they as a dep? How much faster - benchmark? How much code needs changed? Do they provide as much functionality? Do they require native C extensions?