Release Notes Heritrix 3.4.0 20210803 - internetarchive/heritrix3 GitHub Wiki
Summary of changes since Release Notes - Heritrix 3.4.0-20210803 - see the full changelog for more details.
Additions
- ExtractorChrome: reduce request duplication between browser and frontier #416 (ato)
- ExtractorChrome: Capture requests made by the browser #411 (ato)
- Add ExtractorChrome to contrib #403 (ato)
- Add basic syntax highlighting to the crawl.log viewer #408 (ato)
- JDK 16 compatibility #418 (ato)
Changes
- Upgrade httpclient to 4.5 #397 (anjackson)
- Don't extract data URIs #423 (ato)
- ToeThread: ensure currentCuri is finished before exiting #421 (ato)
- Switch from Travis CI to Github Actions #404 (ato)
- Speed up test suite #405 (ato)
- Fix a couple of boring maven warnings #407 (ato)
- Fix and document the -r option which runs a named job on startup #406 (ato)
- Upgrade maven-assembly-plugin to 3.3.0 to fix file permissions #414 (ato)
- Warc writer stats fixes #410 (ato)
Removals
none
Bugfixes
- Fix WARC-IP-Address and use a common server-ip CrawlURI attribute for all protocols #409 (ato)
- Jobs can get stuck STOPPING with "Interrupt leaving unfinished CrawlURI" #420
- Groovy version is incompatible with JDK 16+ #419
- module java.base does not export sun.security.tools.keytool to unnamed module @1ece4432 #417
- Distribution package has broken filesystem permissions #413
- Add WARC-IP-Address header to WARCWriterChainProcessor #396