Credentials - internetarchive/heritrix3 GitHub Wiki
Credentials can be added so that Heritrix can gain access to areas of
Web sites requiring authentication. Credentials are configured in the
Spring configuration file, crawler-beans.cxml. The following example
shows a configured Credential.
<bean id="credential"
class="org.archive.modules.credential.HttpAuthenticationCredential">
<property name="domain">
<value>
domain
</value>
</property>
<property name="realm">
<value>
myrealm
</value>
</property>
<property name="login">
<value>
mylogin
</value>
</property>
<property name="password">
<value>
mypassword
</value>
</property>
</bean>One of the settings for a credential is its domain. It is therefore possible to create all credentials at a global level. However, because this can cause excessive unneeded checking of credentials, it is recommended that credentials be added to a domain override. This way, the credential is only checked when the relevant domain is being crawled.
Heritrix offers two types of authentication: RFC2617 (BASIC and DIGEST
Auth) and POST and GET of an HTML form.