Home - acli/w3m-debian GitHub Wiki

Upstream status

The SourceForge project is dead. Debian’s fork is the project that’s still alive.

Adding tags

This is a work in progress.

Modifying TagMAP in html.c is not enough; you need to also modify MyHashItem in tagtable.tab (not tagtable.c, which is a generated file), plus file.c, plus possibly table.c.

DEL tag

file.c case HTML_DEL.

Investigate using U+0338

This is sort of like the problem with the Q tag: by the time we reach HTMLtagproc1() ie the expected place to do these things, we have absolutely no idea what the character set is.

In theory we have DocumentCharset but it seems to be set to WC_CES_UTF_8 even when we declare <meta charset=us-ascii>. Also, if we actually insert UTF-8 it comes out as ??

Tag scanning

  • tokenizer internal FSM: next_status() in etc.c, with states defined in fm.h (R_ST_* constants)
  • tokenizer public API: read_token(), next_token() in etc.c
  • pay attention to the FSM because if the FSM doesn’t recognize a tag (e.g., SGML entity) it gets blanked out
  • ! (in SGML entities) gets dropped out somewhere (don’t know where)

The big picture:

  • HTMLlineproc0 claims it’s “first pass”
  • HTMLlineproc2body (in file.c)
  • loadGeneralFile → loadHTMLBuffer → loadHTMLstream → HTMLlineproc2 → HTMLlineproc2body
  • proc_again: label
  • HTMLlineproc0 ← loadHTMLstream

Named entities

  • w3m calls them “& escapes”
  • handled in getescapecmd() in indep.c
  • internal (?) function in getescapechar() in indep.c
  • once getescasecmd() have a Unicode code point, it calls conv_entity() in entity.c so that function is significant

Interactive content