CSV Conventions - GaiaViz/GaiaViz GitHub Wiki

There is no 'standard' CSV format, hence we define ours here!


CSV Raw Text Example

CSV_raw-text.png

[!NOTE] HINT: Verify the raw text formatting with a text editor such as 'notepad.exe' (MSW) or 'textedit' (OSX).

We use CSV (Comma Separated Values) files to load and save our datascapes. You can edit these files using a spreadsheet program (LibreOffice/Excel), database (MySQL/PostgreSQL) or programatically (Python/PHP). But be careful to export as described below.


CSV Formating

  • Field delimiter is a comma: ,

  • String delimiter is double quotes: "

    • Strings can contain a line-break, double-quote pairs and/or commas.
    • Use (solo) double quotes " around simple string fields:
      • "Example String"
    • Use pairs of nested double quotes "" within string fields:
      • Tag 'title' with a quote inside:
        • "and I ""quote!"" within a string"
      • URL string:
        • "<a href=""www.example.com"">pairs of double quotes</a>"
      • Filepath to an app and passing a command line argument:
        • "<a href=""""GaiaViz.exe"" -f User/Tutorials/Tags/U-key-Open-App_npe.bat"">GaiaViz app</a>"
    • Note that an empty string will appear as a pair of double quotes "" (as well).
  • Numbers (float, int, uchar, etc.)

    • Number columns should ONLY have numbers (or nothing at all).
      • 42,3.14,0.0,0,,,,,88
        • Empty values ,, are treated as zero.
        • Non-decimal characters are NOT allowed, (eg. nan or inf).
          • NO scientific notation, (eg. 2e-06).
    • Numbers should NOT be quoted.
    • Most of our numbers are single precision 32bit (signed) int or float.
    • Some exceptions include color components (8bit) and record_id (64bit).
  • UTF-8 encoding:

    • Can also use plain ASCII, which is directly compatible.
    • Our Text Tags have partial support for other languages.
  • First row is the field names.

  • First cell (upper-left) is the 'id' column (and table name), eg. np_node_id

  • The first record (data) starts on the 2nd row.

    • For convienance, we often start with a null record (unused by scene) to show default values.
  • Each row is terminated with a CRLF line-ending*, including the last row!

*Reading is tolerant to different line endings CRLF (MSW), CR (OSX) and LF (Linux). We write using CRLF.


Python - CSV Writing

  • Explicitly format the field types:
    • Assert double quotes around strings.
    • Assert floats are floats and that integers are integers (42 is an integer, 42.0 is a float).
      • Do NOT allow non-numeric values in a numbers column (eg. nan or inf or 2e-06).
      • Empty value (nothing at all) in a field is okay.

LibreOffice - Export CSV

  1. File -> Save As
  • Save as type: Text CSV (*.csv)

CSV_LibreOffice_Save-As_1.png

  1. Check '[x] Edit filters settings' box to get the settings dialog:

CSV_LibreOffice_Save-As_2.png

Settings:

  • 'Character set:' Unicode (UTF-8)
  • 'Field delimiter' is a comma ,
  • 'String delimiter' is a double quote "
  • '[x] Save cell content as shown'
  • '[x] Quote all text cells'

Note that LibreOffice will automatically add the nested pairs of double quotes "" for quotes that are within a string.


For a general discussion on CSV formats see "Towards standardization" on wikipedia CSV page: