technical_documentation - Kungbib/openapc-se GitHub Wiki
Technical documentation for Open APC Sweden
Ulf Kronman, 2017-04-03
Swedish pre-processing
Script: /python/se/clean_and_merge_apc_files.py -l se_SV.UTF-8
Uses: Reads list of files to process from ../data/apc_file_list.txt
Function:
- Merges APC files
- Changes SANT/FALSKT to TRUE/FALSE
- Changes comma (,) decimal delimiter to period (.) decimal delimiter
- Removes big number whitespace formatting from Excel
- Checks for duplicate DOIs
Result: as TAB delimited in /data/apc_se_merged.tsv
Main enrichment process
Script: /python/se/apc_csv_processing.py -l se_SV.UTF-8 ../data/apc_se_merged.tsv
Uses: /data/apc_se_merged.tsv
Result: /python/out.csv
Swedish post-processing
Script: /python/se/normalise_and_copy.py
Uses: /python/out.csv
Result: /data/apc_se.csv
Analysis
Script: statistics.Rmd
Uses: /data/apc_se.csv
Result: statistics.md