dicomtocsv - dgobbi/vtk-dicom GitHub Wiki
Scan a directory tree for DICOM files, and print the metadata in a format usable for a spreadsheet or database.
dicomtocsv [options] <directory>
-k tag=value Provide a key to be queried and matched.
-q <query.txt> Provide a file to describe the find query.
-u <uids.txt> Provide a file that contains a list of UIDs.
-o <data.csv> Provide a file for the query results.
--first-nonzero Search series for first nonzero value of each key.
--all-unique Report all unique values within each series.
--min-value Report the minimum value within each series.
--max-value Report the maximum value within each series.
--directory-only Use directory scan only, do not re-scan files.
--ignore-dicomdir Ignore the DICOMDIR file even if it is present.
--charset <cs> Charset to use if SpecificCharacterSet is missing.
--images-only Only list files that have PixelData or equivalent.
--noheader Do not print the csv header.
--study Print one record for each study.
--series Print one record for each series (default).
--image Print one record for each image.
--silent Do not report any progress information.
--help Print a brief help message.
--version Print the software version.
This program will create a .csv file in accordance with the supplied query information. If no query is specified, then a default query will be used. The .csv file will list the attributes of each DICOM series that is found within a directory tree.
For each attribute to be extracted, the tag can be given with "-k", for example "-k PatientName". The attributes can also be specified with "-q query.txt" where each line of "query.txt" gives one attribute. For detailed information on how to specify a query, see Command Line Tools.
The output file is formatted as follows, with one header line followed by the comma-separated, quote-enclosed values.
PatientName,PatientBirthDate,PatientSex,StudyDate,StudyTime,SeriesNumber,SeriesDescription
"TEST^PATIENT","19360703","M","20140603","105200","2","Cerebral 4.0 H31s"
The command to produce this output is as follows, which searches the current directory for DICOM files:
dicomtocsv . -k PatientName -k PatientBirthDate -k PatientSex -k StudyDate -k StudyTime \
-k SeriesNumber -k SeriesDescription
The order of the fields in the .csv file is the same as given on the command line. If the command line repeats an attribute, then that field will still only be listed once, with its first appearance on the command line dictating the order in the .csv file.
All output is given in utf-8, with conversion from the original character set to utf-8 if necessary. If any value contains a quotation mark, the quotation mark will be doubled as per RFC 4180. The file will use <CR><LF> to end each line.
By default, the .csv file will provide one record per series, but with the "--image" option it will provide a record for every file. Similarly, the "--study" option will only provide one record per study. The records are sorted first by patient, then by date, and finally by instance number.
The use of "-o" to give the name of the output file is optional. If no output file is given, then the output will be written to stdout. However, one advantage of using "-o" to specify the output file is that this allows dicomtocsv to print progress information to the terminal, which is useful during a long scan. This progress information can be turned off with the "--silent" option.
The default series-level query is as follows:
# patient-level information
PatientName
PatientID
PatientBirthDate
PatientSex
# study-level information
StudyDate
StudyTime
StudyID
AccessionNumber
StudyDescription
StudyInstanceUID
# series-level information
Modality
SeriesNumber
SeriesDescription
SeriesInstanceUID
Rows
Columns
NumberOfReferences
The "NumberOfReferences" attribute will add a field for the number of files in the series.
When using the "--image" option, the above query will be expanded to include the following fields:
# image-level information
InstanceNumber
SOPClassUID
SOPInstanceUID
ReferencedFileID
The "ReferencedFileID" attribute will add a field for the file name.
If "--directory-only" is given, then the results will be limited to what is present in the DICOMDIR file for the directory (or, in the absence of a DICOMDIR file, to the information that a DICOMDIR typically contains). This option is useful when scanning a CD, since it allows a summary to be provided by scanning just a single file on the CD. The information in the default queries listed above will be provided by most DICOMDIR files.
The "--first-nonzero" option can be used when writing one record per series. For each attribute, it causes dicomtocsv to scan the entire series for that attribute and print the first value that has a nonzero value. It has no effect on non-numeric values. If dicomtocsv is writing one record per series and this option is not used, then the record will show the values from the first file in the series.