parser repertoires - jacquesfauquex/DCKV GitHub Wiki

parser repertoires

Los repertoires DICOM se dividen en dos grupos. Los 14 que NO permiten la técnica de code extension y los 16 que la permiten. Está técnica permite completar el charset principal por uno o dos más, que permiten precisar el primer grupo (alfabetico) por un segundo (ideografico) y un tercero (fonetico).

Otro repertorio se usa en los atributos de tipo UR (escaped UTF-8), aunque no fue repertoriado en las listas.

Tambien es posible no indicar ningun repertorio, en cual caso es latin 1 ("ISO_IR 100") que applica.

En suma, contamos 32 repertoires, número que cabe en 5 bits. En caso de uso de 3 repertorios (técnica de code extension), multiplicamos esta cifra por 3 y llegamos a 15 bit. Entra perfecto dentro del unsigned shore (16 bit) que reservamos luego de la value representation para indicar el repertoire del valor del attribute.

charsets aplican exclusivamente a atributos de vr vl LO,LT,SH,ST y vr vll UC y UT

lista de repertorios

no repertorio definido

idx	código	descripción
0x00		empty

Single-Byte Character Sets Without Code Extensions

http://dicom.nema.org/medical/dicom/current/output/html/part03.html#table_C.12-2

idx	código	descripción
0x01	ISO_IR 100	latin 1
0x02	ISO_IR 101	latin 2
0x03	ISO_IR 109	latin 3
0x04	ISO_IR 110	latin 4
0x05	ISO_IR 148	latin 5
0x06	ISO_IR 126	greek
0x07	ISO_IR 127	arabic

0x09	ISO_IR 144	cyrilic
0x0A	ISO_IR 138	hebrew
0x0B	ISO_IR 13	japanese
0x0C	ISO_IR 166	thai

Multi-Byte Character Sets Without Code Extensions

http://dicom.nema.org/medical/dicom/current/output/html/part03.html#table_C.12-5

idx	código	descripción
0x08	ISO_IR 192	Unicode in UTF-8

0x0D	GB18030	GB18030
0x0E	GBK	GBK

atributos vr UR (not listed in DICOM repertoires)

idx	código	descripción
0x0F	RFC3986	url-encoded utf-8

The URI/URL (UR) VR uses a subset of the Default Character Repertoire as defined in [RFC3986], and shall not use any code extension or replacement techniques. URI/URL domain name components that in their original form use characters outside the permitted character set shall use the Internationalized Domain Names for Applications encoding in accordance with IETF RFC5890 and RFC5891. Other URI/URL content that uses characters outside the permitted character set shall use the Internationalized Resource Identifiers encoding mechanism of IETF RFC 3987, representing the content string in UTF-8 and percent encoding characters as required.

Single-Byte Character Sets with Code Extensions

http://dicom.nema.org/medical/dicom/current/output/html/part03.html#table_C.12-3

idx	código	descripción
0x10	ISO 2022 IR 6	ascii
0x11	ISO 2022 IR 100	latin 1
0x12	ISO 2022 IR 101	latin 2
0x13	ISO 2022 IR 109	latin 3
0x14	ISO 2022 IR 110	latin 4
0x15	ISO 2022 IR 144	cyrilic
0x16	ISO 2022 IR 127	arabic
0x17	ISO 2022 IR 126	greek
0x18	ISO 2022 IR 138	hebrew
0x19	ISO 2022 IR 148	latin 5
0x1A	ISO 2022 IR 13	japanese
0x1B	ISO 2022 IR 166	thai

Multi-Byte Character Sets with Code Extensions

http://dicom.nema.org/medical/dicom/current/output/html/part03.html#table_C.12-4

idx	código	descripción
0x1C	ISO 2022 IR 87	japanese
0x1D	ISO 2022 IR 159	japanese
0x1E	ISO 2022 IR 149	korean
0x1F	ISO 2022 IR 58	simplified chinese

bit 16

Lo reservamos para indicar si el valor en su charset original ha sido transformado en utf-8 durante el parseo. La indicación del charset original permite volver a él en caso de ser necesario (por ejemplo latin1).