linux sort - beyondnlp/nlp GitHub Wiki

man sort๋ฅผ ์‹คํ–‰์‹œํ‚ค๋ฉด ์‚ฌ์šฉ๋ฒ• ์ค‘ ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ต์…˜์ด ๋‚˜์˜จ๋‹ค.

    -S, --buffer-size=SIZE
          use SIZE for main memory buffer
  • ํ•˜์ง€๋งŒ ์ •ํ™•ํžˆ ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ์ •๋„๋กœ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ์ถฉ๋ถ„ํžˆ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํฐ ์„œ๋ฒ„์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ 
  • ๊ทธ๋ž˜์„œ ์ง์ ‘ ์†Œ์Šค๋ฅผ ์—ด์–ด ํ™•์ธํ•ด ๋ณด๊ธฐ๋กœ ํ•œ๋‹ค.
  • http://ftp.gnu.org/gnu/coreutils/coreutils-7.1.tar.gz ์ฝ”๋“œ๋ฅผ ๋‹ค์šด
  • sort.cํŒŒ์ผ์—์„œ specify_sort_size()ํ•จ์ˆ˜๋ฅผ ๋ถ„์„
struct line{
  char *text;           /* Text of the line. */
  size_t length;        /* Length including final newline. */
  char *keybeg;         /* Start of first key. */
  char *keylim;         /* Limit of first key. */
};
* #define NMERGE_DEFAULT 16
* #define MIN_MERGE_BUFFER_SIZE (2 + sizeof (struct line))
* #define MIN_SORT_SIZE (nmerge * MIN_MERGE_BUFFER_SIZE)
  • ์‚ฌ์šฉ์ž๊ฐ€ ๋„ฃ์€ ๋ฌธ์ž์—ด์„ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.

    • ex> 10k = 10 * 1024 = 10240
    • enum strtol_error e = xstrtoumax (s, &suffix, 10, &n, "EgGkKmMPtTYZ");
  • ๋งŒ์•ฝ ๋ฐฑ๋ถ„์œจ๋กœ ์ฃผ์—ˆ์„ ๊ฒฝ์šฐ๋„ ์ฒ˜๋ฆฌ( 10% = ๋ฌผ๋ฆฌ์  ์‚ฌ์šฉ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์–ป์–ด์™€ * 0.1 ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹น )

    • double mem = physmem_total () * n / 100;
  • ์œ„์—์„œ ๊ณ„์‚ฐํ•œ ๋ฉ”๋ชจ๋ฆฌ์™€ MIN_SORT_SIZE์ค‘ ํฐ ๊ฐ’์„ ์‚ฌ์šฉํ•ด์„œ ํ• ๋‹น( ์ตœ์ข… ๊ฒฐ๊ณผ๋Š” sort_size ๋ณ€์ˆ˜์— ๋‹ด๊ธด๋‹ค )

  • initbuf()

    • ์•„๋ž˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ alloc๋Š” ์œ„์—์„œ ์„ค๋ช…ํ•œ sort_size์™€ ๊ฐ™์€ ๊ฐ’์ด๋‹ค.
    • ์›ํ•˜๋Š” ํฌ๊ธฐ๋งŒํผ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•  ์ˆ˜ ์—†์œผ๋ฉด ๋ฐ˜์œผ๋กœ ์ค„์—ฌ๊ฐ€๋ฉด์„œ ์‹œ๋„ํ•œ๋‹ค.
    • ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ž‘์œผ๋ฉด ์„ฑ๋Šฅ์— ๋‚˜์œ์˜ํ–ฅ์„ ์ฃผ๊ฒ ์ง€๋งŒ ์•ˆ๋˜๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ์ข‹์œผ๋‹ˆ...

static void initbuf (struct buffer *buf, size_t line_bytes, size_t alloc){
    for (;;){
    
        alloc += sizeof (struct line) - alloc % sizeof (struct line);  <- ํ• ๋‹นํ•  ๋ฉ”๋ชจ๋ฆฌ ์—ฐ์‚ฐ 
        buf->buf = malloc (alloc);                                     <- ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น( alloc๋งŒํผ )
        if (buf->buf)                                                  <- ๋ฉ”๋ชจ๋ฆฌํ• ๋‹น์ด ์„ฑ๊ณตํ•˜๋ฉด
            break;                                                     <- ๋ฃจํ”„ ํƒˆ์ถœ
        alloc /= 2;                                                    <- ํ• ๋‹นํ•œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ฐ˜์œผ๋กœ ์ค„์—ฌ๋ณธ๋‹ค.
        if (alloc <= line_bytes + 1)                                   <- line ๊ตฌ์กฐ์ฒด์˜ ํฌ๊ธฐ๋ณด๋‹ค ํ• ๋‹นํ•  ์–‘์ด ์ž‘๋‹ค๋ฉด ์ข…๋ฃŒ(์˜๋ฏธ์—†๋‹ค)
            xalloc_die ();
    }

    buf->line_bytes = line_bytes;
    buf->alloc = alloc;
    buf->used = buf->left = buf->nlines = 0;
    buf->eof = false;
}
This option can improve the performance of sort by causing it to start with a larger or smaller sort buffer than the default.
However, this option affects only the initial buffer size. 
The buffer grows beyond size if sort encounters input lines larger than size.
  • -S์˜ต์…˜์€ ์ดˆ๊ธฐ์— ์„ค์ •ํ•˜๋Š” ๋ฒ„ํผ ํฌ๊ธฐ์—๋งŒ ์˜ํ–ฅ์„ ์ค€๋‹ค.

๋‚˜๋ผ๋ฉด ์ด๋ ‡๊ฒŒ ํ•˜๊ฒ ๋‹ค. (ํ–ˆ๋‹ค)

  • ์•„๋ž˜ ๋‚ด์šฉ์€ sort.cํŒŒ์ผ์— ์žˆ๋Š” ์ฃผ์„์ค‘ ์ผ๋ถ€๋ฅผ ํ•ด์„ํ•œ ๋‚ด์šฉ์ด๋‹ค.
  * ์›ํ•˜๋Š” ํฌ๊ธฐ๋งŒํผ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•  ์ˆ˜ ์—†์œผ๋ฉด ๋ฐ˜์œผ๋กœ ์ค„์—ฌ๊ฐ€๋ฉด์„œ ์‹œ๋„ํ•œ๋‹ค. 
  * ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ž‘์œผ๋ฉด ์„ฑ๋Šฅ์— ๋‚˜์œ ์˜ํ–ฅ์„ ์ฃผ๊ฒ ์ง€๋งŒ ์•ˆ๋˜๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ์ข‹์œผ๋‹ˆ...
  • ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ถฉ๋ถ„ํžˆ ์‚ฌ์šฉํ•ด์„œ ์ •๋ ฌํ•˜์ง€ ๋ชปํ• ๊ฑฐ๋ฉด ์•ˆํ•˜๋Š”๊ฒŒ ๋‚ซ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ถฉ๋ถ„ํžˆ ํฌ๊ฒŒ ์žก์•„ mmap์œผ๋กœ ์ฝ์€ ํ›„ ๋ผ์ธ๋งˆ๋‹ค ํฌ์ธํ„ฐ๋ฅผ ๋‹ฌ์•„ ์ด ํฌ์ธํ„ฐ๋ฅผ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ•ด๊ฒฐํ–ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํžˆ ํฐ ๊ฒฝ์šฐ sort๋ณด๋‹ค ์ตœ๋Œ€ ๋ช‡๋ฐฐ ๋น ๋ฅธ ์†๋„๋ฅผ ๋ณด์ธ๋‹ค.
  • ์‹คํŒจํ•˜๋”๋ผ๋„ ํŒŒ์ผ์„ splitํ•˜์—ฌ ๊ฐ๊ฐ ๋‚˜๋ˆ„๊ณ  ๋‹ค์‹œ merge sortingํ•˜๋Š” ๊ฒƒ์ด ๊ฒฝ์ œ์ ์ด๋‹ค.