c_basics - RicoJia/notes GitHub Wiki

========================================================================

Comp Sci Basics

========================================================================

  1. Endianess

    • Origin: Gulliver's Travels(格列夫游记)whether should we put the large digits to high bits (Big endian) or low bits in a register:
    int 123 -> 1-2-3 or 3-2-1? 
  2. Style

    • contains: Author, Purpose, Usage, References,
    • You spend more than half of your time maintaining your code, so be clear
    • OOP, structured programming, goto-less programming.... These are "cults"
      • goto is useful for breaking out of nested for loops, without using too many flags
    • avoid side effects, use ++ or -- on separate lines
    • the law of least astonishment: no surprise = the best
    • surround macros with {}
        #define DIE(msg) {printf("eheh"); }
    • use #ifdef, #undef at the top of the program
  3. A program can launch multiple processes

    • In Unix systems, each program has an initial process
    • Then you can launch "clone" of the original process
  4. C comes from Free Software Foundation, distributing it to computers

  5. software has a life cycle, it dies, and gets replaced by a younger product

    1. write a project description, then show it to your boss and colleague (like a google doc)
    2. build a minimal prototype
    3. Understanding someone else's work is the hard part.
      • Initial meeting: which files, what they do on a high level, how to build and launch, how to test (if there's a recommendation)
      • cross reference, indenters are great tools.
    4. taking notes is very important.
    5. need to learn some debugger
    6. When you work on new functions, start from the new functions, understand how it works, then work your way up
    7. actually starts processor commands, can be used to edit something other than c.
  6. Floating point

    • floating addition requires floating nums to shift. floating numbers' multiplication do not require them to shift. So they take about the same amount of time
    • storage and calculations may have different accuracies.
    • floating pointer arithmatic is required to be in double by c
    • sine values might have bad precision at some angles.
  7. Files

    • printer, terminals are files too.
    • each file is treated as a series of bytes
  8. expose: add an interface to manipulate objects.

  9. optimization might be expensive: and can bring only 10-20% speed up.

    • register: for fast access. Unit has 12 registers.
        register int i;     
    • shift instead of multiply
        a <<2;
    • for nested loops, have the simple function in the inner loop
    • the ultimate optimizations: floating pointer acclerator -> high speed RISC computer -> macros -> assembly directly.
    • instead of having ascii files, have binary files
      • Ascii is like web pages. binary files: images
  10. Compiler

    • compiler first has a lexical analyzer, spitting out tokens and operators for words.
  11. OS will free all memory used by a program, even there's memory leak

  12. To see files in hex, use xxd, or hexedit

Development Philosophy:

Before you start

  1. ALWAYS KNOW WHAT THE REQUIREMENTS ARE

    • ask boss what the final form is.
    • Ask him what other potential features they'd like to add, to reduce the amount of "oh could you please add this? could you please change this" etc.
    • Know what matters to development, what doesn matter
  2. Design a minimalist development track

    • Know how to benchmark it?
    • Make your input & output exactly like the benchmarks, very easy to benchmark
      • like the histogram count cuda project, libjpeg doesn't yield the same result as np.histogram(), and we can just output the same array as libjpeg.
  3. Always fast prototype for something small:

    1. Try using gcc, nvcc for compilation (instead of cmake)
    2. Use existing library to avoid complex libs (may install them)
  4. Compilation:

    1. Compilation: process
    2. setbuf(stdout, NULL): no buffering printing
    3. pause() puts a thread to sleep until signal arrives, sleep(1) will sleep for a while
    4. Signals: good read
      • SIGUSR1 doesn't have a pre-assigned keybinding to it, so can be sent by another user process
      • example

Development

  1. Experience: for large project, it's important to have a struct with all params!

  2. Copying and pasting can be detrimental: if there's something haven't changed, it can be long to debug, like the cmake command for copying: target has been copied will yield another empty copy, strange to debug.

DEBUGGING

  1. reproduce the bug + locate it (90% of the time)

  2. talk to other ppl will help you find the bug

========================================================================

Basic Functions

========================================================================

IO

  1. printf

    • %g is for scientific numbers
    • printf loads args like a stack (same as cout)
      int i = 1; 
      printf("%d, %d", i, i++);       // see 2,1
    • printf does not check if args are supplied, and it will make one up if nothing is supplied
      printf("%d, %d");       // see garbage values
    • printf("%p", ptr), print address stored in a pointer
  2. argc (num of args), argv (args themselves, the first arg is the name of the program);

    while ((argc > 1) && (argv[1][0] == '-')){
      switch(argv[1][1]){
        case 'v': 
          //DO_STUFF
          break; 
        case '0': 
          //do stuff
          break; 
        default: 
            break; 
      }
      --argc; 
      ++argv;     // move onto the next arg
    }
  3. fgets + sscanf

    • read in an array and parse an array using scanf
      char arr[15]; 
      fgets(arr, 15, stdin);    //stdin is the keyboard 
      int count, total; 
      sscanf(arr, "%d %d", &count, &total);     //sscanf just parses the arr
      • don't use scanf cuz it's fussy about '\n'
      • Initialize char arr:
        char arr[] = "123";     //method 1
        char arr2[12]; 
        #include <string.h>
        strcpy(arr, "lol"); 
      • fgets automatically has a '\n' at the end, so STRIP OFF \n if you need to
      • sscanf has a sister function: fscanf, just scans thru files.
  4. '\n' vs '\n\n'

    • UNIX uses '\n'(called ), where MS-DOS uses '\n\r' ( ), apple uses <RETURN>
    • Before computer, there's a tele-typewriter. It takes two unit time to switch lines, where 1 unit time is for typing a char
    • After computer, ASCII chooses to have just one . UNIX
  5. logging

    • fprintf usage, note you DON't NEED to concatenate strings, just copy and paste!
       FILE * pFile;
       pFile = fopen ("myfile.txt","w");
       fprintf (pFile, "Name %d [%-10.10s]\n",666,"lol");   // 
       fprintf (stdout, "Name %ld [%s]""ll""asdfa",666,"lol");    //Name 666 [lol]llasdfa
    • a simple macro for logging: note ## is an old special token that's only valid after ,
      #define CYN   "\x1B[96m"
      #define LOG(fmt, arg...) fprintf(stderr, CYN "haha "fmt,##arg);        
      LOG("status : %d", 123);
    • Other ways for variadic
    • macro:
      #define BYTE_TO_BINARY(byte) ((byte & 0x02) ? '1' : '0'), ((byte & 0x01) ? '1' : '0')
  6. File IO 1.fopen() flags: - w: create file if not there; will overwrite - b means open the file as a binary file

  7. format a string using std::sprintf:

    int main(){
      // sprintf for string formatting. Restriction: preallocate arr!
      char arr[10]; 
      float num = 1.2345; 
      sprintf(arr, "he:%06.3f", num);
      printf("%s", arr); 
    }

========================================================================

Pointers

========================================================================

  1. Misc
    • pointers are also calledscript callbacks
    • sizeof
      • sizeof(ptr) gives the number of bytes of each datatype
      • pointer arithmatics
        int *p = arr + 2; //second element of the array
        p - 2		//returns 2. 
  2. Null pointer (C and C++)
    • (int*)0 is a constant expression whose value is a null pointer
    • 0 or (void*)0 in C are null pointer constant.
      • Null pointer constant can be assigned to any data type lvalue, but the constant expression with a null pointer can only be assigined to compatible types.
    • NULL is a macro
      long *a = 0;           // ok, 0 is a null pointer constant
      long *b = (long *)0;   // ok, (long *)0 is a null pointer with appropriate type
      long *c = (void *)0;   // ok in C, invalid conversion in C++
      long *d = (int *)0;    // invalid conversion in both C and C++, long and int, different types!!
  3. Pointer Casting:
char i = 1; 
int *p = &i; 
printf("%d", %p);     // We're forcing the compiler to read i as an int!
  1. Void Pointers

    • Motivation: void pointer can hold address of any type of pointer. However, they must be used after type-casting
      int a = 1; 
      void* ptr = &a; 
      printf("this is an interger: %d", *ptr);  // this doesn't work
      printf("this is an interger: %d", *((int* )ptr));  // this does work
      double b = 1.0; 
      ptr = &b; 
      printf("this is a double: %f", *((double* )ptr)); 
    • Malloc, Calloc
      #include <stdlib.h>
      int* x = malloc(sizeof(int)*5); //array of 5 elements, but won't compile in C++. Malloc returns void*
      int* x = (int*) malloc(sizeof(int)*5);
    • good practice for malloc:
      b.ptr_ = nullptr;
      b.ptr_ = malloc(sz);
      if(!b.ptr_) { 
    • Limitations
      • pointer arithmatic is not possible because we don't know its concrete size.
      • Cannot be dereferenced
    • Uses
      • Even though you can't dereference void*, you can still access its members directly
      typedef struct student {
      int id; 
      }student;
      student* student_1;
        
      void* data = student_1;  
      data -> id;
  2. Function pointers

    • Basics
      void (*fun_ptr)(int) = &FUNC; //we need () for *fun_ptr, else it will become a function
      void (*fun_ptr)(int) = FUNC;  //also works
      void (*fun_ptr[])(int) = {func_1, func2};
    • No need to allocate/deallocate function pointer.
    • void* is used for generic programming in C. It can provide a callback signature to upper level applications, to avoid code redundancy. This is called "responsibility delegation"
      • The key is library functions should take in void*, and returns void*, the application should use type cast for specific applications
    //app.c
    #include <stdlib.h>   //qsort
    int compare(const void* a, const void* b){
        return (*(int*) a - *(int*) b);   //of course you need type casting
    }
    int main(){
        int arr[] = {1,2,3};
        int n = sizeof(arr)/sizeof(arr[0]); 
        qsort(arr, n, sizeof(int), compare);  //generic programming
    }
  3. Double Pointers

    int i = 1;
    int * const p = &i; 
    int * const *pp = &p;
    • int* const* int_double_ptr
      • const here means const ptr, which means the address pointer points to cannot be modified.
  4. Cautions

    1. making a local variable with the same name might cause segfault
      Node node = node->next();   //the other node is a global variable

========================================================================

Keywords

========================================================================

  • Static

    • static has three meanings:
      • static functions can only be used in the current translation unit
        • it won't be exported to .o file
      • in a function, variable is allocated from static memory
      • static variables are exclusive to a function, but not the thread.
        • AND accessible only within that translation unit
          • So if your header includes the same global varable, you'll have to change names.
        • A single translation unit is any preprocessed source file with included headers, i.e, can be compiled into an object file, library, or executable program.
  • volatile: value can be changed without any code of yours changing it. anytime.

    • So optimization is omitted.
      • Compiler will copy the value from a memory to a register, and keep accessing the register. Volatile prevents this.
      • such as if two values are stored in a row, the compiler must store the values twice, instead of once.
        i = 2; 
        i = i; // must store twice
  • define

    • #define vs const
      1. const is always better than #define, but old code uses a lot of define
      2. const can work with any type, like struct. #define can work with only basic types
      3. Except #define is essential for conditional compilation.
    • Conditional Compilation: debug using #define
      #define DEBUG
      #ifdef DEBUG
        printf(); 
      #endif
      • or create DEBUG using cmake -DDEBUG
    • Comment with /* /: doesn't work with another / */
      • So better alternative is
        #if 0
        #end
        
    • Force remove #define, so #undefine will not be effective
      #undef DEBUG
    • [design pattern]: create const.h to store all consts.
      #ifndef __CONST_H__   // checks if const.h has been included somewhere
        #define ...
      #endif
    • technically you can do
      #define count int
      count flag;
  • Auto (TODO)

  • pragma

    • What is it? TODO
      • #pragma is for compiler dependent commands.
      • pragma startup FUNC1 Specify FUNC1 that need to run before program start up (before control passed to main)
      • pragma exit FUNC2 FUNC2 just before the program exits (control returned from main)
    • visibility
      • use it to declare functions only within the scope of a shared library
        #pragma GCC visibility push (hidden)
        some_code_hidden from the callers
        #pragma GCC visibility pop
      • So the dynamic symbol will become smaller. Something for optimization
      • Don't use it with exceptions
      • Don't put #include in it
      • http://hirntier.blogspot.com/2008/09/gcc-visibility.html

Operators

  • comma:
    c = (a,b); //equivalent to a; c = b;
    ptr = &(a, b)   //a; ptr = &b; 
  • assignment =
    • returns the lhs value after evaluation
      (a = b); //return a; 
      
      // Expert Example 
      while ('\n' != (*p++ = *q++)); 
      // quivaluent to 
      *p = *q; 
      ++p; 
      ++q; 
      while ('\n' != *p){
        *p = *q; 
        ++p; 
        ++q; 
      }
    • You can assign a new variable in while loop:
    while (node = (node->next_).get()){}

========================================================================

Data types

========================================================================

char

  • Basics
    • char array needs to have '\n'
    • those created by string literals will automatically have '\n'
  • Operations
    1. print string:
    char string[] = "1 233"; 
    printf("%s", string);
    • you get 0.0 if you do
      char str[2] = "hh";
      printf("%.1f", str); 
    1. string functions:
    • strcpy: have to use this to copy a string literal over.
      char arr[15];
      strcpy(arr, "lolol");
    • strcmp(char*, char*)
      • <string.h>
      • return 0 if 2 strings are identical, else, return > 0 or < 0 for the first different chars chars
    • strncmp(char*, char*, size_t)
      • pretty much the same as strcmp, but we just look at a portion of the string.
      • strncmp caution: returns 0 if two strings are the same

array

  1. array initialization

    • Designated Initialization: only in C, not in C++
      int arr[2] = {[0]=1, [1]=2};
  2. 2D array initialization

    const char* str_arr[] = {"lol", "hehe"};
    • or, you can write it this way:
      int aaa[3][3] = {[0] = {6,2,3}, [1]={2,3,4}};
      What is this syntax for initializing 2D array?
  3. string literal is allocated on stack,

    • char [] has the copy of the whole string literal as a char array
      • char[] arr = "SOMETHING"is modifiable, because it's a copy
  4. Change a string: char* str="something" stores the address of the first char of the string literal (so read-only) in char*.

    • so you can't directly modify the string literal str points to.
      • so *str = 'a'; will fail, since it points to read-only memory.
    • But you can change the address of string literal that str stores.
    • The solution:
      //void getStr(char* str_copy); //creates a copy that stores the same address to the string literal,but we want to change str itself!
      void getStr(char** str){
            *str = "GFG"; 
      }
        
      int main()
      {
        char* str = malloc(5);  
        char* str = "lol";  // This is string literal, but you have a memory leak here. as str now points to something completely different. 
        getStr(&str); 
        cout<<str; 
      }
    • You may succeed with
      char str[] = "something"; 
      str[1] = 'a';		// C++ is lenient, and let you pass in the address as non-const\*
    • The best way to create a string is to use strcpy
      strcpy(str, "this is a test"); 
  5. Passing char* out of a function is safe, but char[] is not, since it contains a copy of the str on stack

    char* getStr(){
        char* str = "GFG"; 
        return str;		//Safe
      }
    char* getStr2(){
        char str[] = "GFG"; 
        return str;		//not safe.
      }

int

  • printf("%u") is to print unsigned int
    • For int types,
      printf("%8u", i);    //if i shorter than 8 chars, the front will be padded with blank; else, it won't be trucated
      printf("%08u", i);    // pad 0 if there's not enough 0
    • For float types,
      printf("%4.2f", 32.145);    //get 32.15
      printf("%4.10f", 32.145);    //10 digits after decimal point will be printed no matter what and no spacing at front. 0 will be padded. Nothing will be truncated, get 32.145000..0
    • combine two bytes into one
      uint16_t value = (highByte << 8) | lowByte; 
    • on most windows, int is 16 bits, but on POSIX, 32 bits.
  • converting float to unsigned int:
    union {
      float f;
      uint32_t u;
    } f2u;
    f2u.f = 1.1
    uint32_t out = f2u.u; 
    • Reason is: copying no-negative float to uint is "order-preserving" on bits.
    • if you work on x86, ARM, you need to swap the bits as well
    • floats ar stored as magnitude and sign bit, int is two's compliment. So need to flipthe non-sign bit.
  • converting float to string:
    #include <stdio.h>
    int main()
    {
       float f = 1.123456789;
       char c[50]; //size of the number
        sprintf(c, "%g", f);
        printf(c);
        printf("\n");
    }

bool

  • _Bool On = 0; // it's not false.
    • _Bool is part of C99.
    • bool, true, false are defined in <stdbool.h>, so may need to explicitly include it.

Union

  • Basics
    • can be thought of as a "multi-purpose variable". A union can have multiple members, but only one member can have a value at the same time.
      • The other ones don't really have values (garbage value),
      • But vars of the same type will be altered too
    • These variables have the same memory location.
    • The size of the variable is the largest member.
    • E.g,
       union Test{
       		int x,y;	//the size is 4 bytes, since int is the largest type here. 
       		char z; 
       	}; 
       int main(){
       	union Test t; 
       	t.x = 4;		//This will make y 4 as well, but z is garbage
       	t.z = '3';	//x,y will have invalid values, z=3. 
       	t.y = 100;	//now x will be 100 as well. 
       	union Test t2  = {2,4,'a'};			//only the first member gets initialized to 2. 
       	}
    • Uses:
      • Use union when you just need one var at a time.

Struct

  • Creation

    1. basic structure
      #include <stdio.h>
      
      // 1. use typedef, - you're just adding a name here, not to replace the original one!
      typedef struct Foo{
        int i; 
        double j;
      } Foo;
      
      // 2. use struct
      struct Bar{
          int i; 
      };
      
      int main()
      {
          // 1. use typedef, and designated initialization
          Foo f = {.i=1,.j=2};
          // 2. use struct, and list intiailization
          struct Bar b = {3, 9};
      
          // designated initialization
           int arr2[6] = {[4] = 29, [2] = 15 };
          // standard initialization
          int a[6] = { 0, 0, 15, 0, 29, 0 };
      
          printf("%f", f.j);
          return 0;
      }
    2. caution about typedef: ```c typedef struct node{ struct node* prev_; // You need struct, struct is not fully declared until it gets defined.
      }Foo_node;
    Foo_node fn; 
    struct node fn2;    // also valid
    ```
    
    1. Anonymous structure: no structure name, so there's only one struct object ```c structure{

      } VAR_NAME ```

    2. designated initialization

      • Applied on aggregates (arrays, unions, structs)
      • e.g.
        typedef struct Foo_{
          int a;
          int b;
        }Foo;
        Foo f = {.a = 1, .b = 100};
  • bit fields: code to extract bit fields is huge, so don't use this unless storage is a problem

    typedef struct{
      int i : 1; 
      int parity: 1; 
      int error: 1;     //1 bit assgined in the same register
    }hee; 
  • Cautions

    • typedef name and the struct tags are slightly different: this is an old convention, where typedef and the struct names are different.
    • _t means type, _s means structure

========================================================================

Memory Management

========================================================================

  • Malloc vs Calloc

    • Difference:
      • Malloc only allocates memory block, no initialization. Calloc allocates memory block with initialization
      • Interface is different
            int* arr1 = (int*) malloc(5 * sizeof(int));     //preferred over calloc because it's faster
            memset(arr, 0, 5);      //initialize using memset. 
            int* arr2 = (int*) calloc(5, sizeof(int));      // interface is different, also, it initializes 0 to it
            free(arr1); 
            free(arr2); 
    • Similarities:
      • Both return void*/NULL, but can you don't need to cast them explicitly.
      • Need Free()
  • Object created by malloc are in consecutive memory

     typedef struct Obj{
     	char name[80]
     	unsigned int salary
     	}Obj; 
  • Free

    • used on malloc, calloc.
    • Good practice is to use it in the same function as malloc, calloc. In C++, RAII is a great standard to follow
    • You can't see if a pointer has been freed, but ppl usually do ptr=NULL
  • memory manipulation

    #include <string.h>
    memset(void* dest, int c, size_t n);
    void* memcpy(void* to, const void* from, size_t size); 
    • memset
    • set byte by byte to a value. So, two -1 or 0 (0xFF) works for one uint16_t(0xFFFF)
      char str[20];
      #include <stdio.h>
      memset(str, -1, sizeof(string));

========================================================================

Functions

========================================================================

Weak Functions

  • Basics:
    • It's allows user defined function with the same name, like overloading.
    • You define your function without __weak and your function will alwasy be taken
    // E.g 1
    void USART1_IRQHandler (void) __attribute__ ((weak, alias("Default_Handler")));
    void Default_Handler(void) { while(1); }
    void USART1_IRQHandler(){ ...  }
    // E.g 2
    __weak void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
    {
      /* NOTE: This function Should not be modified, when the callback is needed,
           the HAL_UART_RxCpltCallback could be implemented in the user  file
      */
    }
    void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart){}

Macros

  1. Basics
  • The preprocessor language is not c, so don't use = or ;!
  • single line
  • multi line
    #define Bar(x) {\   // need {} to enclose the macro!
      x + 1; 
      x + 1;    // This is simply text substitution
    } \       

========================================================================

Time

========================================================================

  1. print time in timestamp (uint64_t)
    #import <time>
    (uint64_t)time(NULL);       // time in seconds elapsed from 1-1-1970

========================================================================

Low Level firmware

========================================================================

  1. How malloc works

    • You may have around 6GB physical RAM
    • If you just do malloc and not using it, OS will allow you to ask for as much as possible
    • But if you use it, memset, then you can only use around 6GB, (or a bit more as virtual memory can use some disk space)
  2. Memory Mapped IO

    • On uC, there're two ways to interface with registers:
      1. Port mapped (use special CPU instructions)
      2. Memory Mapped
      • More convenient
      • each device register is assigned to an address in virtual memory
    • Illustration
      1. Memory Map of STM32

      Memory Map of STM32
  3. Memory Map of Linux

    • Heap starts from the bottom, till program break
    • When you do malloc, top of heap goes up.
    • When you do free, top of heap goes down
    • Two functions that change the program break:
      • not recommented for personal use, cuz they're for implementing malloc
        #include <unistd.h>
        brk(void *addr);   #push program break to a location
        void *sbrk(intprt_t increment)    // specify by how much you want program break to move up, returns the address of the previous break.
    • sbrk may effectively move by 4096 bytes, Cuz virtual memory using paging, size of one page is 4096

      Linux Memory Map
  4. mmap

    #include <stdio.h>
    #include <unistd.h>
    #include <sys/mman.h>
    int main()
    {
        #define PAGESIZE 4096
        unsigned int* first = mmap((void*)0xFEEDBEEF, PAGESIZE, PROT_READ | PROT_WRITE, 
        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);    
        //change 0xFEEDBEEF to 0 or NULL means "I don't care where you put this memory"
        // PAGESIZE is how much memory you're asking for
        // we're allowing read and write, PROT_READ | PROT_WRITE
        // We're not sharing, but private to the process MAP_PRIVATE | MAP_ANONYMOUS
        // The beginning of the block has to be the top of a page. So it will try to allocate the 
        // address we requested. 
        
        printf("Address %p", first);
    }
  5. open a file

    fd_ = open(loc_.c_str(), O_RDWR | O_CREAT, (mode_t)0600)
    • If fails, make sure you have the directory created already

Signal

  1. ctrl-c has to do with signal in Unix
  2. signal(SIGPIPE, SIG_IGN) : #include<signal.h>

========================================================================

Common Errors

========================================================================

  1. type don't match (with delctype...): (no return type, ooops)
  2. this pointer: not available in constructor? (yes, use it on intialized items. derived class members, nope.)
  3. std::function needs this pointer? No, we can simply do if (!function_ptr)
  4. std::bad_alloc
    • For emplace_back, If reallocation fails bad_alloc exception is thrown.
    • check if the list itself is valid. if there's type casting, vec.size() doesn't really solve the problem
  5. segfault
    • printf(1); //print a number in printf, unlike in C++, it gives you segfault
  6. !2 = 0 because this is a bool
  7. don't modify global variables outside main, the program is not running yet, and no variables are actually created. so you can only declare them.
    int i = 3;
    i = 4;
    
    int main(){
    }
⚠️ **GitHub.com Fallback** ⚠️