60. HuggingFaces - yojulab/learn_deeplearning GitHub Wiki

머신러닝 기술을 제공하는 오픈소스 및 플랫폼 제공업체, 모델을 훈련하고 테스트하는데 걸리는 시간이 상당히 단죽
사전 학습된 언어 모 델, 그리고 image, 오디오 등 다양한 모델을 제공
offical
youtube- HuggingFaces

Dataset

Load from anything

Tokenizers

(1) 문자 토큰화(character tokenization)
(2)단어 토큰화(word tokenization)
(3) 하위 단위 토큰화(subword tokenization)

Gradio

Build and share machine learning demos and web applications using the core Gradio Python library.

ex) 실습 코드(with classfication) : https://dreamfactory100.tistory.com/49

Models

Transformer

Text classification(감성분석)

Named entity recognition(NER)

Dataset XTREME 이용 : 다국어 사람, 지역, 기관 구분
사용자 정의 모델링

질의응답, 요약, 번역, 텍스트 생성 등

성능측정지표(Performance Measures) : seqeval Lib

텍스트 생성(Text Generation) : Greedy Search Decoding, Beam Search Decoding

언더플로우는 수치적 불안정성, 로그 확률 사용해 문제 해결
로그 확률 사용 vs 로그 확률 미사용

Pretrained model

모델 훈련 옵션 : 특성 추출(Feature extraction), 미세 조정(Fine-tuning》)

Stable Diffusion

Fast Stable Diffusion XL on TPU v5e : quality text-to-image model from Stability AI