Crawling - couragesuper/couragesuper-ds GitHub Wiki
Crawling
Frameworks
-
Common Libraries
-
crawler base
-
txt file writer
-
txt file reader
-
txt preprocessor
-
selenium driver
Supported Sites
- bookcosmos , obtain pdfs
- joins keywords
- ytn
- naver ranking news
Crawler detailed
apis
- init : create selenium web driver create logger create txt file writer
- createlogger : initalize the logger.
- createTxt : initalize the txt file
- setTxtColumn : set the column lists
- run : starts crawling
- close : close the crawling module
- openpage : openpage with web driver
- login : login sites if it is needs
- makeCateLinks : make the url lists about site map
- naviSites : navigates site with url lists about site map
- navigate : open some category and iterate the pages and iterate the open the individual articles