Home - UTMediaCAT/mediacat-docs GitHub Wiki

About

What is MediaCat and what is trying to achieve (use cases)
A document with evolving notes about the next release of MediaCat is available here on Google Docs
A diagram with evolving architecture is available

Contact info

The Slack for this project is teammediacat.slack.com

Installation and Use

MediaCat code is stored in three repositories. Each repository contains information about how to run and manage this component of the MediaCat stack.

mediacat-twitter-API-crawler

MediaCat-twitter-API-crawler takes in a scope document in a prescribed format and crawls twitter handles, bringing back the contents of tweets. The end result is one or more .csvs containing all the tweets for the target twitter users. Detailed information for how to run and troubleshoot this application is available in the repository at:

https://github.com/UTMediaCAT/mediacat-twitter-API-crawler/blob/main/twitter_api_demo.ipynb

mediacat-domain-crawler

mediacat-domain-crawler takes in a scope document in a prescribed format and crawls domains, bringing back the html contents of individual domains. Detailed information for how to run and troubleshoot this application is available in the repository at:

https://github.com/UTMediaCAT/mediacat-domain-crawler

Post-processor

Post-processor takes in the data results from both the twitter and domain crawlers and produces a .csv file in a prescribed format from which a user can determine citational practices and approaches between scope twitter and news media sources. Detailed information for how to run and troubleshoot this application is available in the repository at:

https://github.com/UTMediaCAT/Post-processor/tree/main/Post_processor

Troubleshooting

Additional developer Documentation is linked in the readme

Credit

All students that worked on it and grant funding that supported it