Home - NCBI-Codeathons/Use-UMLS-and-Python-to-classify-website-visitor-queries-into-measurable-categories GitHub Wiki
How to implement an MVP version of this project
This is a codeathon repo, meaning, we spent three days coding what we thought would be a cool app. Three days of discussing, designing, blocking out, coding, and testing goes faster than you might think. The repo is unfinished. Therefore some revisions afterward were needed to pull out and enhance a minimum viable product (MVP) that reduces the complexity in exchange for an easy-to-use set of scripts.
This wiki will highlight a VERY SMALL part of the work, that a person with basic Python knowledge, plus a web person with information-architecture knowledge, can use with out-of-the-box Google Analytics exports, to classify your health/medical search logs to
- The Unified Medical Language System (UMLS)
- Your own site-specific additions such as branding, person names, organizational-piece names, etc.
See the README on the repo home page and inside the scripts for more about WHY you would benefit from doing this; here you will learn more about the HOW.
If your site is similar to the pilot site, an intern who knows basic Python, working with an IA person who knows the organization, should be able to tag 70 percent of your search volume within one week, and render the results in two Tableau interfaces. (Just an estimate.) For reporting, we recommend a drill-down discovery interface at four levels of specificity, and interactive charts. Tableau was used for the pilot project. The process will be much faster if the two roles can be handled by one person, because you will need to customize the match files to your site.
In the pilot site, after working with three months of logs and building the match files, on the fourth month we were able to run the scripts from the command line and tag 70 percent of the search volume with no intervention.
To understand the project file structure, see https://drivendata.github.io/cookiecutter-data-science/.
Click links on the right to learn more.
You may post issues in the issues section, but support may not be available.
A word on complete search analysis
This project represents only 50 percent of search analysis; it focuses only on the search terms used by PEOPLE WHO HAVE FOUND YOU. Another aspect of search analysis, that is not supported here, is analyzing the terms being used in search engines that ARE NOT leading people to your site. A complete analysis includes both.