Research Assistant Notes and Milestone features - doraithodla/notes GitHub Wiki
Milestone-1
Questions
- What is the method of eliminating duplicates in search results?
- How many searches are done per company
- Are there any searches done for the competition
-
- How do we make sure that the answers given during chat are correct?
- What information is logged (for later analysis)
- How do we extract power terms? Can we use term extraction from the LLM or a library?
- What is the quality of the results of the "alternate" Google search?
- How do we make sure that the competition derived from the alternate search is good?
- How can we minimize the number of search results
- Can we use common crawl for gathering website data instead of Google searches?
Dash board
- Company Name and address(es)
- Alternative companies (after checking alignment)
- Key people
- Industry category
- Size of the company
- What does the company do? (from the primary pages. We can use common crawl for this)
- List of products/services
- Social media handles including RSS
- Recent blog posts
- Mentions of the company from outside using incoming links
- Outer links (potential partners or customers)
- Company category (commercial, Non-profit, gov, educational, other)
- Stage of the company (seed funded, series A funded, etc.)
- Number of employees on Linked in
- Hiring? Whom are they hiring?
Miscellania
-Companies for initial test ( a tech company, ai app companies, couple of small colleges, an incubator and a non-profit)
- AI companies - Hugging Space, Spacy
- Miruna], top design, hayt
- Chennai Institute of Technology
- Kongu Engineering College
- Forge
- villgrow