Research Assistant Notes and Milestone features - doraithodla/notes GitHub Wiki

Milestone-1

Questions

  • What is the method of eliminating duplicates in search results?
  • How many searches are done per company
  • Are there any searches done for the competition
    • How do we make sure that the answers given during chat are correct?
  • What information is logged (for later analysis)
  • How do we extract power terms? Can we use term extraction from the LLM or a library?
  • What is the quality of the results of the "alternate" Google search?
  • How do we make sure that the competition derived from the alternate search is good?
  • How can we minimize the number of search results
  • Can we use common crawl for gathering website data instead of Google searches?

Dash board

  1. Company Name and address(es)
  2. Alternative companies (after checking alignment)
  3. Key people
  4. Industry category
  5. Size of the company
  6. What does the company do? (from the primary pages. We can use common crawl for this)
  7. List of products/services
  8. Social media handles including RSS
  9. Recent blog posts
  10. Mentions of the company from outside using incoming links
  11. Outer links (potential partners or customers)
  12. Company category (commercial, Non-profit, gov, educational, other)
  13. Stage of the company (seed funded, series A funded, etc.)
  14. Number of employees on Linked in
  15. Hiring? Whom are they hiring?

Miscellania

-Companies for initial test ( a tech company, ai app companies, couple of small colleges, an incubator and a non-profit)

  • AI companies - Hugging Space, Spacy
  • Miruna], top design, hayt
  • Chennai Institute of Technology
  • Kongu Engineering College
  • Forge
  • villgrow