Jump to content

I made an AI Chatbot powered by wiki.fosscell.org, using AI

From WIKI FOSSCELL NITC

About the Author

Other recent contributors

Make this page better by editing it.
Ibilees

Other recent voters

If you like the article, vote for it.
avataravatar
3

Backstory:

3 months of sem-break doing near to nothing has made my mind near to insanity. One random night I decided to crawl through nitc's website and collect mass data - because i was bored. I ran a crawling script made by Cursor AI overnight and a day. Turns out after scraping 40k pages there were around 2 lakhs of pages in queue waiting to get scraped. Then i asked myself (and later to my AI) why i was doing this. Cursor told me i can make a search engine out of it.

After a lot of waiting for responses and tweaking the code, I made an okay-ish search engine. But the pages shown weren't exactly relevant. 40% (i think) of the pages scraped were from opus.nitc.ac.in and it was a shitload of books and its info. That's when i thought of another source of meaningful and relevant data that I can use - wiki.fosscell.org 🔥

The Chatbot:

I began scraping from the wiki and extracted exactly 2606 pages of content. Through hours of tweaking scripts and re-prompting cursor I made a working search engine. But still the results shown were not actually relevant to what I asked. The reason was that the keywords used for the search are the exact words i asked from the question. For clarity; say i asked "What all different hostels are there in the campus?" - All the words except 'hostels' are mostly irrelevant but still given the same priority to 'hostels'. Then another idea came to mind...

Few months ago I was obsessed with getting a free AI for text,image and voice generation which can be programatically used. In the process i managed to get access to Originality's text generation API. They didn't care about security nor rate limiting - which was perfect for me. Coming back to the chatbot's keyword problem, i thought What if i use the AI to generate relevant keywords which also gives a score from 1-10 to use it the search? I did exactly that. I gave the python script for AI's API i made to cursor and asked it to integrate it to previous search engine.

After manually tweaking the prompt for keyword generation, I got it working really great! Now it gave weighted keywords for the search engine. At this point I thought, I did all of this effort, why not make a chatbot out of it? Again I asked my trusted friend cursor to modify the script and Make it into a chatbot.

And everything was beautiful. I made a Working chatbot for nitc with the wiki, an unprotected text-generation API and most important of them all - Cursor

Final Product:

I scraped 2606 pages of wiki.fosscell.org, generated keywords from user's questions with AI, used the top 10 matching pages and again gave it to ai as context and made it answer the question. It was definitely a rewarding experience 🙂. Checkout the GitHub page, PRs are always welcome :)

Loading comments...