Dialog Engine for Product Information

The implementation of this project for has been divided into the following modules:

Live Application

You can find the live application hosted here.

Crawling, Scraping & Processing

The tools used were Scrapy and BeautifulSoup for crawling the data from Flipkart's website. The categories that were scraped are mobiles, televisions, laptops, air conditioners, refrigerators and cameras. The amount of data that was extracted was around 3000 products from the above mentioned categories.
BeautifulSoup is a python library used for extracting data from the HTML or XML pages.

Data Representation

MongoDB is a NoSql database used for storing big data with a lot of flexibility.
We maintain different collections for different categories of products. Ex: Mobiles and TV's of electronics are stored in different collections, which will be advantageous while querying, once the category is known we can search in the corresponding collection.
The primary key of each document in MongoDB is the model name.

Query Processing

We have handled two type of property based queries:



#IIIT-H #IRE #Major_Project #Information_Retrieval_and_Extraction_Course #Dialog_Engine #Flipkart #NLP #StopWordDetection #Tokeniser #Tokenisation #Keywords_Identification #Crawling #Scraping #Python #MongoDB #BeatifulSoup #Scrapy