CSIS 541 Project: Rev 3.0
Don't forget to COMPILE the *.java files!
- Save these files in C:\jakarta-tomcat-5.0.19\webapps\searchengine
- Save this file in C:\jakarta-tomcat-5.0.19\webapps\searchengine\WEB-INF
- Save these files in C:\jakarta-tomcat-5.0.19\webapps\searchengine\WEB-INF\classes\searchengine
There are severe limitations to this code at this time, as the function to strip off the HTML tags does so successfully only if the begin and end tags are on the same line.
- The program takes in as input a URL.
- It then checks to see if it is a valid URL.
- If invalid URL, error message is displayed and user can try again.
- If URL is valid, it will save the page to a text file named inputpage.txt.
(Stored in C:\jakarta-tomcat-5.0.19\bin).
- It then connects to the database and checks to see if URL is already in database.
- If URL already exists in database, goto 8.
- If URL does not exist in database, the program calls the WordCounter program to first strip off the HTML tags (output to pagestrip.txt) and then to count the keyword occurence on that page and saves the results into the database.
- It then calculates the weights of the URLs in the database.
- It then extracts all the pages (URLS) stored in the database that has weights not equal to zero and displays them on the results page.