CSIS 541 Project: Rev 3.0

  1. Save these files in C:\jakarta-tomcat-5.0.19\webapps\searchengine
       index.html   searchengine.jsp

  2. Save this file in C:\jakarta-tomcat-5.0.19\webapps\searchengine\WEB-INF
       web.xml

  3. Save these files in C:\jakarta-tomcat-5.0.19\webapps\searchengine\WEB-INF\classes\searchengine
       SearchResultBean.java   DataSourceBean.java    WordCounter.java
Don't forget to COMPILE the *.java files!

Program Description

  1. The program takes in as input a URL.

  2. It then checks to see if it is a valid URL.

  3. If invalid URL, error message is displayed and user can try again.

  4. If URL is valid, it will save the page to a text file named inputpage.txt.
    (Stored in C:\jakarta-tomcat-5.0.19\bin).

  5. It then connects to the database and checks to see if URL is already in database.

  6. If URL already exists in database, goto 8.

  7. If URL does not exist in database, the program calls the WordCounter program to first strip off the HTML tags (output to pagestrip.txt) and then to count the keyword occurence on that page and saves the results into the database.

  8. It then calculates the weights of the URLs in the database.

  9. It then extracts all the pages (URLS) stored in the database that has weights not equal to zero and displays them on the results page.
There are severe limitations to this code at this time, as the function to strip off the HTML tags does so successfully only if the begin and end tags are on the same line.