IP19 Report template

Open as TemplateView PDF
Author:
Troels Bjerre Lund and Martin Aumüller
Last Updated:
6 years ago
License:
Creative Commons CC BY 4.0
Abstract:
Template for exam project in Introductory Programming at the IT University of Copenhagen for the fall of 2019
Tags:
Reports
\begin{now}
Discover why over 20 million people worldwide trust Overleaf with their work.
\documentclass[11pt]{article}

% Character set for input and output
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

% Fonts
\usepackage{libertine}
\usepackage[scaled=0.83]{beramono}

% AMS math-packages
\usepackage{amssymb}
\usepackage{amsmath,amsthm}

% TODO in text
\usepackage{todonotes}

% URL (clickable), references within document, etc./
\usepackage{hyperref}

% Code formatting
\usepackage{listings}
\lstset{
   language=java,
   extendedchars=true,
   basicstyle=\footnotesize\ttfamily,
   showstringspaces=false,
   showspaces=false,
   numbers=left,
   numberstyle=\footnotesize,
   numbersep=9pt,
   tabsize=2,
   breaklines=true,
   showtabs=false,
   frame=single,
   extendedchars=false,
   inputencoding=utf8,
   captionpos=b
}

% Text / Paragraph space
\addtolength{\textwidth}{1.5cm}
\addtolength{\hoffset}{-0.5cm}
\setlength{\parindent}{0pt}
\setlength{\parskip}{1.5ex plus 1ex minus 1ex}

\title{Project: Search Engine}
\author{{\Large Group Z}\\Alice Liddell \quad Bob Ross \quad Cruella de Vil \quad David Attenborough\\\tt{\{alid,bros,cdev,datt\}@itu.dk}}
\date{\today{}}
%\date{December 14, 2019}

\begin{document}

\maketitle

\section*{Introduction}
%
This document reports on the search engine project that we developed
during the Introductory Programming course at the IT University of
Copenhagen.
%
In Sections~1--3 we will report on our solutions to the
mandatory tasks posed in the project description.
%
We also solved some of the challenges posed for these mandatory
tasks.
%
Their solution is described in the respective section along with the
solution to the mandatory task.
%
\todo{This is just an example template; you do not need to do these extensions.}
%
Furthermore, we developed the following extensions which are described
in detail in the following sections:
%
\begin{itemize}
\item
  Section 4 describes our changes to the client GUI.
\item
  Section 5 describes how we built a webcrawler. 
\item
  Section 6 describes how we supported the ``fuzzy search'' feature. 
\end{itemize}

The description for each solution is roughly split up into the
following parts:
%
\begin{itemize}
\item
  \textbf{Task:} A short review on the task that we had to solve.
\item
  \textbf{Approach:} An informal, high-level description of how we solved the
  task.
\item
  \textbf{Solution:} A detailed technical description of our solution to the task.
\item
  \textbf{Test:} Description of considerations for testing the
  correctness of our solution.
\end{itemize}

%
Other parts (e.g. design, reflection, \ldots{}) will be included where
deemed appropriate. For instance, Task 1 includes a part on the
benchmarking results.

%
The source code of our project is handed in as a single
zip file called \texttt{<filename>.zip}.
%
\texttt{<insert directory>} contains the source code that solves the
mandatory tasks.
%
Our code is also available on ITU's Github: 
\url{https://github.itu.dk/...}.
%

\subsection*{Statement of Contribution}
%
All authors contributed equally to all parts of the solution of the
mandatory tasks.
%
Bob Ross made the graphic design for the client GUI. 
%
Bob Ross and Cruella de Vil conceived the idea of the web crawler,
Cruella de Vil implemented it, and Bob Ross documented it.
%
Alice Liddell is solely responsible for the ``fuzzy search'' feature.

\section{Inverted Indices}
\paragraph{Task}
%
This task involves reducing the amount of time the search engine takes
to respond to a query, by means of an inverted index.
%

%
An inverted index provides a fast way to access the webpages that
contain a certain word by storing a mapping from words to the list
of webpages that contain the word.
%
Figure~\ref{fig:index:uml} provides an example for such a data structure.

%
We are to implement two variants of an inverted index: one based on a
{\tt TreeMap}, and the other on a {\tt HashMap}. We are then to
evaluate the performance of these implementations, and to pick the
best one to use in our search engine.


%
\paragraph{Approach}
%
As per the code handout, we generalized the notion of an index into what we call an
{\tt Index}, that has the following two methods:
%
\begin{itemize}
\item {\tt build}: add description from your javadoc.
\item {\tt lookup}: add description from your javadoc.

\end{itemize}
%
We then implemented the inverted index by using Java's Map data structure.
%
Here, Java provides different implementations and we tested the
following variants in our code: ..., ....

\paragraph{Solution} We build the index data structures as described in Figure~\ref{fig:index:uml}. 
We allowed to switch the implementation of the Map data structure by ... . Building the inverted index was accomplished by the following code: 
\begin{lstlisting}[language=Java]
for each webpage page in list of webpages {
    for each word p on page {
        ...
    }
}
\end{lstlisting}

\begin{figure}[t]
\centering
\includegraphics[width=\textwidth]{figures/uml}
\caption{UML Diagram for the Software Architecture of our webserver.}
\label{fig:index:uml}
\end{figure}

\paragraph{Test} We verified the correctness of the {\tt build} and {\tt lookup} method with unit tests. These tests can be found in the file .... Here, the {\tt build} method of the inverted index was tested by .... (May add Java code here.) 

The {\tt lookup} method of all three indexes was checked by carefully creating a list of test webpages and checking whether the lookup methods returns the expected results by looking at the following properties of the returned list: (i) its size and (ii) ... ? (Add Java code?) 

\paragraph{Benchmark}
%
We evaluated which implementation provides the fastest running time for
lookup operation by ... (describe your benchmarking setup / approach).

%
We choose the following query words for the benchmark:
%
``itu, is, the, best''.
%
The running times of different index implementations can be found in
Table~\ref{tab:benchmark:indices} (or a plot).
%

\begin{table}[t]
\begin{tabular}{l|l|l}
Index  & File & Avg. lookup time (ms)\\ \hline \hline
SimpleIndex & enwiki-tiny.txt & ... \\ 
SimpleIndex & enwiki-small.txt & ... \\ 
SimpleIndex & enwiki-medium.txt & ... \\ 
InvertedIndex w/ HashMap & enwiki-tiny.txt & ... \\
...  & ... & ... \\
InvertedIndex w/ TreeMap & enwiki-tiny.txt & ... \\
...  & ... & ... 
\end{tabular}
\caption{Running times for different index implementations.}
\label{tab:benchmark:indices}
\end{table}

%
These benchmark results indicate that ... .
%
Based on these results, we decided that we use ... as the index in our
data structure.

\section{Refined Queries}

\section{Ranking Algorithms}

\section{Extension: Improving the Client GUI}
\section{Extension: Webcrawler}
\section{Extension: Fuzzy-Search}

\end{document}
IP19 Report template

Get in touch