Scrum Notes Documentation: Readable Version - Morssel/CS_478_SemesterProject GitHub Wiki

This section will host the notes from our SCRUM meetings that will look cleaned up and more readable to a user wanting to know the progression of our project. The notes will be included in the final project report at the end for our Book and Document Distribution Tool (BDDT). Joseph has also provided a lucid chart diagram that explains how BDDT should operate before adding in our code.

September 3, 2019

Our team has come to agreement on what project we would like to develop. Joseph has introduced an idea of taking certain kinds of documents that can be parsed out and displayed on a neat format. The use of this tool is to help users who live in rural or areas with poor Internet reception. The user will be able to upload their document, and the code should be able to parse out all the text and remove any images of the files. Then the parsed document will be stored in our database that will be available for the public to view in the directory.

Front end implementation

At this time, our front end developers will be working in HTML/CSS with the possibility of moving into Angular.

Back end implementation

At the time of this writing, we will be looking at using Python and/or Tesseract to handle the parsing script.

Implementation Questions

Once we agreed with Joseph's idea, we had a multiple amount of questions to figure out as we started or analysis phase.

The first question we came to was what will we use as a sort of database? We have an advantage that all we are storing are documents, which will not require a large amount of storage for hosting. One solution we proposed was to keep the documents stored on a raspberry pi server.

Our next question came to be how long did we want this website to be available for? Do we want this website to exist after the current semester? This would be a great example to keep on our resumes.

The most difficult question we figured would be how would we call the computation? Joseph has knowledge on using AWS Lambda function. If we decide to implement AWS Lambda and it can call the parsing script successfully, we would not have to worry about the function going down for maintenance.

We now asked about where would we like to host our website? One suggestion made by Hans was using GitHub gh-pages using Jekyll. Another was hosting the website on Amazon's S3 buckets. Since the website is for education purposes, we may acquire a student account where we would not have to put up any financial costs.

September 6, 2019

Because our website will be lightweight, the end goal on the front end will be having a basic layout. The basic layout may take for of WordPress, but with out own custom css. Jace and Kaitlyn will have the challenge of creating a css style that is easy to implement but not appearing like a website from the early 90's. Because of this, we may be avoiding using images or icons that can cause the site to load slowly on poor Internet connections.

With constant uploads, we will want a page that can serve as a directory. The page in the directory could possibly list our most recent PDFs that have been uploaded. These uploads can also have sections or tags based on education, manual, literature, or any kind of genre we believe will be added in.

The team has also been discussing at length about having two different versions of our site. Obviously, we ant one that has just the basics for our consumers to use on poor internet connections. The other, more dynamic version of the site, we could make available with a link at the top of the page for users wanting a more aesthetically pleasing experience.

One last note regarding how we want to present our PowerPoint. We have it summarized as the following: Summary, How BDDT Works, Front End, Back End, Documentation, Roadblocks, and FAQ. This should cover everything we need for the presentation since we are still in the concept phase. Each team will be using their preferred development environment but have their branches assigned to the GitHub page.

September 16, 2019

We're now focusing on how we want our user stories to develop and reflect our code design. We have a list of a general audience this will be targeted to. The list is as follows:
  • Educators
  • Students
  • Professionals in a poor internet field

This list will build as we gradually develop our code design.

As we start to analyze what we would like to start incorporating into the backend, we still need to develop how to read the PDF. We considered using Tesseract OCR to be able to read and scan PDFs to. Our architecture issue may been string or depending on a common/global variables. This may result in a "hand off" or fetch location when pressing the upload button.

Another problem concerning the team is the code could save the original file uploaded and not output correctly. This also corresponds into a storage problem. The solution we have been looking at is using the AWS Lambda function to take care of this.

September 18, 2019

Our code is going to reflect as clear on our user stories as possible. Once we started going through and making a DFD lucid chart of what we suspect the code implementation should be, we found a better idea and clarity regarding our backed implementation. This also brought up the question of will we allow different file formats to be uploaded. For example, are we only going to allow .docx files to be uploaded or can we handle multiple file extensions? This will be down the line, along with the question of do we need to establish SFTP or a protocol on file transfers?

For file uploading, do we need to have just an upload button or can we allow a user space to drag and drop the file in? In this case of drag and drop, is anything else needed for the implementation?

September 20, 2019

For beginning our project, we would like to have a domain name, possbily resembling BDDT, but can be something else. We've come to an agreement to come up with a better name by next Friday. Now we will be laying down the documentation on what everybody has to cover, depending on the team that they choose.

September 23, 2019

We are running into some authentication errors regarding AWS setup. We think we may have to make a student account for this project. Judging from what we have gathered, that should provide a problem or cost. This can also be used for a general security.

The team has also begun working on the wireframe for what we would like on the front end. By Wednesday, we should start throwing something together for the user story. This goes also goes along with the story template we have for the front end and backend.

September 27, 2019

After deliberation, Joseph has come to the conclusion that setting up Jira is becoming too much of a pain. So for our tasks and keeping them in track, we may end up using Trello. Joseph has responded that he will set up trello for its initial start. From what we have seen, Trello provides better tracking aspects we need. This will also lead to having an admin, which may either be Joseph or Hans since he is taking care of all the documentation. All ideas will be planned out as tasks.

We also need to come up with a DFD and user story plotline that we can use to help us develop our code. This needs to be observed by everyone after the completion so that we can all get an agreement on what needs to be designed. Hans will be working on this and sending the link for everyone to comment and change on.

September 30, 2019

Now the team is coming to the question of finding a balance. Although the website is suppossed to look as simplistic as possible, at least for users on rural or poor connections, we still want to add some style. We may want to choose a font that we find readable with the css instead of letting the user choose.

From our understanding, the backend python script will also grab the text, while the front end will choose how to display the text. We just have to connect the pattern when we finally bring the two together.

For documentation purposes, Jospeh set up Trello in the previous meeting. This was a good suggesstion so that everyone will be able to keep track of what they are to be doing. Hans is also using this to his side in combination with github. This is to provide an extra security measure so we don't forget what tasks we would have to finish. Hans will make sure each area is up to date.

At the end of our meeting today, we have also discussed for how we want out pages to look. This includes both the view, upload, and checking recent documents that have been uploaded. The style again may change but we've agrreed on what we think will be important from the user's perspective.

We may also have to make the website backwards compatible on different OS. This is in part that users may be on expired or out-of-date Windows operating systems.

October 2, 2019

Hans has added the additional tasks to trello. This includes everything for documentation, front-end, and back-end. The beginning tasks have been added to start, with more coming through as the project progresses.

Kaitlyn has also begun work on the directory page. She has a mock representation of what we would like the directory to appear as. This may change over time, but it will help her get started on the design.

Jace currently has his first draft of the upload page on his github branch. The tema will look over the design and give our input on what's good and what can be changed.

Al from our knowledge has the parsing method figured out. Now we would just have to understand what we would like for formatting. This will go along with the stylesheet we want for the website.

October 10, 2019

We have learned today that our first demo that is suppossed to be for the class will be due in about two weeks. From what we gather, we assume that we just have to demop how far we have progressed. This may include front-end, back-end, and doucmentation purposes.

For the first part of the demo, we should have the home page and possibly the upload page. Hans has also asked to add a small documentation section on the website to help us with anyone wanting information about the website.

Jospeh is still in the process of figuring out how to trigger the lambda functions. We have some outside resources that may be able to point us in the right direction. Jospeh and Hans will be discussing this later today.

Our next primary questions is for directory and filtering out results is how will we approach this? We also may have the consequence of handling copyright information. This may include having a moderator approve documents being uploaded.

October 14, 2019

A demo of our site is currently being hosted on Amazon's S3 bucket for testing. This is only for the home page and upload page. Currently, the upload page is only ecstatic, though we have the parser that will be behind the button ready to go.

We learned that Jace had to use html/css for the styling of the page versus trying to use Angular. Using the code in angular was causing more problems than what it was worth. In this case, it would just be easier to use html/css.

The parsing script that was developed by Al using Python is complete. Al demonstrated to us how it works by taking the first chapter from our textbook for Software Development. The next step for this will be having the parsing script take in different file formats.

We have found a possible problem if we include javascript with the S3 bucket container. Depending on how the labda works, this may conflict with us trying to run any other scripts on our website. This might also lead us to having to host our site on a pi server instead of S3 bucket. Research on this is still in progress.

October 16, 2019

We acknowledged that we are confident that our first demo coming up next Friday should be a success. What we have working is the following;

  • Jace's home page works and we have the file upload page
  • Al's parser is working correctly off of his machine.
  • Beginning the design on the directory page.

In short, we have a beginning and end and now we are coming to the process. In the middle, we are still dreaming of how the link the directory to everything on the site.

The parser that Al has developed stores the html , generates an html file, and what will be sent back is an html file to the user. What we have found is that different file formats will have a different default parse. The formatting, for the time being, is not ideal but is at least consistent. In the next phase, we will see how to parse can center the text and choose a legible font.

As a stylist choice, we may also add a border around the page. This can help with user eye strain and we can keep their attention focused on the information.

October 18, 2019

A rather short meeting regarding everybody in the group today. From this point, we will begin trying to merge our back end code with the front end as well.

Our first test is to get Jace's index file to work with Al's parser. When a user clicks the button to upload the document, we should be able to have the output of our desired format.

Joseph on his end will begin working on the labda side to put Al's parser in so that it runs each time its operated on a different user.

Katelyn is designing the directory and will possibly be ready for the second demo. Questions for her side will come up after our presentation.

Hans will begin the heavier portion of documentation. This includes putting everything together for the wiki and report, transcribing the notes into readability for all users, and documenting both the front and back end code.

October 30, 2019

After our first meeting regarding our demo, we now have possibly our first roadblock that we may be running into with the database. there are also questions that we must think about how we want to approach finishing the front end side.

The homepage will probably go thorugh its first revision once we have some time to dedicate on the front end side of things. We will add some elements to the homepage that can be more aquatinted with mobile users. We may put in place the hamburger option that will let the user go back to directory, uploads, and documentation section of the site.

Another question we have asked is can we make SQL compatible with Amazon's S3 bucket? Al and Hans both agree that there has to be some research or documentation on how to implement with an SQL database correctly. It just seems to prepostorus to not have been done before. This also lead us to the question of do we want to keep an SQL database at all or use a different method?

For users wanting a secure access or keeping the doucment private, Al raised a point of how would we keep track of the users who would want to see the pdf? Users technically are there based on the shared link. We don't want to have users in the sense of having accounts and logins/passwords> instead, the link would take the users into the pdf that will not show up in the database. Otherwise, we would have to keep the docuemtns available to all who visit the site. This also raises the quesiton of do we want to offer a private or secure document setting? We still may add characters on the link, but may expand it to twelve characters with mixed characters. This at least gives the users a sense of security withouth going through the hassle of creating login and registering for the website.

For searching the directory, how do we want to limit the scope of the user? We had thought about giving them a multitude of options when it came to searching the direcotry. We now may only give them a search based on document only. We could give them a textox to enter the name of the document. The only drawback is if we use and SQL database, they may have to enter the name of the document exactly as it should be. SO in other words, one accidental misspell could have them not locating the document.

With the users being whoever knows the unique link, we leave the option for the users to police themselves on who knows and has access to the link instead of keeping it up on our end. For the time being, this can be used as our default option unless we have time to do something different.

November 1, 2019

Our group had a lengthy discussion today about the next roadblock up ahead. As Al explained, if we are going to want users to access the public directory while looking for a private document, we would have to go the way of creating a registration access, complete with login ids and passwords for users wanting to access the site. At minimum, this would have us working with four databases: one for public documents, private documents, user ids, and passwords. Although commendable, in our time frame we do not believe this will be possible.

To make up for a security measure, if a user is wanting to make their documents private, we give them an option and a link for where their PDF is located with a mix of character strings. The URL link will act as the user and the random character string will be there password. This will let the users decide on who has access to the information instead of just storing it in the public directory. We may have the default option being private, where the user is given a checkbox on if they want to make their information public. If that is the case, then it will be added to the public directory where anybody can search for it.

For the public documents, we will also let the user fill in the categories for what this kind of document is. This includes author, title, genre, and other tags. All this information we can then put into categories that allow the user to list or search by any way they prefer.

What we have to understand is if we put in an SQL database, how can we have this working with the AWS Lambda? There is some research we have begun looking at, which we are still testing if the research will work. Once we are past this hurdle, we should be able to add extra components on to the website.

November 4th, 2019

We discovered that we may have a solution for the SQL database by using AWS Relational Database Tables, which is RDS for short. We know it can hold the capabilities for MySQL which is what Kaitlyn is experimenting with at the moment. Hans has found some documentation and tutorials linking on how to keep the database up to date.

On front end, Jace will be adding in a hamburger menu for easier access on mobile devices, thereby making it easier on end users to navigate through the menu.

November 6th, 2019

Kaitlyn has done some cleanup on the database end for the directory. She is clearing out the private filters and adding in the drop down menu for searching. We had some miscommunication, where Hans believed all the documents would be displayed on the directory page, categorized by upload date or author. Hans produced a diagram that he thought would be the end result, with Kaitlyn adding in her perspective. We think we know how the directory will link up and how the directory would be displayed.

Jace is working on the hamburger menu, and is stuck on the format on how he would like to add in his design. Hans has suggested some code he's done before that can make the format look more correct instead of lingering in an absent space.

Al has also put his input on how the database will be called be specific wildcards. RDS does have something about it, where we need to look more into it.

November 8th, 2019

At this point, we now have two weeks before our second demo. Hans has begun rebuilding the database that Kaitlyn did in asp.net. He is not exactly sure if the database can just be added into the S3 bucket and can be found with a connection from there or if he will have to rebuild the database from scratch on his computer.

Joseph is also having problems with Flask and the S3 bucket. From his report, he has spent close to about 8 hours trying to discover what the problem is on AWS. One of the common occurrences both him and Hans have been running into have been permission errors. Even if only one policy is being used, it assumes that other permissions must be set in the S3 bucket or container. Until then, from Al's parser we know it is working but can only be done on local host.

Al has suggested some milestones we should set as the deadline for the second demo draws near. By 11/15 find someway to connect the database from S3. By 11/18 We also take the merge from Jace's front end development and Al's backend code. By 11/18 as well we need to see if we can get everything hosted on AWS or not.

November 11, 2019

Over the weekend, Hans took on the task of rebuilding Kaitlin's database to see if anything needed to be added for connectivity. As Hans rebuilt the database, he added a couple of features as well for testing purposes. On his first creation, he was able to connect to AWS using the relational database server (RDS) of the testing database. Although the connection was successful, he also fell into permission errors. Even setting up one solely for the database still caused forbidden errors for anyone trying to access the website. He concluded this may be in part of AWS security measure of a database. Joseph has also done some look into it, but is still figuring out the flask errors associated with uploading the parsed document.

Al has begun another portion of the project that we think will be neat for a user. He has started wondering how a person can search within a text document trying to find certain keywords. This may require him building a different database than we used previously.

November 13, 2019

On the development side for appearance, the team has asked Jace to go through a couple of revisions on the file upload and home page. This is where we want to have a simple static but pleasant appearance on the website. This will include new features for the buttons and give the homepage an uplift. This is also to correspond with Al's backed code when a user wants to look at the directory.

Hans after testing Kaitlyn's original database and his could still not get past the errors that he has been running into so often. Due to the time constraints, the group has decided that we may just have to build off of localhost. Incorporating both functionality of asp.net, flask, and searching has just made it more aggravating to achieve with all the other layers thrown on.

Al is continuing his research on searching within the database or a document. He believes it will be using Apache Solr.

November 15, 2019

Jace has finished his changes on what was asked for the upload page as well as for the homepage. Some of the changes are to correspond with Al's backend code. The new buttons look better and add an uplift to the homepage. This is also for testing on localhost purposes and goes with Al' backend code.

Joseph is making more progress on his roadblock. He's beginning to see a pattern with his problems and some of the naming conventions with using flask. He's knocking away some of the roadblocks, but from his perspectives its just taking a bit more time that he anticipated. He's getting some outside perspective to see if there is anything he is missing.

November 18, 2019

Al has finished his merging code with Jace's upload page. Jace has added a new button over the old buttons since you cannot change the template of the buttons that specify with uploading a document. This was something the team did not know about and became a quirky fact for us. When the user clicks the upload button, the document gets parsed and is routed to a directory to serve for the parsed document. In addition, the new parsed document gets a unique URL link that will act as the username and password for anyone who knows the link. It's too unique for others to memorize unless the acquire the link if it was shared.

Joseph made a breakthrough with the document and can get the parsed document to move into the S3 bucket. He has made the suggestion that we might make the S3 bucket public and anyone could go though there to find there text document. We could put an html wrapper around the S3 bucket and pretty up the areas around it.

November 20, 2019

Al was able to integrate the Solr searching for both the documents in the directory and inside the documents. This required using two new search boxes depending on how the user wanted to search.

Joseph can officially send the parsed files into the directory of our S3 bucket. Users can now find the parsed document on our local host testing computer. We are also going to include how much space is condensed when the user is only given the text. On slow internet connections, its a great time saver as we expected.

⚠️ **GitHub.com Fallback** ⚠️