Research - kstrack-grose/Git-Project GitHub Wiki

Virtual Machines

VM:
Conclusion: if we have to use on of these, we’re fucked, because there’s no way we’ll be able to fully understand it in time, let alone create one. That would be an entire semester in itself. However, the research was useful in that it seemed to indicate that virtual machines, at least in relation to git-based software, are used to create a code hosting repository on a local machine, accessible through a local network only. maybe. I’m not entirely sure. However, I see no reason why it will be more efficient to use a VM for this particular project, so I think we’ll be okay.

hypervisors (virtual machine monitors; creates and runs VMs) are v important
http://en.wikipedia.org/wiki/Hypervisor

http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html
http://en.wikipedia.org/wiki/Virtual_machine
http://www.griffincaprio.com/blog/2006/08/virtual-machines-virtualization-vs-emulation.html
http://www.oracle.com/technetwork/java/whitepaper-135217.html#overview
http://en.wikipedia.org/wiki/Temporal_isolation_among_virtual_machines
http://en.wikipedia.org/wiki/Abstraction_layer
http://en.wikipedia.org/wiki/Computer_cluster
messaging system: http://en.wikipedia.org/wiki/Message_Passing_Interface???

Repos!

This is what I've gathered so far, let's hope I didn't completely misunderstand it. So what is a repository? Basically, it's a file server, a storage space that allows multiple parties to access the same information and edit it. But since repos are a part of a Version Control System (http://git-scm.com/book/en/Getting-Started-About-Version-Control), clients accessing a repository get the latest version of whatever was edited, and can then also access the history of different versions. Basically, when you commit something, you are sending the latest version of a file (or whatever) to the repo which then makes it available to other users. From the latest commit, you can go back along a line (or branch) of commits all the way to what's called the HEAD, or the root commit, which is the first thing ever added to the repository. This structure is called a filesystem tree. If you click on the "Network" icon on the right of our repo, GitHub provides a nice visual of what this looks like. Also, more about the structure of a repository here (http://svnbook.red-bean.com/en/1.7/svn.basic.version-control-basics.html#ftn.idp7110832), I found this super helpful.

Now before we create a repository we first have to ask: Are we using Git or Subversion (two Version Control systems)? If so, it's pretty simple. Both Git and Subversion can be installed on our network, and then there are established commands that do what we want. Literally, if we use Git, we just type 'git init' in whichever directory we want the repo to exist (http://git-scm.com/book/en/Git-Basics-Getting-a-Git-Repository). Once we did that, we'd want to make the repo available to others, which is also pretty simple, but making it run efficiently requires some more work. If you look at the 'chapters' link at the head of the page from the last link they have more information on that. There's also an index of commands there, and GitHub has a glossary of git commands as well. (http://git-scm.com/book)

Helpful code for saving/coping files and directories using a python module. (http://www.pythonforbeginners.com/os/python-the-shutil-module)

High level file operations in python (https://docs.python.org/2/library/shutil.html#shutil.copy)

Here's great information about how to use the os module to create a file/directory infrastructure in python (http://www.tutorialspoint.com/python/os_file_methods.htm)


CHERNOH
Git Repo and User interaction (how Git returns repo data to users)
Basically, there are three main stages in which any data is stored/hanged on git/Github. Git data reside on either the committed, the modified, or the staged stage - and this determines the locality of any file on the system. The Git repository only contains stored metadata, so Github basically copies this data and makes the copy available to any user with authenticate access to the original/parent repository/directory. However, the user can technically only access these data from the working directory (instead of the Git directory), this is where the individual files are decompressed and placed on a temporary disk for users to pull and modify. Note, Git employs a staging area (where information is basically packaged before commit), it does not record the history of individual files, only committed changes! This means that later users only have access to data in the working directory, the directory that is effectively a copy of the parent Git directory. Once committed, always committed, that's why the Git Repo is so important in collaborative projects!

[ps. this is based on my remote understanding, and I could be wrong]

Sources:
http://web.mit.edu/cluedumps/slides/understanding-git-2008.pdf
http://ftp.newartisans.com/pub/git.from.bottom.up.pdf (pages 6-14)
https://developer.github.com/guides/getting-started/#repositories
http://git-scm.com/book/en/Getting-Started-Git-Basics

#Interfacing with the Server

Let’s talk a little about how the Internet works. When you open a browser and type in a URL, that URL is actually fed as a request to the DNS server that houses the IP address for what you are searching for. The server returns your URL as an IP address that your computer then sends a request to (requests are little packets of data). Once your request hits the server location where the website data is held, your machine interprets the language of the site and converts it into the proper readable form. For example, at the end of a URL you may see a .PHP, this corresponds to the type of file you’re addressing and denotes how your machine should interpret it. PHP, HTTP, HTML, JavaScript, TXT, are all essentially programing languages that your machine interprets in order to display a website and its functionalities.

Github is written with Ruby and Ruby on Rails, but Python and PHP (PHP hypertext processor) are a very useful server languages for adding functionality to a website. All of these languages can allow a user to access and manipulate data on the server. There are many built in PHP and Python functions that can allow us to easily create the web to user interface and of course, we have the Github source code to work off of.

(Also note, that Ruby on Rails is fun though Andrew finds it inferior and I trust Andrew)

For more information see:
http://www.w3schools.com/php/php_intro.asp
http://www.w3.org/wiki/How_does_the_Internet_work
https://headway101.com/html-css-and-php-explained-an-introduction-to-internet-coding-basics/
http://tryruby.org/levels/1/challenges/0
http://railsforzombies.org/

## How Github Searches:

So, a lot of this information is available on github itself, but what I have found so far is that Github's application programing interface or API "s optimized to help you find the specific item you’re looking for (e.g., a specific user, a specific file in a repository, etc.). Think of it the way you think of performing a search on Google. It’s designed to help you find the one result you’re looking for (or maybe the few results you’re looking for). Just like searching on Google, you sometimes want to see a few pages of search results so that you can find the item that best meets your needs. To satisfy that need, the GitHub Search API provides up to 1,000 results for each search."

Also, in the meeting people were interested in how github searches repositories specifically. The q search term can also contain any combination of the supported repository search qualifiers:

in Qualifies which fields are searched. With this qualifier you can restrict the search to just the repository name, description, readme, or any combination of these. size Finds repositories that match a certain size (in kilobytes). forks Filters repositories based on the number of forks, and/or whether forked repositories should be included in the results at all. created or pushed Filters repositories based on times of creation, or when they were last updated. user or repo Limits searches to a specific user or repository. language Searches repositories based on the language they’re written in. stars Searches repositories based on the number of stars.

Information found: https://developer.github.com/v3/search/#search-repositories

##Git Web Stuff (Robin) I am not sure if we will need this, but a open source software known as GitLab allows you to set up repos on your own server (rather than using the github server)

Here are the steps for setting it up:

Setting up Private Github Server Using GitLab

We may be able to use this and then create our own interface for communicating with it? (Still unsure whether the bulk of the work will be in creating the functionality of Github or in setting up a website/interface to communicate with it)

Apparently you can host websites on GitHub. It would be funny to host our GitHub version on GitHub. Although probably unneccessary

Here is a customizable PHP web interface for git known as git-php

Very basic web interface for Git that can be further customized

##Metadata, Issues, and Checksums: "Git Has Integrity Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy. You can’t lose information in transit or get file corruption without Git being able to detect it.

The mechanism that Git uses for this checksumming is called a SHA-1 hash. This is a 40-character string composed of hexadecimal characters (0–9 and a–f) and calculated based on the contents of a file or directory structure in Git. A SHA-1 hash looks something like this:

24b9da6552252987aa493b52f8696cd6d3b00373 You will see these hash values all over the place in Git because it uses them so much. In fact, Git stores everything not by file name but in the Git database addressable by the hash value of its contents."

⚠️ **GitHub.com Fallback** ⚠️