Miscellaneous - barialim/architecture GitHub Wiki

Table of Content

Jupyter

Project Jupyter is a nonprofit organization created to "develop open-source software, open-standards, and services for interactive computing across dozens of programming languages". Spun off from IPython in 2014 by Fernando Pérez, Project Jupyter supports execution environments in several dozen languages.

Jupyter Notebook ICE

Jupyter Notebook (previously known as IPython Notebook ICE/App) is a web-based Interactive Computational Environment ("ICE") used by Researcher, Data scientists and Developers for creating and sharing what's called Jupyter notebook document that contains live code, equations (figures, tables), visualizations (charts/graphs), and text.

It's basically a Client/Server app which you can install on your machine, and access the IDE-like app in browser.

Jupyter Notebook Kernal

A notebook kernel is a "computational engine" that executes the code contained in a Notebook document.

When you open a notebook document, the associated _kernel_is automatically launched. When the notebook is executed (either cell-by-cell or with menu Cell -> Run All), the kernel performs the computation and produces the results. ⭐ Depending on the type of computations, the kernel may consume significant CPU and RAM. Note that the RAM is not released until the kernel is shut-down. more on kernal shutdown

JupyterLab IDE

JupyterLab is the next-generation/version IDE for Jupyter notebooks (documents), code, and data. It has modular structure, where you can open several notebooks or file (html, text, Markdown etc.) as tabs in the same window.

It offers more of an IDE-like experience. ⭐ Its richer in terms of feature with an enhanced interface which can be extended through extensions like you would've seen with VSCode and it's plugins.

What is Jupyter notebook used for

It is largely used for data analysis, data visualization and further interactive, exploratory computing.

Languages Jupyter supports

It supports several languages like Python (IPython), Julia, R etc.

How it works in action

When you save a notebook, this is sent from your browser to the "notebook server", which saves it on disk (local/share drive) as a JSON file with a .ipynb extension.

notebook high-level architecture

The notebook server, not the kernel, is responsible for saving and loading notebooks, ⭐ so you can edit notebooks even if you don’t have the kernel for that language—you just won’t be able to run code. The kernel doesn’t know anything about the notebook document: it just gets sent cells of code to execute when the user runs them.

How to share your notebook or code

With Jupyter, you do your coding inside the website/IDE, and you can then share your code (notebook/document) with others via github.

Summary

  • Jupyter notebook: is a program code also known as document. It's a JSON-based file with .ipynb extension (also supports other formats i.e. HTML etc.).
  • Jupyter notebook server: is a web based IDE-like app responsible for writing, saving and loading notebook.
  • Jupyter Kernel: is responsible for executing notebook (.ipynb) file

Bleu Score

One of the challenges of machine translation is that given a French :fr: sentence, there could be multiple English (:uk: :us:) translations that are equally good translation of that French :fr: sentence.

So how do you evaluate a machine translation system if there're multiply equally good answers unlike image recognition where there is 1️⃣ right answer to measure the accuracy where as if you've multiple answers, how do you measure accuracy?

The way this is done conventionally is that there is some called the BLEU (bilingual evaluation understudy) SCORE is a string-matching algorithm for evaluating the quality of text which has been machine-translated from one natural language to another..

Understand how the BLEU score works

Let's understand few terms:

  • Reference translation is Human translation (HT/RT)
  • Candidate translation is Machine translation (MT)

Let's saying you're given a French sentence:

French: Le chat est sur le tapis

And you're given a Reference; a human generated translated version of this French sentence:

  • Reference 1 (Human 1 translated version): The cat is on the mat.
  • Reference 2 (Human 2 translated version): There is a cat on the mat.

They're both perfectly fine human translations of the French sentence.

Now what the BLEU score does is given a machine generated translation, it computes a score to measure how good is that machine translation.

blue score

blue score diagram

Conclusion:

BLEU algorithm measures the closeness of the machine translation to human reference translation taking translation length, word choice, and word order into consideration. It is used for machine translation, abstractive text summarization, image captioning and speech recognition