Machine Learning

Algorithms

C4.5

Constructs a decision tree classifier. Uses Information gain/entropy. Single-pass pruning. Continuous and discrete. Human Readable.

K-means

Operates on continuous data Weakness: sensitivity to outliers and initial choice of centroids

Support Vector Machines

Learns Hyper planes, divides data into 2 classes. Margins: distance between the hyperplane and 2 closest data points from each class. Attempts to maximize margins.

Apriori

Learns association rules of a database of transactions. Works using size of itemset (associations of 2,3,n), support: number of transactions containing itemset / total transactions confidence: conditional probability of an item given other items in itemset.

Approach: join -> prune -> repeat (So a bottom up approach)

see Asaini’s Apriori and Aturhoo’s apriori

EM

Expectation-maximization Process: E-step -> M-step -> repeat E-Step calculates probabilities for assignments of each data point to a cluster M-Step updates model parameters based on cluster assignments Weaknesses: slows down in later iterations, gets stuck in local optima.

PageRank

Determines relative importance of some object within a network of objects. Has a networkx implementation.

AdaBoost

Multiple round learning of multiple classifiers Uses folds of training data on separate classifiers, weighting data that did that was hard on the previous round. Implemented in scikit-learn.

kNN

K-Nearest Neighbours. Lazy, only labels new data after training. Uses distance metrics, like Euclidean distance. Transform discrete data into continuous (such as hamming distance, binary features). Weakenesses: expensive on large datasets, weak on noisy data, large ranges can dominate distance metric, storage requirements, importance of a good distance metric.

Implemented in scikit-learn.

Naive Bayes

Implemented in scikit-learn.

CART

Uses Gini Impurity. (A measure of how often a random element would be incorrectly labelled). Cost-complexity pruning. Decision nodes can only be binary. Uses surrogates (pseudo data that resembles test features that send data to the left or right node appropriately)

Implemented in scikit-learn.

Principal Component Analysis

Independent Component Analysis

SKLearn

Linear Regression

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

def linear_regression(data):
    """ Acting on a ndarray of shape (n, 2) """
    means = data.mean(axis=0)
    errors = data - means

    error_sq = (pow(errors[:,0],2)).sum()
    errors = (errors[:,0] * errors[:,1]).sum()

    coefficient = errors / error_sq
    y_intercept = means[1] - (coefficient * means[0])
    return (y_intercept, coefficient)

data1 = np.random.random((20,2))
data2 = np.random.random((20,2)) * 10
data = np.row_stack((data1, data2))
reg1 = linear_regression(data1)
reg2 = linear_regression(data2)
print("Regression: {},{}".format(*reg))
plt.figure()
plt.style.use('classic')
plt.plot([0, 10], [reg1[0], reg1[0] + reg1[1]])
plt.plot([0, 10], [reg2[0], reg2[0] + reg2[1]])
plt.plot(data[:,0], data[:,1], 'ro')
plt.show()

import numpy as np
a = np.random.random((5,2))
variance = a.var()
assert(variance == pow((a - a.mean()),2).mean())

	 import numpy as np
 a = np.random.random((5,2))
 covar = ((a[:,0]-m[0])*(a[:,1]-m[1])).mean()

Links

http://efavdb.com/gaussian-processes/

https://github.com/edublancas/sklearn-evaluation http://billchambers.me/tutorials/2015/01/14/python-nlp-cheatsheet-nltk-scikit-learn.html

http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#stock-market

http://scikit-learn.org/stable/documentation.html

http://scikit-learn.org/stable/modules/naive_bayes.html

http://scikit-learn.org/stable/modules/preprocessing.html#binarization

http://scikit-learn.org/stable/user_guide.html

https://dashee87.github.io/data%20science/general/Clustering-with-Scikit-with-GIFs/

https://egghead.io/courses/introductory-machine-learning-algorithms-in-python-with-scikit-learn

https://github.com/aigamedev/scikit-neuralnetwork

https://pypi.python.org/pypi/scikit-neuralnetwork/0.3

https://scikit-learn.org/stable/modules/classes.html http://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/

http://blog.christianperone.com/2011/09/machine-learning-text-feature-extraction-tf-idf-part-i/

http://blog.webkid.io/datasets-for-machine-learning/

http://blog.yhat.com/posts/harry-potter-classification.html

http://cironline.org/blog/post/using-machine-learning-extract-quotes-text-3687

http://dalelane.co.uk/blog/?p=3381

http://en.wikipedia.org/wiki/Information_extraction

http://en.wikipedia.org/wiki/Reinforcement_learning

http://en.wikipedia.org/wiki/Restricted_Boltzmann_machine

http://humancompatible.ai/bibliography

http://inversed.ru/AIS.htm

http://johanneskopf.de/publications/pixelart/

http://kevintechnology.com/post/71621133663/using-machine-learning-to-recommend-heroes-for

http://machinelearning.wustl.edu/mlpapers/paper_files/LT17.pdf

http://machinelearningmastery.com/

http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

http://matt.eifelle.com/2013/05/02/comparison-of-optimization-algorithms/

http://mccormickml.com/2013/12/13/adaboost-tutorial/

http://michaeljflynn.net/2017/02/06/a-tutorial-on-principal-component-analysis/

http://ml4a.github.io/classes/itp-F18/

http://ml4a.github.io/classes/itp-S19/

http://ml4a.github.io/demos/itpf18_viewer.html

http://pandas.pydata.org/

http://paradise.caltech.edu/~cook/papers/TwoNeurons.pdf

http://pybrain.org/docs/

http://pyml.sourceforge.net/tutorial.html

http://radimrehurek.com/gensim/tutorial.html

http://rare-technologies.com/word2vec-tutorial/

http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/

http://rhizome.org/editorial/2016/nov/21/simulating-enron/

http://science.sciencemag.org/content/356/6334/183

http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#stock-market

http://scikit-learn.org/stable/documentation.html

http://scikit-learn.org/stable/modules/naive_bayes.html

http://scikit-learn.org/stable/modules/preprocessing.html#binarization

http://scikit-learn.org/stable/user_guide.html

http://seaborn.pydata.org/index.html

http://seat.massey.ac.nz/personal/s.r.marsland/MLbook.html

http://sebastianruder.com/optimizing-gradient-descent/

http://selfdrivingcars.mit.edu/deeptrafficjs/

http://synaptic.juancazala.com/#/

http://timdettmers.com/2015/03/26/convolution-deep-learning/

http://vertex.ai/blog/announcing-plaidml

http://visual.cs.ucl.ac.uk/pubs/handwriting/

http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html

http://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-1.html

http://www.cs.ucf.edu/courses/cap6412/fall2009/papers/Berwick2003.pdf

http://www.datasciencecentral.com/m/blogpost

http://www.datasciencecentral.com/profiles/blogs/17-short-tutorials-all-data-scientists-should-read-and-practice

http://www.deeplearningbook.org/

http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/

http://www.joelsimon.net/dimensions-of-dialogue.html

http://www.johndcook.com/blog/2016/07/14/kalman-filters-and-functional-programming/

http://www.johnwittenauer.net/machine-learning-exercises-in-python-part-1/

http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5B0%5D.pdf

http://www.learndatasci.com/k-means-clustering-algorithms-python-intro/

http://www.marioai.org/LearningTrack/getting-started

http://www.mattkenney.me/

http://www.public.asu.edu/~cbaral/papers/aaai2016-sub.pdf

http://www.reddit.com/r/MachineLearning/comments/3az4qj/large_scale_deep_neural_net_falling_down_the/

http://www.statsblogs.com/2017/03/19/ml-and-metrics-viii-the-new-predictive-econometric-modeling/

http://www.technologyreview.com/view/535451/data-mining-indian-recipes-reveals-new-food-pairing-phenomenon/

http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

http://yerevann.com/a-guide-to-deep-learning/

https://abebabirhane.github.io/

https://ai.stanford.edu/~kdtang/papers/cmj10-jazzgrammar.pdf

https://applyingml.com/

https://applyingml.com/resources/discovery-system-design/

https://applyingml.com/resources/ml-production-guide/

https://arxiv.org/abs/1611.04135

https://arxiv.org/abs/1706.09520

https://arxiv.org/abs/1707.05589

https://arxiv.org/abs/1708.05866

https://arxiv.org/abs/1709.02755

https://arxiv.org/abs/1801.04016

https://arxiv.org/pdf/1706.10199.pdf

https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-science-pdf-f22dc900d2d7

https://blog.acolyer.org/2016/12/16/tensorflow-a-system-for-large-scale-machine-learning/amp/

https://blog.acolyer.org/2017/01/04/learning-to-learn-by-gradient-descent-by-gradient-descent/

https://blog.jle.im/entry/purely-functional-typed-models-1.html

https://blog.openai.com/evolution-strategies/

https://blog.openai.com/science-of-ai/

https://blog.sicara.com/07-2017-best-big-data-new-articles-this-month-acb58d4bb15d

https://blog.slavv.com/the-1700-great-deep-learning-box-assembly-setup-and-benchmarks-148c5ebe6415

https://boringml.com/

https://crfm.stanford.edu/2021/10/18/reflections.html

https://cs.brown.edu/~dabel/blog/posts/misc/nips_2017.pdf

https://dashee87.github.io/data%20science/general/Clustering-with-Scikit-with-GIFs/

https://deepmind.com/blog/population-based-training-neural-networks/

https://developers.google.com/machine-learning/glossary/

https://distill.pub/

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice

https://egghead.io/courses/introductory-machine-learning-algorithms-in-python-with-scikit-learn

https://en.wikipedia.org/wiki/Association_rule_learning

https://en.wikipedia.org/wiki/Confusion_matrix

https://en.wikipedia.org/wiki/Machine_learning

https://eng.uber.com/accelerated-neuroevolution/

https://experiments.withgoogle.com/living-archive-wayne-mcgregor

https://gab41.lab41.org/the-10-algorithms-machine-learning-engineers-need-to-know-f4bb63f5b2fa?gi=4d857d2d5018#.zhgvlskgn

https://github.com/OpenMined/PySyft/tree/master/examples/tutorials

https://github.com/anvaka/sayit

https://github.com/asaini/Apriori

https://github.com/carpedm20

https://github.com/chartbeat-labs/textacy

https://github.com/ctgk/PRML

https://github.com/ethanfetaya/NRI

https://github.com/ethanfetaya/nri

https://github.com/facebook/MemNN

https://github.com/fchollet/keras-resources

https://github.com/iesl/institution_hierarchies

https://github.com/jakevdp/PythonDataScienceHandbook

https://github.com/jphall663/awesome-machine-learning-interpretability

https://github.com/markriedl/easygen

https://github.com/oliviaguest

https://github.com/pymc-devs/pymc

https://github.com/pytorch/pytorch

https://github.com/taolei87/sru

https://github.com/uber/causalml

https://github.com/vahidk/EffectiveTensorflow

https://github.com/vvanirudh/Pixel-Art

https://goelhardik.github.io/2016/10/04/fishers-lda/

https://gregorygundersen.com/blog/2020/02/09/log-sum-exp/

https://hackernoon.com/finding-magic-the-gathering-archetypes-with-latent-dirichlet-allocation-729112d324a6

https://hbr.org/2016/12/a-guide-to-solving-social-problems-with-machine-learning

https://homes.cs.washington.edu/~msap/atomic/

https://howwegettonext.com/silicon-valley-thinks-everyone-feels-the-same-six-emotions-38354a0ef3d7

https://huggingface.co/

https://idc9.github.io/stor390/notes/clustering/clustering.html

https://jeremykun.com/2017/02/27/the-reasonable-effectiveness-of-the-multiplicative-weights-update-algorithm/

https://karpathy.github.io/2015/05/21/rnn-effectiveness/

https://keras.io/#installation

https://koaning.io/posts/goodheart-bad-metric/

https://lifehacker.com/find-specialty-subreddits-with-this-tool-1831773643

https://lilianweng.github.io/posts/2022-02-20-active-learning/

https://magenta.tensorflow.org/music-transformer

https://magenta.tensorflow.org/studio

https://makingnoiseandhearingthings.com/2018/08/31/what-you-can-cant-and-shouldnt-do-with-social-media-data/

https://medium.com/@Francesco_AI/ai-knowledge-map-how-to-classify-ai-technologies-6c073b969020

https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471

https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a

https://medium.com/@gk_/text-classification-using-algorithms-e4d50dcba45#.ge2p15jwp

https://medium.com/@james_aka_yale/the-8-neural-network-architectures-machine-learning-researchers-need-to-learn-2f5c4e61aeeb

https://medium.com/@samim/musical-novelty-search-2177c2a249cc

https://medium.com/analytics-vidhya/building-a-powerful-dqn-in-tensorflow-2-0-explanation-tutorial-d48ea8f3177a

https://medium.com/analytics-vidhya/building-a-powerful-dqn-in-tensorflow-2-0-explanation-tutorial-d48ea8f3177a?_branch_match_id=763867126928800599

https://medium.com/syncedreview/the-staggering-cost-of-training-sota-ai-models-e329e80fa82

https://medium.com/thoughts-and-reflections/racial-bias-and-gender-bias-examples-in-ai-systems-7211e4c166a1

https://medium.freecodecamp.org/explained-simply-how-deepmind-taught-ai-to-play-video-games-9eb5f38c89ee

https://medium.freecodecamp.org/the-hitchhikers-guide-to-machine-learning-algorithms-in-python-bfad66adb378

https://mml-book.github.io/

https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

https://mubaris.com/2017-09-28/linear-regression-from-scratch

https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb

https://observablehq.com/@tophtucker/inferring-chart-type-from-autocorrelation-and-other-evils

https://openai.com/blog/faulty-reward-functions/

https://pypi.python.org/pypi/scikit-neuralnetwork/0.3

https://quality-diversity.github.io/papers

https://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/

https://rbharath.github.io/what-cant-deep-learning-do/

https://research.googleblog.com/2017/08/transformer-novel-neural-network.html

https://rockt.github.io/pdf/rocktaschel2017end-slides.pdf

https://sadanand-singh.github.io/posts/treebasedmodels/#.WXT8Kli2pUw.hackernews

https://scholarship.law.duke.edu/cgi/viewcontent.cgi?article=1315&context=dltr

https://setosa.io/ev/principal-component-analysis/

https://spinningup.openai.com/en/latest/

https://srconstantin.wordpress.com/2017/01/28/performance-trends-in-ai/

https://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification/20556654#20556654

https://stackoverflow.com/questions/34518656/how-to-interpret-loss-and-accuracy-for-a-machine-learning-model#34519264

https://techcrunch.com/2012/12/14/ray-kurzweil-joins-google-as-engineering-director-focusing-on-machine-learning-and-language-tech/

https://thestackcanary.com/from-python-pytorch-to-elixir-nx/

https://towardsdatascience.com/ai-architecture-f9d78c6958e0?gi=ba57e7504245

https://towardsdatascience.com/the-advent-of-architectural-ai-706046960140?gi=7ffeaec03907

https://towardsdatascience.com/the-most-underrated-python-packages-e22bf6049b5e

https://towardsdatascience.com/understanding-the-mathematics-behind-gradient-descent-dde5dc9be06e

https://towardsdatascience.com/use-kaggle-to-start-and-guide-your-ml-data-science-journey-f09154baba35?gi=67279a870d21

https://utkuufuk.github.io/2018/05/04/learning-curves/

https://web.archive.org/web/20030903185326/http://www.aisb.org.uk/news/mljresign.html

https://web.archive.org/web/20160729170700/http://numenta.com/

https://www.alexirpan.com/2018/02/14/rl-hard.html

https://www.cc.gatech.edu/~riedl/pubs/aaai-keg17.pdf

https://www.cs.ox.ac.uk/people/yarin.gal/website/blog_3d801aa532c1ce.html

https://www.cs.princeton.edu/news/bias-machine-internet-algorithms-reinforce-harmful-stereotypes

https://www.nature.com/articles/s41598-017-08028-4

https://www.oreilly.com/ideas/visualizing-convolutional-neural-networks

https://www.polygon.com/2018/10/25/18010142/machine-learning-president-2020-election-larp

https://www.sciencenews.org/article/machines-are-getting-schooled-fairness

https://www.techdirt.com/articles/20130110/14542221635/ibm-researcher-feeds-watson-supercomputer-urban-dictionary-very-quickly-regrets-it.shtml

https://www.technologyreview.com/s/608380/machines-are-developing-language-skills-inside-virtual-worlds/

https://www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/?utm_campaign=the_download.unpaid.engagement&utm_source=hs_email&utm_medium=email&utm_content=73419330&_hsenc=p2ANqtz-_bEFSwiNCaX2ewrkLJMvV6uqgEPuDv9EaDkl2ulug1XcyiDfE6ni0TOY6OWvbNpExPMpxFIHKWB8UZ8zA-hi55UyLMLQ&_hsmi=73419330

https://www.tensorflow.org/tutorials/mnist/beginners/

https://www.wired.com/story/machines-shouldnt-have-to-spy-on-us-to-learn/

https://www.wired.com/story/sobering-message-future-ai-party/

https://www.youtube.com/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs

https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/

https://zefsguides.com/

ref.machine_learning - jgrey4296/jgrey4296.github.io GitHub Wiki

Machine Learning

Algorithms

C4.5

K-means

Support Vector Machines

Apriori

EM

PageRank

AdaBoost

kNN

Naive Bayes

CART

Principal Component Analysis

Independent Component Analysis

SKLearn

Linear Regression

Links

⚠️ GitHub.com Fallback ⚠️

ref.machine_learning - jgrey4296/jgrey4296.github.io GitHub Wiki

Machine Learning

Algorithms

C4.5

K-means

Support Vector Machines

Apriori

EM

PageRank

AdaBoost

kNN

Naive Bayes

CART

Principal Component Analysis

Independent Component Analysis

SKLearn

Linear Regression

Links

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️