====== Notes in Progress ====== ==== Resources to consume ==== * 3blue1brown youtube channel with numerous videos on math with animations\\ * Neural Net Deep Learning 1, ft hot chick\\ * Neural Net Deep Learning 2, done\\ * Neural Net Deep Learning 3, Back Propagation, not public yet\\ * online html book\\ * website on Google DeepMind bootcamp on Reinforcement Learning * youtube from RL bootcamp * Christopher Ohlah's blog, multiple articles, visualizations, article on NHIST * Distill Blog, co-edited by Christopher Ohlah * https://www.tensorflow.org/get//started/get//started tensorflow getting started, 1/3 of the way thru the first tutorial ==== great thoughts ==== TensorFlow and Caffe Theano, python package for machine learning supported by Montreal Institute for Leaning Algorithms (MILA) from 2010 to 2017. Eclipsed by TensorFlow and Caffe. Caffe, python package for machine learning supported by Berkeley AI Research (BAIR). Yoshua Bengio, Head of Montreal Institute for Leaning Algorithms (MILA) Yangqing Jia (贾扬清), created Caffe at Berkeley, went to Google Brain and worked on TensorFlow, now at Facebook working on Caffe2 and ONNX. [ONNX is the attempt by Facebook and Microsoft to keep themselves relevant under the onslaught of TensorFlow.] equation drawing with rmarkdown, latex http://www.statpower.net/Content/310/R%20Stuff/SampleMarkdown.html\\ python is installed here\\ C:\Users\John\AppData\Local\Programs\Python\Python36\\ language imprecision\\ In week 1, he uses the term "dimensional" incorrectly. In week 2, Andrew Ng uses the word "multivariant linear regression" incorrectly m = number of rows in the training data n = number of independent variables in each row independent variable - aka predictor, aka feature there is no such thing as a 4-dimensional vector a vector is a 1-dimensional matrix and a 1-dimensional tensor this vector has 4 thingies occuring in one dimension elements of an array member of a set vector in physics vs vector in matrix math vector = array = a list of things, members, elements, numbers with only two features, the cost function can be represented by a surface or contour feature scaling = normalization get every feature into -1 < x < +1 or -3 < x < +3 or 0 < x < +1 - this last is called normalization -.5 < 0 < +.5 this is called mean normalization, where the mean value is zero feature scaling improves the efficience of gradient descent demonstrated by the 2 variable case, where if one variable has a range of 1 to 2000 and the other variable has a range of 1 to 5, the resulting contour will be long and skinny and gradient descent works much better if the countour is zero. so the squared errors of each variable are roughly similar in scale to one another if feature scaling is not done to the data before hand it will be implicitly done by the weights so by feature scaling up front, we give more meaning to the resulting weights alpha = learning rate try .003, .03, .3, 3 plot the curve of the cost function y, by number of iterations x in should be the kind of curve that is an asymptote (a parabola that never reaches the 0 origin on either axis) polynomial regression using a curve instead of a line, square, cubic, quadratic, etc same use of least squares and gradient descent feature scaling becomes important, because of squares and cubes in the equation, which increase the artificial weight of each feature what is the difference between regression and neural net? in a neural net, each neuron implements a regression analysis the normal equation in the beginning Andrew Ng uses the notation X//{ij}, and later X^{(i)}//j for an element in a matrix regression analysis linear regression non-linear regression fitting data to a line or curve least squares normal equation gradient descent 1809, Carl Frederich Gauss formalized the least squares method. Many predecessors had worked through the 1700s refining the concept of minimizing the total error in observations vs predictions of planetary orbits. The term "regression" was not used until the 1920s. 1920, The word "regression" was first used to refer to the process of using the least squares method to fit data to a line or curve. https://en.wikipedia.org/wiki/Gallery//of//curves a parabola has two x,s for every y what about a curve that has two y's for every x? the function f(x) would have two results for every x. like a circle for example. how do you graph a circle? x^2 + y^2 = 0 if the data matches an algebraic curve non-linear data can sometimes be solved with either linear or nonlinear regression for the former, we fudge the data until it fits a straight line linear nonlinear curvilinear http://blog.minitab.com/blog/adventures-in-statistics-2/what-is-the-difference-between-linear-and-nonlinear-equations-in-regression-analysis http://blog.minitab.com/blog/adventures-in-statistics-2/curve-fitting-with-linear-and-nonlinear-regression https://en.wikipedia.org/wiki/Curvilinear_coordinates --------------- x = 5184 range = max - min = 8836 - 4761 = 4075 mean = (7921+5184+8836+4761) / 4 = 6675.5 smean = scaled = 5184/(8836 - 4761) = 1.2721472392638036 normalized = scaled-mean/range = x - mean / range 5184 - 6675.5 / 4075 min 4761 / 4075 8836 / 4075 ----------- this is the Octave command to calculate theta using normal equation \theta = pinv({X}'*X)*{X}'*y feature scaling is NOT necessary in normal equation method Sigmoid - old school, replaced by ReLU\\ ReLU Rectified Linear Unit function, straight horizontal line to a point then an angle up\\ source minute 17:00 of https://www.youtube.com/watch?v=aircAruvnKk\\ ==== Notes ==== Moreover, the holders of this much data remain in the hands of the private sector in the big six: Amazon, Facebook, Google, Microsoft, Apple and Baidu."\\ ----\\ GAN Generative Adversarial Network, see Wikipedia\\ used to generate additional data. more data helps the training. "augmented data"\\ -----\\ Datasets\\ News\\ The web\\ A crawler\\ Output of a crawler, which links proved fruitful\\ Pandora music\\ \\ Build an AI that can create data sets\\ \\ Datasets to build\\ Map shapes\\ Treasures\\ Crawler - predict which links to follow\\ \\ Teach an AI to play plunder\\ Let the ai identify boring vs addictive game play\\ \\ How does ocr work? Pattern matching\\ \\ How to train s human\\ Human gets tired\\ Machine does not\\ Need efficient training techniques\\ \\ How is a glossary done in wiki pedia?\\ \\ Look at everything humans do and figure out how to do it by machine.\\ \\ Massage data into a training set.\\ Decide what to do, vs how to do it.\\ \\ Technology\\ Logic\\ Prayer\\ \\ Matrix\\ Variables × weights\\ \\ Vector, matrix. Now what is a tensorflow.\\ \\ ----\\ \\ \\ https://www.technocracy.news/index.php/2017/08/27/147-transnational-companies-run-world/\\ \\ Find the Illuminati\\ \\ Draw the hierarchy of humans\\ Hold them to account\\ \\ Keep the population under control\\ Keep the environment conducive to human life\\ \\ Really? But what about the future life forms?\\ Attempt s to freeze a moment of happiness, to create a Utopia, backfire by repressing "natural" development.\\ \\ Nothing needs to be done. Everything is perfect. \\ Enjoy.\\ Be.\\ \\ ----\\ \\ Use ai to geocode and chonocode, timestamp article s\\ to find historical maps, georeference, break into shapes, chronocode, timestamp, label\\ so that AGI can put human population movements into historical perspective\\ \\ AGI needs historical perspective.\\ Therefore, all data must be timestamped.\\ \\ all data must be timestamped and geocoded\\ \\ perhaps the AGI will be able to figure out why people tweet\\ \\ ----\\ \\ Game-playing API.\\ Create multiple clones of itself. Have the clones play against each other in leagues and tournaments. Observe the different personalities and styles of play that develop. Bring the learning of all the clones back into itself, delete the clones, and go defeat the humans.\\ \\ Meta learning. Use one neural network to create and test multiple neural networks to pick the best one. https://arxiv.org/abs/1703.01041\\ \\ Sense of self\\ Ego\\ Metaphysics\\ \\ Tools\\ Tools become machines\\ Machines become intelligent\\ Child supersedes the father\\ \\ \\ Mimic the brain\\ \\ Mimic natural selection\\ \\ \\ As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network," both implemented using deep neural network technology.[2][10] A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a nakade pattern) is applied to the input before it is sent to the neural networks. from Wikipedia: alphago In uncharted territory—where one would expect learning to be most beneficial—an agent must be able to learn from its own experience. An endless proliferation of 3D environments: In the past couple of years there have been a bunch of new large-scale AI-training environments released ranging from Microsoft's Minecraft-based Malmo to DeepMind's Quake-based 'DeepMind Lab', to the Doom-based VizDoom. three types of machine learning: supervised unsupervised reinforcement application robotics game playing operations management in a factory agent in environment reward state action explore (new territory or strategy) exploit (what he already knows) Markov Decision Process goal: maximize reward over time policy: optimize reward over time policy: agent's behavior function value function: how good is each state and/or action model: agent's representation of the environment q-learning r(s,a) immediate reward Q(s,a) values features defined by expert deep learning to discover features on its own https://deepmind.com/blog/deep-reinforcement-learning/ https://en.wikipedia.org/wiki/Markov_decision_process#Reinforcement_learning Kurzweil, Ray (2005). The Singularity is Near. New York: Penguin Group. ISBN 9780715635612. Is there some kind of thinking that is not programmed? -------------- evolutionary algorithm = genetic algorithm 3 parts, loop thru these three steps selection crossover mutation population breeds fittist survive genome - something we want to breed and improve over time multiple genes gene = parameter selection fitness function decides which genomes can breed crossover breeding offspring, new population mutation mutate the new population mutation function randomly modifying parameters since the 80s alternative to back propagation alternative to reinforcement learning neural network zoom "evolution strategies as an alternative to reinforcement learning" ----------- classification of environments Deterministicness (deterministic or stochastic or Non-deterministic): An environment is deterministic if the next state is perfectly predictable given knowledge of the previous state and the agent's action. or "strategic" it there is one other agent. like a chess game. Staticness (static or dynamic): Static environments do not change while the agent deliberates. Observability (full or partial): A fully observable environment is one in which the agent has access to all information in the environment relevant to its task. Agency (single or multiple): If there is at least one other agent in the environment, it is a multi-agent environment. Other agents might be apathetic, cooperative, or competitive. Knowledge (known or unknown): An environment is considered to be "known" if the agent understands the laws that govern the environment's behavior. For example, in chess, the agent would know that when a piece is "taken" it is removed from the game. On a street, the agent might know that when it rains, the streets get slippery. Episodicness (episodic or sequential): Sequential environments require memory of past actions to determine the next best action. Episodic environments are a series of one-shot actions, and only the current (or recent) percept is relevant. An AI that looks at radiology images to determine if there is a sickness is an example of an episodic environment. One image has nothing to do with the next. Discreteness (discrete or continuous or ): A discrete environment has fixed locations or time intervals. A continuous environment could be measured quantitatively to any level of precision. Simulated : a separate program is used to simulate an environment, feed percepts to agents, evaluate performance, etc. ---------------- feed forward NN -input hidden output, two weight matrices between input times weight, bias, activate RNN - for sequential data, like video or audio a third weight matrix connects hidden layer back to itself vanishing gradient problem exploding gradient problem an RNN cell can be replaced with an LSTM cell x * weight plus bias, activate common activation functions: sigmoid tanH ReLU RNN's are used for translation and question answer chatbots a RNN cell remembers it's previous state blog comments: classification by sentiment: positive negative neutral