Podcast: Play in new window | Download | Embed
Episode Summary:
This particular podcast is the initial episode in a new special series of episodes designed to provide commentary on a new book that I am in the process of writing. In this episode we discuss books, software, courses, and podcasts designed to help you become a machine learning expert!
Show Notes:
Hello everyone! Welcome to the 78th podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner. This particular podcast is actually the initial episode in a new special series of episodes designed to provide commentary on a new book that I am in the process of writing. To identify this special series of episodes about my new book I am introducing a prefix to the title of each episode which indexes a particular book chapter. This episode is a commentary on the book’s preface so I am using the notation: LM101-078: Ch0 as a prefix. In this first episode, we discuss possible ways one can become a machine learning expert.
First, it should be emphasized, that it is not enough to be able to perfectly implement a machine learning algorithm in software in order to become an expert in machine learning. Unlike the typical type of algorithm taught in Computer Science courses, there is an experimental component to Machine Learning. If you program up a search algorithm based upon the Bubble Sort Algorithm, debug the algorithm, and get the algorithm to be correctly implemented and running in computer software, then the algorithm will immediately work. It can be used to sort or search through any collection of objects regardless of the content of those objects and it’s efficiency will not vary regardless of whether you are sorting magazine article titles, recipe titles, book titles, or song titles.
This is entirely different from the typical machine learning algorithm. The typical machine learning algorithm looks for statistical regularities in the data. This means that you might correctly write a computer program to implement a machine learning algorithm. You have written the software code for the algorithm perfectly with absolutely no flaws! The software code runs perfectly without software bugs. Now you are going to try it out on a collection of recipe titles. Perhaps you will train the machine learning algorithm on a collection of recipes and then give it a new recipe that it has never seen before the machine learning algorithm will find the recipes which are most similar to that new recipe. It would not be surprising if your machine learning algorithm which was perfectly programmed totally fails this task.
On the other hand, if you take the exact same software program which implemented the machine learning algorithm which did not work with recipes and now use that exact same program on a different data set you may be surprised. For example, suppose you take the machine learning algorithm which failed on the recipe search problem and train the machine learning algorithm with a collection of songs and then give it a new song that it has never seen before and ask the algorithm to find the songs which are most similar to that new song. You do this experiment using the exact same machine learning software which failed on the recipe task and to your great surprise the machine learning algorithm works perfectly!!
How could this be? The exact same software algorithm works perfectly for one data set and fails miserably on another data set. The reason is that machine learning algorithms extract out statistical regularities from data sets. Different machine learning algorithms will be focusing on different statistical regularities.
And so the bottom line is that learning to use machine learning software or even learning to implement machine learning software code is not sufficient to become a machine learning expert. Having excellent computer programming skills and the ability to debug computer programs is just one of the many basic skills which is required to become a machine learning expert. In addition, you must have skills for selecting, analyzing, designing, testing, and evaluating machine learning algorithms. And, furthermore, as an expert it is your responsibility to clearly communicate to the other members of your team how to implement the algorithms and explain how the algorithms work. So communication skills are also fundamentally important.
As you obtain more experience with actually using machine learning algorithms and writing software to implement machine learning algorithms, it is necessary for you to also obtain a deeper understanding of the machine learning algorithms which you are using. In particular, it is important for you to understand for each different machine learning algorithm architecture what types of statistical regularities in the data that machine learning algorithm is designed to detect. By having a good understanding of the type of data set you are using and having a good understanding of the types of statistical regularities that a machine learning algorithm is designed to detect, this can help enormously in making good decisions regarding which machine learning algorithms to employ for a particular task, how to modify existing or create novel machine learning algorithms to improve performance, and how to evaluate those algorithms. Your communication skills will also improve since it is easier to communicate complex ideas if you have a better understanding of those ideas.
The first step of your journey towards becoming a machine learning expert involves checking out some of the excellent machine learning software development environments and toolkits that are readily available. Many of these tools are available for free. A list of some of the more important software development environments and toolkits is provided in the show notes in this episode. Just to whet your appetite, I will briefly list them now.
Software Development Environments for Machine Learning include:
- MATLAB Software Development Environment (www.mathworks.com)
- R Studio Software Development Environment (www.rstudio.com)
- Julia Programming Language (www.julialang.org)
- Python Programming Language (www.python.org)
- H20 (https://www.h20.ai)
Machine Learning Software Frameworks include:
- TensorFlow (www.tensorflow.org)
- Microsoft Cognitive Toolkit (https://docs.microsoft.com/en-us/cognitive-toolkit/)
- Scikit-Learn Python Toolkit (https://scikit-learn.org/stable/)
- PyTorch (https://pytorch.org)
- Keras: The Python Deep Learning Library (https://keras.io/)
- Gym ( www.gym.openai.com) [reinforcement learning algorithms)
- Theano (Python Toolkit) (http://deeplearning.net/software/theano/)
- FastAi V1 for PyTorch (https://www.fast.ai)
In addition, there are many useful software-oriented machine learning textbooks which support the development and evaluation of a wide range of machine learning architectures in computer software. Such textbooks can greatly accelerate your ability to learn specific machine learning software development environments for the purposes of developing and evaluating machine learning algorithms.
Here is a short list of books which might be helpful. This is not intended to be an all-inclusive list of great resources but rather just a few examples of the types of resources that someone interested in getting into machine learning should consider. I may have unintentionally omitted other great books and resources. The book list is as follows.
Software-Oriented Machine Learning Books:
- An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani
- Hands-on Machine Learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems by Geron
- Introduction to Machine Learning with Python: A guide for data scientists by Muller and Guido
- Python
Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn,
and TensorFlow 2 by Raschka and Mirjalii.
Books such as these are a good place to start. They introduce the basic concepts of machine learning, discuss practical computer software that can be immediately applied to real-world machine learning problems. In addition, they provide important insights into the strengths and weaknesses of various algorithms. These insights are used to not only help you select the correct machine learning algorithm for a particular problem but properly evaluate the machine learning algorithm. Some of these books also provide insights into Data Cleaning which is important for preparing your data sets for processing by both classical learning machines and deep learning machines. Some of these books also provide insights into the machine learning development and deployment process. If you are listening to this podcast and have some computer programming experience, then you should have no difficulty in mastering the material any of these books. There is virtually no math background required to read any of these books although these books will contain math and will vary in the amount of math they contain.
Getting your hands dirty and obtaining practical experience with applying machine learning algorithms to real-world data sets is essential. You can’t move forward in machine learning until you have actually spent a lot of time experimenting with lots of machine learning algorithms. While you are experimenting with machine learning software, it will be helpful if you check out the free series of lectures by Stanford Professor Andrew Ng. A hyperlink to his machine learning lecture series can be found in the show notes. Also, don’t forget the Learning Machines 101 podcasts! You can listen to the podcasts or simply go to my website www.learningmachines101.com and read the podcasts for free. One advantage to going to my website is that I provide additional supplemental resources in the show notes of each podcast. Finally, check out the free MOOCS at fast.ai which include courses such as “Practical Deep Learning for Coders”, “Introduction to Machine Learning for Coders”, and “Computational Linear Algebra”.
Note that acquiring expertise in applying machine learning algorithms to real data sets, does not occur overnight. I would estimate that to get beyond the novice level you will probably require at least six months or a year of experience in machine learning algorithm development and evaluation in addition to checking out Professor Ng’s lectures and my podcast lectures and working through many of the parts of at least one of the four software-oriented books I previously mentioned. At that point, you should be ready to apply for entry level positions in data science and machine learning.
Ultimately, you will want to acquire a deeper knowledge of various types of machine learning algorithms. This requires as a minimum knowledge of lower-division undergraduate linear algebra and an upper-division undergraduate calculus-based probability theory course. These courses are usually required courses in most computer science, engineering, or mathematics programs so its quite possible that you have already taken these courses. However, if you have not taken these courses before or you took them a long time ago and forgot the contents of those courses, do not despair!!
Free MIT Online Courses
MIT online courseware offers the FREE upper-division undergraduate calculus-based probability theory course “Probabilistic Systems Analysis and Applied Probability” taught by Dr. Tsitsikilis.
MIT online coursework also offers the FREE lower-division undergraduate linear algebra course “Linear Algebra” taught by Dr. Strang.
If you haven’t had the equivalent of these free courses and you want to go beyond the novice level in machine learning, then it is absolutely necessary you acquire the knowledge in these two courses in order to move forward. I provide hyperlinks to these courses and the textbooks used in these courses in the show notes of this episode. You don’t have to watch these videos, another strategy is to simply study the textbooks used in these courses or any equivalent undergraduate texts covering the topics of linear algebra and calculus-based probability theory.
Intermediate Level Machine Learning Books
Once you have obtained some background in linear algebra and calculus-based probability theory you should be ready to move into the following intermediate-level machine learning textbooks.
My current three favorite machine learning textbooks which I frequently use for reference purposes are the following textbooks.
- Pattern Recognition and Machine Learning by Bishop
- The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
- Deep Learning by Goodfellow, Bengio, and Courvill
Actually all three of these books are available for free on the web and I provide hyperlinks in the show notes to help you find the free versions of these books. Note that these books do require some mathematics background in order to understand them but the mathematics background is relatively minor. Assuming you have taken the equivalent of lower-division linear algebra and upper-division calculus-based probability theory, then you should have no difficulty understanding the basic ideas of topics covered in these books although you may have difficulty understanding some sections of these books and you may not perceive nuances and technical details which are tersely mentioned. This is fine, if you can get the basic ideas of approximately 60% of the material in one of these intermediate-level books you will be in good shape! Don’t try for 100%!
Mastery of about 60% of the material in the three books by Bishop, Hastie et al., and Goodfellow et al., and in addition, obtaining at least 1 or 2 years of machine learning algorithm development and evaluation experience using real world data moves you to the intermediate level.
If you understand most of the basic ideas in these intermediate level machine learning books and have at least 5 years of experience in machine learning software algorithm development and evaluation, you may consider yourself to be a machine learning expert. Note that the field of machine learning is so broad that these days every machine learning expert will have deep knowledge in some areas yet shallow knowledge in other areas. To become a machine learning expert you should acquire deep knowledge in at least several areas of machine learning.
Suppose, however, you want to become an elite machine Learning expert then you will need to further increase your expertise in machine learning. There are many ways to do this. For example, obtaining even more experience in machine learning software development and evaluation with a much broader range of algorithms or obtaining a greater in-depth theoretical knowledge of why various machine learning algorithms work.
The remainder of this podcast will focus upon just one out of many possible different ways to increase your expertise in machine learning. This will provide you with powerful tools and an exceptionally deep understanding of machine learning algorithms that is not readily available to others. It is based upon the following observation. The books by Bishop, Hastie et al., and Goodfellow contain some mathematics but they omit many important specific mathematical details.
The deliberate omission of mathematical details in intermediate-level machine learning textbooks has important advantages. Mathematical details can be distracting and can make it very difficult to absorb the really important concepts. The books by Bishop, Hastie, and Goodfellow focus in on the really important concepts which are necessary to immediately move beyond the novice machine learning level. Also, if you have a PhD in Mathematical Statistics, PhD in Computer Science, PhD in Electrical Engineering, or PhD in Econometrics then you really don’t need those mathematical details since you can pick them up by reading research papers and other graduate textbooks in the areas of nonlinear optimization theory and mathematical statistics which go through the mathematical theory in detail.
Still, if you want to become an elite machine learning expert then details are always important and can always add considerable value to your work. Mastery of technical details supports the communication, analysis, design, and evaluation of machine learning algorithms. In fact, it can also provide insights into software development because the mathematical details can reveal how two algorithms that appear on the surface to be quite different are actually special cases of a larger class of algorithms.
The presentation of technical details are best presented as theorems. Given certain conditions are true, then the algorithm will have this behavior. Statements such as “under very general conditions the algorithm will work most of the time” are vague and not acceptable. They are less useful for supporting the analysis and design of entirely novel machine learning algorithms. They are less useful for specifying robust methods for algorithm development and evaluation.
How the theorems are written is also extremely important. If you have theorems of this type and the conditions are not really usable in practice or are impossible to understand, then the theorems are not very useful. Also the conclusion of the theorem needs to be useful and helpful in practical machine learning algorithm analysis and design problems.
Let me provide you with some examples.
Batch learning and adaptive gradient descent learning methods are widely used in machine learning. Discussions of batch and adaptive gradient descent learning methods may be found in Episode 16, Episode 63, Episode 68 of learning machines 101. However, there are virtually no textbooks available that provide mathematical theorems, cookbook procedures, and explicit examples of how to apply those theorems to determine whether a specific batch or adaptive machine learning algorithm will converge to a designated set of solutions on a multimodal nonconvex objective function.
In addition, gradient descent methods require that matrix derivatives are computed but most machine learning books do not explain how to take the derivative of a complicated objective function with respect to its parameters. That is, matrix calculus methods, theorems, and cookbook procedures are not discussed. True, many software development environments such as TensorFlow contain automatic matrix differentiation routines. One might argue why learn matrix calculus since these automatic matrix calculus differentiation routines are readily available. But these routines are limited and can only be applied to specific network architectures. There are many network architectures which have not yet been invented for which these routines are not applicable.
In addition, in order to use automatic matrix differentiation routines properly it is necessary to have expertise in matrix differentiation. It’s true that we use calculators to add numbers together but that doesn’t mean that the ability to add two numbers together should be an arcane skill. If it is an important addition calculation it is always helpful to check that you are using the calculator correctly by comparing the calculator results to your hand calculations.
Another important topic is Monte Carlo Markov Chain methods. These methods are discussed in Episodes 39, Episode 42, and Episode 43 of Learning Machines 101. Although such methods are widely used, explicit theorems for checking the convergence of Monte Carlo Markov Chain algorithms are not provided in introductory machine learning books. Most books will mention many of the conditions in an informal way but don’t explicitly emphasize when those conditions are sufficient and when those conditions are not sufficient to ensure convergence. For example, the conditions for convergence of a Monte Carlo Markov chain on a finite state space are much simpler than the conditions for convergence of a Monte Carlo Markov chain on an uncountably infinite state space. In many discussions of convergence of Monte Carlo Markov chain algorithms in intermediate-level machine learning textbooks careful discussions of the conditions for convergence of Monte Carlo Markov Chain algorithms are not clearly provided.
And finally, most machine learning books do not provide explicit theorems for investigating generalization performance. For example, how does the prediction error of a machine learning algorithm when it is tested on novel test data change as more data is collected? It would be very desirable to have explicit theorems which allow one to estimate the generalization performance of your model on a test data set using only training data. Cross-validation type simulation methods are widely used in machine learning which allow performance on a novel data set to be estimated from only training data. But despite the wide use of cross validation simulation methods, methods which allow estimation of performance on novel data sets using training data using simple formulas are rarely used. Given the heavy computational burden associated with cross-validation type simulation methods, it would be very helpful to have simple formulas for estimating cross validation results which would give similar results using a tiny fraction of the computational resources of simulation methods.
Such formulas could be used to design procedures for comparing competing models, calculating confidence levels on predictions, and determining what aspects of a network architecture are relevant. These ideas are widely used in classical statistics for models such as linear regression and logistic regression but there does not exist a thoughtful discussion in the existing scientific literature regarding the explicit conditions for these methods to work for multimodal empirical risk functions which have multiple local and multiple global minimizers.
For example, what are the explicit conditions that ensure the validity of the model selection criteria AIC and BIC which were informally discussed in Episode 76 and Episode 77 of Learning Machines 101.
So, for this reason, I have been working on an advanced machine learning textbook that addresses all of the above issues. In particular, this book explicitly provides theorems and cookbook procedures for checking if your adaptive or batch learning gradient descent algorithm converges, theorems and cookbook procedures for computing derivatives for the purpose of deriving novel adaptive or batch learning gradient descent algorithms, theorems and cookbook procedures for checking convergence of Monte Carlo Markov Chain algorithms, and theorems and cookbook procedures for characterizing the generalization performance of highly nonlinear machine learning network architectures.
In my book, I have invested considerable effort into showing how a small number of theorems can be applied to a very broad range of widely used machine learning algorithm architectures. These theorems have been specifically custom-designed by myself to support machine learning applications. The theorems possess easily verifiable assumptions and interpretable assumptions with practically relevant conclusions.
I have tried to keep the number of theorems small and spend a lot of time providing explicit examples on how to apply those theorems in practical applications. I’ve actually been working on this book in my spare time for more than a decade and the concepts in the book represent the culmination of my research and teaching activities in the area of deep learning for more than thirty years. I’ve been working in this area for decades, long before, the recent great increase in interest in machine learning and deep learning methods.
The book’s title is “Statistical Machine Learning: A Unified Framework” and here is an excerpt from the preface that provides a general overview of the book. “The recent rapid growth in the variety and complexity of new machine learning architectures requires the development of improved methods for designing, analyzing, evaluating, and communicating machine learning technologies. This mathematics textbook provides students, engineers, and scientists with tools from mathematical statistics and nonlinear optimization theory to become experts in the field of machine learning. In particular, the material in this text directly supports the mathematical analysis and design of old, new, and not-yet-invented nonlinear high-dimensional machine learning algorithms.”
I designed this mathematics textbook for students pursuing their masters degree or doctorate degree in statistics, computer science, electrical engineering, or applied mathematics. However, the unique feature of the book is that it should be accessible to highly motivated undergraduate students, professional engineers, and multidisciplinary scientists in fields such as cognitive science, computational neuroscience, econometrics, and mathematical psychology. The text is very self-contained and includes short sections on the relevant real analysis, linear algebra, matrix calculus, random vectors and random functions from a measure theory perspective, and stochastic convergence. These short sections were specifically written and designed to support the later chapters which involve the statement of key machine learning theorems.
The actual mathematical prerequisites for my book are very minimal. Only knowledge of lower-division linear algebra and upper-division probability theory is required. This background in linear algebra and calculus-based probability theory can be obtained by studying the two free MIT online courses which I previously recommended. The hyperlinks to those courses is provided in the show notes.
In my textbook, I teach you strategically selected aspects of relevant elementary real analysis, measure-theory probability for representing random vectors consisting of discrete and continuous random variables, and stochastic convergence concepts. If you have the minimal prerequisites of linear algebra and calculus-based probability theory, you will find this book challenging yet quite straightforward and accessible to read. Again, I have tried to provide numerous examples and applications in machine learning so that the mathematical theory is directly connected to commonly encountered machine learning algorithms.
In subsequent episodes of this special book series nested within the standard Learning Machines 101 series, we will go through different parts of the book. The book which I am writing consists of 16 chapters so I’m planning on having 16 episodes following this episode.
My current plan for the next 16 months is as follows. Each podcast episode will begin with a 5-10 minute Chapter Overview, then I plan to spend about 5-10 minutes explaining why you should be interested in that Chapter, then I plan to spend about 5 minutes providing guidance for students who are using the book for self-study regarding how they might study that chapter in a productive way, and finally I will spend another 5 minutes providing instructors who are using the book in a course with some guidance regarding how to teach that book chapter in a course.
To help you begin your journey towards becoming a machine learning expert, I would like to share some commentary on quotes located at the end of the preface of my book “Statistical Machine Learning”. These quotes provide a helpful summary of principles and advice which I have found valuable in my multi-decade quest for understanding the nature of machine learning algorithms.
The first quote is by Thomas Alva Edison, the well-known 20th Century American Inventor. Edison states “Genius is 1% inspiration and 99% perspiration”. My paraphrase of this quote is that bringing novel creative ideas to fruition takes a lot of work. But a corollary is that it is not easy to become an expert in machine learning. It will take a lot of work.
The second quote is by Miyamoto Musashi, 16th Century Samurai Warrior. Miyamoto states: “From one thing, know ten thousand things.” My paraphrase of this quote in the context of machine learning algorithm analysis and design is that the novice will see 10,000 entirely different machine learning algorithms, while the expert will see 10,000 special cases of one general machine learning algorithm. So, if you obtain mastery and complete understanding of just that one general machine learning algorithm, you instantly have that same mastery and understanding of thousands of specific machine learning algorithms!
The third quote is by Lao Tzu, 1st Century Chinese Philosopher. Lao Tzu states: “There are many paths to enlightenment. Be sure to take one with a heart.” I attribute this to Lao Tzu but I really got this from the late Professor David E. Rumelhart who was one of the co-inventors of deep learning in the mid-1980s. When I was a postdoctoral scholar at Stanford University in David Rumelhart’s lab in the mid-1990s, David Rumelhart was fond of saying that “there are many paths to understanding”. When we go to school, knowledge is taught to us in a particular order. For example, first you learn algebra as a high school student, then you learn analytic geometry as a high school student, then you learn Riemann calculus as a high school student, then later as a graduate student you might learn Lebesgue Integration and Measure Theory. This order of learning is partially hierarchical and partially arbitrary. Analytic geometry is not fundamentally easier than Measure Theory. Measure Theory and Lebesgue Integration are not more advanced than high school calculus. To make our educational system efficient, we create an arbitrary ordering of knowledge. This improves efficiency so that an instructor in Electrical Engineering can safely assume every graduate student has been exposed to the concept of a linear system. An instructor in Computer Science can safely assume that every graduate student is familiar with the concept of a finite state machine. An instructor in Mathematics can assume that every graduate student has taken an undergraduate course in real analysis.
In summary, although the acquisition of expertise in a particular area is, of course, fundamentally sequential and hierarchical, some of this hierarchical structure is natural while other components of this hierarchical structure are artificially created to facilitate teaching efficiency for the purpose of teaching large numbers of students. There are many paths to understanding! Follow either the traditional path or the non-traditional path whichever path gives you greater personal satisfaction and meaning to support your own personal quests.
The fourth quote is from Gustave Flaubert, 19th Century French Novelist, who states that: “The good God is in the details.” This means, in the context of mathematics, that when we get everything right. When we perfectly state the explicit assumptions that ensure an explicit conclusion holds then at that moment in time we have reached Nirvana! I mean this is actually an amazing moment of revelation! You have created a “true statement” which holds for hundreds of thousands of years into the future and ultimately lasts forever within the fictional universe you have created for that statement! Hopefully your fictional universe has something to do with reality!
The fifth and final quote is from Friedrich Nietzsche, 19th Century German Philosopher, who states that: “The devil is in the details.” This means, in the context of mathematics, that if we attempt to state explicit assumptions that ensure an explicit conclusion holds and we get everything perfect EXCEPT for one tiny detail then we basically have nothing. Our conclusion does not follow from our assumptions. Mathematics is an all-or-nothing deal: It’s either Nirvana or the Devil!! You can’t wave your hands. You’ve got to get everything right.
As I said before, my book is not yet published. That’s because I am still writing the book and fixing the remaining tiny details!!! Or, in other words, I’m continually looking for the Devil! The current plan is to publish the book sometime in the year 2021. UPDATE! Expected publication date is now May 2020! If you are a member of the Learning Machines 101 community, make sure to watch your email because when the book is released I want to provide special discounts to members of the Learning Machines 101 community so you won’t have to pay full price for my book. In addition, I will send out more details about the availability of my book as we move closer to the publication date!!!
If you are not a member of the Learning Machines 101 community, you can join the community by visiting our website at: www.learningmachines101.com and you will have the opportunity to update your user profile at that time. This will allow me to keep in touch with you and keep you posted on the release of new podcasts and the availability of the book which I am currently writing. You can update your user profile when you receive the email newsletter by simply clicking on the: “Let us know what you want to hear” link!
In addition, I encourage you to join the Statistical Machine Learning Forum on LINKED-IN. When the book is available, I will be posting information about the book as well as information about book discounts in that forum as well.
And don’t forget to follow us on TWITTER. The twitter handle for Learning Machines 101 is “lm101talk”! Also please visit us on ITUNES and leave a review. You can do this by going to the website: www.learningmachines101.com and then clicking on the ITUNES icon. This will be very helpful to this podcast! Thank you again so much for listening to this podcast and participating as a member of the Learning Machines 101 community!
Keywords: machine learning software, machine learning books, machine learning mathematics, teaching machine learning, statistical machine learning, tensorflow, pytorch, machine learning development environment
Software Oriented Machine Learning Books
- An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani
- Hands-on Machine Learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems by Geron
- Introduction to Machine Learning with Python: A guide for data scientists by Muller and Guido
- Python
Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn,
and TensorFlow 2 by Raschka and Mirjalii.
Intermediate Level Books for Machine Learning
- Pattern Recognition and Machine Learning by Bishop
- The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
- Deep Learning by Goodfellow, Bengio, and Courville
Software Development Environments for Machine Learning
- MATLAB Software Development Environment (www.mathworks.com)
- R Studio Software Development Environment (www.rstudio.com)
- Julia Programming Language (www.julialang.org)
- Python Programming Language (www.python.org)
- H20 (https://www.h20.ai)
Math Background to Progress to Intermediate Level
- Strang,
Gilbert. Introduction to Linear Algebra. 5th ed. Wellesley-Cambridge
Press, 2016. ISBN: 9780980232776.
- Bertsekas, Dimitri, and John Tsitsiklis. Introduction to Probability. 2nd ed. Athena Scientific, 2008. ISBN: 978188652923.
Machine Learning Software Frameworks
- TensorFlow (www.tensorflow.org)
- Microsoft Cognitive Toolkit (https://docs.microsoft.com/en-us/cognitive-toolkit/)
- Scikit-Learn Python Toolkit (https://scikit-learn.org/stable/)
- PyTorch (https://pytorch.org)
- Keras: The Python Deep Learning Library (https://keras.io/)
- Gym ( https://www.gym.openai.com ) [reinforcement learning algorithms)
- Theano (Python Toolkit) (http://deeplearning.net/software/theano/)
- FastAi V1 for PyTorch (https://www.fast.ai)
Related Learning Machines 101 Podcasts:
- Episode
16, Episode
63, Episode
68: Batch and Adaptive Gradient Descent Learning
- Episodes
39, Episode
42, Episode
43: Monte Carlo Markov Chains
Hi Dr. Golden,
I finished this podcast today. I enjoyed it very much. I will go through other podcasts. Thanks.
I have just one comment about Lao Tzu. You said he was 1st Century Philosopher. Actually he was born ~571 BC, more than 600 years earlier.
https://en.wikipedia.org/wiki/Laozi
https://baike.baidu.com/item/%E8%80%81%E5%AD%90/5448
David