Tools for Practitioners of Machine Learning and A.I.
Jupyter Notebooks Are Essential
Jupyter Notebooks come in many forms. Anaconda installs a local version you can run on your PC. Several cloud-based, Freemium and totally free versions are used by many. The description in the tabbed section below outlines that.
If you are working on A.I. and Machine Learning projects in Data Science and you are a Python programmer, you will most likely have run into or used Jupyter Notebooks. We learn from and acquire code examples in Jupyter notebook format, ‘.ipynb’ all over the web. In communities like Kaggle and Github as well as ‘Towards Data Science’ articles on Medium. You just can’t miss it.
Jupyter notebooks came from the iPython project and spun out by Fernando Perez into Project Jupyter. And yes, we are grateful. It supports numerous languages such as Julia, R, Haskell, Ruby and my favorite, Python.
The concept of execution environments or ‘kernels’ in all the above-mentioned languages and is supported by Github. The aforementioned very popular source code management site is having some notebook rendering issues of late but they promise resolution soon. Kaggle also makes use of JN’s on their popular data science competition site.
What is absolutely amazing about Jupyter Notebooks is that you can create whole projects that live inside the notebook alongside the documentation of the code, inline visualizations, O/S directives and code-based package installation. In most cases, if you post a notebook on your Github repo and build directories for the data, forking of this entire project is made very easy and repeatable to others.
It does not take long to learn the nomenclature and user experience of JN’s. You will find yourself palming your forehead when you think about all the clunky IDEs you used just to build a simple machine-learning project with visuals and share it with someone else.
You can now post live notebooks on several cloud sites:
|Binder||Binds and runs notebooks linked back to Github. You can interact with and make changes to a notebook this way but the changes are not rewritten back to the originating gist on GH.|
|Kaggle||A data science competition community now owned by Google. Members post ‘Kernels’ which are notebooks containing their solution to a data science problem. They then strive to be #1 on the competition leaderboard. It is also a dataset repository and a great place to learn from other’s code.|
|Collaboratory||A notebook-centric Cloud-IDE for coding and testing data science projects. It connects to you Google Drive which can be mounted to a currently active Jupyter notebook and a lot more. Numerous packages are already installed and accessible to your notebook.|
|MS Azure Notebooks||Part of MS Azure Cloud – Machine Learning Studio. Freemium with a low threshold before it converts to a paid account.|
|CoCalc||A complete re-write of the Jupyter interface. Freemium plans. Access to large Data Science libraries and packages. Hosted in the cloud.|
|Datalore||From JetBrains, maker of PyCharm. (I use PyCharm). Seems to be in Beta and not ready for prime time|
Sampling of Jupyter Notebooks Feature-Set
Click to zoom
Jupyter Notebook top-nav commands
Jupyter Notebook Launch from command line (bash shell)
Jupyter Notebook Code or Markdown Cell
Jupyter Notebook Run Cells
Jupyter Notebook Clear Cell
Jupyter Notebook Clear All Cells
Jupyter Notebook Auto-complete
Jupyter is pretty easy to install. You have a couple of choices:
- Anaconda comes with Jupyter Notebooks
- Install from Jupyter.org
The gifs in the ‘Features’ tab shows how this is done.
You launch from either the Anaconda Navigator or command line terminal.
When launching from the cmd line, if you have already installed jupyter, just run:
It’s worth mentioning that file paths are a little tricky. The directory you launch JN’s from determines the hierarchy you can navigate inside the base html page.
It helps to go to your user’s base directory (or ‘folder’ in OSX) and then run the command line invocation. When you see the web page launched, you are now in position to go down from there. Its hard to go up in the tree.
Also, by using Anaconda distribution, you get a ‘base’ virtual environment’ with everything in the disty that you need to start doing Data Science. The packages and libs are all there and there is very little to install. You can add packages by hand but it is a good practice to do it from inside the ‘base’ VE that Anaconda sets up. Then everything you install from there will be seen inside the Anaconda Navigator and can be managed (if you like visual interfaces) from there.
In future tool posts, we will discuss how all this can be done with containers in case you are doing things in the cloud, on your vm instances. Or if you just like to keep things tidy for replicating whole environments.
This is not brand new but had 150k views on YouTube so check it out.
Video Demos | ‘In-Depth’
Click ‘+’ to expand