609 lines
72 KiB
Plaintext
609 lines
72 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# Visualization and Learning in Julia\n",
|
|
"\n",
|
|
"Tom Breloff\n",
|
|
"\n",
|
|
"https://github.com/tbreloff"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"### Outline\n",
|
|
"- Background\n",
|
|
"- Julia packages\n",
|
|
"- Plots.jl\n",
|
|
"- Fun with data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## My background\n",
|
|
"- BA Mathematics and Economics (U. of Rochester)\n",
|
|
"- MS Mathematics (NYU Courant Institute)\n",
|
|
"- Trader, researcher, quant, developer at several big banks and hedge funds, including one which I founded\n",
|
|
"- High speed algorithmic arbitrage trading and market making\n",
|
|
"- Machine learning and visualization enthusiast\n",
|
|
"- Lifelong programmer (since learning BASIC in 4th grade)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Before Julia\n",
|
|
"- Python and C/C++\n",
|
|
"- MATLAB and Java (so many files!!)\n",
|
|
"- Throughout the years: Mathematica, Go, R, C#, Javascript, Visual Basic/Excel, Lisp, Erlang, ..."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Things I like\n",
|
|
"- Python\n",
|
|
" - Solid packages\n",
|
|
" - Easy to get stuff done\n",
|
|
"- C/C++\n",
|
|
" - Fast (when you put in the effort)\n",
|
|
"- MATLAB\n",
|
|
" - Great matrix operations\n",
|
|
" - Easy visualizations\n",
|
|
"- Java\n",
|
|
" - Hmmm... \n",
|
|
" ```\n",
|
|
" public static boolean DoTheFunctionNamesReallyNeedToBeLongerThanThatMaryPoppinsSong() {\n",
|
|
" return true; \n",
|
|
" }```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Why Julia?\n",
|
|
"- Easy to code\n",
|
|
"- Fast with little effort\n",
|
|
"- Solid vector/matrix support, but more flexible\n",
|
|
"- Macros and staged functions\n",
|
|
"- so much more!\n",
|
|
"\n",
|
|
"(Slow clap...)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# Julia's Package Ecosystem"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Top packages by stars\n",
|
|
"Package | Github Stars | 2-week change | Type\n",
|
|
"------ | -------- | ------------- | --------\n",
|
|
"Gadfly\t| 732\t| 14 | Plotting\n",
|
|
"IJulia | 732 | 11 | Workflow\n",
|
|
"Mocha | 496 | 36 | Learning\n",
|
|
"DataFrames | 230 | 12 | Data Structures\n",
|
|
"PyCall | 204 | 4 | Language Wrapper\n",
|
|
"JuMP | 182 | 5 | Optimization\n",
|
|
"Escher | 135 | 10 | GUIs\n",
|
|
"Optim | 131 | 4 | Optimization\n",
|
|
"Morsel | 128 | -1 | Web (deprecated)\n",
|
|
"Distributions | 125 | 7 | Statistics\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Recent trends\n",
|
|
"Package | Github Stars | 2-week change | Type\n",
|
|
"------ | -------- | ------------- | --------\n",
|
|
"Mocha | 496 | 36 | Learning\n",
|
|
"Gadfly | 732 | 14 | Plotting\n",
|
|
"DataFrames | 230 | 12 | Data Structures\n",
|
|
"IJulia | 732 | 11 | Workflow\n",
|
|
"Escher | 135 | 10 | GUIs\n",
|
|
"Interact| 102 | 8 | GUIs\n",
|
|
"Distributions| 125 | 7 | Statistics\n",
|
|
"Plots| 23 | 6 | Plotting\n",
|
|
"Seismic| 7 | 6 | Plotting\n",
|
|
"Immerse | 23 | 5 | Plotting\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Statistics and Learning in Julia\n",
|
|
"- Stats (mostly in JuliaStats)\n",
|
|
" - StatsBase\n",
|
|
" - Distributions\n",
|
|
" - DataFrames, DataArrays, NullableArrays\n",
|
|
" - MultivariateStats, GLM\n",
|
|
" - OnlineStats\n",
|
|
" - many more...\n",
|
|
" "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Statistics and Learning in Julia\n",
|
|
"- Optimization (mostly in JuliaOpt)\n",
|
|
" - MathProgBase\n",
|
|
" - JuMP\n",
|
|
" - Optim\n",
|
|
" - Convex\n",
|
|
" - NLOpt\n",
|
|
" "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Statistics and Learning in Julia\n",
|
|
"- Machine learning\n",
|
|
" - Mocha\n",
|
|
" - GeneticAlgorithms\n",
|
|
" - Orchestra\n",
|
|
" - TextAnalysis\n",
|
|
" - Clustering\n",
|
|
" - OnlineAI\n",
|
|
" - many more..."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Visualization in Julia\n",
|
|
"\n",
|
|
"Lots of packages: Gadfly, PyPlot, Vega, Winston, UnicodePlots, Qwt, Bokeh, Immerse, GLPlot ... \n",
|
|
"\n",
|
|
"Strengths:\n",
|
|
"- Interactive: Immerse, PyPlot, Qwt\n",
|
|
"- Fast: GLPlot\n",
|
|
"- Easy/concise: UnicodePlots, Winston, Qwt\n",
|
|
"- Pretty: Gadfly, Vega, Bokeh\n",
|
|
"- Native: Gadfly, Winston, UnicodePlots\n",
|
|
"- Features: PyPlot\n",
|
|
"\n",
|
|
"Learning more than one or two packages is time consuming and impractical...\n",
|
|
"\n",
|
|
"### Why do I have to choose one?!?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# What makes good code design?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Good design: AbstractArray\n",
|
|
"Many concrete array-types:\n",
|
|
"- Dense arrays\n",
|
|
"- Sparse arrays\n",
|
|
"- Ranges\n",
|
|
"- Distributed arrays\n",
|
|
"- Shared arrays\n",
|
|
"- GPU arrays\n",
|
|
"- Custom data structures\n",
|
|
"\n",
|
|
"Common code is implemented once for AbstractArray, and all concrete types get the benefit."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 52,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"5-element ScaryVec:\n",
|
|
" 1 \n",
|
|
" \"BOO!\"\n",
|
|
" 3 \n",
|
|
" 4 \n",
|
|
" 5 "
|
|
]
|
|
},
|
|
"execution_count": 52,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"type ScaryVec <: AbstractArray{Int,1}\n",
|
|
" boo::Int\n",
|
|
" n::Int\n",
|
|
" ScaryVec(n::Integer) = new(rand(1:n), n)\n",
|
|
"end\n",
|
|
"Base.size(sv::ScaryVec) = (sv.n,)\n",
|
|
"Base.getindex(sv::ScaryVec, i::Integer) = (i == sv.boo ? \"BOO!\" : i)\n",
|
|
"\n",
|
|
"sv = ScaryVec(5)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 53,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"4-element Array{Int64,1}:\n",
|
|
" 1\n",
|
|
" 3\n",
|
|
" 4\n",
|
|
" 5"
|
|
]
|
|
},
|
|
"execution_count": 53,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"filter(x -> isa(x, Number), sv)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Good design: AbstractArray\n",
|
|
"- Inheriting from AbstractArray gives you a lot \"for free\":\n",
|
|
" - Iteration (`map`, `for x in ...`, `filter`, ...)\n",
|
|
" - Operations\n",
|
|
" - Printing\n",
|
|
" - etc\n",
|
|
"- Few methods to implement... only what's needed.\n",
|
|
"- Abstractions put overlapping functionality in one place\n",
|
|
" - Easy to code\n",
|
|
" - Easy to maintain\n",
|
|
"\n",
|
|
"\n",
|
|
"### Imagine if there were no AbstractArray..."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"### Gadfly : `____________` :: ScaryVector : AbstractArray\n",
|
|
"\n",
|
|
"Thinking of graphics packages as concrete types, we see that we have many different types, but no abstraction linking them together. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# Plots.jl\n",
|
|
"### The AbstractArray of plotting..."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[Plots.jl] Initializing backend: gadfly"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# setup... choose Gadfly as the backend, set some session defaults\n",
|
|
"using Plots\n",
|
|
"gadfly()\n",
|
|
"default(size=(600,500), legend=false)\n",
|
|
"\n",
|
|
"# create parametric functions\n",
|
|
"fx(u) = 1.6sin(u)^3\n",
|
|
"fy(u) = 0.1 + 1.5cos(u) - 0.6cos(2u) - 0.25cos(3u) - cos(4u)/8\n",
|
|
"\n",
|
|
"# plot and annotate\n",
|
|
"p = plot(fx, fy, 0, 2π, line=(5,:darkred), xlim=(-2,2), ylim=(-2,2))\n",
|
|
"annotate!(0, -0.15, text(\" I ♡\\nPlots\", 45, -0.1π, :darkred));"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"Plot{Plots.GadflyPackage() n=1}"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"p"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# use the same parametric functions to create a custom marker shape\n",
|
|
"us = linspace(0, 2π, 100)\n",
|
|
"heart = Shape([(fx(u), -fy(u)) for u in us])\n",
|
|
"\n",
|
|
"# generate some data\n",
|
|
"n = 50\n",
|
|
"xy() = 4rand(2) - 2\n",
|
|
"\n",
|
|
"# add a title\n",
|
|
"title!(\"Let me count the ways...\")\n",
|
|
"\n",
|
|
"# add a new series\n",
|
|
"scatter!(zeros(0),zeros(0), z=0:n, marker=(heart,15,:reds))\n",
|
|
"\n",
|
|
"# animations!\n",
|
|
"anim = Animation()\n",
|
|
"for i in 1:n\n",
|
|
" x, y = xy()\n",
|
|
" \n",
|
|
" # add to a series after creation\n",
|
|
" push!(p, 2, x, y)\n",
|
|
" \n",
|
|
" # easy annotations\n",
|
|
" annotate!(x, y, text(i))\n",
|
|
" \n",
|
|
" # save an animation frame\n",
|
|
" frame(anim)\n",
|
|
"end"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"INFO: Saved animation to /home/tom/.julia/v0.4/Plots/examples/meetup/iheartplots.gif\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<img src=\"iheartplots.gif?0.35511351462964247>\" />"
|
|
],
|
|
"text/plain": [
|
|
"Plots.AnimatedGif(\"/home/tom/.julia/v0.4/Plots/examples/meetup/iheartplots.gif\")"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"gif(anim, \"iheartplots.gif\", fps=3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# One problem...\n",
|
|
"\n",
|
|
"<img src=\"cart-before-the-horse.jpg\" style=\"width: 800px; height: 500px;\">\n",
|
|
"\n",
|
|
"When the abstract comes after the concrete, it's a lot more work. Oops. Better late than never!!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Fun with data - UCI Wine Quality Dataset\n",
|
|
"<img src=\"wine-toast.jpg\" style=\"width: 800px; height: 500px;\">\n",
|
|
"\n",
|
|
"P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. \n",
|
|
"Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": true,
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# Come collaborate:\n",
|
|
"- Plots.jl\n",
|
|
"- OnlineStats.jl\n",
|
|
"- OnlineAI.jl\n",
|
|
"- LearnBase.jl\n",
|
|
"- Unums.jl\n",
|
|
"\n",
|
|
"# or get in touch:\n",
|
|
"- tom@breloff.com\n",
|
|
"- https://github.com/tbreloff"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"celltoolbar": "Slideshow",
|
|
"kernelspec": {
|
|
"display_name": "Julia 0.4.0-rc4",
|
|
"language": "julia",
|
|
"name": "julia-0.4"
|
|
},
|
|
"language_info": {
|
|
"file_extension": ".jl",
|
|
"mimetype": "application/julia",
|
|
"name": "julia",
|
|
"version": "0.4.0"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
}
|