{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Visualization and Learning in Julia\n", "\n", "Tom Breloff\n", "\n", "https://github.com/tbreloff" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Outline\n", "- Background\n", "- Julia packages\n", "- Plots.jl\n", "- Fun with data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## My background\n", "- BA Mathematics and Economics (U. of Rochester)\n", "- MS Mathematics (NYU Courant Institute)\n", "- Trader, researcher, quant, developer at several big banks and hedge funds, including one which I founded\n", "- High speed algorithmic arbitrage trading and market making\n", "- Machine learning and visualization enthusiast\n", "- Lifelong programmer (since learning BASIC in 4th grade)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Before Julia\n", "- Python and C/C++\n", "- MATLAB and Java (so many files!!)\n", "- Throughout the years: Mathematica, Go, R, C#, Javascript, Visual Basic/Excel, Lisp, Erlang, ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Things I like\n", "- Python\n", " - Solid packages\n", " - Easy to get stuff done\n", "- C/C++\n", " - Fast (when you put in the effort)\n", "- MATLAB\n", " - Great matrix operations\n", " - Easy visualizations\n", "- Java\n", " - Hmmm... \n", " ```\n", " public static boolean DoTheFunctionNamesReallyNeedToBeLongerThanThatMaryPoppinsSong() {\n", " return true; \n", " }```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Why Julia?\n", "- Easy to code\n", "- Fast with little effort\n", "- Solid vector/matrix support, but more flexible\n", "- Macros and staged functions\n", "- so much more!\n", "\n", "(Slow clap...)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Julia's Package Ecosystem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Top packages by stars\n", "Package | Github Stars | 2-week change | Type\n", "------ | -------- | ------------- | --------\n", "Gadfly\t| 732\t| 14 | Plotting\n", "IJulia | 732 | 11 | Workflow\n", "Mocha | 496 | 36 | Learning\n", "DataFrames | 230 | 12 | Data Structures\n", "PyCall | 204 | 4 | Language Wrapper\n", "JuMP | 182 | 5 | Optimization\n", "Escher | 135 | 10 | GUIs\n", "Optim | 131 | 4 | Optimization\n", "Morsel | 128 | -1 | Web (deprecated)\n", "Distributions | 125 | 7 | Statistics\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Recent trends\n", "Package | Github Stars | 2-week change | Type\n", "------ | -------- | ------------- | --------\n", "Mocha | 496 | 36 | Learning\n", "Gadfly | 732 | 14 | Plotting\n", "DataFrames | 230 | 12 | Data Structures\n", "IJulia | 732 | 11 | Workflow\n", "Escher | 135 | 10 | GUIs\n", "Interact| 102 | 8 | GUIs\n", "Distributions| 125 | 7 | Statistics\n", "Plots| 23 | 6 | Plotting\n", "Seismic| 7 | 6 | Plotting\n", "Immerse | 23 | 5 | Plotting\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Statistics and Learning in Julia\n", "- Stats (mostly in JuliaStats)\n", " - StatsBase\n", " - Distributions\n", " - DataFrames, DataArrays, NullableArrays\n", " - MultivariateStats, GLM\n", " - OnlineStats\n", " - many more...\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Statistics and Learning in Julia\n", "- Optimization (mostly in JuliaOpt)\n", " - MathProgBase\n", " - JuMP\n", " - Optim\n", " - Convex\n", " - NLOpt\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Statistics and Learning in Julia\n", "- Machine learning\n", " - Mocha\n", " - GeneticAlgorithms\n", " - Orchestra\n", " - TextAnalysis\n", " - Clustering\n", " - OnlineAI\n", " - many more..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Visualization in Julia\n", "\n", "Lots of packages: Gadfly, PyPlot, Vega, Winston, UnicodePlots, Qwt, Bokeh, Immerse, GLPlot ... \n", "\n", "Strengths:\n", "- Interactive: Immerse, PyPlot, Qwt\n", "- Fast: GLPlot\n", "- Easy/concise: UnicodePlots, Winston, Qwt\n", "- Pretty: Gadfly, Vega, Bokeh\n", "- Native: Gadfly, Winston, UnicodePlots\n", "- Features: PyPlot\n", "\n", "Learning more than one or two packages is time consuming and impractical...\n", "\n", "### Why do I have to choose one?!?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What makes good code design?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Good design: AbstractArray\n", "Many concrete array-types:\n", "- Dense arrays\n", "- Sparse arrays\n", "- Ranges\n", "- Distributed arrays\n", "- Shared arrays\n", "- GPU arrays\n", "- Custom data structures\n", "\n", "Common code is implemented once for AbstractArray, and all concrete types get the benefit." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "5-element ScaryVec:\n", " 1 \n", " \"BOO!\"\n", " 3 \n", " 4 \n", " 5 " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type ScaryVec <: AbstractArray{Int,1}\n", " boo::Int\n", " n::Int\n", " ScaryVec(n::Integer) = new(rand(1:n), n)\n", "end\n", "Base.size(sv::ScaryVec) = (sv.n,)\n", "Base.getindex(sv::ScaryVec, i::Integer) = (i == sv.boo ? \"BOO!\" : i)\n", "\n", "sv = ScaryVec(5)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4-element Array{Int64,1}:\n", " 1\n", " 3\n", " 4\n", " 5" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filter(x -> isa(x, Number), sv)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Good design: AbstractArray\n", "- Inheriting from AbstractArray gives you a lot \"for free\":\n", " - Iteration (`map`, `for x in ...`, `filter`, ...)\n", " - Operations\n", " - Printing\n", " - etc\n", "- Few methods to implement... only what's needed.\n", "- Abstractions put overlapping functionality in one place\n", " - Easy to code\n", " - Easy to maintain\n", "\n", "\n", "### Imagine if there were no AbstractArray..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Gadfly : `____________` :: ScaryVector : AbstractArray\n", "\n", "Thinking of graphics packages as concrete types, we see that we have many different types, but no abstraction linking them together. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Plots.jl\n", "### The AbstractArray of plotting..." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n" ] } ], "source": [ "# setup... choose Gadfly as the backend, set some session defaults\n", "using Plots\n", "gadfly()\n", "default(size=(600,500), legend=false)\n", "\n", "# create parametric functions\n", "fx(u) = 1.6sin(u)^3\n", "fy(u) = 0.3 + 1.5cos(u) - 0.6cos(2u) - 0.25cos(3u) - cos(4u)/8\n", "\n", "# plot and annotate\n", "p = plot(fx, fy, 0, 2π, line=(5,:darkred), xlim=(-2,2), ylim=(-2,2))\n", "annotate!(0, -0.15, text(\" I ♡\\nPlots\", 45, -0.1π, :darkred));" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", " -2\n", " -1\n", " 0\n", " 1\n", " 2\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " I ♡Plots\n", " \n", " \n", " \n", " \n", " -2\n", " -1\n", " 0\n", " 1\n", " 2\n", " \n", "\n", "\n", "\n", " \n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", " -2\n", " -1\n", " 0\n", " 1\n", " 2\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " I ♡Plots\n", " \n", " \n", " \n", " \n", " -2\n", " -1\n", " 0\n", " 1\n", " 2\n", " \n", "\n", "\n", "\n", " \n", "\n", "\n" ], "text/plain": [ "Compose.SVG(158.73015873015876,132.2751322751323,IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=15959, maxsize=Inf, ptr=15960, mark=-1),nothing,\"fig-474a105cd7f04be2bee4332cccccbf89\",0,Compose.SVGPropertyFrame[],Dict{Type{T},Union{Compose.Property{P<:Compose.PropertyPrimitive},Void}}(),Dict{Compose.ClipPrimitive{P<:Compose.Point{XM<:Compose.Measure{S,T},YM<:Compose.Measure{S,T}}},AbstractString}(Compose.ClipPrimitive{Compose.Point{Compose.Measure{Compose.MeasureNil,Compose.MeasureNil},Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}}}([Compose.Point{Compose.Measure{Compose.MeasureNil,Compose.MeasureNil},Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}}(Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(5.506666666666661,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0),Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(1.0,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0)),Compose.Point{Compose.Measure{Compose.MeasureNil,Compose.MeasureNil},Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}}(Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(157.73015873015876,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0),Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(1.0,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0)),Compose.Point{Compose.Measure{Compose.MeasureNil,Compose.MeasureNil},Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}}(Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(157.73015873015876,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0),Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(126.64179894179898,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0)),Compose.Point{Compose.Measure{Compose.MeasureNil,Compose.MeasureNil},Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}}(Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(5.506666666666661,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0),Compose.Measure{Compose.MeasureNil,Compose.MeasureNil}(126.64179894179898,Compose.MeasureNil(),Compose.MeasureNil(),0.0,0.0))])=>\"fig-474a105cd7f04be2bee4332cccccbf89-element-5\"),Set{AbstractString}(),true,false,nothing,true,\"fig-474a105cd7f04be2bee4332cccccbf89-element-13\",false,13,AbstractString[\"/home/tom/.julia/v0.4/Gadfly/src/gadfly.js\"],Tuple{AbstractString,AbstractString}[(\"Snap.svg\",\"Snap\"),(\"Gadfly\",\"Gadfly\")],AbstractString[\"fig.select(\\\"#fig-474a105cd7f04be2bee4332cccccbf89-element-4\\\")\\n .init_gadfly();\"],false,:none)" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "image/svg+xml": [], "text/plain": [ "Plot{Plots.GadflyPackage() n=1}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n", "(xmeta,ymeta) = (nothing,nothing)\n" ] } ], "source": [ "# use the same parametric functions to create a custom marker shape\n", "us = linspace(0, 2π, 100)\n", "heart = Shape([(fx(u), fy(u)) for u in us])\n", "\n", "# generate some data\n", "n = 50\n", "xy() = 4rand(2) - 2\n", "\n", "# add a title\n", "title!(\"Let me count the ways...\")\n", "\n", "# add a new series\n", "scatter!(1, z=1:n, marker=(heart,15,:reds))\n", "\n", "# animations!\n", "anim = Animation()\n", "for i in 1:n\n", " x, y = xy()\n", " \n", " # add to a series after creation\n", " push!(p, 2, x, y)\n", " \n", " # easy annotations\n", " annotate!(x, y, text(i))\n", " \n", " # save an animation frame\n", " frame(anim)\n", "end" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO: Saved animation to /home/tom/.julia/v0.4/Plots/examples/meetup/iheartplots.gif\n" ] }, { "data": { "text/html": [ "\" />" ], "text/plain": [ "Plots.AnimatedGif(\"/home/tom/.julia/v0.4/Plots/examples/meetup/iheartplots.gif\")" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gif(anim, \"iheartplots.gif\", fps=3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# One problem...\n", "\n", "\n", "\n", "When the abstract comes after the concrete, it's a lot more work. Oops. Better late than never!!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Fun with data - UCI Wine Quality Dataset\n", "\n", "\n", "P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. \n", "Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "# Come collaborate:\n", "- Plots.jl\n", "- OnlineStats.jl\n", "- OnlineAI.jl\n", "- LearnBase.jl\n", "- Unums.jl\n", "\n", "# or get in touch:\n", "- tom@breloff.com\n", "- https://github.com/tbreloff" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Julia 0.4.0", "language": "julia", "name": "julia-0.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.4.0" } }, "nbformat": 4, "nbformat_minor": 0 }