• Home

  • Custom Ecommerce
  • Application Development
  • Database Consulting
  • Cloud Hosting
  • Systems Integration
  • Legacy Business Systems
  • Security & Compliance
  • GIS

  • Expertise

  • About Us
  • Our Team
  • Clients
  • Blog
  • Careers

  • CasePointer

  • VisionPort

  • Contact
  • Our Blog

    Ongoing observations by End Point Dev people

    Rapid Test-Driven Development in Julia

    Kamil Ciemniewski

    By Kamil Ciemniewski
    November 9, 2020

    Automation

    The Julia programming language has been rising in the ranks among the science-oriented programming languages lately. It has proven to be revolutionary in many ways. I’ve been watching its development for years now. It’s one of the most innovative of all the modern programming languages.

    Julia’s design seems to be driven by two goals: to appeal to the scientific community and to achieve the best performance possible. This is an attempt to solve the “two languages problem” where data analysis and model building is performed using a slower interpreted language (like R or Python) while performance-critical parts are written in a faster language like C or C++.

    The type-system is what allows Julia to meet its goals. The mix of strong and dynamic typing enables Python-like productivity with C++ or Rust-like performance. Julia is not an interpreted language. It compiles its code to native binary just like C, C++, Go, or Rust. The compilation and execution, though, are what sets it 1000 feet apart from all those other languages.

    Here’s a simplified, brief outline of the steps in Julia’s code execution model:

    1. Julia process starts up.
    2. Code is parsed.
    3. For each code chunk:
      • If it hasn’t yet been compiled, decide whether to interpret or JIT compile it and then execute:
        • If compile then infer the types and use LLVM to produce native code.
        • Execute the newly-created native code.
      • If it has been compiled, execute it.
    4. Repeat until the program ends or the user closes the REPL.

    It’s quite apparent that the compilation and type inference happen at a very different time compared to other compiled languages. Using Rust, you compile your code just once. Execution isn’t taxed by consecutive recompilation.

    The result is quite a big negative surprise to Julia’s newcomers. Each time you run your app, there’s a significant slowdown before you see anything. It’s called the “time to first plot” issue. This is because, for example, a data scientist may want to generate some plots during her “exploratory data analysis”. Doing it in languages that are slower on paper — like R — makes those plots appear way quicker than in Julia.

    There are more “time to first X” issues in Julia

    Julia’s execution model makes more aspects trickier than just seeing the first plot. If you’re a software engineer who’s used to following the test-driven development (TDD) approach, you’re in for a big surprise.

    In languages like Ruby or Rust, it’s easy to have a tool watch for any file changes and respond by running the project’s testing suite. I often use the watchexec tool which works with virtually any language, interpreter, or compiler. I run watchexec -cw . "bundle exec rspec --fail-fast" when working on a Ruby project, or watchexec -cw . "cargo test" with Rust.

    With Julia this approach is not an option though — the “time to first test” is dramatically long. The wastefulness of continuous re-compilation steals my precious time, making me extremely unproductive.

    Making it work in Julia

    The “time to first X” issue is only a problem if we’re closing the session in which our code has already been compiled. If we could move the file-watching and test-running steps all into the same session, the testing suite would run slowly only the first time. Julia’s standard library has built-in file watching functions that we could use to reproduce the watchexec in our code:

    julia> using FileWatching
    
    julia> watch_file
    watch_file (generic function with 2 methods)
    
    julia> watch_folder
    watch_folder (generic function with 4 methods)
    

    We can use those to get notified about the changes in our project’s files whenever they happen. Let’s imagine the following simple project’s structure:

    $ tree
    .
    └── src
        ├── App.jl
        └── nested
            └── Other.jl
    
    2 directories, 2 files
    

    How do we watch for file changes in Julia? Let’s start up the REPL and see:

    julia> using FileWatching
    
    help?> watch_file
    search: watch_file watch_folder unwatch_folder
    
      watch_file(path::AbstractString, timeout_s::Real=-1)
    
    
      Watch file or directory path for changes until a change occurs or timeout_s seconds have elapsed.
    
      The returned value is an object with boolean fields changed, renamed, and timedout, giving the result of watching the file.
    
      This behavior of this function varies slightly across platforms. See https://nodejs.org/api/fs.html#fs_caveats (https://nodejs.org/api/fs.html#fs_caveats) for more detailed information.
    
    julia> watch_file("src")
    

    The REPL didn’t return from the watch_file function. We can now change the “src/App.jl” file and see what happens:

    julia> watch_file("src")
    FileWatching.FileEvent(true, true, false)
    
    julia>
    

    Good! The function returned a FileEvent struct. We can ask Julia for its definition:

    help?> FileWatching.FileEvent
      No documentation found.
    
      Summary
      ≡≡≡≡≡≡≡≡≡
    
      struct FileWatching.FileEvent <: Any
    
    
      Fields
      ≡≡≡≡≡≡≡≡
    
      renamed  :: Bool
      changed  :: Bool
      timedout :: Bool
    

    We can see it tells us whether the file’s been renamed, changed, or if the timeout happened.

    So far so good, can we get it to notify us when the nested file changes too?

    julia> watch_file("src")
    

    Now changing the “src/nested/Other.jl”:

    julia> watch_file("src")
    

    Nothing happened. We’ll need to be specific about the nested directory to make it work:

    julia> watch_file("src/nested")
    FileWatching.FileEvent(true, true, false)
    

    With those experiments we can now conclude that:

    1. We’ll need to watch on all possible nested directories at the same time.
    2. Watching blocks the current thread so for each folder to watch we need a separate thread.

    We’ll need a list of folders. My first idea was to use the Glob package:

    julia> using Glob
    
    julia> glob("**/*")
    2-element Array{String,1}:
     "src/App.jl"
     "src/nested"
    

    This seems legit but let’s nest another folder. Here’s how the project’s structure would look now:

    $ tree .
    .
    └── src
        ├── App.jl
        └── nested
            ├── Other.jl
            └── nested2
                └── YetAnother.jl
    
    3 directories, 3 files
    

    Trying the Glob package again:

    julia> glob("**/*")
    2-element Array{String,1}:
     "src/App.jl"
     "src/nested"
    
    julia> glob("**/**/*")
    2-element Array{String,1}:
     "src/nested/Other.jl"
     "src/nested/nested2"
    

    Turns out that Julia’s Glob package doesn’t support extensions that allow “recursive” globbing. We’ll need to roll our own code to return all the possible nested folders:

    julia> function subdirs(base="src")
             ret = [base]
    
             for (root, dirs, _) in walkdir(base)
               fulldirs = map(d -> joinpath(root, d), dirs)
    
               ret = vcat(vcat(vcat(map(subdirs, fulldirs)...), fulldirs), ret)
             end
    
             return ret |> unique
           end
    subdirs (generic function with 2 methods)
    
    julia> subdirs("src")
    3-element Array{Any,1}:
     "src/nested/nested2"
     "src/nested"
     "src"
    

    Being able to list all the nested directories, we can now work on the file-watching function. Here’s the plan of attack:

    1. Create a “channel” to receive the file changes from other threads watching each of those directories.
    2. Spin up a new thread for working through the stream from the channel specifically.
    3. Spin up threads for every nested directory found and watch for changes at the same time.
    4. When the file change is detected, queue it into the channel.
    function onchange(f, basedirs=["src"])
      channel = Channel()
    
      function handle()
        should_continue = true
    
        for file in channel
          try
            f(file)
          catch err
            should_continue = typeof(err) != InterruptException
    
            @warn("Error in the hanlder:\n$err")
          end
        end
      end
    
      function schedule(file)
        put!(channel, file)
      end
    
      Threads.@spawn handle()
    
      subs = vcat(map(basedir -> subdirs(basedir), basedirs)...)
    
      @threads for dir in subs
        should_continue = true
    
        while true
          (file, event) = watch_folder(dir, 1)
    
          if event.changed
            try
              schedule(file)
            catch err
              should_continue = typeof(err) != InterruptException
    
              @warn("Error in the scheduler:\n$err")
            end
          end
        end
      end
    
      for dir in subs
        unwatch_folder(dir)
      end
    end
    

    Before we can run this code though, we need to mention one of other of Julia’s quirks. The @threads macro is cool and all, but it’s not going to work unless you start Julia with some predefined number of threads first:

    $ JULIA_NUM_THREADS=4 julia
    

    Let’s give it a go now:

    julia> onchange(f -> println(f))
    

    While the REPL is still “inside” the onchange function, let’s change some of those files in the dummy project and see what happens:

    julia> onchange(f -> println(f))
    4913
    App.jl
    App.jl
    4913
    Other.jl
    Other.jl
    

    It works! The output is weird but we do get something here. For each file change, we get three messages here. After being puzzled for hours with how Julia implements this file watching I decided to just not mind it and add the throttling to make it work for my testing needs. The idea is that the throttling will only run the suite once per each of those triples.

    Fortunately, the Flux package comes with the throttle function that we can reuse:

    function throttle(f, timeout; leading=true, trailing=false)
      cooldown = true
      later = nothing
      result = nothing
    
      function throttled(args...; kwargs...)
        yield()
    
        if cooldown
          if leading
            result = f(args...; kwargs...)
          else
            later = () -> f(args...; kwargs...)
          end
    
          cooldown = false
          @async try
          while (sleep(timeout); later != nothing)
              later()
              later = nothing
            end
          finally
            cooldown = true
          end
        elseif trailing
          later = () -> (result = f(args...; kwargs...))
        end
    
        return result
      end
    end
    

    And the final version of our function:

    function onchange(f, basedirs=["src", "test"], timeout=1)
      channel = Channel()
    
      function handle()
        should_continue = true
    
        for file in channel
          try
            f(file)
          catch err
            should_continue = typeof(err) != InterruptException
    
            @warn("Error in the hanlder:\n$err")
          end
        end
      end
    
      function schedule(file)
        put!(channel, file)
      end
    
      throttled_schedule = throttle(schedule, timeout)
    
      Threads.@spawn handle()
    
      subs = vcat(map(basedir -> subdirs(basedir), basedirs)...)
    
      @threads for dir in subs
        should_continue = true
    
        while true
          (file, event) = watch_folder(dir, 1)
    
          if event.changed
            try
              throttled_schedule(file)
            catch err
              should_continue = typeof(err) != InterruptException
    
              @warn("Error in the scheduler:\n$err")
            end
          end
        end
      end
    
      for dir in subs
        unwatch_folder(dir)
      end
    end
    

    Putting it all together

    Armed with the helper onchange function we can now set up our nice auto-test runner. Let’s add the “test/runtests.jl” file:

    using Test
    
    function runtests()
      @testset "the project" begin
        include("test/test_one.jl")
        include("test/test_two.jl")
      end
    
      nothing
    end
    
    function watchtest()
      onchange(_ -> runtests())
    end
    

    Now, with the watchtest function running the whole testing suite re-runs whenever any of the project’s files changes.

    Final words

    I found it easy to have a love-hate relationship with Julia. I have all the respect for its creators. They’re doing an amazing job and are very bold with bringing in innovation. Once the code is compiled, it’s amazingly fast. The language’s ecosystem, along with amazing packages is one of its strongest points.

    However, it’s awfully slow when you run your functions for the first time in the current session. Also, developers often have to rethink the workflows they’re so used to. This article touches on one of those issues.

    The way the file watching is implemented in the standard library leaves a lot of room for improvement. As an example, I’m getting the “renamed” flag instead of “changed” in the FileWatching.FileEvent when I’m changing the file. That’s why in code I’m just checking for the absence of the timeout. It feels like a dirty hack but what can you do? The watcher was also not working consistently when the timeout was not given.

    The immaturity of the standard library isn’t a show-stopper for many. Julia is developing rapidly and we can expect it to get better and better over time. Other issues will need engineers themselves to rethink their paradigms. I think it’s good though — challenges are what’s making us evolve after all and radical innovation doesn’t happen that often.

    julia automation development testing


    Comments