Our Blog

Ongoing observations by End Point Dev people

EditorConfig: Ending the Spaces vs. Tabs Confusion

Jon Jensen

By Jon Jensen
April 30, 2022

Photo by Garrett Skinner

Varieties of text formatting

Most everyone who has worked on a software development project with a group of other people has encountered the problem of source code being formatted in different ways by different text editors, IDEs, and operating systems.

The main variations go back to the 1970s or earlier, and include the questions:

  • Will indentation be done by tabs (an ASCII control character) or spaces?
    • If indentation is done by spaces, how many spaces are used for each indentation level?
  • What will indicate the end of each line (EOL)? The choices are:
    • a line feed (LF), used by the Unix family including Linux and modern macOS
    • a carriage return (CR), used by old pre-Unix Macintosh and some now-obscure operating systems
    • both together (CRLF) used by Windows and most Internet protocols
  • Which character set encoding will be used? Common choices are:
    • Unicode UTF-8 encoding, used by Linux, macOS, and most other Unixes, and standard on the Internet
    • Unicode UTF-16 encoding (with either little-endian or big-endian encoding), used by modern Windows
    • legacy ISO-8859 and Windows “code page” encodings in older documents and codebases

Editor configurations in conflict

Causing widespread frustration, by default, text editors and IDEs generally are each configured differently, and once set, the choices apply broadly from then on. But each developer can simply configure their editor to follow their team’s standards, right? Well, maybe.

First, getting that to happen for every developer and every different editor being used isn’t straightforward. It typically requires a document showing instructions and/or screenshots of how to configure each editor. It may have to be redone after a major upgrade or move to a new computer.

Second, and often a more persistent problem, standards may vary across different projects and even for different types of files within a given project. Ruby code is typically indented with 2 spaces, while perhaps in your project JavaScript uses 4 spaces and HTML uses tabs.

If you start a new project from scratch you can probably settle on a single standard, but in existing large codebases, it can make a lot of version control change “noise” to mess with that.

Computers are good at keeping track of lots of little details, so isn’t there some way to have the computer deal with this?

Storing configuration in the project

What if we store the text editor’s or IDE’s configuration in the project instead of per user, so it can go with the project to each new developer and tell their editor how to behave?

For many years that has been possible with some editors, but the configuration had to be set up separately for each editor, and often the feature is disabled by default.

Let’s consider the two most popular terminal-based editors on Unix, partisans in a long-running editor war:

Vim

Vim has a feature called a “modeline” that allows for configuration settings to appear within the top or bottom 5 lines of the file.

For example, to instruct Vim to use spaces instead of tabs and 4-space tab stops, we can add to the top or bottom of our C source code file:

/* vim: tabstop=4 shiftwidth=4 expandtab
 */

Since it gets tedious putting those special configuration comments in each file, Vim has an option to read a .vimrc file from the current directory, which applies to all files there and can be committed to version control.

This feature is disabled by default because Vim has in the past been vulnerable to files with malicious settings running arbitrary code.

You can :set exrc secure to enable the modeline feature in a code base you trust, and also to restrict what it can do.

Emacs

In Emacs the same thing can be done on the first or second line of the file. (Of course its setting names differ from Vim’s.) For example consider this configuration in C source code:

/* -*- mode: c; indent-tabs-mode: nil; c-basic-offset: 4; tab-width: 4 -*- */

Alternately you can use “Local Variables” set at the end of the file in as many lines as needed:

/* Local Variables:      */
/* mode: c               */
/* indent-tabs-mode: nil */
/* c-basic-offset: 4     */
/* tab-width: 4          */
/* End:                  */

Emacs also has “Directory Variables” that can be set in the file .dir-locals.el for a directory and its subdirectories.

Others

Even if someone has gone to the trouble to set up such editor configuration files and add them to the project code repository, how often has that been done for your editor or IDE?

And how often is one out of sync with the others?

This is not the way to success.

EditorConfig to the rescue

About 10 years ago Trey Hunner and Hong Xu shared with the world EditorConfig, their creation to solve this problem across ideally all editors.

They intentionally kept EditorConfig fairly limited in scope. It covers a limited number of the most important editor options so that the standard would be simple enough to be implemented for every editor either internally or as a plugin, and there would be no arbitrary code execution possible to cause security problems.

In EditorConfig the configuration for our examples and hypotheticals above lives in a .editorconfig file in the root of the project that looks like this:

# top-most EditorConfig file
root = true

# basics for all files in our project
[*]
charset = utf-8
end_of_line = lf

# C and JavaScript source get 4-space indents
[*.{c,js}]
indent_style = space
indent_size = 4

# Ruby gets 2-space indents
[*.rb]
indent_style = space
indent_size = 2

# HTML gets tab indents
[*.html]
indent_style = tab

In a big project you may want to have separate, smaller .editorconfig files in different directories. You can omit the root = true setting in subdirectories to inherit settings from the top-level .editorconfig file.

There are a couple of other options that are nice to specify.

This one removes any tabs or spaces from the end of lines:

trim_trailing_whitespace = true

Those are rarely needed or semantically meaningful, so it’s nice to remove them. But there are a few cases where they can matter such as in Markdown.

This one determines whether the last line in the file will end with a newline:

insert_final_newline = true

By default some editors add to the last line a newline (such as Vim) and some don’t (such as Emacs), leading to needless changes as various developers change files.

Typically every line should end with a newline, so that’s a good editor feature to enable. But you could have some text template that should not end with a newline, so might need to specify false for that type of file.

And those are most of the features of EditorConfig! The file format details are easy to digest.

Editor & IDE support

EditorConfig is now widely supported. These popular editors & IDEs recognize .editorconfig files with no extra work:

  • IntelliJ IDEA and most of its language-specific variants
  • GitHub
  • GitLab
  • Visual Studio
  • BBEdit
  • and others

And these support it with a plugin:

  • VS Code
  • Vim
  • Emacs
  • Sublime Text
  • TextMate
  • Eclipse
  • Atom
  • Notepad++
  • Geany
  • and others

The plugins are typically easy to install system-wide from your operating system’s package manager, or else locally for your user only.

Do you need it?

Yes, I think you do.

I know of no reason for any developer not to use EditorConfig, in every editor, for every project. It’s simple and at long last solves this small set of problems well.

One possible counterargument: If, before every version control commit, you run an automatic code formatter such as Prettier (in Node.js, for many languages) or a language-specific one such as gofmt, rustfmt, etc., you could perhaps live without your editor knowing how your files should be saved.

But isn’t it better if your editor knows what kind of line endings and indents to use, rather than waiting for a code formatter to correct such fundamental things after you save? It is easy to start with a single .editorconfig file long before you have a continuous integration set up for the project.

And many projects don’t format code automatically, and instead just “lint” it to report on deviations from the project standards. But that requires work to correct, and can be ignored if not enforced.

Many open source projects large and small use EditorConfig, including this blog itself. But in recent months I have found several developers who had not yet heard of EditorConfig, so I want to spread awareness of it. I hope you’ll use EditorConfig too!


development tips

Formatting SQL code with pgFormatter within Vim

Josh Tolley

By Josh Tolley
April 26, 2022

Outdoor view of a creek bank with dry trees and old wooden buildings against a blue sky Photo by Garrett Skinner

Sometimes a little, seemingly simple tip can make a world of difference. I’ve got enough gray hair these days that it would be pretty easy for me to start thinking I’d seen an awful lot, yet quite frequently when I watch a colleague working in a meeting or a tmux session or somewhere, I learn some new and simple thing that makes my life demonstrably easier.

Luca Ferrari recently authored a post about using pgFormatter in Emacs; essentially the same thing works in Vim, my editor of choice, and it’s one of my favorite quick tips when working with complicated queries. I don’t especially want to get involved an editor war, and offer the following only in the spirit of friendly cooperation for the Vim users out there.

As Luca mentioned, pgFormatter is a convenient way to make SQL queries readable, automatically. It’s easy enough to feed it some SQL, and get a nice-looking result as output:

$ pg_format < create_outbreaks.sql
INSERT INTO outbreak                      
SELECT                              
    nextval('outbreak_id'::regclass),
    extract('year' FROM now())::text || '-' || nextval('outbreak_number_seq')::text, --number
    (                        
        SELECT  
            first_name
        FROM person TABLESAMPLE BERNOULLI (10)
LIMIT 1), -- name
    NOW() - interval '1 day' * random() * 100, (
        SELECT
            id
        FROM "user" TABLESAMPLE BERNOULLI (10)
...

In my perfect world I might quibble with some of its formatting decisions, such as the lack of indent on the LIMIT 1 line above. But in practice the results are good enough for my tastes that I haven’t bothered to investigate whether I can improve them. I just use it, and it’s good enough for me.

And because Vim lets me highlight a region, pipe it through an external program, and replace the region with that program’s output, it’s easy to use it simply by selecting a section of code and typing :!pg_format like this:

pgformatter example animation of terminal


tips open-source tools postgres

Visualizing Data with Pair-Plot Using Matplotlib

Kürşat Kutlu Aydemir

By Kürşat Kutlu Aydemir
April 25, 2022

Photo of dark blue glass with lines and right angles, perhaps windows of a modern skyscraper Photo by Sebastian

Pair Plot

A pair plot is plotting “pairwise relationships in a dataset” (seaborn.pairplot). A few well-known visualization modules for Python are widely used by data scientists and analysts: Matplotlib and Seaborn. There are many others as well but these are de facto standards. In the sense of level we can consider Matplotlib as the more primitive library and Seaborn builds upon Matplotlib and “provides a high-level interface for drawing attractive and informative statistical graphics” (Seaborn project).

Seaborn’s higher-level pre-built plot functions give us good features. Pair plot is one of them. With Matplotlib you can plot many plot types like line, scatter, bar, histograms, and so on. Pair-plot is a plotting model rather than a plot type individually. Here is a pair-plot example depicted on the Seaborn site:

Seaborn pairplot

Using a pair-plot we aim to visualize the correlation of each feature pair in a dataset against the class distribution. The diagonal of the pairplot is different than the other pairwise plots as you see above. That is because the diagonal plots are rendering for the same feature pairs. So we wouldn’t need to plot the correlation of the feature in the diagonal. Instead we can just plot the class distribution for that pair using one kind of plot type.

The different feature pair plots can be scatter plots or heatmaps so that the class distribution makes sense in terms of correlation. Also the plot type of the diagonal can be chosen among the mostly used kind of plots such as histogram or KDE (kernel density estimate), which essentially plots the density distribution of the classes.

Since a pair plot visually gives an idea of correlation of each feature pair, it helps us to understand and quickly analyse the correlation matrix (Pearson) of the dataset as well.

Custom Pair-Plot using Matplotlib

Since Matplotlib is relatively primitive and doesn’t provide a ready-to-use pair-plot function, we can do it ourselves in a similar way to how Seaborn does. You normally won’t necessarily create such home-made functions if they are already available in modules like Seaborn. But implementing your visualization methods in a custom way give you a chance to know what you plot and may be sometimes very different than the existing ones. I am not going to introduce an exceptional case here but creating our pair-plot grid using Matplotlib.

Plot Grid Area

Initially we need to create a grid plot area using the subplots function of matplotlib like below.

fig, axis = plt.subplots(nrows=3, ncols=3)

For a pair-plot grid you should give the same row and column size because we are going to plot pairwise. Now we can prepare a plot function for the plot grid area we created. If we have 3 features in our dataset as this example we can loop through the features per feature like this:

for i in range(0, 3):
    for j in range(0, 3):
        plotPair()

For cleaner code it is better to move the single pair plotting to another function.

Below is a function I created for one of my master’s degree coursework assignments in December 2021 at the University of London. Plotting a single pair in a grid needs to get the current axis for the current grid cell and identify the current feature data values on the current axis. Another thing to consider is where to render the labels of axes. If we were plotting a single chart it would be easy to render the labels on each axis of the chart. But in a pair plot it is better to plot the labels on the left-most and bottom-most of the grid area so that we won’t bother the inner subplots with the dirty labeling.

def plot_single_pair(ax, feature_ind1, feature_ind2, _X, _y, _features, colormap):
    """Plots single pair of features.

    Parameters
    ----------
    ax : Axes
        matplotlib axis to be plotted
    feature_ind1 : int
        index of first feature to be plotted
    feature_ind2 : int
        index of second feature to be plotted
    _X : numpy.ndarray
        Feature dataset of of shape m x n
    _y : numpy.ndarray
        Target list of shape 1 x n
    _features : list of str
        List of n feature titles
    colormap : dict
        Color map of classes existing in target

    Returns
    -------
    None
    """

    # Plot distribution histogram if the features are the same (diagonal of the pair-plot).
    if feature_ind1 == feature_ind2:
        tdf = pd.DataFrame(_X[:, [feature_ind1]], columns = [_features[feature_ind1]])
        tdf['target'] = _y
        for c in colormap.keys():
            tdf_filtered = tdf.loc[tdf['target']==c]
            ax[feature_ind1, feature_ind2].hist(tdf_filtered[_features[feature_ind1]], color = colormap[c], bins = 30)
    else:
        # other wise plot the pair-wise scatter plot
        tdf = pd.DataFrame(_X[:, [feature_ind1, feature_ind2]], columns = [_features[feature_ind1], _features[feature_ind2]])
        tdf['target'] = _y
        for c in colormap.keys():
            tdf_filtered = tdf.loc[tdf['target']==c]
            ax[feature_ind1, feature_ind2].scatter(x = tdf_filtered[_features[feature_ind2]], y = tdf_filtered[_features[feature_ind1]], color=colormap[c])

    # Print the feature labels only on the left side of the pair-plot figure
    # and bottom side of the pair-plot figure. 
    # Here avoiding printing the labels for inner axis plots.
    if feature_ind1 == len(_features) - 1:
        ax[feature_ind1, feature_ind2].set(xlabel=_features[feature_ind2], ylabel='')
    if feature_ind2 == 0:
        if feature_ind1 == len(_features) - 1:
            ax[feature_ind1, feature_ind2].set(xlabel=_features[feature_ind2], ylabel=_features[feature_ind1])
        else:
            ax[feature_ind1, feature_ind2].set(xlabel='', ylabel=_features[feature_ind1])

Let’s go back to the initial plotting of the grid area and adjust the call of plot_single_pair function. We can adjust the figure size of the grid area using fig.set_size_inches depending on the feature count so that we can prepare a well-scaled area.

colormap={0: "red", 1: "green", 2: "blue"}

fig.set_size_inches(feature_count * 4, feature_count * 4)

# Iterate through features to plot pairwise.
for i in range(0, 3):
    for j in range(0, 3):
        plot_single_pair(axis, i, j, X, y, features, colormap)

plt.show()

In my plot-single_pair function notice that I also used a colormap dictionary. This dictionary is used to color the classes (labels) of the dataset to distinguish in a scatter plot or a histogram and makes it look more beautiful.

Here is my final grid plot function for pair-plot:

def myplotGrid(X, y, features, colormap={0: "red", 1: "green", 2: "blue"}):
    """Plots a pair grid of the given features.

    Parameters
    ----------
    X : numpy.ndarray
        Dataset of shape m x n
    y : numpy.ndarray
        Target list of shape 1 x n
    features : list of str
        List of n feature titles

    Returns
    -------
    None
    """

    feature_count = len(features)
    # Create a matplot subplot area with the size of [feature count x feature count]
    fig, axis = plt.subplots(nrows=feature_count, ncols=feature_count)
    # Setting figure size helps to optimize the figure size according to the feature count.
    fig.set_size_inches(feature_count * 4, feature_count * 4)

    # Iterate through features to plot pairwise.
    for i in range(0, feature_count):
        for j in range(0, feature_count):
            plot_single_pair(axis, i, j, X, y, features, colormap)

    plt.show()

Pair-Plot a Dataset

Now let’s prepare a dataset and plot using our custom pair-plot implementation. Notice that in my plot_single_pair function I passed the feature and target values as the numpy.ndarray type.

Let’s get the iris dataset from the SciKit-Learn dataset collection and do a quick exploratory data analysis.

from sklearn import datasets
iris = datasets.load_iris()

Here are the targets (classes) of the iris dataset:

iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

And here we can see the feature names and a few lines of the dataset values.

iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
iris_df.head()
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2

Since iris.data and iris.target are already of type numpy.ndarray as I implemented my function I don’t need any further dataset manipulation here. Now let’s finally call myplotGrid function and render the pair-plot for the iris dataset.

Note that you can change color per target in colormap as you wish.

myplotGrid(iris.data, iris.target, iris.feature_names, colormap={0: "red", 1: "green", 2: "blue"})

And here is my custom pair-plot output:

Pair-Plot output

For further research I encourage you to go and do your exploratory data analysis and take a look at the correlation coefficient analysis to get more insights for pair-wise analysis.


python matplotlib visualization data-science

Perl Web Frameworks

Marco Pessotto

By Marco Pessotto
April 19, 2022

Spider webs and spiders

CGI

When I started programming, back in the day, CGI (the Common Gateway Interface) was still widely used. Usually the Apache webserver would just execute a script or a binary with some environment variables set and serve whatever the executable sent to the standard output, while keeping the standard error in the logs.

This simple and straightforward mechanism can still be used for small programs, but larger applications usually want to save the start-up time and live longer than just a single request.

At that time Perl was used far more often than now, and it had (and still has) the CGI.pm module to help the programmer to get the job done.

#!/usr/bin/env perl

use utf8;
use strict;
use warnings;
use CGI;

my $q = CGI->new;
print $q->header;
my $name = 'Marco';
print $q->p("Hello $name");
print "\n";

And it will output:

./cgi.pl
Content-Type: text/html; charset=ISO-8859-1

<p>Hello Marco</p>

Here the script mixes logic and formatting and the encoding it produces by default tells us that this comes from another age. But if you want something which is seldom used and gets executed on demand without persisting in the machine’s memory, this is still an option.

Please note that there are frameworks which can work in CGI mode, so there is no reason to use CGI.pm, beside having to maintain legacy programs.

Mojolicious

Fast-forward to 2022.

Nowadays Perl is just another language among dozens of them. But it still gets the job done and lets you write nice, maintainable code like any other modern language.

Mojolicious is currently the top choice if you want to do web development in Perl. It is an amazing framework, with a large and active community, and appears to have collected the best concepts that other web frameworks from other languages have to offer.

Let’s hack an app in a couple of minutes in a single file, like during the CGI days:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

use Mojolicious::Lite;

get '/' => sub {
    my ($c) = @_;
    $c->stash(name => "Marco");
    $c->render(template => 'index');
};

app->start;

__DATA__
@@ index.html.ep
Hello <%= $name %>

Here the structure is a bit different.

First, there’s a Domain Specific Language (DSL) to give you some sugar. This is the “Lite” version, while in a well-structured Mojolicious app one prefers to write class methods. We declare that the root (/) URL path of the application is going to execute some code. It populates the “stash” with some variables, and finally renders a template which can access the stashed variables. If you execute the script, you get:

./mojo.pl cgi 2> /dev/null
Status: 200 OK
Content-Length: 12
Date: Fri, 08 Apr 2022 12:33:52 GMT
Content-Type: text/html;charset=UTF-8

Hello Marco

The logging to the standard error stream is:

[2022-04-08 14:33:52.92508] [163133] [debug] [82ae3iV2] GET "/"
[2022-04-08 14:33:52.92532] [163133] [debug] [82ae3iV2] Routing to a callback
[2022-04-08 14:33:52.92565] [163133] [debug] [82ae3iV2] Rendering template "index.html.ep" from DATA section
[2022-04-08 14:33:52.92610] [163133] [debug] [82ae3iV2] 200 OK (0.001021s, 979.432/s)

This is basically what a modern framework is supposed to do.

The nice thing in this example is that we created a single-file prototype and launched it as a CGI. But we can also launch it as daemon and visit the given address with a browser, which is how you should normally deploy it, usually behind a reverse proxy like nginx.

./mojo.pl daemon
[2022-04-08 14:48:42.01827] [163409] [info] Listening at "http://*:3000"
Web application available at http://127.0.0.1:3000
[2022-04-08 14:48:48.53687] [163409] [debug] [CwM6zoUQ] GET "/"
[2022-04-08 14:48:48.53715] [163409] [debug] [CwM6zoUQ] Routing to a callback
[2022-04-08 14:48:48.53752] [163409] [debug] [CwM6zoUQ] Rendering template "index.html.ep" from DATA section
[2022-04-08 14:48:48.53808] [163409] [debug] [CwM6zoUQ] 200 OK (0.001209s, 827.130/s)

If you want you can even launch it with HTTPS as well (please note the syntax to pass the certificates).

./mojo.pl daemon -l 'https://[::]:8080?cert=./ssl/fullchain.pem&key=./ssl/privkey.pem' -m production

For a small application listening on a high port this is already enough and the whole deployment problem goes away.

Speaking about deployment, Mojolicious has basically no dependencies other than the core modules and comes with a lot of goodies, for example a non blocking user-agent.

Recently a legacy application needed to make some API calls. To speed up the process, we wanted to make the requests in parallel. And here’s the gist of the code:

package MyApp::Async;

# ... more modules here

use Mojo::UserAgent;
use Mojo::Promise;

# .... other methods here

sub example {
    my $email = 'test@example.com';
    my $ua = Mojo::UserAgent->new;
    foreach my $list ($self->get_lists) {
        my $promise = $ua->post_p($self->_url("/api/v2/endpoint/$list->{code}"),
                                  json => { email => $email })
          ->then(sub {
                     my $tx = shift;
                     my $res = $tx->result;
                     if ($res->code =~ m/^2/) {
                         $self->_update_db($data);
                     }
                     else {
                         die $tx->req->url . ' ' . $res->code;
                     }
                 });
        push @promises, $promise;
    }
    my $return = 0;
    Mojo::Promise->all(@promises)->then(sub { $return = 1 }, sub { $return = 0})->wait;
    return $return;
}

So a bunch of requests are run in parallel and then synced before returning. Does it remind you of JavaScript? Of course. A lot of common paradigms taken from other languages and frameworks were implemented here, and you can find the best of them in this nice package.

But the point here is that it doesn’t need dozens of new modules installed or upgraded. It’s just a single module in pure Perl that you can even install in your application tree. This is a huge advantage if you’re dealing with a legacy application which uses an old Perl tree and you want to play safe.

So, if you’re starting from scratch, go with Mojolicious. It lets you prototype fast and doesn’t let you down later.

However, starting from scratch is not always an option. Actually, it’s a rare opportunity. There’s a whole world of legacy applications and they generate real money every day. It’s simply not possible or even desirable to throw away something that works for something that would do the same thing but in a “cooler” way. In ten years, the way we’re coding will look old anyway.

Interchange

Wait. Isn’t Interchange an old e-commerce framework? Yes, it’s not exactly a generic web framework, on the contrary, it’s a specialized one, but it’s still a framework and you can still do things in a maintainable fashion. The key is using the so-called action maps:

ActionMap jump <<EOR
sub {
    # get the path parameters
    my ($action, @args) = split(/\//, shift);

    # get the query/body parameters
    my $param = $CGI->{param};

    # redirect to another page
    $Tag->deliver({ location => $final });

    # or serve JSON
    $Tag->deliver({ type => 'application/json', body => $json_string });

    # or serve a file
    $Tag->deliver({ type => 'text/plain', body => $bigfile });

    # or populate the "stash" and serve a template page
    $Tag->tmp(stash_variable => "Marco");
    $CGI->{mv_nextpage} = "test.html";
}
EOR

In pages/test.html you would put this template:

<p>Hello [scratch stash_variable]</p>

Now, I can’t show you a simple script which demonstrates this and you’ll have to take my word for it since we can’t go through the installation process here for a demo.

Interchange is old, and it shows its years, but it is actively maintained. It lacks many of Mojo’s goodies, but you can still do things in a reasonable way.

In the example the code will execute when a path starting with /jump/ is requested. The whole path is passed to the routine, so you can split at /, apply your logic, and finally either set $CGI->{mv_nextpage} to a file in the pages directory or output the response body directly with deliver. This way you can easily build, as a classical example, an API.

It’s a bit of a poor man’s MVC but it works. That’s basically the core of what a framework like Dancer does.

Dancer (1 & 2)

Dancer is basically Ruby’s Sinatra ported to Perl. As already mentioned, ideas developed in other languages and frameworks are often ported to Perl, and this is no exception.

Let’s see it in an action:

#!/usr/bin/env perl
use strict;
use warnings;
use Dancer2;

get '/' => sub {
    my $name = "Marco";
    return "Hello $name\n";
};

start;

Start the script:

Dancer2 v0.400000 server 22969 listening on http://0.0.0.0:3000

Try it with curl:

$ curl -D - http://0.0.0.0:3000
HTTP/1.0 200 OK
Date: Mon, 11 Apr 2022 07:22:18 GMT
Server: Perl Dancer2 0.400000
Server: Perl Dancer2 0.400000
Content-Length: 12
Content-Type: text/html; charset=UTF-8

Hello Marco

If in the script you say use Dancer; instead of use Dancer2, you get:

$ curl -D - http://0.0.0.0:3000
HTTP/1.0 200 OK
Server: Perl Dancer 1.3513
Content-Length: 12
Content-Type: text/html
X-Powered-By: Perl Dancer 1.3513

Hello Marco

Dancer’s core doesn’t do much more than routing. And you’ll also notice that the syntax is very similar to Mojolicious::Lite. So to get something done you need to start installing plugins which will provide the needed glue to interact with a database, work with your template system of choice, and more.

Today you would wonder why you should use Dancer and not Mojolicious, but when Dancer was at the peak of its popularity the games were still open. There were plenty of plugins being written and published on CPAN.

Around 2013 Dancer’s development team decided to rewrite it to make it better. The problem was that plugins and templates needed to be adapted as well. I’m under the impression that the energy got divided and the momentum was lost. Now there are two codebases and two plugin namespaces which do basically the same thing, because for the end user there is not much difference.

Catalyst

So what was attracting people to Dancer? When Dancer came out, Perl had a great MVC framework, which is still around, Catalyst. (And note that the main Mojolicious developer was on the Catalyst team.)

Now, the problem is that to get started with Catalyst, even if it has plenty of documentation, you need to be already acquainted with a lot of concepts and technologies. For example, the tutorial starts to talk about Template Toolkit and the DBIx::Class ORM very early.

These two modules are great and powerful and they deserve to be studied, but for someone new to modern web development, or even to Perl, it feels (and actually is) overwhelming.

So, why would you choose Catalyst today? Catalyst has the stability which Mojo, at least at the beginning, lacked, while backward compatibility is a priority for Catalyst. The other way to look at this is that Catalyst doesn’t see much current development, but someone could see this as a feature.

Even if Catalyst predates all the hyper-modern features that Mojo has, it’s still a modern framework, and a good one. I can’t show you a self contained script (you need a tree of files), but I’d like to show you what makes it very nice and powerful:

package MyApp::Controller::Root;

use Moose;
use namespace::autoclean;

BEGIN { extends 'Catalyst::Controller'; }

# start the chain with /foo/XX
sub foo :Chained('/') CaptureArgs(1) {
    my ($self, $c, $arg) = @_;
    $c->stash(name => "$arg");
}

# /foo/XX/bar/YY
sub bar :Chained('foo') Args(1) {
    my ($self, $c, $arg) = @_;
    $c->detach($c->view('JSON'));
}

# /foo/XX/another/YY
sub another :Chained('foo') Args(1) {
    my ($self, $c, $arg) = @_;
    $c->detach($c->view('HTML'));
}

So, if you hit /foo/marco/bar/test the second path fragment will be processed by the first method (CaptureArgs(1)) and saved in the stash. Then the second bar method will be chained to it and the name will be available in the stash. The last method will be hit with /foo/marco/another/test2. (Incidentally, please note that Mojolicious has nested routes as well.)

Now, I think it’s clear that in this way you can build deep hierarchies of paths with reusable components. This works really great with the DBIx::Class ORM, where you can chain queries as well. As you can imagine, this is far from a simple setup. On the contrary, this is an advanced setup for people who already know their way around web frameworks.

Conclusion

So, to sum up this excursion in the amazing land of Perl web frameworks: If you build something from scratch, go with Mojolicious. It’s your best bet. If nothing else, it’s super-easy to install, with basically no dependencies.

However, there’s no need to make a religion out of it. Rewriting code without a clear gain is a waste of time and money. A good developer should still be able to write maintainable code with the existing tools.


cgi perl mojolicious catalyst dancer interchange

Quartz Scheduler as a Service

Kürşat Kutlu Aydemir

By Kürşat Kutlu Aydemir
April 18, 2022

Close-up view of mechanical watch with roman numerals and day of month and month pointers

Photo by Mat Brown from Pexels

Quartz Job Scheduler

“Quartz is a richly featured, open source job scheduling library that can be integrated within virtually any Java application — from the smallest stand-alone application to the largest e-commerce system.” (Quartz Scheduler overview)

Besides its advanced features, most basic and frequently used feature is job scheduling and job execution. Some frameworks like Spring Scheduler have their integration practice using Quartz Scheduler which allows using its default scheduling method.

In this post I am going to tell you a different approach to show how we can use Quartz Scheduler to schedule our jobs. We actually still will be using the existing scheduling mechanism of Quartz but we’re going to show how we can manage the scheduled and unscheduled jobs online. This way you can manage all the available jobs or create new ones on the fly.

Quartz Scheduler as a Service

Previously I led development of an enterprise “Business Service Management” software to replace IBM’s TBSM product at a major telco company in Turkey. This was a challenging project and found a solid place in the customer environment after a successful release.

Scheduled key performance indicators (KPI) retrieval and background reporting jobs were a significant part of this project. KPIs were either internal business service availability and health metrics or measured metrics calculated and stored in external data sources. Reports are also another type of schedulable jobs as many organizations need the data to be reported at certain intervals.

In an enterprise web application with such needs you would need to allow your customer to create their own customized scheduled jobs (KPIs, reports, etc.) in an easily manageable way. For this I came up with a simple solution by blending the existing Quartz Scheduler scheduling mechanism with some spice.

So here is the model we used:

  • A database table for creating/​updating scheduler job definitions
  • Observer Schduler Job for observing the scheduler job table to watch for any updates in the scheduled jobs: new job, updated job, or disabled job, etc.
  • Business Job: You might define several schedulable business job types. KPI is one of those and I am going to give an example of it.

Simplicity should be a design goal, however the details can have their complexities.

This design doesn’t replace or provide an alternative to how Quartz Scheduler schedules its jobs. That is subject to job persistence and is out of this article’s scope. I am assuming we are scheduling the jobs all in Quartz Scheduler’s RAM-store or Job-store.

Read and Manage Job Data

Ideally you should store and manage the jobs as services in a database and you can then connect to this job storage either via DB connection or API. For security reasons even if you think that your application or services are internal and totally authenticated and authorised you should still perform DB operations via APIs. But for capability perspective yes you can use many ways to read and manage a data storage.

For this simple project I am not going to use a database but instead a JSON file as job service definitions repository. But you can simply convert this method to a database or API method.

I am going to use a JSON file named kpi.json in my project and define a simple set of attributes for each KPI item. Any service or scheduled job can have more or fewer attributes according to the requirements of the business use case.

Spring Application

You can use any framework or even without using any framework you can create your application from scratch and build a JAR. Here in this project I chose to go with Spring framework. You can also simply initialize a Spring application here.

Design

As I suggested a model above as a scheduling service solution, here is a high-level design of the model.

Quartz Scheduler service model diagram

The overall solution would have a data storage for holding scheduled job service definitions and a UI for managing their attributes like enabling/​disabling or changing scheduling dates etc.

In this solution we have two different Quartz Scheduler job types: observer job and business job. Observer job is a single job triggered frequently, say, every 5 seconds or every 1 minute, and checks the existing job definitions in the job storage. If it sees any update on the job definitions or new jobs it behaves accordingly. Business jobs are the job definitions found in job storage and designed to perform certain business actions. The business jobs can be notification jobs, KPI measuring jobs, and any other scheduled business jobs which should have their own scheduling interval.

In this example project I specifically used KPI term as the business case just to make it more relevant.

Scheduler

KPIJobWatcher class is responsible to schedule the observer job. In Spring application startup this is going to be our starting point to the scheduling service management.

Spring Application Startup

@SpringBootApplication
public class QSchedulerApplication {

	public static void main(String[] args) {
		SpringApplication.run(QSchedulerApplication.class, args);
	}

	Scheduler kpiScheduler;

	@EventListener(ApplicationReadyEvent.class)
	public void onAppStartUp() {
		try {
			// initializing KPI Trigger
			SchedulerFactory sf = new StdSchedulerFactory();
			kpiScheduler = sf.getScheduler();

			// watcher runs an observer job which monitors and manages KPI jobs
			KPIJobWatcher watcher = new KPIJobWatcher(kpiScheduler);
			watcher.run();
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

KPIJobWatcher schedules the Observer job:

public class KPIJobWatcher {
    private static final Logger logger = LoggerFactory.getLogger(KPIJobWatcher.class);

    Scheduler kpiScheduler;
    public KPIJobWatcher(Scheduler s) {
        kpiScheduler = s;
    }

    /**
     * run KPIJobWatcher
     * @throws Exception
     */
    public void run() throws Exception {

        try {
            // Setting the KPI Job factory of observerScheduler
            KPIJobFactory jf = new KPIJobFactory((StdScheduler)kpiScheduler);
            kpiScheduler.setJobFactory(jf);

            // Scheduling KPI Observer Job
            JobDetail observerJob = newJob(KPIObserverJob.class)
                    .withIdentity("observerJob", "observergroup")
                    .build();

            SimpleTrigger trigger = newTrigger()
                    .withIdentity(observerJob.getKey() + "_trigger", "observergroup")
                    .withSchedule(org.quartz.SimpleScheduleBuilder.simpleSchedule()
                            .withIntervalInSeconds(10)
                            .repeatForever())
                    .build();

            Date ft = kpiScheduler.scheduleJob(observerJob, trigger);
            logger.info(observerJob.getKey() + " has been scheduled to run at: " + ft);

            // Starting KPI Observer Scheduler
            kpiScheduler.start();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Before moving on to observer job here I want to notice that you can use a custom JobFactory and attach it to the current scheduler object so that you can use custom jobs with custom constructors created within this custom JobFactory as part of the factory design pattern.

JobFactory

public class KPIJobFactory implements JobFactory {
    Scheduler kpiScheduler;
    public KPIJobFactory(Scheduler s) {
        kpiScheduler = s;
    }

    public KPIObserverJob newJob(TriggerFiredBundle bundle, Scheduler Scheduler) throws SchedulerException {

        JobDetail jobDetail = bundle.getJobDetail();
        Class<KPIObserverJob> jobClass = (Class<KPIObserverJob>) jobDetail.getJobClass();
        try {
            // this is how we construct our custom job with custom factory
            return jobClass.getConstructor(Scheduler.getClass()).newInstance(kpiScheduler);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }
}

Observer Job

As suggested in the model above the observer job is triggered frequently and manages the overall scheduling job service in the background. I created the KPIObserverJob class as the observer job in this project and as you can see in the previous section KPIJobFactory creates instances of this observer job.

KPIObserverJob

Observer Job has some specific methods like ScheduleJob and UnscheduleJob to manage scheduling jobs.

public class KPIObserverJob implements Job {

    private static final Logger logger = LoggerFactory.getLogger(KPIObserverJob.class);

    List<JobDetail> jobList;
    Scheduler kpiScheduler;

    public KPIObserverJob(StdScheduler s) {
        kpiScheduler = s;
    }

    List<String> scheduledJobList;
    HashMap<String, JobDetail> alreadyScheduledJobList;

    String cronFormat = "SECOND MINUTE HOUR DAY_OF_MON MONTH DAY_OF_WEEK";

    @Override
    public void execute(JobExecutionContext context) throws JobExecutionException {
        scheduledJobList = new ArrayList<String>();
        alreadyScheduledJobList = new HashMap<String, JobDetail>();
        //JobKey jobKey = context.getJobDetail().getKey();

        jobList = new ArrayList<JobDetail>();

        // Get all KPIs and create their jobs
        CreateJobs();

        // Get the list of currently scheduled KPI jobs
        try {
            for (String groupName : kpiScheduler.getJobGroupNames()) {
                for (JobKey jk : kpiScheduler.getJobKeys(GroupMatcher.jobGroupEquals(groupName))) {
                    String jobName = jk.getName();
                    String jobGroup = jk.getGroup();

                    scheduledJobList.add(jobName);

                    JobDetail jd = kpiScheduler.getJobDetail(jk);
                    alreadyScheduledJobList.put(jobName, jd);

                    logger.info("already scheduled jobName {}", jobName);
                }
            }
        } catch (SchedulerException e) {
            e.printStackTrace();
        }

        // Schedule or unschedule KPI jobs if not done yet
        for (JobDetail job : jobList) {
            try {
                if (!scheduledJobList.contains(job.getKey().getName()))
                {
                    if (job.getJobDataMap().getInt("isRunning") == 1) {
                        logger.info("scheduling job: kpiJobName_{}", job.getJobDataMap().getString("kpiName"));
                        ScheduleJob(job);
                    }
                } else {
                    // Check any changes in the KPI job definition
                    JobDetail sJD = alreadyScheduledJobList.get("kpiJobName_" + job.getJobDataMap().getString("kpiName"));
                    if (!job.getJobDataMap().getString("cron").equals(sJD.getJobDataMap().getString("cron"))) {
                        logger.info("rescheduling job: kpiJobName {} , new cron: {}",
                                job.getJobDataMap().getString("kpiName"), job.getJobDataMap().getString("cron"));
                        UnscheduleJob(job.getJobDataMap().getString("kpiName"));
                        ScheduleJob(job);
                    }

                    if (job.getJobDataMap().getInt("isRunning") == 0) {
                        logger.info("Unscheduling: kpiJobName {}" + job.getJobDataMap().getString("kpiName"));
                        UnscheduleJob(job.getJobDataMap().getString("kpiName"));
                    }
                }
            } catch (SchedulerException e) {
                e.printStackTrace();
            }
        }

        // Finally unschedule deleted jobs if they are not listed anymore
        for (String kpiName : scheduledJobList) {
            boolean unschedule = true;
            if (!kpiName.equals("observerJob")) {
                JobDetail toBeRemovedJob = null;
                for (JobDetail jdetail : jobList) {
                    if (jdetail.getKey().getName().equals(kpiName)) {
                        unschedule = false;
                    }
                }

                if (unschedule) {
                    logger.info("Unscheduling: " + "kpiJobId" + kpiName.split("_")[1]);
                    UnscheduleJob(kpiName.split("_")[1]);
                }
            }
        }
    }

    private static final Type KPI_JSON_TYPE = new TypeToken<List<KPI_JSON>>() {}.getType();

    /**
     * Create Quartz Scheduler jobs from the job records read from a data source
     */
    private void CreateJobs() {

        Gson gson = new Gson();
        try {
            // kpi.json as a service data storage where we get KPI job data to be scheduled
            JsonReader reader = new JsonReader(new FileReader("kpi.json"));
            List<KPI_JSON> kpiList = gson.fromJson(reader, KPI_JSON_TYPE);

            for (KPI_JSON kpiItem : kpiList) {
                logger.info("Found KPI in kpi.json: {} , enabled: {}", kpiItem.getName(), kpiItem.getIsRunning());

                JobDetail job = newJob(KPIJSONJob.class)
                        .withIdentity("kpiJobName_" + kpiItem.getName(), "kpigroup")
                        .usingJobData("kpiName", kpiItem.getName())
                        .usingJobData("cron", kpiItem.getCron())
                        .usingJobData("lastRan", kpiItem.getLastRan())
                        .usingJobData("kpiDescription", kpiItem.getKpiDescription())
                        .usingJobData("lastMeasuredValue", kpiItem.getLastMeasuredValue())
                        .usingJobData("filename", kpiItem.getFilename())
                        .usingJobData("type", kpiItem.getType())
                        .usingJobData("isRunning", kpiItem.getIsRunning())
                        .build();

                jobList.add(job);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    /**
     * Schedule a job
     * @param job
     * @throws SchedulerException
     */
    private void ScheduleJob(JobDetail job) throws SchedulerException {
        String cron = job.getJobDataMap().getString("cron");
        CronTrigger trigger = newTrigger()
                .withIdentity(job.getKey().getName() + "_trigger", "kpigroup")
                .withSchedule(cronSchedule(cron))
                .startNow()
                .build();
        Date ft = kpiScheduler.scheduleJob(job, trigger);
    }

    /**
     * Unschedule a job
     * @param kpiName
     */
    private void UnscheduleJob(String kpiName) {
        TriggerKey tk = new TriggerKey("kpiJobName_" + kpiName + "_trigger", "kpigroup");
        try {
            kpiScheduler.unscheduleJob(tk);
            kpiScheduler.deleteJob(new JobKey("kpiJobName_" + kpiName, "kpigroup"));
        } catch (SchedulerException e) {
            e.printStackTrace();
        }
    }
}

Business Jobs

Business jobs, as suggested in the model, can be any schedulable jobs. Managing/​updating the business jobs frequently is a key point here. As the enterprise demands grow and change continuously, KPIs are generated at intervals (daily, weekly, monthly, etc.) and for frequent notification needs this kind of scheduling job management can be an important part of a solution.

Here I created KPIJSONJob as my business job:

public class KPIJSONJob implements Job {
    private static final Logger logger = LoggerFactory.getLogger(KPIJSONJob.class);
    private KPI_JSON kpi;

    @Override
    public void execute(JobExecutionContext context) throws JobExecutionException {
        JobDataMap dataMap = context.getJobDetail().getJobDataMap();

        kpi.setName(dataMap.getString("kpiName"));
        kpi.setKpiDescription(dataMap.getString("kpiDescription"));
        kpi.setIsRunning(dataMap.getInt("isRunning"));
        kpi.setFilename(dataMap.getString("filename"));
        kpi.setCron(dataMap.getString("cron"));
        kpi.setLastRan(dataMap.getString("lastRan"));
        kpi.setType(dataMap.getString("type"));
        kpi.setLastMeasuredValue(dataMap.getString("lastMeasuredValue"));

        this.processKPI();
    }

    public class KPIMeasured {
        public String name;
        public String value;
    }

    private static final Type KPIMEASURED_TYPE = new TypeToken<List<KPIMeasured>>() {}.getType();
    protected void processKPI() {
        // processKPI is supposed to get the KPI measured value from an external datasource and updates kpi.json
        // ...
    }

}

Running this Solution

Let’s give it a try and see it in action. Say we have the kpi.json as our job storage, with the following KPI jobs defined:

[
  {
    "name": "critical_ticket_count",
    "type": "JSON_FILE",
    "cron": "0 0 1 ? * * *",
    "isRunning": 0,
    "lastRan": "2022-04-01 01:00:00",
    "kpiDescription": "Open critical ticket count",
    "lastMeasuredValue": "7",
    "filename": "kpi_measured.json"
  },
  {
    "name": "failed_customer_api_call",
    "type": "JSON_FILE",
    "cron": "0 0 2 ? * * *",
    "isRunning": 0,
    "lastRan": "2022-04-01 02:00:00",
    "kpiDescription": "Last 24-Hour failed API call count",
    "lastMeasuredValue": "23",
    "filename": "kpi_measured.json"
  }
]

When we run the Spring application it starts logging like below:

2022-04-11 13:48:12.354  INFO 13033 --- [eduler_Worker-1] com.example.qscheduler.KPIObserverJob    : Found KPI in kpi.json: critical_ticket_count , enabled: 0
2022-04-11 13:48:12.355  INFO 13033 --- [eduler_Worker-1] com.example.qscheduler.KPIObserverJob    : Found KPI in kpi.json: failed_customer_api_call , enabled: 0
2022-04-11 13:48:12.355  INFO 13033 --- [eduler_Worker-1] com.example.qscheduler.KPIObserverJob    : already scheduled jobName observerJob

Initially I set the isRunning attribute of those KPI jobs to 0 and my scheduler service is not scheduling them. My KPIObserverJob triggers every 10 seconds because I set it to trigger that way in KPIJobWatcher.

Now let’s see if I update critical_ticket_count KPI’s isRunning value to 1:

2022-04-11 13:52:32.347  INFO 13033 --- [eduler_Worker-7] com.example.qscheduler.KPIObserverJob    : Found KPI in kpi.json: critical_ticket_count , enabled: 1
2022-04-11 13:52:32.348  INFO 13033 --- [eduler_Worker-7] com.example.qscheduler.KPIObserverJob    : Found KPI in kpi.json: failed_customer_api_call , enabled: 0
2022-04-11 13:52:32.348  INFO 13033 --- [eduler_Worker-7] com.example.qscheduler.KPIObserverJob    : already scheduled jobName observerJob
2022-04-11 13:52:32.349  INFO 13033 --- [eduler_Worker-7] com.example.qscheduler.KPIObserverJob    : scheduling job: kpiJobName_critical_ticket_count

As you can see from the logs ObserverJob noticed the enabled job and scheduled it.

Let’s change the cron scheduling rule of critical_ticket_count job to 0 0 3 ? * * * and see the logs again:

2022-04-11 13:55:12.354  INFO 13033 --- [eduler_Worker-3] com.example.qscheduler.KPIObserverJob    : Found KPI in kpi.json: critical_ticket_count , enabled: 1
2022-04-11 13:55:12.355  INFO 13033 --- [eduler_Worker-3] com.example.qscheduler.KPIObserverJob    : Found KPI in kpi.json: failed_customer_api_call , enabled: 0
2022-04-11 13:55:12.356  INFO 13033 --- [eduler_Worker-3] com.example.qscheduler.KPIObserverJob    : already scheduled jobName observerJob
2022-04-11 13:55:12.356  INFO 13033 --- [eduler_Worker-3] com.example.qscheduler.KPIObserverJob    : already scheduled jobName kpiJobName_critical_ticket_count
2022-04-11 13:55:12.356  INFO 13033 --- [eduler_Worker-3] com.example.qscheduler.KPIObserverJob    : rescheduling job: kpiJobName critical_ticket_count , new cron: 0 0 3 ? * * *

The Observer job now rescheduled the job since we changed its cron rule. These are all how we make observer job manage the KPI business jobs. If you have more attributes and if you want your observer job to reschedule or perform different operations on business job definition updates you should enrich your ObserverJob.

Extend by Creating a Management UI

Managing the scheduler jobs using a UI is not in the scope of this post. But that is not much different than managing any data on a web application. I encourage you to do your own implementations if this solution sounds useful to you.

Conclusion

This solution helps you create your own scheduling job management solution on the fly and lets you create, update, or delete the Quartz Scheduler jobs dynamically.

The complete implementation can be found in the GitHub project.


java development automation

Job opening: VisionPort support engineer (western U.S.)

Alejandro Ramon

By Alejandro Ramon
April 14, 2022

VisionPort cabinet with 7 HDTV screens in portrait orientation

We are looking for an engineer to join the End Point Immersive and Geospatial Support (I+G) Team—​a small, multidisciplinary team that supports our company’s clients with their VisionPort systems incorporating Liquid Galaxy technology. VisionPort hardware consists of large-panel HD TVs within a curved panoramic environment, supported by a server stack with power, video, audio, and network connections and equipment.

The candidate will be based out of a home office in the western United States (Washington, Oregon, California, Idaho, Utah, Nevada, Arizona, Montana, Wyoming, New Mexico, and Texas). The engineer will be asked to travel to, perform, and supervise system installations, in addition to day-to-day remote support work from a home office.

Occasional evenings and weekend on-call shifts are shared amongst the team.

This is a great entry-level opportunity for people already familiar with light audiovisual, server room, and/or installation handiwork to get experience with all aspects of production computer systems and their deployment. More experienced individuals will have the opportunity to work directly in feature development on production systems and possibly assist with other ongoing consulting projects the I+G team takes on.

Overview

  • Job Level: Entry-level or experienced, full-time or part-time.
  • Location: Remote work with occasional on-site.
  • Environment/​Culture: Casual, remote management, lots of video meetings.
  • Benefits: For full-time employees, paid vacation and holidays, 401(k), health insurance.

Core responsibilities

  • Support clients over email and video calls, troubleshooting system features or content.
  • Build server stacks, troubleshoot hardware, and fix software issues during installation.
  • Track, monitor, and resolve system issues on production computer systems.
  • Report and document issues and fixes.
  • Adapt to differing needs day to day!

Preferred skills and experience

  • A driver’s license is required for travel.
  • Comfortable with basic installation tools: hand drill, cable dressing, cable pulls, crimping, etc.
  • Experience on an active construction or office renovation site.
  • Working familiarity with computers, general exposure to internal components and software.
  • Self-driven, results-, detail-, and team-oriented, follows through on problems.
  • Familiarity with Linux basics, or willingness to learn.

What work here offers:

  • Flexible work hours
  • Annual bonus opportunity
  • Freedom from being tied to an office location
  • Collaboration with knowledgeable, friendly, helpful, and diligent co-workers around the world

About End Point

End Point is the leading global agency for developing, deploying, and supporting VisionPort systems. With over 200 installations and events worldwide, our team of engineers and content developers has a deep knowledge of the hardware, software, and aesthetics that can make a VisionPort come to life as a dazzling data, business, education, or presentation tool.

End Point was founded in 1995 and now has over 60 engineers based throughout North America, Europe, and Asia. This team brings decades of experience in systems management, database programming, web platform development, application development, and more specifically, VisionPort development and support.

Get in touch with us

Please email us an introduction to jobs@visionport.com to apply!

Include your location, resume, LinkedIn URL (if any), and whatever else may help us get to know you. You can expect to interview with the Support Lead and Team Manager.

We look forward to hearing from you! Direct work seekers only, please—​this role is not for agencies or subcontractors.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of sex/​gender, race, religion, color, national origin, sexual orientation, age, marital status, veteran status, or disability status.


jobs visionport

On Shapefiles and PostGIS

Josh Tolley

By Josh Tolley
April 2, 2022

Partial map of the voyage of the Endurance, from the book “South”, Ernest H. Shackleton Partial map of the voyage of the Endurance, from “South”, by Ernest Shackleton

The shapefile format is commonly used in geospatial vector data interchange, but as it’s managed by a commercial entity, Esri, and as GIS is a fairly specialized field, and perhaps because the format specification is only “mostly open”, these files can sometimes be confusing to the newcomer. Perhaps these notes can help clarify things.

Though the name “shapefile” would suggest a single file in filesystem parlance, a shapefile requires at least three different files, including filename extensions .shp, .shx, and .dbf, stored in the same directory, and the term “shapefile” often refers to that directory, or to an archive such as a zipfile or tarball containing that directory.

QGIS

QGIS is an open-source package to create, view, and process GIS data. One good first step with any shapefile, or indeed any GIS data, is often to take a look at it. Simply tell QGIS to open the shapefile directory. It may help to add other layers, such as one of the world map layers QGIS provides by default, to see the shapefile data in context.

GDAL

Though QGIS can convert between GIS formats itself, I prefer working in a command-line environment. The GDAL software suite aims to translate GIS data between many available formats, including shapefiles. I most commonly use its ogr2ogr command-line utility, along with the excellent accompanying manpage.

In short, a typical ogr2ogr command tells the utility where to find the input data and where to put the converted output, optionally with various reformatting and processing options. You’ll find some examples below.

PostGIS

Much of our (ok, my) GIS work has involved PostGIS, an extension to the PostgreSQL database for handling GIS data. It’s been convenient for me to process GIS data using the same language and tools I use to process other data. It uses GDAL’s libraries internally.

Examples

Import Shapefile data into PostGIS

The example below comes from a customer’s project we recently worked on. They provided us a set of several shapefiles, which I first arranged in a directory structure. This code imports each of them into a PostGIS database, in the shapefiles schema.

The other arguments to ogr2ogr specify the output format (“PostgreSQL”), the destination database name, and the directory which stores the shapefile. ogr2ogr expects the destination and source arguments in that order, as two positional arguments, so here the destination is PG:dbname=destdb, and the source file name comes from the the $i script variable.

for i in `find . -name "*shp"`; do
    j=$(basename $i)
    k=${j/.shp/}
    ogr2ogr -f PostgreSQL -nln shapefiles.${k} PG:dbname=destdb $i
done

Export PostGIS data as KML

This example creates a KML file from PostGIS query results. The arguments provide the query to use to fetch the data, the output format (“KML”), the output file name, and the source database. This will create a KML file containing a set of unstyled placemarks, with names from the property_code column, and geometry data from the outline_geom column in the properties table of our database.

In this project, outline_geom contained GIS “linestrings”, data types consisting of a series of lines, which ogr2ogr translated into KML polygons. Had outline_geom contained points, for instance, the KML result would also have been points. In other words, ogr2ogr automatically chooses the correct KML object type based on the GIS object type in the input data.

ogr2ogr -sql "select property_code, outline_geom from properties" -f KML outlines.kml PG:dbname=properties

Note that though the examples above use PostGIS, ogr2ogr can take shapefile input and produce KML output directly without the PostGIS intermediary. We used PostGIS in these cases for other purposes, such as to filter the output and limit the attributes stored in the KML result.

By default, ogr2ogr puts all the attributes from the shapefile into ExtendedData elements in the KML, but in our case we didn’t want those. We also didn’t want all the entries in the shapefile in our resulting KML. To skip the PostGIS step, we might do something like this:

ogr2ogr -f kml output.kml shapefile_directory/

What tools do you use for shapefile processing? Please let us know!


tips open-source tools gis maps postgres

Extending Your Jetty Distribution’s Capabilities

Kürşat Kutlu Aydemir

By Kürşat Kutlu Aydemir
March 31, 2022

Jetty Logo

What is Jetty?

“Jetty is a lightweight highly scalable Java-based web server and servlet engine.” (Jetty Project)

Jetty can run standalone or embedded in a Java application and the details about running a Jetty webserver can be found in the Jetty Project Git repository and documentation as well. The Jetty project has been hosted at the Eclipse Foundation since 2009 (Jetty, Eclipse).

Know Your Jetty

In many legacy environments using the Jetty web server there may be an older version of Jetty. If you know the version of the Jetty distribution in your environment then you can find its source code in the Jetty project GitHub repo. Some of the distributions are in project releases but most of the distributions can be found in the tags as well.

For instance jetty-9.4.15.v20190215 distribution can be found in the Jetty project tags at this URL: https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.15.v20190215

When you clone the jetty.project Git repo, you can then easily switch to any specific release tag:

$ git clone git@github.com:eclipse/jetty.project.git
$ git checkout jetty-9.4.15.v20190215

Then you can build or add your custom code in that version.

Extending Your Jetty Capabilities

The reason you might want to build Jetty yourself is that you have a specific Jetty version in your environment and want to add some custom handlers or wrappers so that you can add additional capabilities in your environment.

Jetty is written in Java and you can add new features or patch your own fork like other open-source Java projects.

Build

Once you have your target version code base you can just work on that individually. This is one way to add new features to your Jetty distribution.

After you add your custom code you’ll need to build. You can find the building instructions on Jetty Project GitHub home, which is simply:

$ mvn clean install

If you want to skip the tests the option below is your friend:

$ mvn clean install -DskipTests

Compile Classes Individually

This is a tricky way to inject your newly created custom classes into your Jetty distribution. In this way, instead of building the whole Jetty project, you can just create individual custom Java classes consuming Jetty libraries and compile them manually. You don’t need the whole project this way.

If we come back to the question: what new features would I want to add to my new or ancient local Jetty distribution? Well, that really depends on the issues you face or improvements you need to add.

For one of our customers, once we needed to log request and response headers in Jetty. We couldn’t find an existing way to do that. So I decided to create a custom RequestLog handler class and inject this into the Jetty deployment we already have rather than building the whole project.

Even if you don’t build the whole project it is still useful and handy to get the whole project code to refer the existing code and prepare your code by learning the existing way things are done in the project.

I found RequestLog interface in jetty-server sub-project and it is created under org.eclipse.jetty.server package. There is also a class RequestLogCollection in the same level implementing RequestLog which may give you some idea about the implementations.

So I followed the structure and created my custom handler in the same level and implemented RequestLog. Below is a part of my CustomRequestLog class:

package org.eclipse.jetty.server;

import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import org.eclipse.jetty.http.pathmap.PathMappings;
import org.eclipse.jetty.util.component.ContainerLifeCycle;
import org.eclipse.jetty.util.log.Log;
import org.eclipse.jetty.util.log.Logger;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.*;

public class CustomRequestLog extends ContainerLifeCycle implements RequestLog
{
    protected static final Logger LOG = Log.getLogger(CustomRequestLog.class);

    private static ThreadLocal<StringBuilder> _buffers = ThreadLocal.withInitial(() -> new StringBuilder(256));

    protected final Writer _requestLogWriter;

    private String[] _ignorePaths;
    private transient PathMappings<String> _ignorePathMap;

    public CustomRequestLog(Writer requestLogWriter)
    {
        this._requestLogWriter = requestLogWriter;
        addBean(_requestLogWriter);
    }

    /**
     * Is logging enabled
     *
     * @return true if logging is enabled
     */
    protected boolean isEnabled()
    {
        return true;
    }

    /**
     * Write requestEntry out. (to disk or slf4j log)
     *
     * @param requestEntry the request entry
     * @throws IOException if unable to write the entry
     */
    public void write(String requestEntry) throws IOException
    {
        _requestLogWriter.write(requestEntry);
    }

    private void append(StringBuilder buf, String s)
    {
        if (s == null || s.length() == 0)
            buf.append('-');
        else
            buf.append(s);
    }

    /**
     * Writes the request and response information to the output stream.
     *
     * @see RequestLog#log(Request, Response)
     */
    @Override
    public void log(Request request, Response response)
    {
        try
        {
            if (_ignorePathMap != null && _ignorePathMap.getMatch(request.getRequestURI()) != null)
                return;

            if (!isEnabled())
                return;

            StringBuilder buf = _buffers.get();
            buf.setLength(0);

            Gson gsonObj = new GsonBuilder().disableHtmlEscaping().create();
            Map<String, Object> reqLogMap = new HashMap<String, Object>();
            Map<String, String> reqHeaderMap = new HashMap<String, String>();
            // epoch timestamp
            reqLogMap.put("timestamp_epoch", System.currentTimeMillis());

            // timestamp
            DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX");
            String nowAsString = df.format(new Date());
            reqLogMap.put("timestamp", nowAsString);

            // request headers
            List<String> reqHeaderList = Collections.list(request.getHeaderNames());
            for(String headerName : reqHeaderList) {
                reqHeaderMap.put(headerName.toLowerCase(), request.getHeader(headerName));
            }
            reqLogMap.put("request_headers", reqHeaderMap);

            // response headers
            Map<String, String> resHeaderMap = new HashMap<String, String>();
            for(String headerName : response.getHeaderNames()) {
                resHeaderMap.put(headerName.toLowerCase(), response.getHeader(headerName));
            }
            reqLogMap.put("response_headers", resHeaderMap);

            // http method
            reqLogMap.put("http_method", request.getMethod());

            // original URI
            reqLogMap.put("original_uri", request.getOriginalURI());

            // protocol
            reqLogMap.put("protocol", request.getProtocol());

            // http status
            reqLogMap.put("http_status", response.getStatus());

            // query string
            reqLogMap.put("query_string", request.getQueryString());

            String reqJSONStr = gsonObj.toJson(reqLogMap);
            buf.append(reqJSONStr);

            String log = buf.toString();
            write(log);
        }
        catch (IOException e)
        {
            LOG.warn(e);
        }
    }
}

In this custom RequestLog class the most interesting part is public void log(Request request, Response response) method where the logging operation is actually done. You can simply override the existing logging behaviour and put anything you want. Here I added the raw request and response headers coming and going through Jetty server.

Now it is time to compile this class. You can find many tutorials about compiling a single Java class using classpath. Here’s how I did it:

$ javac -cp ".:$JETTY_HOME/lib/jetty-server-9.4.15.v20190215.jar:$JETTY_HOME/lib/jetty-http-9.4.15.v20190215.jar:$JETTY_HOME/lib/jetty-util-9.4.15.v20190215.jar:$JETTY_HOME/lib/servlet-api-3.1.jar:$JETTY_HOME/lib/gson-2.8.2.jar" CustomRequestLog.java

If you look at my classpath I even added a third party library gson-2.8.2.jar since I also used this in my custom code. Remember to put this in your $JETTY_HOME directory as well.

The command above generates the CustomRequestLog.class file which is now available to be injected. So where do you need to inject this?

Since I followed where the RequestLog interface is located and packaged we better inject this into the same project JAR file, which is jetty-server.jar. In my environment it is jetty-server-9.4.15.v20190215.jar. I also added other required dependencies in the classpath to compile this code.

Now, I want to inject CustomRequestLog.class into jetty-server-9.4.15.v20190215.jar. I copied this jar into a temporary directory and I extracted the content of jetty-server-9.4.15.v20190215.jar into the temp directory using this command:

$ jar xf jetty-server-9.4.15.v20190215.jar

This command extracts all the content of the jar file including resource files and the classes in their corresponding directory structure org/eclipse/jetty/server. You would see RequestLog.class also extracted in this directory.

So what we need to do is now simply copy our CustomRequestLog.class into this extracted org/eclipse/jetty/server directory and pack up the JAR file again by running this command:

$ jar cvf jetty-server-9.4.15.v20190215.jar org/ META-INF/

This command re-bundles compiled code along with the other extracted resources (in this case the META-INF/ directory only) and creates our injected JAR file. You’d better create this injected Jetty JAR file in the temp directory so that you can control the backup of existing original JAR files.

For this specific case I added this custom RequestLog handler in my Jetty config file jetty.xml. It may not be the case for all the custom changes or extensions you’d add to your Jetty instance.

Here is an example RequestLog config entry for this custom handler:

<Set name="RequestLog">
  <New id="RequestLog" class="org.eclipse.jetty.server.CustomRequestLog">
    <!-- Writer -->
    <Arg>
      <New class="org.eclipse.jetty.server.AsyncRequestLogWriter">
        <Arg>
          <Property name="jetty.base" default="." />/
          <Property>
            <Name>jetty.requestlog.filePath</Name>
            <Default>
              <Property name="jetty.requestlog.dir" default="logs"/>/yyyy_mm_dd.request.log
            </Default>
          </Property>
        </Arg>
        <Arg/>
        <Set name="filenameDateFormat">
          <Property name="jetty.requestlog.filenameDateFormat" default="yyyy_MM_dd"/>
        </Set>
        <Set name="retainDays">
          <Property name="jetty.requestlog.retainDays" default="90"/>
        </Set>
        <Set name="append">
          <Property name="jetty.requestlog.append" default="false"/>
        </Set>
        <Set name="timeZone">
          <Property name="jetty.requestlog.timezone" default="GMT"/>
        </Set>
      </New>
    </Arg>
  </New>
</Set>

That’s all.


java jetty development
Page 1 of 203 • Next page