Our Blog

Ongoing observations by End Point Dev people

Creating Telegram bots with Google Apps Script

Empty clothes in the shape of a human sitting on a bench with fall leaves

In a previous post on this blog, Afif wrote about how to use Google Apps Script with Google Forms. Coincidentally, last year I learned a bit about how to use Google Apps Script with Telegram Bot as a personal ledger tool, as outlined in this post by Mars Escobin.

The Telegram Bot I created from Mars’s code looks like this:

In this post I will share a bit on how to adapt Mars’s code and use Telegram Bot to get the input from the user and let Google Apps Script call Google’s cloud-based services (translation and finance) to later return the outputs to the user.

The initial process of creating a Telegram bot is outlined on Telegram’s website. After receiving Telegram’s API key we can use it inside our Google Apps Script editor.

Translation Bot

One way to use the Telegram Bot and Google Cloud Services which came to mind was to create a translation bot. Although there are undoubtedly tons of mobile apps out there which do the same thing, I wanted to learn about it by using Telegram. So I found a class that I could use in Apps Script to realize that.

Google Apps Script can be used to manipulate the Google Translate capability by calling the Language.App class. The translation capability is invoked with LanguageApp.translate(text, sourceLanguage, targetLanguage).

In addition to fetching the input from the users, I also wanted to store the searched word inside a Google Sheets spreadsheet.

So by using this:

sheet.appendRow([formattedDate, item[0], item[1], item[2], myTranslationOutput]);

we can append the searched word inside the Google spreadsheet, and later append the input and output which we got from the user.

In my case, I removed the inline keyBoard function from Mars’s code since I didn’t plan to use it in my bot. Instead, I will use the inputs which were sent through the item variable and let LanguageApp translate it. The translation will virtually take a few seconds, so if we do not put an interval within our code, sendMessage will reply with an empty output for the translation.

In order to handle this, I used the Utilities.sleep() function prior to return sendMessage() so that I will be able to grab the answer before returning the output to the requestor.

The following are my changes for creating the translation bot:

var token = "<insert your Telegram API token here>";
var telegramUrl = "https://api.telegram.org/bot" + token;
var webAppUrl = "<insert your webAppURL which is generated from Google Apps Script UI here>";

function sendMessage(id, text) {
  var data = {
    method: "post",
    payload: {
      method: "sendMessage",
      chat_id: String(id),
      text: text,
      parse_mode: "HTML",

    }
  };
  UrlFetchApp.fetch('https://api.telegram.org/bot' + token + '/', data);
}

function doPost(e) {
  var contents = JSON.parse(e.postData.contents);
  var ssId = "<insert your webAppURL which is generated from Google Apps Script UI here>";
  var sheet = SpreadsheetApp.openById(ssId).getSheetByName("<sheet name here>");

  if (contents.message) {
    var id = contents.message.from.id;
    var text = contents.message.text;

    if (text.indexOf(",") !== -1) {
      var dateNow = new Date;
      var formattedDate = dateNow.getDate() + "/" + (dateNow.getMonth() + 1);
      var item = text.split(",");
      var myTranslationOutput = LanguageApp.translate(item[0], item[1], item[2]);

      sheet.appendRow([formattedDate, item[0], item[1], item[2], myTranslationOutput]);

      Utilities.sleep(200);
      return sendMessage(id, "The translation of " + item[0] + " is " + myTranslationOutput);
    } else {
      return sendMessage(id, "The word that you key in will be kept for our analysis purpose\nPlease use this format : word, source language code, target language code \nRefer cloud.google.com/translate/docs/languages");
    }
  }
}

Currency Converter Bot

There are at least two possible ways to create a currency converter bot with Telegram and Google Apps Script. I could either invoke a curl-like method by calling an external API (which is not related to Google) or just manipulating whatever Google Finance offers. I did some searching but I could not find any built-in class in order to do the conversion within the code. However, I remember that Google Sheets could actually call Google Finance within its cell. So I decided to let Sheets do the conversion and then I will fetch the result and return it to the requester.

This is shown in the following snippet:

sheet.getRange('a2').setValue(item[0]);
sheet.getRange('b2').setValue(item[1]);
sheet.getRange('c2').setValue(item[2]);
sheet.getRange('d2').setValue('=GOOGLEFINANCE("currency:"&b2&c2)*a2');

And later we fetched the value from d2 cell with

var value = SpreadsheetApp.getActiveSheet().getRange('d2').getValue();

As the default value is taking many decimal points, I made it fixed to two decimal points.

value = value.toFixed(2);

I then returned the value which later converted the currency code to uppercase with .toUpperCase().

return sendMessage(id, item[1].toUpperCase() + " " + item[0] + " = " + item[2].toUpperCase() + " " + value);

You can see my changes in the following scripts:

var token = "<insert your Telegram API token here>";
var telegramUrl = "https://api.telegram.org/bot" + token;
var webAppUrl = "<insert your webAppURL which is generated from Google Apps Script UI here>";

function setWebhook() {
  var url = telegramUrl + "/setWebhook?url=" + webAppUrl;
  var response = UrlFetchApp.fetch(url);
  Logger.log(response.getContentText());
}


function sendMessage(id, text) {
  var data = {
    method: "post",
    payload: {
      method: "sendMessage",
      chat_id: String(id),
      text: text,
      parse_mode: "HTML",

    }
  };
  UrlFetchApp.fetch('https://api.telegram.org/bot' + token + '/', data);
}

function doPost(e) {
  var contents = JSON.parse(e.postData.contents);
  var ssId = "<insert the spreadsheet ID here, you can get it from the browser URL bar>";
  var sheet = SpreadsheetApp.openById(ssId).getSheetByName("<sheet name here>");

  if (contents.message) {
    var id = contents.message.from.id;
    var text = contents.message.text;

    if (text.indexOf(",") !== -1) {
      var dateNow = new Date;
      var formattedDate = dateNow.getDate() + "/" + (dateNow.getMonth() + 1);
      var item = text.split(",");

      sheet.getRange('a2').setValue(item[0]);
      sheet.getRange('b2').setValue(item[1]);
      sheet.getRange('c2').setValue(item[2]);
      sheet.getRange('d2').setValue('=GOOGLEFINANCE("currency:"&b2&c2)*a2');

      SpreadsheetApp.getActiveSheet().getRange('c2').setValue('=GOOGLEFINANCE("currency:"&item[1]&item[2])*item[0]');

      var value = SpreadsheetApp.getActiveSheet().getRange('d2').getValue();
      Utilities.sleep(10);

      sheet.appendRow([formattedDate, item[0], item[1], item[2], value]);

      value = value.toFixed(2);

      return sendMessage(id, item[1].toUpperCase() + " " + item[0] + " = " + item[2].toUpperCase() + " " + value)
    } else {
      return sendMessage(id, "The word that you key in will be kept for our analysis purpose\nPlease use this format : amount, source currency code, target currency code")
    }
  }
}

Special Notes

Throughout the process, I found several ways for us to refer to a spreadsheet that we want inside the code. For example, we can use:

var ssId = "<spreadsheet's ID>";
var sheet = SpreadsheetApp.openById(ssId).getSheetByName("<the sheet's name>");

This can select a specific sheet, if your spreadsheet file contains multiple different sheets (tabs).

Or we could use SpreadsheetApp.getActiveSpreadsheet() but this method depends on the active sheet inside the spreadsheet’s UI, as described here.

Nevertheless, both of the methods above are part of the SpreadsheetApp Class.

There are many more things that could be done by the Google Apps Script. It is really helpful for automating anything that we routinely do across many files. In my example that I gave above, the function is only being used on two different spreadsheets β€” as a placeholder so that I could get the result to be returned to my Telegram bot.


google-apps-script chat integration

Comparison of Lightweight CSS Frameworks

Seth Jensen

By Seth Jensen
January 13, 2022

The frame of a house in front of a mountain range

Several months ago I was building a new website and needed a CSS framework. There are many new CSS frameworks around now, so I decided to do several small trial runs and compare them to find the most fitting for our site.

Initially I tried using Foundation, which this site is built on, but I ran into a problem: Foundation and other popular frameworks are powerful, but they have more features than I needed, and they take up too much disk space for a modest website. Foundation is about 6000 lines long and 200kB unminified. It can be shrunk by removing components you don’t need, but the time it takes to slim a large CSS framework down to your needs can be more than it’s worth, especially if you want to change any styling.

Instead of adapting a larger framework to my needs, I looked into lightweight CSS frameworks. These aim to provide boilerplate components, better styling for buttons, forms, tables, and the like, all at the cost of very little disk space.

I like to modify the source CSS as needed, so all of the sizes I list will be unzipped and unminified. They are also from my brief real-world testing on one machine, so your mileage may vary.

Bulma

Bulma looked very attractive at first. It has sleek, modern design and is quite popular, with a good-sized community for support. But for our site, it was still too big. I only used a few components and layout helpers, a fraction of its 260kB β€” around 50kB bigger than Foundation, in my tests!

mini.css

mini.css had some great things going for it, but got pushed out by even smaller options which I’ll cover shortly. It took up about 46kB with all the bells and whistles.

The use of flex is a bonus, but it’s a little too style agnostic for the website I was working on; I would have overridden a lot of its styling, making the overhead size even larger.

It has options for using SCSS or CSS variables, which I always like to see. I prefer SCSS’s added features and variables, but if you would rather use vanilla CSS, mini.css has a plain option using CSS variables, which I don’t see too often.

Pure.css

Pure.css offers similar features to the other frameworks on this list, and weighing in at around 17kB, it competes well in size. However, I wasn’t a big fan of its look, and with some odd omissions, notably, the absence of a container class for horizontally centering content, it was edged out by the competition.

Skeleton

Skeleton has very clean styling and a good selection of minimal components. It’s tiny, weighing in at 12kB. Unfortunately, it hasn’t had a release since 2014, and uses float instead of Flexbox, which took it out of the running for building a more modern site.

Milligram

Milligram feels like a spiritual successor to Skeleton: It covers a similar scope, and has a simple, clean design reminiscent of Skeleton’s. In lieu of using classes, it usually applies styling directly to HTML tags. This is a choice I like, since I don’t follow the Tailwind CSS approach of creating websites purely in HTML, with agnostic classes applied to the layout.

It also has an SCSS offering, so it’s easy to drop components you don’t want, making it even tinier!

The size and simplicity of Milligram made it win the bid. It has worked quite well, providing just enough framework to be useful for small websites, but getting out of the way so that you can do your own styling.

Making the most of your lightweight framework

Hopefully this post will give you an idea of what each of these frameworks is like, but the best way to test them is by trying them yourself! These five were fairly quick to spin up and test out on the website I was working on, and doing the same will give you the best idea of what your website needs.

With frameworks that are only a few kilobytes, you can also read through the entire source. I tried to look through enough of each framework’s CSS files to become familiar with its way of doing things. This eased the whole website-building process, and will help you find a framework that matches your style of web development.

Also see my colleague Afif’s article Building responsive websites with Tailwind CSS for an in-depth look at Tailwind CSS.


css design

Database integration testing with .NET

Kevin Campusano

By Kevin Campusano
January 12, 2022

Sunset over lake in mountains

Ruby on Rails is great. We use it at End Point for many projects with great success. One of Rails’ cool features is how easy it is to write database integration tests. Out of the box, Rails projects come with all the configuration necessary to set up a database that’s exclusive for the automated test suite. This database includes all the tables and other objects that exist within the regular database that the app uses during its normal execution. So, it is very easy to write automated tests that cover the application components that interact with the database.

ASP.NET Core is also great! However, it doesn’t have this feature out of the box. Let’s see if we can’t do it ourselves.

The sample project

As a sample project we will use a REST API that I wrote for another article. Check it out if you want to learn more about the ins and outs of developing REST APIs with .NET. You can find the source code on GitHub.

The API is very straightforward. It provides a few endpoints for CRUDing some database tables. It also provides an endpoint which, when given some vehicle information, will calculate a monetary value for that vehicle. That’s a feature that would be interesting for us to cover with some tests.

The logic for that feature is backed by a specific class and it depends heavily on database interactions. As such, that class is a great candidate for writing a few automated integration tests against. The class in question is QuotesService which is defined in Services/QuoteService.cs. The class provides features for fetching records from the database (the GetAllQuotes method) as well as creating new records based on data from the incoming request and a set of rules stored in the database itself (the CalculateQuote method).

In order to add automated tests, the first step is to organize our project so that it supports them. Let’s do that next.

Organizing the source code to allow for automated testing

In general, the source code of most real world .NET applications is organized as one or more “projects” under one “solution”. A solution is a collection of related projects, and a project is something that produces a deployment artifact. An artifact is a library (i.e. a *.dll file) or something that can be executed like a console or web app.

Our sample app is a stand-alone “webapi” project, meaning that it’s not within a solution. For automated tests, however, we need to create a new project for tests, parallel to our main one. Now that we have two projects instead of one, we need to reorganize the sample app’s source code to comply with the “projects in a solution” structure I mentioned earlier.

Let’s start by moving all the files in the root directory into a new VehicleQuotes directory. That’s one project. Then, we create a new automated tests project by running the following, still from the root directory:

dotnet new xunit -o VehicleQuotes.Tests

That creates a new automated tests project named VehicleQuotes.Tests (under a new aptly-named VehicleQuotes.Tests directory) which uses the xUnit.net test framework. There are other options when it comes to test frameworks in .NET, such as MSTest and NUnit. We’re going to use xUnit.net, but the others should work just as well for our purposes.

Now, we need to create a new solution to contain those two projects. Solutions come in the form of *.sln files and we can create ours like so:

dotnet new sln -o vehicle-quotes

That should’ve created a new vehicle-quotes.sln file for us. We should now have a file structure like this:

.
β”œβ”€β”€ vehicle-quotes.sln
β”œβ”€β”€ VehicleQuotes
β”‚   β”œβ”€β”€ VehicleQuotes.csproj
β”‚   └── ...
└── VehicleQuotes.Tests
    β”œβ”€β”€ VehicleQuotes.Tests.csproj
    └── ...

Like I said, the *.sln file indicates that this is a solution. The *.csproj files identify the individual projects that make up the solution.

Now, we need to tell dotnet that those two projects belong in the same solution. These commands do that:

dotnet sln add ./VehicleQuotes/VehicleQuotes.csproj
dotnet sln add ./VehicleQuotes.Tests/VehicleQuotes.Tests.csproj

Finally, we update the VehicleQuotes.Tests project so that it references the VehicleQuotes project. That way, the test suite will have access to all the classes defined in the REST API. Here’s the command for that:

dotnet add ./VehicleQuotes.Tests/VehicleQuotes.Tests.csproj reference ./VehicleQuotes/VehicleQuotes.csproj

With all that setup out of the way, we can now start writing some tests.

You can learn more about project organization in the official online documentation.

Creating a DbContext instance to talk to the database

The VehicleQuotes.Tests automated tests project got created with a default test file named UnitTest1.cs. You can delete it or ignore it, since we will not use it.

In general, it’s a good idea for the test project to mimic the directory structure of the project that it will be testing. Also, we already decided that we would focus our test efforts on the QuoteService class from the VehicleQuotes project. That class is defined in VehicleQuotes/Services/QuoteService.cs, so let’s create a similarly located file within the test project which will contain the test cases for that class. Here: VehicleQuotes.Tests/Services/QuoteServiceTests.cs. These would be the contents:

// VehicleQuotes.Tests/Services/QuoteServiceTests.cs

using System;
using Xunit;

namespace VehicleQuotes.Tests.Services
{
    public class QuoteServiceTests
    {
        [Fact]
        public void GetAllQuotesReturnsEmptyWhenThereIsNoDataStored()
        {
            // Given

            // When

            // Then
        }
    }
}

This is the basic structure for tests using xUnit.net. Any method annotated with a [Fact] attribute will be picked up and run by the test framework. In this case, I’ve created one such method called GetAllQuotesReturnsEmptyWhenThereIsNoDataStored which should give away its intention. This test case will validate that QuoteService’s GetAllQuotes method returns an empty set when called with no data in the database.

Before we can write this test case, though, the suite needs access to the test database. Our app uses Entity Framework Core for database interaction, which means that the database is accessed via a DbContext class. Looking at the source code of our sample app, we can see that the DbContext being used is VehicleQuotesContext, defined in VehicleQuotes/Data/VehicleQuotesContext.cs. Let’s add a utility method to the QuoteServiceTests class which can be used to create new instances of VehicleQuotesContext:

// VehicleQuotes.Tests/Services/QuoteServiceTests.cs

// ...
using Microsoft.EntityFrameworkCore;
using VehicleQuotes.Services;

namespace VehicleQuotes.Tests.Services
{
    public class QuoteServiceTests
    {
        private VehicleQuotesContext CreateDbContext()
        {
            var options = new DbContextOptionsBuilder<VehicleQuotesContext>()
                .UseNpgsql("Host=db;Database=vehicle_quotes_test;Username=vehicle_quotes;Password=password")
                .UseSnakeCaseNamingConvention()
                .Options;

            var context = new VehicleQuotesContext(options);

            context.Database.EnsureCreated();

            return context;
        }

        // ...
    }
}

As you can see, we need to go through three steps to create the VehicleQuotesContext instance and get a database that’s ready for testing:

First, we create a DbContextOptionsBuilder and use that to obtain the options object that the VehicleQuotesContext needs as a constructor parameter. We needed to include the Microsoft.EntityFrameworkCore namespace in order to have access to the DbContextOptionsBuilder. For this, I just copied and slightly modified this statement from the ConfigureServices method in the REST API’s VehicleQuotes/Startup.cs file:

// VehicleQuotes/Startup.cs

public void ConfigureServices(IServiceCollection services)
{
    // ...

    services.AddDbContext<VehicleQuotesContext>(options =>
        options
            .UseNpgsql(Configuration.GetConnectionString("VehicleQuotesContext"))
            .UseSnakeCaseNamingConvention()
            .UseLoggerFactory(LoggerFactory.Create(builder => builder.AddConsole()))
            .EnableSensitiveDataLogging()
    );

    // ...
}

This is a method that runs when the application is starting up to set up all the services that the app uses to work. Here, it’s setting up the DbContext to enable database interaction. For the test suite, I took this statement as a starting point and removed the logging configurations and specified a hardcoded connection string that specifically points to a new vehicle_quotes_test database that will be used for testing.

If you’re following along, then you need a PostgreSQL instance that you can use to run the tests. In my case, I have one running that is reachable via the connection string I specified: Host=db;Database=vehicle_quotes_test;Username=vehicle_quotes;Password=password.

If you have Docker, a quick way to get a Postgres database up and running is with this command:

docker run -d \
    --name vehicle-quotes-db \
    -p 5432:5432 \
    --network host \
    -e POSTGRES_DB=vehicle_quotes \
    -e POSTGRES_USER=vehicle_quotes \
    -e POSTGRES_PASSWORD=password \
    postgres

That’ll spin up a new Postgres instance that’s reachable via localhost.

Secondly, now that we have the options parameter ready, we can quite simply instantiate a new VehicleQuotesContext:

var context = new VehicleQuotesContext(options);

Finally, we call the EnsureCreated method so that the database that we specified in the connection string is actually created.

context.Database.EnsureCreated();

This is the database that our test suite will use.

Defining the test database connection string in the appsettings.json file

One quick improvement that we can do to the code we’ve written so far is move the connection string for the test database into a separate configuration file, instead of having it hardcoded. Let’s do that next.

We need to create a new appsettings.json file under the VehicleQuotes.Tests directory. Then we have to add the connection string like so:

{
  "ConnectionStrings": {
    "VehicleQuotesContext": "Host=db;Database=vehicle_quotes_test;Username=vehicle_quotes;Password=password"
  }
}

This is the standard way of configuring connection strings in .NET. Now, to actually fetch this value from within our test suite code, we make the following changes:

// ...
+using Microsoft.Extensions.Hosting;
+using Microsoft.Extensions.Configuration;
+using Microsoft.Extensions.DependencyInjection;

namespace VehicleQuotes.Tests.Services
{
    public class QuoteServiceTests
    {
        private VehicleQuotesContext CreateDbContext()
        {
+           var host = Host.CreateDefaultBuilder().Build();
+           var config = host.Services.GetRequiredService<IConfiguration>();

            var options = new DbContextOptionsBuilder<VehicleQuotesContext>()
-               .UseNpgsql("Host=db;Database=vehicle_quotes_test;Username=vehicle_quotes;Password=password")
+               .UseNpgsql(config.GetConnectionString("VehicleQuotesContext"))
                .UseSnakeCaseNamingConvention()
                .Options;

            var context = new VehicleQuotesContext(options);

            context.Database.EnsureCreated();

            return context;
        }

        // ...
    }
}

First we add a few using statements. We need Microsoft.Extensions.Hosting so that we can have access to the Host class through which we obtain access to the application’s execution context. This allows us to access the built-in configuration service. We also need Microsoft.Extensions.Configuration to have access to the IConfiguration interface which is how we reference the configuration service which allows us access to the appsettings.json config file. And we also need the Microsoft.Extensions.DependencyInjection namespace which allows us to tap into the built-in dependency injection mechanism, through which we can access the default configuration service I mentioned before. Specifically, that namespace is where the GetRequiredService extension method lives.

All this translates into the few code changes that you see in the previous diff: first getting the app’s host, then getting the configuration service, then using that to fetch our connection string.

You can refer to the official documentation to learn more about configuration in .NET.

Writing a simple test case that fetches data

Now that we have a way to access the database from within the test suite, we can finally write an actual test case. Here’s the GetAllQuotesReturnsEmptyWhenThereIsNoDataStored one that I alluded to earlier:

// ...

namespace VehicleQuotes.Tests.Services
{
    public class QuoteServiceTests
    {
        // ...

        [Fact]
        public async void GetAllQuotesReturnsEmptyWhenThereIsNoDataStored()
        {
            // Given
            var dbContext = CreateDbContext();
            var service = new QuoteService(dbContext, null);

            // When
            var result = await service.GetAllQuotes();

            // Then
            Assert.Empty(result);
        }
    }
}

This one is a very simple test. We obtain a new VehicleQuotesContext instance that we can use to pass as a parameter when instantiating the component that we want to test: the QuoteService. We then call the GetAllQuotes method and assert that it returned an empty set. The test database was just created, so there should be no data in it, hence the empty resource set.

To run this test, we do dotnet test. I personally like a more verbose output so I like to use this variant of the command: dotnet test --logger "console;verbosity=detailed". Here’s what the output looks like.

$ dotnet test --logger "console;verbosity=detailed"
  Determining projects to restore...
  All projects are up-to-date for restore.
  VehicleQuotes -> /app/VehicleQuotes/bin/Debug/net5.0/VehicleQuotes.dll
  VehicleQuotes.Tests -> /app/VehicleQuotes.Tests/bin/Debug/net5.0/VehicleQuotes.Tests.dll
Test run for /app/VehicleQuotes.Tests/bin/Debug/net5.0/VehicleQuotes.Tests.dll (.NETCoreApp,Version=v5.0)
Microsoft (R) Test Execution Command Line Tool Version 16.11.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
/app/VehicleQuotes.Tests/bin/Debug/net5.0/VehicleQuotes.Tests.dll
[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.3+1b45f5407b (64-bit .NET 5.0.12)
[xUnit.net 00:00:01.03]   Discovering: VehicleQuotes.Tests
[xUnit.net 00:00:01.06]   Discovered:  VehicleQuotes.Tests
[xUnit.net 00:00:01.06]   Starting:    VehicleQuotes.Tests
[xUnit.net 00:00:03.25]   Finished:    VehicleQuotes.Tests
  Passed VehicleQuotes.Tests.Services.QuoteServiceTests.GetAllQuotesReturnsEmptyWhenThereIsNoDataStored [209 ms]

Test Run Successful.
Total tests: 1
     Passed: 1
 Total time: 3.7762 Seconds

Resetting the state of the database after each test

Now we need to write a test that actually writes data into the database. However, every test case needs to start with the database in its original state. In other words, the changes that one test case does to the test database should not be seen, affect, or be expected by any subsequent test. That will make it so our test cases are isolated and repeatable. That’s not possible with our current implementation, though.

You can read more about the FIRST principles of testing here.

Luckily, that’s a problem that’s easily solved with Entity Framework Core. All we need to do is call a method that ensures that the database is deleted just before it ensures that it is created. Here’s what it looks like:

 private VehicleQuotesContext CreateDbContext()
 {
     var host = Host.CreateDefaultBuilder().Build();
     var config = host.Services.GetRequiredService<IConfiguration>();

     var options = new DbContextOptionsBuilder<VehicleQuotesContext>()
         .UseNpgsql(config.GetConnectionString("VehicleQuotesContext"))
         .UseSnakeCaseNamingConvention()
         .Options;

     var context = new VehicleQuotesContext(options);

+    context.Database.EnsureDeleted();
     context.Database.EnsureCreated();

     return context;
 }

And that’s all. Now every test case that calls CreateDbContext in order to obtain a DbContext instance will effectively trigger a database reset. Feel free to dotnet test again to validate that the test suite is still working.

Now, depending on the size of the database, this can be quite expensive. For integration tests, performance is not as big of a concern as for unit tests. This is because integration tests should be fewer in number and less frequently run.

We can make it better though. Instead of deleting and recreating the database before each test case, we’ll take a page out of Ruby on Rails’ book and run each test case within a database transaction which gets rolled back after the test is done. For now though, let’s write another test case: this time, one where we insert new records into the database.

If you want to hear a more in-depth discussion about automated testing in general, I go into further detail on the topic in this article: An introduction to automated testing for web applications with Symfony.

Writing another simple test case that stores data

Now let’s write another test that exercises QuoteService’s GetAllQuotes method. This time though, let’s add a new record to the database before calling it so that the method’s result is not empty. Here’s what the test looks like:

// ...
using VehicleQuotes.Models;
using System.Linq;

namespace VehicleQuotes.Tests.Services
{
    public class QuoteServiceTests
    {
        // ...

        [Fact]
        public async void GetAllQuotesReturnsTheStoredData()
        {
            // Given
            var dbContext = CreateDbContext();

            var quote = new Quote
            {
                OfferedQuote = 100,
                Message = "test_quote_message",

                Year = "2000",
                Make = "Toyota",
                Model = "Corolla",
                BodyTypeID = dbContext.BodyTypes.Single(bt => bt.Name == "Sedan").ID,
                SizeID = dbContext.Sizes.Single(s => s.Name == "Compact").ID,

                ItMoves = true,
                HasAllWheels = true,
                HasAlloyWheels = true,
                HasAllTires = true,
                HasKey = true,
                HasTitle = true,
                RequiresPickup = true,
                HasEngine = true,
                HasTransmission = true,
                HasCompleteInterior = true,

                CreatedAt = DateTime.Now
            };

            dbContext.Quotes.Add(quote);

            dbContext.SaveChanges();

            var service = new QuoteService(dbContext, null);

            // When
            var result = await service.GetAllQuotes();

            // Then
            Assert.NotEmpty(result);
            Assert.Single(result);
            Assert.Equal(quote.ID, result.First().ID);
            Assert.Equal(quote.OfferedQuote, result.First().OfferedQuote);
            Assert.Equal(quote.Message, result.First().Message);
        }
    }
}

First we include the VehicleQuotes.Models namespace so that we can use the Quotes model class. In our REST API, this is the class that represents the data from the quotes table. This is the main table that GetAllQuotes queries. We also include the System.Linq namespace, which allows us to use various collection extension methods (like Single and First) which we leverage throughout the test case to query lookup tables and assert on the test results.

Other than that, the test case itself is pretty self-explanatory. We start by obtaining an instance of VehicleQuotesContext via the CreateDbContext method. Remember that this also resets the whole database so that the test case can run over a clean slate. Then, we create a new Quote object and use our VehicleQuotesContext to insert it as a record into the database. We do this so that the later call to QuoteService’s GetAllQuotes method actually finds some data to return this time. Finally, the test case validates that the result contains a record and that its data is identical to what we set manually.

Neat! At this point we have what I think is the bare minimum infrastructure when it comes to serviceable and effective database integration tests, namely, access to a test database. We can take it one step further, though, and make things more reusable and a little bit better performing.

Refactoring into a fixture for reusability

We can use the test fixture functionality offered by xUnit.net in order to make the database interactivity aspect of our test suite into a reusable component. That way, if we had other test classes focused on other components that interact with the database, we could just plug that code in. We can define a fixture by creating a new file called, for example, VehicleQuotes.Tests/Fixtures/DatabaseFixture.cs with these contents:

using System;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;

namespace VehicleQuotes.Tests.Fixtures
{
    public class DatabaseFixture : IDisposable
    {
        public VehicleQuotesContext DbContext { get; private set; }

        public DatabaseFixture()
        {
            DbContext = CreateDbContext();
        }

        public void Dispose()
        {
            DbContext.Dispose();
        }

        private VehicleQuotesContext CreateDbContext()
        {
            var host = Host.CreateDefaultBuilder().Build();
            var config = host.Services.GetRequiredService<IConfiguration>();

            var options = new DbContextOptionsBuilder<VehicleQuotesContext>()
                .UseNpgsql(config.GetConnectionString("VehicleQuotesContext"))
                .UseSnakeCaseNamingConvention()
                .Options;

            var context = new VehicleQuotesContext(options);

            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            return context;
        }
    }
}

All this class does is define the CreateDbContext method that we’re already familiar with but puts it in a nice reusable package. Upon instantiation, as seen in the constructor, it stores a reference to the VehicleQuotesContext in its DbContext property.

With that, our QuoteServiceTests test class can use it if we make the following changes to it:

 using System;
 using Xunit;
-using Microsoft.EntityFrameworkCore;
 using VehicleQuotes.Services;
-using Microsoft.Extensions.Hosting;
-using Microsoft.Extensions.Configuration;
-using Microsoft.Extensions.DependencyInjection;
 using VehicleQuotes.Models;
 using System.Linq;
+using VehicleQuotes.Tests.Fixtures;

 namespace VehicleQuotes.Tests.Services
 {
-    public class QuoteServiceTests
+    public class QuoteServiceTests : IClassFixture<DatabaseFixture>
     {
+        private VehicleQuotesContext dbContext;

+        public QuoteServiceTests(DatabaseFixture fixture)
+        {
+            dbContext = fixture.DbContext;
+        }

-        private VehicleQuotesContext CreateDbContext()
-        {
-            var host = Host.CreateDefaultBuilder().Build();
-            var config = host.Services.GetRequiredService<IConfiguration>();

-            var options = new DbContextOptionsBuilder<VehicleQuotesContext>()
-                .UseNpgsql(config.GetConnectionString("VehicleQuotesContext"))
-                .UseSnakeCaseNamingConvention()
-                .Options;

-            var context = new VehicleQuotesContext(options);

-            context.Database.EnsureDeleted();
-            context.Database.EnsureCreated();

-            return context;
-        }

         [Fact]
         public async void GetAllQuotesReturnsEmptyWhenThereIsNoDataStored()
         {
             // Given
-            var dbContext = CreateDbContext();

             // ...
         }

         [Fact]
         public async void GetAllQuotesReturnsTheStoredData()
         {
             // Given
-            var dbContext = CreateDbContext();

             // ...
         }
     }
 }

Here we’ve updated the QuoteServiceTests class definition so that it inherits from IClassFixture<DatabaseFixture>. This is how we tell xUnit.net that our tests use the new fixture that we created. Next, we define a constructor that receives a DatabaseFixture object as a parameter. That’s how xUnit.net allows our test class to access the capabilities provided by the fixture. In this case, we take the fixture’s DbContext instance, and store it for later use in all of the test cases that need database interaction. We also removed the CreateDbContext method because now that’s defined within the fixture. We also removed a few using statements that became unnecessary.

One important aspect to note about this fixture is that it is initialized once per whole test suite run, not once per test case. Specifically, the code within the DatabaseFixture’s constructor gets executed once, before all of the test cases. Similarly, the code in DatabaseFixture’s Dispose method get executed once at the end, after all test cases have been run.

This means that our test database deletion and recreation step now happens only once for the entire test suite. This is not good with our current implementation because that means that individual test cases no longer run with a fresh, empty database. This can be good for performance though, as long as we update our test cases to run within database transactions. Let’s do just that.

Using transactions to reset the state of the database

Here’s how we update out test class so that each test case runs within a transaction:

 // ...

 namespace VehicleQuotes.Tests.Services
 {
-    public class QuoteServiceTests : IClassFixture<DatabaseFixture>
+    public class QuoteServiceTests : IClassFixture<DatabaseFixture>, IDisposable
     {
         private VehicleQuotesContext dbContext;

         public QuoteServiceTests(DatabaseFixture fixture)
         {
             dbContext = fixture.DbContext;

+            dbContext.Database.BeginTransaction();
         }

+        public void Dispose()
+        {
+            dbContext.Database.RollbackTransaction();
+        }

         # ...
     }
 }

The first thing to note here is that we added a call to BeginTransaction in the test class constructor. xUnit.net creates a new instance of the test class for each test case. This means that this constructor is run before each and every test case. We use that opportunity to begin a database transaction.

The other interesting point is that we’ve updated the class to implement the IDisposable interface’s Dispose method. xUnit.net will run this code after each test case, so we rollback the transaction.

Put those two together and we’ve updated our test suite so that every test case runs within the context of its own database transaction. Try it out with dotnet test and see what happens.

To learn more about database transactions with Entity Framework Core, you can look at the official docs.

You can learn more about xUnit.net’s test class fixtures in the samples repository.

Alright, that’s all for now. It is great to see that implementing automated database integration tests is actually fairly straightforward using .NET, xUnit.net, and Entity Framework. Even if it isn’t quite as easy as it is in Rails, it is perfectly doable.


dotnet integration database testing

On the Importance of Explicitly Converting Strings to Numbers

By Jeff Laughlin
January 11, 2022

Wall with tiles of 4 colors in a pattern

Recently a valued colleague approached me with a JavaScript problem. This individual is new to programming and is working on a self-taught course.

The assignment was fairly simple: Take a list of space-delimited integers and find the maximum and minimum values. If you are an experienced developer you can probably already guess where this is going. His code:

function highAndLow(numbers) {
  const myArr = numbers.split(" ");
  let lowNum = myArr[0];
  let highNum = myArr[0];
  for (let i = 0; i < myArr.length; i++) {
    if (myArr[i] > highNum) {
      highNum = myArr[i];
    } else if (myArr[i] < lowNum) {
      lowNum = myArr[i];
    }
  }
  return highNum + ' ' + lowNum;
}

console.log(highAndLow("8 3 -5 42 -1 0 0 -9 4 7 4 -4"));

This produced the output:

"8 -1"

These are clearly not the maximum or minimum values.

After looking at it for a few moments I recognized a classic JavaScript pitfall: failure to explicitly convert stringy numbers to actual number types.

You see, JavaScript tries to be clever. JavaScript tries to get it right. JavaScript tries to say β€œthe thing you are doing looks like something that you would do with numbers so I’m going to automatically convert these stringy numbers to number-numbers for you.”

The problem is that JavaScript is not clever; it is in fact very dumb about this. The further problem is when developers come to trust and rely on automatic conversion. Careers have been ruined that way.

In this case the naive programmer would say β€œWell, I’m comparing the things with a mathematical operator (< and >) so JavaScript should treat the values as numbers, right?” Wrong. JavaScript compares them alphabetically, not numerically. Except that even the β€œalphabetical” comparison kind of sucks but that’s another topic. JavaScript doesn’t even attempt to convert to numbers in this case.

Repeat after me:

Always explicitly convert stringy numbers to actual numbers even if the language claims to do it automatically.

I don’t care if it’s JavaScript, Perl, some fancy Python package, it doesn’t matter.

Do not trust automatic type conversion.

You will get it wrong. It will get it wrong. There will be tears.

Fixing this program is as simple as changing one line to explicitly convert the numbers from strings.

const myArr = numbers.split(" ").map(n => Number.parseInt(n, 10));

Number.parseInt(n, 10) is the β€œone true way” to turn a string-number into a number-number in JavaScript. Never omit the 10; it is technically optional but you will regret it if you omit it, trust me. If you are reading base 10 numbers, tell JavaScript so explicitly. Otherwise it will again try to be clever but be not-clever and probably screw up the conversion by guessing the wrong radix.

It’s good that the developer caught this error visually, also, because they did not include a unit test. Errors like this slip through the cracks all. the. time.

Even TypeScript would not catch this. This function is perfectly legal TypeScript. There’s nothing illegal about comparing strings with < or >. TypeScript could only catch this if the developer provided additional type information up front, for example:

TypeScript example of mismatched string and number types

Now that we’ve told the compiler β€œThis is a string” and β€œThis is a number”, now it can helpfully tell us β€œHey, you’re trying to mix strings and numbers in a not-good way”.

So it all comes back to the mantra of β€œAlways explicitly convert strings to numbers. Always.” And if you’re bothering to use TypeScript, go the extra step and actually tell it what the types are. Don’t make it guess; it might guess wrong. Explicit is better than implicit.

Some things never change.


programming javascript typescript

Kubernetes 101: Deploying a web application and database

Kevin Campusano

By Kevin Campusano
January 8, 2022

Groups of birds on a telephone pole

The DevOps world seems to have been taken over by Kubernetes during the past few years. And rightfully so, I believe, as it is a great piece of software that promises and delivers when it comes to managing deployments of complex systems.

Kubernetes is hard though. But it’s all good, I’m not a DevOps engineer. As a software developer, I shouldn’t care about any of that. Or should I? Well… Yes. I know that very well after being thrown head first into a project that heavily involves Kubernetes, without knowing the first thing about it.

Even if I wasn’t in the role of a DevOps engineer, as a software developer, I had to work with it in order to set up dev environments, troubleshoot system issues, and make sound design and architectural decisions.

After a healthy amount of struggle, I eventually gained some understanding on the subject. In this blog post I’ll share what I learned. My hope is to put out there the things I wish I knew when I first encountered and had to work with Kubernetes.

So, I’m going to introduce the basic concepts and building blocks of Kubernetes. Then, I’m going to walk you through the process of containerizing a sample application, developing all the Kubernetes configuration files necessary for deploying it into a Kubernetes cluster, and actually deploying it into a local development cluster. We will end up with an application and its associated database running completely on and being managed by Kubernetes.

In short: If you know nothing about Kubernetes, and are interested in learning, read on. This post is for you.

What is Kubernetes?

Simply put, Kubernetes (or K8s) is software for managing computer clusters. That is, groups of computers that are working together in order to process some workload or offer a service. Kubernetes does this by leveraging application containers. Kubernetes will help you out in automating the deployment, scaling, and management of containerized applications.

Once you’ve designed an application’s complete execution environment and associated components, using Kubernetes you can specify all that declaratively via configuration files. Then, you’ll be able to deploy that application with a single command. Once deployed, Kubernetes will give you tools to check on the health of your application, recover from issues, keep it running, scale it, etc.

There are a few basic concepts that we need to be familiar with in order to effectively work with Kubernetes. I think the official documentation does a great job in explaining this, but I’ll try to summarize.

Nodes, pods, and containers

First up are containers. If you’re interested in Kubernetes, chances are that you’ve already been exposed to some sort of container technology like Docker. If not, no worries. For our purposes here, we can think of a container as an isolated process with its own resources and file system in which an application can run.

A container has all the software dependencies that an application needs to run, including the application itself. From the application’s perspective, the container is its execution environment: the “machine” in which it’s running. In more practical terms, a container is a form of packaging, delivering, and executing an application. What’s the advantage? Instead of installing the application and its dependencies directly into the machine that’s going to run it, having it containerized allows for a container runtime (like Docker) to just run it as a self-contained unit. This makes it possible for the application to run anywhere that has the container runtime installed, with minimal configuration.

Something very closely related to containers is the concept of images. You can think of images as the blueprint for containers. An image is the spec, and the container is the instance that’s actually running.

When deploying applications into Kubernetes, this is how it runs them: via containers. In other words, for Kubernetes to be able to run an application, it needs to be delivered within a container.

Next is the concept of a node. This is very straightforward and not even specific to Kubernetes. A node is a computer within the cluster. That’s it. Like I said before, Kubernetes is built to manage computer clusters. A node is just one computer, either virtual or physical, within that cluster.

Then there are pods. Pods are the main executable units in Kubernetes. When we deploy an application or service into a Kubernetes cluster, it runs within a pod. Kubernetes works with containerized applications though, so it is the pods that take care of running said containers within them.

These three work very closely together within Kubernetes. To summarize: containers run within pods which in turn exist within nodes in the cluster.

There are other key components to talk about like deployments, services, replica sets, and persistent volumes. But I think that’s enough theory for now. We’ll learn more about all these as we get our hands dirty working though our example. So let’s get started with our demo and we’ll be discovering and discussing them organically as we go through it.

Installing and setting up Kubernetes

The first thing we need is a Kubernetes environment. There are many Kubernetes implementations out there. Google, Microsoft, and Amazon offer Kubernetes solutions on their respective cloud platforms, for example. There are also implementations that one can install and run on their own, like kind, minikube, and MicroK8s. We are going to use MicroK8s for our demo, for no particular reason other than “this is the one I know”.

When done installing, MicroK8s will have set up a whole Kubernetes cluster, with your machine as its one and only node.

Installing MicroK8s

So, if you’re in Ubuntu and have snapd, installing MicroK8s is easy. The official documentation explains it best. You install it with a command like this:

$ sudo snap install microk8s --classic --channel=1.21

MicroK8s will create a user group which is best to add your user account to so you can execute commands that would otherwise require admin privileges. You can do so with:

$ sudo usermod -a -G microk8s $USER
$ sudo chown -f -R $USER ~/.kube

With that, our very own Kubernetes cluster, courtesy of MicroK8s, should be ready to go. Check its status with:

$ microk8s status --wait-ready

You should see a “MicroK8s is running” message along with some specifications on your cluster. Including the available add-ons, which ones are enabled and which ones are disabled.

You can also shut down your cluster anytime with microk8s stop. Use microk8s start to bring it back up.

Introducing kubectl

MicroK8s also comes with kubectl. This is our gateway into Kubernetes, as this is the command line tool that we use to interact with it. By default, MicroK8s makes it so we can call it using microk8s kubectl .... That is, namespaced. This is useful if you have multiple Kubernetes implementations running at the same time, or another, separate kubectl. I don’t, so I like to create an alias for it, so that I can call it without having to use the microk8s prefix. You can do it like so:

$ sudo snap alias microk8s.kubectl kubectl

Now that all that’s done, we can start talking to our Kubernetes cluster. We can ask it for example to tell us which are the nodes in the cluster with this command:

$ kubectl get nodes

That will result in something like:

NAME     STATUS   ROLES    AGE   VERSION
pop-os   Ready    <none>   67d   v1.21.4-3+e5758f73ed2a04

The only node in the cluster is your own machine. In my case, my machine is called “pop-os” so that’s what shows up. You can get more information out of this command by using kubectl get nodes -o wide.

Installing add-ons

MicroK8s supports many add-ons that we can use to enhance our Kubernetes installation. We are going to need a few of them so let’s install them now. They are:

  1. The dashboard, which gives us a nice web GUI which serves as a window into our cluster. In there we can see everything that’s running, read logs, run commands, etc.
  2. dns, which sets up DNS for within the cluster. In general it’s a good idea to enable this one because other add-ons use it.
  3. storage, which allows the cluster to access the host machine’s disk for storage. The application that we will deploy needs a persistent database so we need this plugin to make it happen.
  4. registry, which sets up a container image registry that Kubernetes can access. Kubernetes runs containerized applications, containers are based on images. So, having this add-on allows us to define an image for our application and make it available to Kubernetes.

To install these, just run the following commands:

$ microk8s enable dashboard
$ microk8s enable dns
$ microk8s enable storage
$ microk8s enable registry

Those are all the add-ons that we’ll use.

Introducing the Dashboard

The dashboard is one we can play with right now. In order to access it, first run this:

$ microk8s dashboard-proxy

That will start up a proxy into the dashboard. The command will give you a URL and login token that you can use to access the dashboard. It results in an output like this:

Checking if Dashboard is running.
Dashboard will be available at https://127.0.0.1:10443
Use the following token to login:
<YOUR LOGIN TOKEN>

Now you can navigate to that URL in your browser and you’ll find a screen like this:

Dashboard login

Make sure the “Token” option is selected and take the login token generated by the microk8s dashboard-proxy command from before and paste it in the field in the page. Click the “Sign In” button and you’ll be able to see the dashboard, allowing you access to many aspects of your cluster. It should look like this:

Dashboard home

Feel free to play around with it a little bit. You don’t have to understand everything yet. As we work through our example, we’ll see how the dashboard and the other add-ons come into play.

There’s also a very useful command line tool called K9s, which helps in interacting with our cluster. We will not be discussing it further in this article but feel free to explore it if you need or want a command line alternative to the built-in dashboard.

Deploying applications into a Kubernetes cluster

With all that setup out of the way, we can start using our K8s cluster for what it was designed: running applications.

Deployments

Pods are very much the stars of the show when it comes to Kubernetes. However, most of the time we don’t create them directly. We usually do so through “deployments”. Deployments are a more abstract concept in Kubernetes. They basically control pods and make sure they behave as specified. You can think of them as wrappers for pods which make our lives easier than if we had to handle pods directly. Let’s go ahead and create a deployment so things will be clearer.

In Kubernetes, there are various ways of managing objects like deployments. For this post, I’m going to focus exclusively on the configuration-file-driven declarative approach as that’s the one better suited for real world scenarios.

You can learn more about the different ways of interacting with Kubernetes objects in the official documentation.

So, simply put, if we want to create a deployment, then we need to author a file that defines it. A simple deployment specification looks like this:

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

This example is taken straight from the official documentation.

Don’t worry if most of that doesn’t make sense at this point. I’ll explain it in detail later. First, let’s actually do something with it.

Save that in a new file. You can call it nginx-deployment.yaml. Once that’s done, you can actually create the deployment (and its associated objects) in your K8s cluster with this command:

$ kubectl apply -f nginx-deployment.yaml

Which should result in the following message:

deployment.apps/nginx-deployment created

And that’s it for creating deployments! (Or any other type of object in Kubernetes for that matter.) We define the object in a file and then invoke kubectl’s apply command. Pretty simple.

If you want to delete the deployment, then this command will do it:

$ kubectl delete -f nginx-deployment.yaml
deployment.apps "nginx-deployment" deleted

Using kubectl to explore a deployment

Now, let’s inspect our cluster to see what this command has done for us.

First, we can ask it directly for the deployment with:

$ kubectl get deployments

Which outputs:

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3/3     3            3           2m54s

Now you can see that the deployment that we just created is right there with the name that we gave it.

As I said earlier, deployments are used to manage pods, and that’s just what the READY, UP-TO-DATE and AVAILABLE columns allude to with those values of 3. This deployment has three pods because, in our YAML file, we specified we wanted three replicas with the replicas: 3 line. Each “replica” is a pod. For our example, that means that we will have three instances of NGINX running side by side.

We can see the pods that have been created for us with this command:

$ kubectl get pods

Which gives us something like this:

NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-66b6c48dd5-fs5rq   1/1     Running   0          55m
nginx-deployment-66b6c48dd5-xmnl2   1/1     Running   0          55m
nginx-deployment-66b6c48dd5-sfzxm   1/1     Running   0          55m

The exact names will vary, as the IDs are auto-generated. But as you can see, this command gives us some basic information about our pods. Remember that pods are the ones that actually run our workloads via containers. The READY field is particularly interesting in this sense because it tells us how many containers are running in the pods vs how many are supposed to run. So, 1/1 means that the pod has one container ready out of 1. In other words, the pod is fully ready.

Using the dashboard to explore a deployment

Like I said before, the dashboard offers us a window into our cluster. Let’s see how we can use it to see the information that we just saw via kubectl. Navigate into the dashboard via your browser and you should now see that some new things have appeared:

Dashboard home with deployment

We now have new “CPU Usage” and “Memory Usage” sections that give us insight into the utilization of our machine’s resources.

There’s also “Workload status” that has some nice graphs giving us a glance at the status of our deployments, pods, and replica sets.

Don’t worry too much about replica sets right now, as we seldom interact with them directly. Suffice it to say, replica sets are objects that deployments rely on to make sure that the number of specified replica pods is maintained. As always, there’s more info in the official documentation.

Scroll down a little bit more and you’ll find the “Deployments” and “Pods” sections, which contain the information that we’ve already seen via kubectl before.

Dashboard home: deployments and pods

Feel free to click around and explore the capabilities of the dashboard.

Dissecting the deployment configuration file

Now that we have a basic understanding of deployments and pods and how to create them, let’s look more closely into the configuration file that defines it. This is what we had:

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

This example is very simple, but it touches on the key aspects of deployment configuration. We will be building more complex deployments as we work through this article, but this is a great start. Let’s start at the top:

  • apiVersion: Under the hood, a Kubernetes cluster exposes its functionality via a REST API. We seldom interact with this API directly because we have kubectl that takes care of it for us. kubectl takes our commands, translates them into HTTP requests that the K8s REST API can understand, sends them, and gives us back the results. So, this apiVersion field specifies which version of the K8s REST API we are expecting to talk to.
  • kind: It represents the type of object that the configuration file defines. All objects in Kubernetes can be managed via YAML configuration files and kubectl apply. So, this field specifies which one we are managing at any given time.
  • metadata.name: Quite simply, the name of the object. It’s how we and Kubernetes refer to it.
  • metadata.labels: These help us further categorize cluster objects. These have no real effect in the system so they are useful for user help more than anything else.
  • spec: This contains the actual functional specification for the behavior of the deployment. More details below.
  • spec.replicas: The number of replica pods that the deployment should create. We already talked a bit about this before.
  • spec.selector.labels: This is one case when labels are actually important. Remember that when we create deployments, replica sets and pods are created with it. Within the K8s cluster, they each are their own individual objects though. This field is the mechanism that K8s uses to associate a given deployment with its replica set and pods. In practice, that means that whatever labels are in this field need to match the labels in spec.template.metadata.labels. More on that one below.
  • spec.template: Specifies the configuration of the pods that will be part of the deployment.
  • spec.template.metadata.labels: Very similar to metadata.labels. The only difference is that those labels are added to the deployment while these ones are added to the pods. The only notable thing is that these labels are key for the deployment to know which pods it should care about (as explained in above in spec.selector.labels).
  • spec.template.spec: This section specifies the actual functional configuration of the pods.
  • spec.template.spec.containers: This section specifies the configuration of the containers that will be running inside the pods. It’s an array so there can be many. In our example we have only one.
  • spec.template.spec.containers[0].name: The name of the container.
  • spec.template.spec.containers[0].image: The image that will be used to build the container.
  • spec.template.spec.containers[0].ports[0].containerPort: A port through which the container will accept traffic from the outside. In this case, 80.

You can find a detailed description of all the fields supported by deployment configuration files in the official API reference documentation. And much more!

Connecting to the containers in the pods

Kubernetes allows us to connect to the containers running inside a pod. This is pretty easy to do with kubectl. All we need to know is the name of the pod and the container that we want to connect to. If the pod is running only one container (like our NGINX one does) then we don’t need the container name. We can find out the names of our pods with:

$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-66b6c48dd5-85nwq   1/1     Running   0          25s
nginx-deployment-66b6c48dd5-x5b4x   1/1     Running   0          25s
nginx-deployment-66b6c48dd5-wvkhc   1/1     Running   0          25s

Pick one of those and we can open a bash session in it with:

$ kubectl exec -it nginx-deployment-66b6c48dd5-85nwq -- bash

Which results in a prompt like this:

root@nginx-deployment-66b6c48dd5-85nwq:/#

We’re now connected to the container in one of our NGINX pods. There isn’t a lot to do with this right now, but feel free to explore it. It’s got its own processes and file system which are isolated from the other replica pods and your actual machine.

We can also connect to containers via the dashboard. Go back to the dashboard in your browser, log in again if the session expired, and scroll down to the “Pods” section. Each pod in the list has an action menu with an “Exec” command. See it here:

Dashboard pod exec

Click it and you’ll be taken to a screen with a console just like the one we obtained via kubectl exec:

Dashboard pod bash

The dashboard is quite useful, right?

Services

So far, we’ve learned quite a bit about deployments. How to specify and create them, how to explore them via command line and the dashboard, how to interact with the pods, etc. We haven’t seen a very important part yet, though: actually accessing the application that has been deployed. That’s where services come in. We use services to expose an application running in a set of pods to the world outside the cluster.

Here’s what a configuration file for a service that exposes access to our NGINX deployment could look like:

# nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
    - name: "http"
      port: 80
      targetPort: 80
      nodePort: 30080

Same as with the deployment’s configuration file, this one also has a kind field that specifies what it is; and a name given to it via the metadata.name field. The spec section is where things get interesting.

  • spec.type specifies, well… The type of the service. Kubernetes supports many types of services. For now, we want a NodePort. This type of service makes sure to expose itself as a static port (given by spec.ports[0].nodePort) on every node in the cluster. In our setup, we only have one node, which is our own machine.
  • spec.ports defines which ports of the pods’ containers the service will expose.
  • spec.ports[0].name: The name of the port. To be used elsewhere to reference the specific port.
  • spec.ports[0].port: The port that will be exposed by the service.
  • spec.ports[0].targetPort: The port that the service will target in the container.
  • spec.ports[0].nodePort: The port that the service will expose in all the nodes of the cluster.

Same as with deployments, we can create such a service with the kubectl apply command. If you save the contents from the YAML above into a nginx-service.yaml file, you can run the following to create it:

$ kubectl apply -f nginx-service.yaml

And to inspect it and validate that it was in fact created:

$ kubectl get services
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes      ClusterIP   10.152.183.1    <none>        443/TCP        68d
nginx-service   NodePort    10.152.183.22   <none>        80:30080/TCP   27s

The dashboard also has a section for services. It looks like this:

Dashboard: services

Accessing an application via a service

We can access our service in a few different ways. We can use its “cluster IP” which we obtain from the output of the kubectl get services command. As given by the example above, that would be 10.152.183.22 in my case. Browsing to that IP gives us the familiar NGINX default welcome page:

NGINX via Cluster IP

Another way is by using the “NodePort”. Remember that the “NodePort” specifies the port in which the service will be available on every node of the cluster. With our current MicroK8s setup, our own machine is a node in the cluster, so we can also access the NGINX that’s running in our Kubernetes cluster using localhost:30080. 30080 is given by the spec.ports[0].nodePort field in the service configuration file from before. Try it out:

NGINX via NodePort

How cool is that? We have identical, replicated NGINX instances running in a Kubernetes cluster that’s installed locally in our machine.

Deploying our own custom application

Alright, by deploying NGINX, we’ve learned a lot about nodes, pods, deployments, services, and how they all work together to run and serve an application from a Kubernetes cluster. Now, let’s take all that knowledge and try to do the same for a completely custom application of our own.

What are we building

The application that we are going to deploy into our cluster is a simple one with only two components: a REST API written with .NET 5 and a Postgres database. You can find the source code in GitHub. It’s an API for supporting a hypothetical front end application for capturing used vehicle information and calculating their value in dollars.

If you’re interested in learning more about the process of actually writing that app, it’s all documented in another blog post: Building REST APIs with .NET 5, ASP.NET Core, and PostgreSQL.

If you’re following along, now would be a good time to download the source code of the web application that we’re going to be playing with. You can find it on GitHub. From now on, we’ll use that as the root directory of all the files we create and modify.

Also, be sure to delete or put aside the k8s directory. We’ll be building that throughout the rest of this post.

Deploying the database

Let’s begin with the Postgres database. Similar as before, we start by setting up a deployment with one pod and one container. We can do so with a deployment configuration YAML file like this:

# db-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vehicle-quotes-db
spec:
  selector:
    matchLabels:
      app: vehicle-quotes-db
  replicas: 1
  template:
    metadata:
      labels:
        app: vehicle-quotes-db
    spec:
      containers:
        - name: vehicle-quotes-db
          image: postgres:13
          ports:
            - containerPort: 5432
              name: "postgres"
          env:
            - name: POSTGRES_DB
              value: vehicle_quotes
            - name: POSTGRES_USER
              value: vehicle_quotes
            - name: POSTGRES_PASSWORD
              value: password
          resources:
            limits:
              memory: 4Gi
              cpu: "2"

This deployment configuration YAML file is similar to the one we used for NGINX before, but it introduces a few new elements:

  • spec.template.spec.containers[0].ports[0].name: We can give specific names to ports which we can reference later, elsewhere in the K8s configurations, which is what this field is for.
  • spec.template.spec.containers[0].env: This is a list of environment variables that will be defined in the container inside the pod. In this case, we’ve specified a few variables that are necessary to configure the Postgres instance that will be running. We’re using the official Postgres image from Dockerhub, and it calls for these variables. Their purpose is straightforward: they specify database name, username, and password.
  • spec.template.spec.containers[0].resources: This field defines the hardware resources that the container needs in order to function. We can specify upper limits with limits and lower ones with requests. You can learn more about resource management in the official documentation. In our case, we’ve kept it simple and used limits to prevent the container from using more than 4Gi of memory and 2 CPU cores.

Now, let’s save that YAML into a new file called db-deployment.yaml and run the following:

$ kubectl apply -f db-deployment.yaml

Which should output:

deployment.apps/vehicle-quotes-db created

After a few seconds, you should be able to see the new deployment and pod via kubectl:

$ kubectl get deployment -A
NAMESPACE            NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
...
default              vehicle-quotes-db           1/1     1            1           9m20s
$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
...
default              vehicle-quotes-db-5fb576778-gx7j6            1/1     Running   0          9m22s

Remember you can also see them in the dashboard:

Dashboard DB deployment and pod

Connecting to the database

Let’s try connecting to the Postgres instance that we just deployed. Take note of the pod’s name and try:

$ kubectl exec -it <DB_POD_NAME> -- bash

You’ll get a bash session on the container that’s running the database. For me, given the pod’s auto-generated name, it looks like this:

root@vehicle-quotes-db-5fb576778-gx7j6:/#

From here, you can connect to the database using the psql command line client. Remember that we told the Postgres instance to create a vehicle_quotes user. We set it up via the container environment variables on our deployment configuration. As a result, we can do psql -U vehicle_quotes to connect to the database. Put together, it all looks like this:

$ kubectl exec -it vehicle-quotes-db-5fb576778-gx7j6 -- bash
root@vehicle-quotes-db-5fb576778-gx7j6:/# psql -U vehicle_quotes
psql (13.3 (Debian 13.3-1.pgdg100+1))
Type "help" for help.

vehicle_quotes=# \l
                                            List of databases
      Name      |     Owner      | Encoding |  Collate   |   Ctype    |         Access privileges
----------------+----------------+----------+------------+------------+-----------------------------------
 postgres       | vehicle_quotes | UTF8     | en_US.utf8 | en_US.utf8 |
 template0      | vehicle_quotes | UTF8     | en_US.utf8 | en_US.utf8 | =c/vehicle_quotes                +
                |                |          |            |            | vehicle_quotes=CTc/vehicle_quotes
 template1      | vehicle_quotes | UTF8     | en_US.utf8 | en_US.utf8 | =c/vehicle_quotes                +
                |                |          |            |            | vehicle_quotes=CTc/vehicle_quotes
 vehicle_quotes | vehicle_quotes | UTF8     | en_US.utf8 | en_US.utf8 |
(4 rows)

Pretty cool, don’t you think? We have a database running on our cluster now with minimal effort. There’s a slight problem though…

Persistent volumes and claims

The problem in our database is that any changes are lost if the pod or container were to shut down or reset for some reason. This is because all the database files live inside the container’s file system, so when the container is gone, the data is also gone.

In Kubernetes, pods are supposed to be treated as ephemeral entities. The idea is that pods should easily be brought down and replaced by new pods and users and clients shouldn’t even notice. This is all Kubernetes working as expected. That is to say, pods should be as stateless as possible to work well with this behavior.

However, a database is, by definition, not stateless. So what we need to do to solve this problem is have some available disk space from outside the cluster that can be used by our database to store its files. Something persistent that won’t go away if the pod or container goes away. That’s where persistent volumes and persistent volume claims come in.

We will use a persistent volume (PV) to define a directory in our host machine that we will allow our Postgres container to use to store data files. Then, a persistent volume claim (PVC) is used to define a “request” for some of that available disk space that a specific container can make. In short, a persistent volume says to K8s “here’s some storage that the cluster can use” and a persistent volume claim says “here’s a portion of that storage that’s available for containers to use”.

Configuration files for the PV and PVC

Start by tearing down our currently broken Postgres deployment:

$ kubectl delete -f db-deployment.yaml

Now let’s add two new YAML configuration files. One is for the persistent volume:

# db-persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: vehicle-quotes-postgres-data-persisent-volume
  labels:
    type: local
spec:
  claimRef:
    namespace: default
    name: vehicle-quotes-postgres-data-persisent-volume-claim
  storageClassName: manual
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/home/kevin/projects/vehicle-quotes-postgres-data"

In this config file, we already know about the kind and metadata fields. A few of the other elements are interesting though:

  • spec.claimRef: Contains identifying information about the claim that’s associated with the PV. Used to bind the PVC with a specific PVC. Notice how it matches the name defined in the PVC config file from below.
  • spec.capacity.storage: Is pretty straightforward in that it specifies the size of the persistent volume.
  • spec.accessModes: Defines how the PV can be accessed. In this case, we’re using ReadWriteOnce so that it can only be used by a single node in the cluster which is allowed to read from and write into the PV.
  • spec.hostPath.path: Specifies the directory in the host machine’s file system where the PV will be mounted. Simply put, the containers in the cluster will have access to the specific directory defined here. I’ve used /home/kevin/projects/vehicle-quotes-postgres-data because that makes sense on my own machine. If you’re following along, make sure to set it to something that makes sense in your environment.

hostPath is just one type of persistent volume which works well for development deployments. Managed Kubernetes implementations like the ones from Google or Amazon have their own types which are more appropriate for production.

We also need another config file for the persistent volume claim:

# db-persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vehicle-quotes-postgres-data-persisent-volume-claim
spec:
  volumeName: vehicle-quotes-postgres-data-persisent-volume
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Like I said, PVCs are essentially usage requests for PVs. So, the config file is simple in that it’s mostly specified to match the PV.

  • spec.volumeName: The name of the PV that this PVC is going to access. Notice how it matches the name that we defined in the PV’s config file.
  • spec.resources.requests: Defines how much space this PVC requests from the PV. In this case, we’re just requesting all the space that the PV has available to it, as given by its config file: 5Gi.

Configuring the deployment to use the PVC

After saving those files, all that’s left is to update the database deployment configuration to use the PVC. Here’s what the updated config file would look like:

 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: vehicle-quotes-db
 spec:
   selector:
     matchLabels:
       app: vehicle-quotes-db
   replicas: 1
   template:
     metadata:
       labels:
         app: vehicle-quotes-db
     spec:
       containers:
         - name: vehicle-quotes-db
           image: postgres:13
           ports:
             - containerPort: 5432
               name: "postgres"
+          volumeMounts:
+            - mountPath: "/var/lib/postgresql/data"
+              name: vehicle-quotes-postgres-data-storage
           env:
             - name: POSTGRES_DB
               value: vehicle_quotes
             - name: POSTGRES_USER
               value: vehicle_quotes
             - name: POSTGRES_PASSWORD
               value: password
           resources:
             limits:
               memory: 4Gi
               cpu: "2"
+      volumes:
+        - name: vehicle-quotes-postgres-data-storage
+          persistentVolumeClaim:
+            claimName: vehicle-quotes-postgres-data-persisent-volume-claim

First, notice the volumes section at the bottom of the file. Here’s where we define the volume that will be available to the container, give it a name and specify which PVC it will use. The spec.template.volumes[0].persistentVolumeClaim.claimName needs to match the name of the PVC that we defined in db-persistent-volume-claim.yaml.

Then, up in the containers section, we define a volumeMounts element. We use that to specify which directory within the container will map to our PV. In this case, we’ve set the container’s /var/lib/postgresql/data directory to use the volume that we defined at the bottom of the file. That volume is backed by our persistent volume claim, which is in turn backed by our persistent volume. The significance of the /var/lib/postgresql/data directory is that this is where Postgres stores database files by default.

In summary: We created a persistent volume that defines some disk space in our machine that’s available to the cluster; then we defined a persistent volume claim that represents a request of some of that space that a container can have access to; after that we defined a volume within our pod configuration in our deployment to point to that persistent volume claim; and finally we defined a volume mount in our container that uses that volume to store the Postgres database files.

By setting it up this way, we’ve made it so that regardless of how many Postgres pods come and go, the database files will always be persisted, because the files now live outside of the container. They are stored in our host machine instead.

There’s another limitation that’s important to note. Just using the approach that we discussed, it’s not possible to deploy multiple replicas of Postgres which work in tandem and operate on the same data. Even though the data files can be defined outside of the cluster and persisted that way, there can only be one single Postgres instance running against it at any given time.

In production, the high availability problem is better solved leveraging the features provided by the database software itself. Postgres offers various options in that area. Or, if you are deploying to the cloud, the best strategy may be to use a relational database service managed by your cloud provider. Examples are Amazon’s RDS and Microsoft’s Azure SQL Database.

Applying changes

Now let’s see it in action. Run the following three commands to create the objects:

$ kubectl apply -f db-persistent-volume.yaml
$ kubectl apply -f db-persistent-volume-claim.yaml
$ kubectl apply -f db-deployment.yaml

After a while, they will show up in the dashboard. You already know how to look for deployments and pods. For persistent volumes, click the “Persistent Volumes” option under the “Cluster” section in the sidebar:

Dashboard persistent volume

Persistent volume claims can be found in the “Persistent Volume Claims” option under the “Config and Storage” section in the sidebar:

Dashboard persistent volume claim

Now, try connecting to the database (using kubectl exec -it <VEHICLE_QUOTES_DB_POD_NAME> -- bash and then psql -U vehicle_quotes) and creating some tables. Something simple like this would work:

CREATE TABLE test (test_field varchar);

Now, close psql and the bash in the pod and delete the objects:

$ kubectl delete -f db-deployment.yaml
$ kubectl delete -f db-persistent-volume-claim.yaml
$ kubectl delete -f db-persistent-volume.yaml

Create them again:

$ kubectl apply -f db-persistent-volume.yaml
$ kubectl apply -f db-persistent-volume-claim.yaml
$ kubectl apply -f db-deployment.yaml

Connect to the database again and you should see that the table is still there:

vehicle_quotes=# \c vehicle_quotes
You are now connected to database "vehicle_quotes" as user "vehicle_quotes".
vehicle_quotes=# \dt
           List of relations
 Schema | Name | Type  |     Owner
--------+------+-------+----------------
 public | test | table | vehicle_quotes
(1 row)

That’s just what we wanted: the database is persisting independently of what happens to the pods and containers.

Exposing the database as a service

Lastly, we need to expose the database as a service so that the rest of the cluster can access it without having to use explicit pod names. We don’t need this for our testing, but we do need it for later when we deploy our web app, so that it can reach the database. As you’ve seen, services are easy to create. Here’s the YAML config file:

# db-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: vehicle-quotes-db-service
spec:
  type: NodePort
  selector:
    app: vehicle-quotes-db
  ports:
    - name: "postgres"
      protocol: TCP
      port: 5432
      targetPort: 5432
      nodePort: 30432

Save that into a new db-service.yaml file and don’t forget to kubectl apply -f db-service.yaml.

Deploying the web application

Now that we’ve got the database sorted out, let’s turn our attention to the app itself. As you’ve seen, Kubernetes runs apps as containers. That means that we need images to build those containers. A custom web application is no exception. We need to build a custom image that contains our application so that it can be deployed into Kubernetes.

Building the web application image

The first step for building a container image is writing a Dockerfile. Since our application is a Web API built using .NET 5, I’m going to use a slightly modified version of the Dockerfile used by Visual Studio Code’s development container demo for .NET. These development containers are excellent for, well… development. You can see the original in the link above, but here’s mine:

# [Choice] .NET version: 5.0, 3.1, 2.1
ARG VARIANT="5.0"
FROM mcr.microsoft.com/vscode/devcontainers/dotnet:0-${VARIANT}

# [Option] Install Node.js
ARG INSTALL_NODE="false"

# [Option] Install Azure CLI
ARG INSTALL_AZURE_CLI="false"

# Install additional OS packages.
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
    && apt-get -y install --no-install-recommends postgresql-client-common postgresql-client

# Run the remaining commands as the "vscode" user
USER vscode

# Install EF and code generator development tools
RUN dotnet tool install --global dotnet-ef
RUN dotnet tool install --global dotnet-aspnet-codegenerator
RUN echo 'export PATH="$PATH:/home/vscode/.dotnet/tools"' >> /home/vscode/.bashrc

WORKDIR /app

# Prevent the container from closing automatically
ENTRYPOINT ["tail", "-f", "/dev/null"]

There are many other development containers for other languages and frameworks. Take a look at the microsoft/vscode-dev-containers GitHub repo to learn more.

An interesting thing about this Dockerfile is that we install the psql command line client so that we can connect to our Postgres database from within the web application container. The rest is stuff specific to .NET and the particular image we’re basing this Dockerfile on, so don’t sweat it too much.

If you’ve downloaded the source code, this Dockerfile should already be there as Dockerfile.dev.

Making the image accessible to Kubernetes

Now that we have a Dockerfile, we can use it to build an image that Kubernetes is able to use to build a container to deploy. So that Kubernetes can see it, we need to build it in a specific way and push it into a registry that’s accessible to Kubernetes. Remember how we ran microk8s enable registry to install the registry add-on when we were setting up MicroK8s? That will pay off now, as that’s the registry to which we’ll push our image.

First, we build the image:

$ docker build . -f Dockerfile.dev -t localhost:32000/vehicle-quotes-dev:registry

That will take some time to download and set up everything. Once that’s done, we push the image to the registry:

$ docker push localhost:32000/vehicle-quotes-dev:registry

That will also take a little while.

Deploying the web application

The next step is to create a deployment for the web app. Like usual, we start with a deployment YAML configuration file. Let’s call it web-deployment.yaml:

# web-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vehicle-quotes-web
spec:
  selector:
    matchLabels:
      app: vehicle-quotes-web
  replicas: 1
  template:
    metadata:
      labels:
        app: vehicle-quotes-web
    spec:
      containers:
        - name: vehicle-quotes-web
          image: localhost:32000/vehicle-quotes-dev:registry
          ports:
            - containerPort: 5000
              name: "http"
            - containerPort: 5001
              name: "https"
          volumeMounts:
            - mountPath: "/app"
              name: vehicle-quotes-source-code-storage
          env:
            - name: POSTGRES_DB
              value: vehicle_quotes
            - name: POSTGRES_USER
              value: vehicle_quotes
            - name: POSTGRES_PASSWORD
              value: password
            - name: CUSTOMCONNSTR_VehicleQuotesContext
              value: Host=$(VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST);Database=$(POSTGRES_DB);Username=$(POSTGRES_USER);Password=$(POSTGRES_PASSWORD)
          resources:
            limits:
              memory: 2Gi
              cpu: "1"
      volumes:
        - name: vehicle-quotes-source-code-storage
          persistentVolumeClaim:
            claimName: vehicle-quotes-source-code-persisent-volume-claim

This deployment configuration should look very familiar to you by now as it is very similar to the ones we’ve already seen. There are a few notable elements though:

  • Notice how we specified localhost:32000/vehicle-quotes-dev:registry as the container image. This is the exact same name of the image that we built and pushed into the registry before.
  • In the environment variables section, the one named CUSTOMCONNSTR_VehicleQuotesContext is interesting for a couple of reasons:
    • First, the value is a Postgres connection string being built off of other environment variables using the following format: $(ENV_VAR_NAME). That’s a neat feature of Kubernetes config files that allows us to reference variables to build other ones.
    • Second, the VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST environment variable used within that connection string is not defined anywhere in our configuration files. That’s an automatic environment variable that Kubernetes injects on all containers when there are services available. In this case, it contains the hostname of the vehicle-quotes-db-service that we created a few sections ago. The automatic injection of this *_SERVICE_HOST variable always happens as long as the service is already created by the time that the pod gets created. We have already created the service so we should be fine using the variable here. As usual, there’s more info in the official documentation.

As you may have noticed, this deployment has a persistent volume. That’s to store the application’s source code. Or, more accurately, to make the source code, which lives in our machine, available to the container. This is a development setup after all, so we want to be able to edit the code from the comfort of our own file system, and have the container inside the cluster be aware of that.

Anyway, let’s create the associated persistent volume and persistent volume claim. Here’s the PV (save it as web-persistent-volume.yaml):

# web-persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: vehicle-quotes-source-code-persisent-volume
  labels:
    type: local
spec:
  claimRef:
    namespace: default
    name: vehicle-quotes-source-code-persisent-volume-claim
  storageClassName: manual
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/home/kevin/projects/vehicle-quotes"

And here’s the PVC (save it as web-persistent-volume-claim.yaml):

# web-persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vehicle-quotes-source-code-persisent-volume-claim
spec:
  volumeName: vehicle-quotes-source-code-persisent-volume
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

The only notable element here is the PV’s hostPath. I have it pointing to the path where I downloaded the app’s source code from GitHub. Make sure to do the same on your end.

Finally, tie it all up with a service that will expose the development build of our REST API. Here’s the config file:

# web-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: vehicle-quotes-web-service
spec:
  type: NodePort
  selector:
    app: vehicle-quotes-web
  ports:
    - name: "http"
      protocol: TCP
      port: 5000
      targetPort: 5000
      nodePort: 30000
    - name: "https"
      protocol: TCP
      port: 5001
      targetPort: 5001
      nodePort: 30001

Should be pretty self-explanatory at this point. In this case, we expose two ports, one for HTTP and another for HTTPS. Our .NET 5 Web API works with both so that’s why we specify them here. This configuration says that the service should expose port 30000 and send traffic that comes into that port from the outside world into port 5000 on the container. Likewise, outside traffic coming to port 30001 will be sent to port 5001 in the container.

Save that file as web-service.yaml and we’re ready to apply the changes:

$ kubectl apply -f web-persistent-volume.yaml
$ kubectl apply -f web-persistent-volume-claim.yaml
$ kubectl apply -f web-deployment.yaml
$ kubectl apply -f web-service.yaml

Feel free to explore the dashboard’s “Deployments”, “Pods”, “Services”, “Persistent Volumes”, and “Persistent Volume Claims” sections to see the fruits of our labor.

Starting the application

Let’s now do some final setup and start up our application. Start by connecting to the web application pod:

$ kubectl exec -it vehicle-quotes-web-86cbc65c7f-5cpg8 -- bash

Remember that the pod name will be different for you, so copy it from the dashboard or kubectl get pods -A.

You’ll get a prompt like this:

vscode ➜ /app (master βœ—) $

Try ls to see all of the app’s source code files courtesy of the PV that we set up before:

vscode ➜ /app (master βœ—) $ ls
Controllers     Dockerfile.prod  Models      README.md       Startup.cs            appsettings.Development.json  k8s      queries.sql
Data            K8S_README.md    Program.cs  ResourceModels  Validations           appsettings.json              k8s_wip
Dockerfile.dev  Migrations       Properties  Services        VehicleQuotes.csproj  database.dbml                 obj

Now it’s just a few .NET commands to get the app up and running. First, compile and download packages:

$ dotnet build

That will take a while. Once done, let’s build the database schema:

$ dotnet ef database update

And finally, run the development web server:

$ dotnet run

If you get the error message “System.InvalidOperationException: Unable to configure HTTPS endpoint.” while trying dotnet run, follow the error message’s instructions and run dotnet dev-certs https --trust. This will generate a development certificate so that the dev server can serve HTTPS.

As a result, you should see this:

vscode ➜ /app (master βœ—) $ dotnet run
Building...
info: Microsoft.Hosting.Lifetime[0]
      Now listening on: https://0.0.0.0:5001
info: Microsoft.Hosting.Lifetime[0]
      Now listening on: http://0.0.0.0:5000
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Development
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app

It indicates that the application is up and running. Now, navigate to http://localhost:30000 in your browser of choice and you should see our REST API’s Swagger UI:

Swagger!

Notice that 30000 is the port we specified in the web-service.yaml’s nodePort for the http port. That’s the port that the service exposes to the world outside the cluster. Notice also how our .NET web app’s development server listens to traffic coming from ports 5000 and 5001 for HTTP and HTTPS respectively. That’s why we configured web-service.yaml like we did.

Outstanding! All our hard work has paid off and we have a full-fledged web application running in our Kubernetes cluster. This is quite a momentous occasion. We’ve built a custom image that can be used to create containers to run a .NET web application, pushed that image into our local registry so that K8s could use it, and deployed a functioning application. As a cherry on top, we made it so the source code is super easy to edit, as it lives within our own machine’s file system and the container in the cluster accesses it directly from there. Quite an accomplishment.

Now it’s time to go the extra mile and organize things a bit. Let’s talk about Kustomize next.

Putting it all together with Kustomize

Kustomize is a tool that helps us improve Kubernetes’ declarative object management with configuration files (which is what we’ve been doing throughout this post). Kustomize has useful features that help with better organizing configuration files, managing configuration variables, and support for deployment variants (for things like dev vs. test vs. prod environments). Let’s explore what Kustomize has to offer.

First, be sure to tear down all the objects that we have created so far as we will be replacing them later once we have a setup with Kustomize. This will work for that:

$ kubectl delete -f db-service.yaml
$ kubectl delete -f db-deployment.yaml
$ kubectl delete -f db-persistent-volume-claim.yaml
$ kubectl delete -f db-persistent-volume.yaml

$ kubectl delete -f web-service.yaml
$ kubectl delete -f web-deployment.yaml
$ kubectl delete -f web-persistent-volume-claim.yaml
$ kubectl delete -f web-persistent-volume.yaml

Next, let’s reorganize our db-* and web-* YAML files like this:

k8s
β”œβ”€β”€ db
β”‚   β”œβ”€β”€ db-deployment.yaml
β”‚   β”œβ”€β”€ db-persistent-volume-claim.yaml
β”‚   β”œβ”€β”€ db-persistent-volume.yaml
β”‚   └── db-service.yaml
└── web
    β”œβ”€β”€ web-deployment.yaml
    β”œβ”€β”€ web-persistent-volume-claim.yaml
    β”œβ”€β”€ web-persistent-volume.yaml
    └── web-service.yaml

As you can see, we’ve put them all inside a new k8s directory, and further divided them into db and web sub-directories. web-* files went into the web directory and db-* files went into db. At this point, the prefixes on the files are a bit redundant so we can remove them. After all, we know what component they belong to because of the name of their respective sub-directories.

There’s already a k8s directory in the repo. Feel free to get rid of it as we will build it back up from scratch now.

So it should end up looking like this:

k8s
β”œβ”€β”€ db
β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”œβ”€β”€ persistent-volume-claim.yaml
β”‚   β”œβ”€β”€ persistent-volume.yaml
β”‚   └── service.yaml
└── web
    β”œβ”€β”€ deployment.yaml
    β”œβ”€β”€ persistent-volume-claim.yaml
    β”œβ”€β”€ persistent-volume.yaml
    └── service.yaml

kubectl’s apply and delete commands support directories as well, not only individual files. That means that, at this point, to build up all of our objects you could simply do kubectl apply -f k8s/db and kubectl apply -f k8s/web. This is much better than what we’ve been doing until now where we had to specify every single file. Still, with Kustomize, we can do better than that…

The Kustomization file

We can bring everything together with a kustomization.yaml file. For our setup, here’s what it could look like:

# k8s/kustomization.yaml
kind: Kustomization

resources:
  - db/persistent-volume.yaml
  - db/persistent-volume-claim.yaml
  - db/service.yaml
  - db/deployment.yaml
  - web/persistent-volume.yaml
  - web/persistent-volume-claim.yaml
  - web/service.yaml
  - web/deployment.yaml

This first iteration of the Kustomization file is simple. It just lists all of our other config files in the resources section in their relative locations. Save that as k8s/kustomization.yaml and you can apply it with the following:

$ kubectl apply -k k8s

The -k option tells kubectl apply to look for a Kustomization within the given directory and use that to build the cluster objects. After running it, you should see familiar output:

service/vehicle-quotes-db-service created
service/vehicle-quotes-web-service created
persistentvolume/vehicle-quotes-postgres-data-persisent-volume created
persistentvolume/vehicle-quotes-source-code-persisent-volume created
persistentvolumeclaim/vehicle-quotes-postgres-data-persisent-volume-claim created
persistentvolumeclaim/vehicle-quotes-source-code-persisent-volume-claim created
deployment.apps/vehicle-quotes-db created
deployment.apps/vehicle-quotes-web created

Feel free to explore the dashboard or kubectl get commands to see the objects that got created. You can connect to pods, run the app, query the database, everything. Just like we did before. The only difference is that now everything is neatly organized and there’s a single file that serves as bootstrap for the whole setup. All thanks to Kustomize and the -k option.

kubectl delete -k k8s can be used to tear everything down.

Defining reusable configuration values with ConfigMaps

Another useful feature of Kustomize is ConfigMaps. These allow us to specify configuration variables in the Kustomization and use them throughout the rest of the resource config files. A good candidate to demonstrate their use are the environment variables that configure our Postgres database and the connection string in our web application.

We’re going to make changes to the config so be sure to tear everything down with kubectl delete -k k8s.

We can start by adding the following to the kustomization.yaml file:

 # k8s/kustomization.yaml
 kind: Kustomization

 resources:
   - db/persistent-volume.yaml
   - db/persistent-volume-claim.yaml
   - db/service.yaml
   - db/deployment.yaml
   - web/persistent-volume.yaml
   - web/persistent-volume-claim.yaml
   - web/service.yaml
   - web/deployment.yaml

+configMapGenerator:
+  - name: postgres-config
+    literals:
+      - POSTGRES_DB=vehicle_quotes
+      - POSTGRES_USER=vehicle_quotes
+      - POSTGRES_PASSWORD=password

The configMapGenerator section is where the magic happens. We’ve kept it simple and defined the variables as literals. configMapGenerator is much more flexible than that though, accepting external configuration files. The official documentation has more details.

Now, let’s see what we have to do to actually use those values in our configuration.

First up is the database deployment configuration file, k8s/db/deployment.yaml. Update its env section like so:

# k8s/db/deployment.yaml
# ...
env:
-  - name: POSTGRES_DB
-    value: vehicle_quotes
-  - name: POSTGRES_USER
-    value: vehicle_quotes
-  - name: POSTGRES_PASSWORD
-    value: password
+  - name: POSTGRES_DB
+    valueFrom:
+      configMapKeyRef:
+        name: postgres-config
+        key: POSTGRES_DB
+  - name: POSTGRES_USER
+    valueFrom:
+      configMapKeyRef:
+        name: postgres-config
+        key: POSTGRES_USER
+  - name: POSTGRES_PASSWORD
+    valueFrom:
+      configMapKeyRef:
+        name: postgres-config
+        key: POSTGRES_PASSWORD
# ...

Notice how we’ve replaced the simple key-value pairs with new, more complex objects. Their names are still the same, they have to be because that’s what the Postgres database container expects. But instead of a literal, hard-coded value, we have changed them to these valueFrom.configMapKeyRef objects. Their names match the name of the configMapGenerator we configured in the Kustomization. Their keys match the keys of the literal values that we specified in the configMapGenerator’s literals field. That’s how it all ties together.

Similarly, we can update the web application deployment configuration file, k8s/web/deployment.yaml. Its env section would look like this:

# k8s/web/deployment.yaml
# ...
env:
-  - name: POSTGRES_DB
-    value: vehicle_quotes
-  - name: POSTGRES_USER
-    value: vehicle_quotes
-  - name: POSTGRES_PASSWORD
-    value: password
+  - name: POSTGRES_DB
+    valueFrom:
+      configMapKeyRef:
+        name: postgres-config
+        key: POSTGRES_DB
+  - name: POSTGRES_USER
+    valueFrom:
+      configMapKeyRef:
+        name: postgres-config
+        key: POSTGRES_USER
+  - name: POSTGRES_PASSWORD
+    valueFrom:
+      configMapKeyRef:
+        name: postgres-config
+        key: POSTGRES_PASSWORD
  - name: CUSTOMCONNSTR_VehicleQuotesContext
    value: Host=$(VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST);Database=$(POSTGRES_DB);Username=$(POSTGRES_USER);Password=$(POSTGRES_PASSWORD)
# ...

This is the exact same change as with the database deployment. Out with the hard coded values and in with the new ConfigMap-driven ones.

Try kubectl apply -k k8s and you’ll see that things are still working well. Try to connect to the web application pod and build and run the app.

For data that’s important to secure like passwords, tokens, and keys, Kubernetes and Kustomize also offer Secrets and secretGenerator. Secrets are very similar to ConfigMaps in how they work, but are tailored specifically for handling secret data. You can learn more about them in the official documentation.

Creating variants for production and development environments

The crowning achievement of Kustomize is its ability to facilitate multiple deployment variants. Variants, as the name suggests, are variations of deployment configurations that are ideal for setting up various execution environments for an application. Think development, staging, production, etc., all based on a common set of reusable configurations to avoid superfluous repetition.

Kustomize does this by introducing the concepts of bases and overlays. A base is a set of configs that can be reused but not deployed on its own, and overlays are the actual configurations that use and extend the base and can be deployed.

To demonstrate this, let’s build two variants: one for development and another for production. Let’s consider the one we’ve already built to be the development variant and work towards properly specifying it as so, and then building a new production variant.

Note that the so-called “production” variant we’ll build is not actually meant to be production worthy. It’s just an example to illustrate the concepts and process of building bases and overlays. It does not meet the rigors of a proper production system.

Creating the base and overlays

The strategy I like to use is to just copy everything over from one variant to another, implement the differences, identify the common elements, and extract them into a base that both use.

Let’s begin by creating a new k8s/dev directory and move all of our YAML files into it. That will be our “development overlay”. Then, make a copy the k8s/dev directory and all of its contents and call it k8s/prod. That will be our “production overlay”. Let’s also create a k8s/base directory to store the common files. That will be our “base”. It should be like this:

k8s
β”œβ”€β”€ base
β”œβ”€β”€ dev
β”‚   β”œβ”€β”€ kustomization.yaml
β”‚   β”œβ”€β”€ db
β”‚   β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”‚   β”œβ”€β”€ persistent-volume-claim.yaml
β”‚   β”‚   β”œβ”€β”€ persistent-volume.yaml
β”‚   β”‚   └── service.yaml
β”‚   └── web
β”‚       β”œβ”€β”€ deployment.yaml
β”‚       β”œβ”€β”€ persistent-volume-claim.yaml
β”‚       β”œβ”€β”€ persistent-volume.yaml
β”‚       └── service.yaml
└── prod
    β”œβ”€β”€ kustomization.yaml
    β”œβ”€β”€ db
    β”‚   β”œβ”€β”€ deployment.yaml
    β”‚   β”œβ”€β”€ persistent-volume-claim.yaml
    β”‚   β”œβ”€β”€ persistent-volume.yaml
    β”‚   └── service.yaml
    └── web
        β”œβ”€β”€ deployment.yaml
        β”œβ”€β”€ persistent-volume-claim.yaml
        β”œβ”€β”€ persistent-volume.yaml
        └── service.yaml

Now we have two variants, but they don’t do us any good because they aren’t any different. We’ll now go through each file one by one and identify which aspects need to be the same and which need to be different between our two variants:

  1. db/deployment.yaml: I want the same database instance configuration for both our variants. So we copy the file into base/db/deployment.yaml and delete dev/db/deployment.yaml and prod/db/deployment.yaml.
  2. db/persistent-volume-claim.yaml: This one is also the same for both variants. So we copy the file into base/db/persistent-volume-claim.yaml and delete dev/db/persistent-volume-claim.yaml and prod/db/persistent-volume-claim.yaml.
  3. db/persistent-volume.yaml: This file defines the location in the host machine that will be available for the Postgres instance that’s running in the cluster to store its data files. I do want this path to be different between variants. So let’s leave them where they are and do the following changes to them: For dev/db/persistent-volume.yaml, change its spec.hostPath.path to "/path/to/vehicle-quotes-postgres-data-dev". For prod/db/persistent-volume.yaml, change its spec.hostPath.path to "/path/to/vehicle-quotes-postgres-data-prod". Of course, adjust the paths to something that makes sense in your environment.
  4. db/service.yaml: There doesn’t need to be any difference in this file between the variants so we copy it into base/db/service.yaml and delete dev/db/service.yaml and prod/db/service.yaml.
  5. web/deployment.yaml: There are going to be quite a few differences between the dev and prod deployments of the web application. So we leave them as they are. Later we’ll see the differences in detail.
  6. web/persistent-volume-claim.yaml: This is also going to be different. Let’s leave it be now and we’ll come back to it later.
  7. web/persistent-volume.yaml: Same as web/persistent-volume-claim.yaml. Leave it be for now.
  8. wev/service.yaml: This one is going to be the same for both dev and prod so let’s do the usual and copy it into base/web/service.yaml and remove dev/web/service.yaml and prod/web/service.yaml

The decisions made when designing these overlays and the base may seem arbitrary. That’s because they totally are. The purpose of this article is to demonstrate Kustomize’s features, not produce a real-world, production-worthy setup.

Once all those changes are done, you should have the following file structure:

k8s
β”œβ”€β”€ base
β”‚   β”œβ”€β”€ db
β”‚   β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”‚   β”œβ”€β”€ persistent-volume-claim.yaml
β”‚   β”‚   └── service.yaml
β”‚   └── web
β”‚       └── service.yaml
β”œβ”€β”€ dev
β”‚   β”œβ”€β”€ kustomization.yaml
β”‚   β”œβ”€β”€ db
β”‚   β”‚   └── persistent-volume.yaml
β”‚   └── web
β”‚       β”œβ”€β”€ deployment.yaml
β”‚       β”œβ”€β”€ persistent-volume-claim.yaml
β”‚       └── persistent-volume.yaml
└── prod
    β”œβ”€β”€ kustomization.yaml
    β”œβ”€β”€ db
    β”‚   └── persistent-volume.yaml
    └── web
        β”œβ”€β”€ deployment.yaml
        β”œβ”€β”€ persistent-volume-claim.yaml
        └── persistent-volume.yaml

Much better, huh? We’ve gotten rid of quite a bit of repetition. But we’re not done just yet. The base also needs a Kustomization file. Let’s create it as k8s/base/kustomization.yaml and add these contents:

# k8s/base/kustomization.yaml
kind: Kustomization

resources:
  - db/persistent-volume-claim.yaml
  - db/service.yaml
  - db/deployment.yaml
  - web/service.yaml

configMapGenerator:
  - name: postgres-config
    literals:
      - POSTGRES_DB=vehicle_quotes
      - POSTGRES_USER=vehicle_quotes
      - POSTGRES_PASSWORD=password

As you can see, the file is very similar to the other one we created. We just list the resources that we moved into the base directory and define the database environment variables via the configMapGenerator. We need to define the configMapGenerator here because we’ve moved all the other files that use them into here.

Now that we have the base defined, we need to update the kustomization.yaml files of the overlays to use it. We also need to update them so that they only point to the resources that they need to.

Here’s how the changes to the “dev” overlay’s kustomization.yaml file look:

 # dev/kustomization.yaml
 kind: Kustomization

+bases:
+  - ../base

 resources:
   - db/persistent-volume.yaml
-  - db/persistent-volume-claim.yaml
-  - db/service.yaml
-  - db/deployment.yaml
   - web/persistent-volume.yaml
   - web/persistent-volume-claim.yaml
-  - web/service.yaml
   - web/deployment.yaml

-configMapGenerator:
-  - name: postgres-config
-    literals:
-      - POSTGRES_DB=vehicle_quotes
-      - POSTGRES_USER=vehicle_quotes
-      - POSTGRES_PASSWORD=password

As you can see we removed the configMapGenerator and the individual resources that were already defined in the base. Most importantly, we’ve added a bases element that indicates that our Kustomization over on the base directory is this overlay’s base.

The changes to the “prod” overlay’s kustomization.yaml file are identical. Go ahead and make them.

At this point, you can run kubectl apply -k k8s/dev or kubectl apply -k k8s/prod and things should work just like before.

Don’t forget to also do kubectl delete -k k8s/dev or kubectl delete -k k8s/prod when you’re done testing the previous commands, as we’ll continue doing changes to the configs. Keep in mind also that both variants can’t be deployed at the same time. So be sure delete one before applying the other.

Developing the production variant

I want our production variant to use a different image for the web application. That means a new Dockerfile. If you downloaded the source code from the GitHub repo, you should see the production Dockerfile in the root directory of the repo. It’s called Dockerfile.prod.

Here’s what it looks like:

# Dockerfile.prod
ARG VARIANT="5.0"
FROM mcr.microsoft.com/dotnet/sdk:${VARIANT}

RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
    && apt-get -y install --no-install-recommends postgresql-client-common postgresql-client

RUN dotnet tool install --global dotnet-ef
ENV PATH $PATH:/root/.dotnet/tools

RUN dotnet dev-certs https

WORKDIR /source

COPY . .

ENTRYPOINT ["tail", "-f", "/dev/null"]

The first takeaway from this production Dockerfile is that it is simpler, when compared to the development one. The image here is based on the official dotnet/sdk instead of the dev-ready one from vscode/devcontainers/dotnet. Also, this Dockerfile just copies all the source code into a /source directory within the image. This is because we want to “ship” the image with everything it needs to work without too much manual intervention. Also, we won’t be editing code live on the container, as opposed to how we set up the dev variant which allowed that. So we just copy the files over instead of leaving them out to provision them later via volumes. We’ll see how that pans out later.

Now that we have our production Dockerfile, we can build an image with it and push it to the registry so that Kubernetes can use it. So, save that file as Dockerfile.prod (or just use the one that’s already in the repo), and run the following commands:

Build the image with:

$ docker build . -f Dockerfile.prod -t localhost:32000/vehicle-quotes-prod:registry

And push it to the registry with:

$ docker push localhost:32000/vehicle-quotes-prod:registry

Now, we need to modify our prod variant’s deployment configuration so that it can work well with this new prod image. Here’s how the new k8s/prod/web/deployment.yaml should look:

# k8s/prod/web/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vehicle-quotes-web
spec:
  selector:
    matchLabels:
      app: vehicle-quotes-web
  replicas: 1
  template:
    metadata:
      labels:
        app: vehicle-quotes-web
    spec:
      initContainers:
        - name: await-db-ready
          image: postgres:13
          command: ["/bin/sh"]
          args: ["-c", "until pg_isready -h $(VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST) -p 5432; do echo waiting for database; sleep 2; done;"]
        - name: build
          image: localhost:32000/vehicle-quotes-dev:registry
          workingDir: "/source"
          command: ["/bin/sh"]
          args: ["-c", "dotnet restore -v n && dotnet ef database update && dotnet publish -c release -o /app --no-restore"]
          volumeMounts:
            - mountPath: "/app"
              name: vehicle-quotes-source-code-storage
          env:
            - name: POSTGRES_DB
              valueFrom:
                configMapKeyRef:
                  name: postgres-config
                  key: POSTGRES_DB
            - name: POSTGRES_USER
              valueFrom:
                configMapKeyRef:
                  name: postgres-config
                  key: POSTGRES_USER
            - name: POSTGRES_PASSWORD
              valueFrom:
                configMapKeyRef:
                  name: postgres-config
                  key: POSTGRES_PASSWORD
            - name: CUSTOMCONNSTR_VehicleQuotesContext
              value: Host=$(VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST);Database=$(POSTGRES_DB);Username=$(POSTGRES_USER);Password=$(POSTGRES_PASSWORD)
      containers:
        - name: vehicle-quotes-web
          image: localhost:32000/vehicle-quotes-dev:registry
          workingDir: "/app"
          command: ["/bin/sh"]
          args: ["-c", "dotnet VehicleQuotes.dll --urls=https://0.0.0.0:5001/"]
          ports:
            - containerPort: 5001
              name: "https"
          volumeMounts:
            - mountPath: "/app"
              name: vehicle-quotes-source-code-storage
          env:
            - name: POSTGRES_DB
              valueFrom:
                configMapKeyRef:
                  name: postgres-config
                  key: POSTGRES_DB
            - name: POSTGRES_USER
              valueFrom:
                configMapKeyRef:
                  name: postgres-config
                  key: POSTGRES_USER
            - name: POSTGRES_PASSWORD
              valueFrom:
                configMapKeyRef:
                  name: postgres-config
                  key: POSTGRES_PASSWORD
            - name: CUSTOMCONNSTR_VehicleQuotesContext
              value: Host=$(VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST);Database=$(POSTGRES_DB);Username=$(POSTGRES_USER);Password=$(POSTGRES_PASSWORD)
          resources:
            limits:
              memory: 2Gi
              cpu: "1"
      volumes:
        - name: vehicle-quotes-source-code-storage
          emptyDir: {}

This deployment config is similar to the one from the dev variant, but we’ve changed a few elements on it.

Init containers

The most notable change is that we added an initContainers section. Init Containers are one-and-done containers that run specific processes during pod initialization. They are good for doing any sort of initialization tasks that need to be run once, before a pod is ready to work. After they’ve done their task, they go away, and the pod is left with the containers specified in the containers section, like usual. In this case, we’ve added two init containers.

First is the await-db-ready one. This is a simple container based on the postgres:13 image that just sits there waiting for the database to become available. This is thanks to its command and args, which make up a simple shell script that leverages the pg_isready tool to continuously check if connections can be made to our database:

command: ["/bin/sh"]
args: ["-c", "until pg_isready -h $(VEHICLE_QUOTES_DB_SERVICE_SERVICE_HOST) -p 5432; do echo waiting for database; sleep 2; done;"]

This will cause pod initialization to stop until the database is ready.

Thanks to this blog post for the very useful recipe.

We need to wait for the database to be ready before continuing because of what the second init container does. Among other things, the build init container sets up the database. The database needs to be available for it to be able to to do that. The init container also downloads dependencies, builds the app, produces the deployable artifacts and copies them over to the directory from which the app will run: /app. You can see that all that is specified in the command and args elements, which define a few shell commands to do those tasks.

command: ["/bin/sh"]
args: ["-c", "dotnet restore -v n && dotnet ef database update && dotnet publish -c release -o /app --no-restore"]

Another interesting aspect of this deployment is the volume that we’ve defined. It’s at the bottom of the file, take a quick look:

volumes:
  - name: vehicle-quotes-source-code-storage
    emptyDir: {}

This one is different from the ones we’ve seen before which relied on persistent volumes and persistent volume claims. This one uses emptyDir. This means that this volume will provide storage that is persistent throughout the lifetime of the pod. That is, not tied to any specific container. In other words, even when the container goes away, the files in this volume will stay. This is a mechanism that’s useful when we want one container to produce some files that another container will use. In our case, the build init container produces the artifacts/binaries that the main vehicle-quotes-web container will use to actually run the web app.

The only other notable difference of this deployment is how its containers use the new prod image that we built before, instead of the dev one. That is, it uses localhost:32000/vehicle-quotes-prod:registry instead of localhost:32000/vehicle-quotes-dev:registry.

The rest of the deployment doesn’t have anything we haven’t already seen. Feel free to explore it.

As you saw, this prod variant doesn’t need to access the source code via a persistent volume. So, we don’t need PV and PVC definitions for it. So feel free to delete k8s/prod/web/persistent-volume.yaml and k8s/prod/web/persistent-volume-claim.yaml. Remember to also remove them from the resources section in k8s/prod/kustomization.yaml.

With those changes done, we can fire up our prod variants with:

$ kubectl apply -k k8s/prod

The web pod will take quite a while to properly start up because it’s downloading a lot of dependencies. Remember that you can use kubectl get pods -A to see their current status. Take note of their names and you would also be able to see container specific logs.

  • Use kubectl logs -f <vehicle-quotes-web-pod-name> await-db-ready to see the logs from the await-db-ready init container.
  • Use kubectl logs -f <vehicle-quotes-web-pod-name> build to see the logs from the build init container.

If this were an actual production setup, and we were worried about pod startup time, there’s one way we could make it faster. We could perform the “download dependencies” step when building the production image instead of when deploying the pods. So, we could have our Dockerfile.prod call dotnet restore -v n, instead of the build init container. That way building the image would take more time, but it would have all dependencies already baked in by the time Kubernetes tries to use it to build containers. Then the web pod would start up much faster.

This deployment automatically starts the web app, so after the pods are in the “Running” status (or green in the dashboard!), we can just navigate to the app via a web browser. We’ve configured the deployment to only work over HTTPS (as given by the ports section in the vehicle-quotes-web container), so this is the only URL that’s available to us: https://localhost:30001. We can navigate to it and see the familiar screen:

SwaggerUI on prod

At this point, we finally have fully working, distinct variants. However, we can take our configuration a few steps further by leveraging some additional Kustomize features.

Using patches for small, precise changes

Right now, the persistent volume configurations for the databases of both variants are pretty much identical. The only difference is the hostPath. With patches, we can focus in on that property and vary it specifically.

To do it, we first copy either of the variant’s db/persistent-volume.yaml into k8s/base/db/persistent-volume.yaml. We also need to add it under resources on k8s/base/kustomization.yaml:

 # k8s/base/kustomization.yaml
 # ...
 resources:
+  - db/persistent-volume.yaml
   - db/persistent-volume-claim.yaml
 # ...

That will serve as the common ground for overlays to “patch over”. Now we can create the patches. First, the one for the one for the dev variant:

# k8s/dev/db/persistent-volume-host-path.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: vehicle-quotes-postgres-data-persisent-volume
spec:
  hostPath:
    path: "/home/kevin/projects/vehicle-quotes-postgres-data-dev"

And then the one for the prod variant:

# k8s/prod/db/persistent-volume-host-path.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: vehicle-quotes-postgres-data-persisent-volume
spec:
  hostPath:
    path: "/home/kevin/projects/vehicle-quotes-postgres-data-prod"

As you can see, these patches are sort of truncated persistent volume configs which only include the kind, metadata.name, and the value that actually changes: the hostPath.

Once those are saved, we need to include them in their respective kustomization.yaml. It’s the same modification to both k8s/dev/kustomization.yaml and k8s/prod/kustomization.yaml. Just remove the db/persistent-volume.yaml item from their resources sections and add the following to both of them:

patches:
  - db/persistent-volume-host-path.yaml

Right now, k8s/dev/kustomization.yaml should be:

# k8s/dev/kustomization.yaml
kind: Kustomization

bases:
  - ../base

resources:
  - web/persistent-volume.yaml
  - web/persistent-volume-claim.yaml
  - web/deployment.yaml

patches:
  - db/persistent-volume-host-path.yaml

And k8s/prod/kustomization.yaml should be:

# k8s/prod/kustomization.yaml
kind: Kustomization

bases:
  - ../base

resources:
  - web/deployment.yaml

patches:
  - db/persistent-volume-host-path.yaml

Overriding container images

Another improvement we can make is to use the images element in the kustomization.yaml files to control the web app images used by the deployments in the variants. This is easier for maintenance as it’s defined in one single, expected place. Also, the full name of the image doesn’t have to be used throughout the configs so it reduces repetition.

To put it in practice, add the following at the end of the k8s/dev/kustomization.yaml file:

images:
  - name: vehicle-quotes-web
    newName: localhost:32000/vehicle-quotes-dev
    newTag: registry

Similar thing with k8s/prod/kustomization.yaml, only use the prod image for this one:

images:
  - name: vehicle-quotes-web
    newName: localhost:32000/vehicle-quotes-prod
    newTag: registry

Now, we can replace any mention of localhost:32000/vehicle-quotes-dev in the dev variant, and any mention of localhost:32000/vehicle-quotes-prod in the prod variant with vehicle-quotes-web. Which is simpler.

In k8s/dev/web/deployment.yaml:

 # ...
     spec:
       containers:
         - name: vehicle-quotes-web
-          image: localhost:32000/vehicle-quotes-dev:registry
+          image: vehicle-quotes-web
           ports:
 # ...

And in k8s/prod/web/deployment.yaml:

 # ...

         - name: build
-          image: localhost:32000/vehicle-quotes-prod:registry
+          image: vehicle-quotes-web
           workingDir: "/source"
           command: ["/bin/sh"]
 # ...

       containers:
         - name: vehicle-quotes-web
-          image: localhost:32000/vehicle-quotes-prod:registry
+          image: vehicle-quotes-web
           workingDir: "/app"
 # ...

Once that’s all done, you should be able to kubectl apply -k k8s/dev or kubectl apply -k k8s/prod and everything should work fine. Be sure to kubectl delete before kubectl applying a different variant though, as both of them cannot coexist in the same cluster, due to many objects having the same name.

Closing thoughts

Wow! That was a good one. In this post I’ve captured all the knowledge that I wish I had when I first encountered Kubernetes. We went from knowing nothing to being able to put together a competent environment. We figured out how to install Kubernetes locally via MicroK8s, along with a few useful add-ons. We learned about the main concepts in Kubernetes like nodes, pods, images, containers, deployments, services, and persistent volumes. Most importantly, we learned how to define and create them using a declarative configuration file approach.

Then, we learned about Kustomize and how to use it to implement variants of our configurations. And we did all that by actually getting our hands dirty and, step by step, deploying a real web application and its backing database system. When all was said and done, a simple kubeclt apply -k <kustomization> was all it took to get the app up and running fully. Not bad, eh?

Useful commands

  • Start up MicroK8s: microk8s start
  • Shut down MicroK8s: microk8s stop
  • Check MicroK8s info: microk8s status
  • Start up the K8s dashboard: microk8s dashboard-proxy
  • Get available pods on all namespaces: kubectl get pods -A
  • Watch and follow the logs on a specific container in a pod: kubectl logs -f <POD_NAME> <CONTAINER_NAME>
  • Open a shell into the default container in a pod: kubectl exec -it <POD_NAME> -- bash
  • Create a K8s resource given a YAML file or directory: kubectl apply -f <FILE_OR_DIRECTORY_NAME>
  • Delete a K8s resource given a YAML file or directory: kubectl delete -f <FILE_OR_DIRECTORY_NAME>
  • Create K8s resources with Kustomize: kubectl apply -k <KUSTOMIZATION_DIR>
  • Delete K8s resources with Kustomize: kubectl delete -k <KUSTOMIZATION_DIR>
  • Build custom images for the K8s registry: docker build . -f <DOCKERFILE> -t localhost:32000/<IMAGE_NAME>:registry
  • Push custom images to the K8s registry: docker push localhost:32000/<IMAGE_NAME>:registry

Table of contents


kubernetes docker dotnet postgres

Fixing a PostgreSQL cluster that has no superuser

Jon Jensen

By Jon Jensen
January 7, 2022

Stone building with arched windows, a tower, steps leading up, and lush lawn, flowers, and trees

Normally in a newly-created PostgreSQL database cluster there is a single all-powerful administrative role (user) with “superuser” privileges, conventionally named postgres, though it can have any name.

After the initial cluster setup you can create other roles as needed. You may optionally grant one or more of your new roles the superuser privilege, but it is best to avoid granting superuser to any other roles because you and your applications should generally connect as roles with lower privilege to reduce the risk of accidental or malicious damage to your database.

Let’s break something 😎

Imagine you have a cluster with two or more superuser roles. If you accidentally remove superuser privilege from one role, you can simply connect as the other superuser and re-grant it.

But if you have a cluster where only the single postgres role is a superuser, what happens if you connect as that role and try to remove its superuser privilege?

$ psql -U postgres postgres
psql (14.1)
Type "help" for help.

postgres=# \du
                       List of roles
 Role name |            Attributes             | Member of
-----------+-----------------------------------+-----------
 postgres  | Superuser, Create role, Create DB | {}
 somebody  | Create DB                         | {}

postgres=> \conninfo
You are connected to database "postgres" as user "postgres" via socket in "/tmp" at port "5432".
postgres=# ALTER ROLE postgres NOSUPERUSER;
ALTER ROLE
postgres=# \du
                 List of roles
 Role name |       Attributes       | Member of
-----------+------------------------+-----------
 postgres  | Create role, Create DB | {}
 somebody  | Create DB              | {}

postgres=# ALTER ROLE postgres SUPERUSER;
ERROR:  must be superuser to alter superuser roles or change superuser attribute
postgres=# \q

PostgreSQL happily lets us do that, and now we have no superuser, and so we cannot re-grant the privilege to that role or any other!

Homebrew PostgreSQL problem

Aside from such a severe operator error, there are other situations where you may find no superuser exists. One happened to me recently while experimenting with PostgreSQL installed by Homebrew on macOS.

I used Homebrew to install postgresql@14 and later noticed that it left me with a single role named after my OS user, and it was not a superuser. It couldn’t even create other roles. I’m not sure how that happened, perhaps somehow caused by an earlier installation of postgresql on the same computer, but so it was.

Since there wasn’t any data in there yet, I could have simply deleted the existing PostgreSQL cluster and created a new one. But in other circumstances there could have been data in there that I needed to preserve, which wasn’t accessible to my one less-privileged user, and which caused errors in pg_dumpall.

So how can we solve this problem the right way?

First, stop the server

We need to get lower-level access to our database. To do that, first we stop the running database server.

On a typical modern Linux server running systemd, that looks like:

# systemctl stop postgresql-14

On macOS using Homebrew services, that could be:

$ brew services stop postgresql

Or in my experimental case with Homebrew using a temporary Postgres server which I’m showing here:

$ pg_ctl -D /opt/homebrew/var/postgresql@14 stop
waiting for server to shut down.... done
server stopped

Next, start the PostgreSQL stand-alone backend

Next we start the “stand-alone backend” which a single user can interact with directly, not using a separate client.

No privilege checks are done here, so we can re-grant the superuser privilege to our postgres role.

Interestingly, SQL statements entered here end with a newline, no ; needed, though adding one doesn’t hurt. And statements here can’t span multiple lines without being continued with \ at the end of each intermediate line.

In the command below, note that the --single option must come first, and the postgres at the end of the command is the name of the database we want to connect to:

$ postgres --single -D /opt/homebrew/var/postgresql@14 postgres

PostgreSQL stand-alone backend 14.1
backend> ALTER ROLE postgres SUPERUSER
2022-01-07 20:32:51.321 MST [27246] LOG:  statement: ALTER ROLE postgres SUPERUSER

2022-01-07 20:32:51.322 MST [27246] LOG:  duration: 1.242 ms
backend>

The postgres server stand-alone prompt does not have all the niceties of psql including line editing features, history, and backslash metacommands such as \q to quit, so we type control-D there to mark “end of file” on our input stream and exit the program.

Back to normal

Now we can again start the normal multi-user client/​server PostgreSQL service:

$ pg_ctl -D /opt/homebrew/var/postgresql@14 start
waiting for server to start.... done
server started

And finally we can connect to the server and confirm our change persisted:

$ psql -U postgres postgres
psql (14.1)
Type "help" for help.

postgres=# \du
                       List of roles
 Role name |            Attributes             | Member of
-----------+-----------------------------------+-----------
 postgres  | Superuser, Create role, Create DB | {}
 somebody  | Create DB                         | {}

postgres=#

Reference


postgres security tips

Setting up SSH in Visual Studio Code

Couragyn Chretien

By Couragyn Chretien
January 6, 2022

View of Grand Canyon

Visual Studio Code is a powerful code editor that can create a customized IDE for your development. VS Code’s default configuration is great for working locally but lacks the functionality to give the same experience for remote SSH development. Enter the extension Remote SSH.

Installation

Remote SSH in Visual Studio Code Marketplace

Installing the Remote SSH extension is really easy! First you access the Extension Marketplace with Ctrl+Shift+X or by clicking View > Extensions in the menu, then you just search for and select Remote - SSH.

Setting up your SSH config file

To configure your connection, you’ll need to add a few lines to your SSH config. Click the green Open a Remote Window icon on the bottom left corner:

Open Remote Window

Select Open SSH Configuration File... and select the config file you want to use. I use the Linux default, /home/$USER/.ssh/config. Add the Host, HostName, and User as required and save:

Host MySite
  HostName site.endpointdev.com
  User couragyn

Connecting

Click the green Open a Remote Window icon on the bottom left corner, select Connect to Host..., and pick your desired host, in this case MySite. If your public SSH key isn’t on the remote server, you will be prompted to enter a password. If your key is on the server, it will state it has your fingerprint and prompt you to continue.

You’re now connected and can use VS Code’s features like Terminal and Debug Console just like you would locally.

Opening the working directory

Wouldn’t it be nice to have VS Code automatically open to the correct folder once your SSH connection is established? Unfortunately there isn’t a way to set a folder location in the settings yet; you’d need to click Open Folder and navigate to the project root every time you connect.

There is, however, a workaround to make this a bit less tedious:

  • Click Open Folder
  • Navigate to the project root
  • Click File > Save Workplace As...
  • Save your .code-workspace file somewhere it won’t be picked up by Git

Now open your workspace again with a new connection. If the workspace was recently used, you can use File > Open Recent > $Workspace.code-workspace; otherwise go to File > Open Workspace... and select your .code-workspace file. This should get you set up right in the correct directory after you’ve connected.

SSH with multiple hops

Sometimes you will need to SSH into one location before tunneling into another. To connect to a remote host through an intermediate jump host, you will need to add ForwardAgent and ProxyJump to the config file, like this:

Host MySite
  HostName site.endpointdev.com
  User couragyn
  ForwardAgent yes

Host SiteThatNeedsToGoThroughMySite
  HostName completely.different.com
  User couragyn
  ProxyJump MySite

Happy remote development!


ssh tips visual-studio-code

Word diff: Git as wdiff alternative

Jon Jensen

By Jon Jensen
January 5, 2022

5 drinking fountains mounted on a wall at varying levels

The diff utility to show differences between two files was created in 1974 as part of Unix. It has been incredibly useful and popular ever since, and hasn’t changed much since 1991 when it gained the ability to output the now standard “unified context diff” format.

The comparison diff makes is per line, so if anything on a given line changes, in unified context format we can tell that the previous version of that line was removed by seeing - at the beginning of the old line, and the following line will start with + followed by the new version.

For example see this Dockerfile that had two lines changed:

$ diff -u Dockerfile.old Dockerfile
--- Dockerfile.old	2022-01-05 22:16:21 -0700
+++ Dockerfile	2022-01-05 23:08:55 -0700
@@ -2,7 +2,7 @@
 
 WORKDIR /usr/src/app
 
-# Bundle app source
+# Bundle entire source
 COPY . .
 
-RUN /usr/src/app/test.sh
+RUN /usr/src/app/start.sh

That works well for visually reviewing changes to many types of files that developers typically work with.

It can also serve as input to the patch program, which dates to 1985 and is still in wide use as a counterpart to diff. With patch we can apply changes to a file and avoid the need to send an entire new file or apply changes by hand (which is very prone to error).

But let’s leave that aside and focus on humans reading diff output.

Diffing paragraphs

For a file containing paragraphs of prose each on their own long lines, it can look like the lines change completely when we change only a few words. This is often the case with HTML destined to be displayed in a web browser, email text, or the Markdown source for this blog post itself.

Consider this file with one long line of sample text gathered from pangrams and typing exercises:

$ cat -n paragraph.txt 
     1	The quick brown fox jumped over the lazy dog's back 1234567890 times. Now is the time for all good men to come to the aid of the party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly quick daft zebras jump! The five boxing wizards jump quickly. Jackdaws love my big sphinx of quartz. Pack my box with five dozen liquor jugs.

If we change even one character of that, diff will show that its one line changed, but we have to painstakingly visually scan the entire long line to determine what exactly changed and where. That is not very useful.

We can wrap, or split up, the lines into multiple lines of a maximum of 75 characters with the classic Unix tool fmt:

$ fmt paragraph.txt | tee wrapped.txt
The quick brown fox jumped over the lazy dog's back 1234567890
times. Now is the time for all good men to come to the aid of the
party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
quick daft zebras jump! The five boxing wizards jump quickly.
Jackdaws love my big sphinx of quartz. Pack my box with five dozen
liquor jugs.

With those shorter lines, changes will be easier to see than with one long line, but it is still hard to pick out small changes:

$ diff -u wrapped.txt wrapped2.txt         
--- wrapped.txt	2022-01-05 23:22:46 -0700
+++ wrapped2.txt	2022-01-05 23:38:14 -0700
@@ -1,7 +1,7 @@
 The quick brown fox jumped over the lazy dog's back 1234567890
-times. Now is the time for all good men to come to the aid of the
+times. Now is thy time for all good men to come to the aid of the
 party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
 to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
-quick daft zebras jump! The five boxing wizards jump quickly.
+quick daft zebras jump! Tho five boxing wizards jump quickly.
 Jackdaws love my big sphinx of quartz. Pack my box with five dozen
 liquor jugs.

And much worse, every line changes when we make a significant edit early in the text and reflow the paragraph to fit our maximum line length. Here is what happens after adding “an amazing count of” in the first line and re-wrapping the lines:

diff -u wrapped.txt wrapped3.txt
--- wrapped.txt	2022-01-05 23:22:46 -0700
+++ wrapped3.txt	2022-01-06 08:05:33 -0700
@@ -1,7 +1,7 @@
-The quick brown fox jumped over the lazy dog's back 1234567890
-times. Now is the time for all good men to come to the aid of the
-party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
-to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
-quick daft zebras jump! The five boxing wizards jump quickly.
-Jackdaws love my big sphinx of quartz. Pack my box with five dozen
-liquor jugs.
+The quick brown fox jumped over the lazy dog's back an amazing count
+of 1234567890 times. Now is thy time for all good men to come to
+the aid of the party. Waltz, bad nymph, for quick jigs vex. Glib
+jocks quiz nymph to vex dwarf. Sphinx of black quartz, judge my
+vow. How vexingly quick daft zebras jump! Tho five boxing wizards
+jump quickly.  Jackdaws love my big sphinx of quartz. Pack my box
+with five dozen liquor jugs.

That gives no aid to a human proofreader!

Word diff to the rescue

In 1992 François Pinard wrote the word-based diff program wdiff which is now part of the GNU project. It solves this problem.

Here is how it shows us our example changing two words:

$ wdiff wrapped.txt wrapped2.txt   
The quick brown fox jumped over the lazy dog's back 1234567890
times. Now is [-the-] {+thy+} time for all good men to come to the aid of the
party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
quick daft zebras jump! [-The-] {+Tho+} five boxing wizards jump quickly.
Jackdaws love my big sphinx of quartz. Pack my box with five dozen
liquor jugs.

Words removed are by default marked with [-…-] and words added with {+…+}.

It even knows how to accommodate word changes appearing on different lines! Trying it out on our example with the reflowed paragraph:

$ wdiff wrapped.txt wrapped3.txt
The quick brown fox jumped over the lazy dog's back {+an amazing count
of+} 1234567890 times. Now is [-the-] {+thy+} time for all good men to come to
the aid of the party. Waltz, bad nymph, for quick jigs vex. Glib
jocks quiz nymph to vex dwarf. Sphinx of black quartz, judge my
vow. How vexingly quick daft zebras jump! [-The-] {+Tho+} five boxing wizards
jump quickly.  Jackdaws love my big sphinx of quartz. Pack my box
with five dozen liquor jugs.

So this is very nice, although wdiff often isn’t available by default on the various systems we find ourselves on, and it is perhaps a bit worrisome that the wdiff software has not been updated since 2014.

Too bad this word-diffing feature is not part of standard diff!

A familiar friend

That’s ok because you probably already have a wdiff alternative available on your computer: Git! More specifically, git diff --word-diff.

Maybe you already use that feature when working with your local clones of Git repositories, to look at what changed in the commit history or local edits. Did you know that git diff can act as a complete replacement of the standalone diff tool? Yes, git diff can also compare two arbitrary files that are not part of a Git repository when given the --no-index option!

And Git can usually tell you mean --no-index without you typing that explicitly because you’re comparing at least one file that is not tracked in a Git clone, so you can just type:

$ git diff --word-diff <path1> <path2>

for any two file paths and it will work.

Trying this out with our sample paragraph:

$ git diff --word-diff wrapped.txt wrapped2.txt
diff --git a/wrapped.txt b/wrapped2.txt
index b1c5775..59ff315 100644
--- a/wrapped.txt
+++ b/wrapped2.txt
@@ -1,7 +1,7 @@
The quick brown fox jumped over the lazy dog's back 1234567890
times. Now is [-the-]{+thy+} time for all good men to come to the aid of the
party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
quick daft zebras jump! [-The-]{+Tho+} five boxing wizards jump quickly.
Jackdaws love my big sphinx of quartz. Pack my box with five dozen
liquor jugs.

It uses the same word deletion and insertion markers as wdiff, but to make them easier for our eyes to spot, by default git diff also shows them in different colors when output is going to an interactive terminal. You can disable the coloring with the additional option --color=never.

Use git diff --word-diff=color for a pretty view using only color to show the changes, without the [-…-] and {+…+} markers. This may be more readable when your input text is full of punctuation confusingly similar to the markers, and is useful if you want to copy from the terminal without any extra surrounding characters:

$ git diff --word-diff=color wrapped.txt wrapped2.txt
diff --git a/wrapped.txt b/wrapped2.txt
index b1c5775..59ff315 100644
--- a/wrapped.txt
+++ b/wrapped2.txt
@@ -1,7 +1,7 @@
The quick brown fox jumped over the lazy dog's back 1234567890
times. Now is thethy time for all good men to come to the aid of the
party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
quick daft zebras jump! TheTho five boxing wizards jump quickly.
Jackdaws love my big sphinx of quartz. Pack my box with five dozen
liquor jugs.

There is also the option git diff --word-diff=porcelain for an ugly but more easily code-parseable format useful for output sent as input to scripts:

$ git diff --word-diff=porcelain wrapped.txt wrapped2.txt
diff --git a/wrapped.txt b/wrapped2.txt
index b1c5775..59ff315 100644
--- a/wrapped.txt
+++ b/wrapped2.txt
@@ -1,7 +1,7 @@
 The quick brown fox jumped over the lazy dog's back 1234567890
~
 times. Now is 
-the
+thy
  time for all good men to come to the aid of the
~
 party. Waltz, bad nymph, for quick jigs vex. Glib jocks quiz nymph
~
 to vex dwarf. Sphinx of black quartz, judge my vow. How vexingly
~
 quick daft zebras jump! 
-The
+Tho
  five boxing wizards jump quickly.
~
 Jackdaws love my big sphinx of quartz. Pack my box with five dozen
~
 liquor jugs.
~

I have never needed that yet, but it is good to be aware of in case I ever do need to parse word diff output, to make it easier and more reliable.

Customize word break definition

Other kinds of files can present challenges for readability in diff output.

For example consider trying to see small changes in the classic Unix /etc/passwd text “database” which has one user record per line, and within each record line uses : to delimit fields.

First we’ll try traditional line diff:

$ git diff passwd passwd.mangled
diff --git a/passwd b/passwd.mangled
index 981736c..6531f10 100644
--- a/passwd
+++ b/passwd.mangled
@@ -24,22 +24,22 @@ polkitd:x:996:991:User for polkitd:/:/sbin/nologin
 rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
 pulse:x:171:171:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
 chrony:x:995:988::/var/lib/chrony:/sbin/nologin
-abrt:x:173:173::/etc/abrt:/sbin/nologin
+abrt:x:173:1730::/etc/abrt:/sbin/nologin
 colord:x:994:987:User for colord:/var/lib/colord:/sbin/nologin
 rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
 sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
 vboxadd:x:993:1::/var/run/vboxadd:/sbin/nologin
 dnsmasq:x:985:985:Dnsmasq DHCP and DNS server:/var/lib/dnsmasq:/sbin/nologin
-tcpdump:x:72:72::/:/sbin/nologin
+tcpdump:x:72:72::/:/bin/bash
 systemd-timesync:x:984:984:systemd Time Synchronization:/:/sbin/nologin
 pipewire:x:983:983:PipeWire System Daemon:/var/run/pipewire:/sbin/nologin
 gluster:x:982:982:GlusterFS daemons:/run/gluster:/sbin/nologin
-radvd:x:75:75:radvd user:/:/sbin/nologin
-saslauth:x:981:76:Saslauthd user:/run/saslauthd:/sbin/nologin
+radvd:x:76:75:radvd user:/:/sbin/nologin
+saslauth:x:981:76:Saslauthd user:/ran/saslauthd:/sbin/nologin
 usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
 setroubleshoot:x:980:979::/var/lib/setroubleshoot:/sbin/nologin
 openvpn:x:979:978:OpenVPN:/etc/openvpn:/sbin/nologin
-nm-openvpn:x:978:977:Default user for running openvpn spawned by NetworkManager:/:/sbin/nologin
+mm-openvpn:x:978:977:Default user for running openvpn spawned by NetworkManager:/:/sbin/nologin
 qemu:x:107:107:qemu user:/:/sbin/nologin
 gdm:x:42:42::/var/lib/gdm:/sbin/nologin
 apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

It’s not too hard to “eyeball” changes there if they add or remove characters and thus affect the line lengths. But a line with only a change to a single character isn’t as easy.

Since blank space is not the relevant separator in this file, standard word diff doesn’t help and in some cases is worse than line diff:

$ git diff --word-diff passwd passwd.mangled
diff --git a/passwd b/passwd.mangled
index 981736c..6531f10 100644
--- a/passwd
+++ b/passwd.mangled
@@ -24,22 +24,22 @@ polkitd:x:996:991:User for polkitd:/:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
pulse:x:171:171:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
chrony:x:995:988::/var/lib/chrony:/sbin/nologin
[-abrt:x:173:173::/etc/abrt:/sbin/nologin-]{+abrt:x:173:1730::/etc/abrt:/sbin/nologin+}
colord:x:994:987:User for colord:/var/lib/colord:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
vboxadd:x:993:1::/var/run/vboxadd:/sbin/nologin
dnsmasq:x:985:985:Dnsmasq DHCP and DNS server:/var/lib/dnsmasq:/sbin/nologin
[-tcpdump:x:72:72::/:/sbin/nologin-]{+tcpdump:x:72:72::/:/bin/bash+}
systemd-timesync:x:984:984:systemd Time Synchronization:/:/sbin/nologin
pipewire:x:983:983:PipeWire System Daemon:/var/run/pipewire:/sbin/nologin
gluster:x:982:982:GlusterFS daemons:/run/gluster:/sbin/nologin
[-radvd:x:75:75:radvd-]{+radvd:x:76:75:radvd+} user:/:/sbin/nologin
saslauth:x:981:76:Saslauthd [-user:/run/saslauthd:/sbin/nologin-]{+user:/ran/saslauthd:/sbin/nologin+}
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
setroubleshoot:x:980:979::/var/lib/setroubleshoot:/sbin/nologin
openvpn:x:979:978:OpenVPN:/etc/openvpn:/sbin/nologin
[-nm-openvpn:x:978:977:Default-]{+mm-openvpn:x:978:977:Default+} user for running openvpn spawned by NetworkManager:/:/sbin/nologin
qemu:x:107:107:qemu user:/:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

Another venerable program similar to wdiff that is still maintained is dwdiff. In its self-description we read something intriguing:

It is different from wdiff in that it allows the user to specify what should be considered whitespace …

That sounds useful. But dwdiff is still a separate program and is even less common than wdiff. Can the versatile git diff help us here too?

Yes! git diff has the option --word-diff-regex to specify a regular expression to use instead of whitespace as a delimiter, like dwdiff does. The man page explanation notes:

For example, --word-diff-regex=. will treat each character as a word and, correspondingly, show differences character by character.

It also notes that --word-diff is assumed and can be omitted when using --word-diff-regex.

So let’s try that:

$ git diff --word-diff-regex=. passwd passwd.mangled
diff --git a/passwd b/passwd.mangled
index 981736c..6531f10 100644
--- a/passwd
+++ b/passwd.mangled
@@ -24,22 +24,22 @@ polkitd:x:996:991:User for polkitd:/:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
pulse:x:171:171:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
chrony:x:995:988::/var/lib/chrony:/sbin/nologin
abrt:x:173:173{+0+}::/etc/abrt:/sbin/nologin
colord:x:994:987:User for colord:/var/lib/colord:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
vboxadd:x:993:1::/var/run/vboxadd:/sbin/nologin
dnsmasq:x:985:985:Dnsmasq DHCP and DNS server:/var/lib/dnsmasq:/sbin/nologin
tcpdump:x:72:72::/:/[-s-]bin/[-nologin-]{+bash+}
systemd-timesync:x:984:984:systemd Time Synchronization:/:/sbin/nologin
pipewire:x:983:983:PipeWire System Daemon:/var/run/pipewire:/sbin/nologin
gluster:x:982:982:GlusterFS daemons:/run/gluster:/sbin/nologin
radvd:x:7[-5-]{+6+}:75:radvd user:/:/sbin/nologin
saslauth:x:981:76:Saslauthd user:/r[-u-]{+a+}n/saslauthd:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
setroubleshoot:x:980:979::/var/lib/setroubleshoot:/sbin/nologin
openvpn:x:979:978:OpenVPN:/etc/openvpn:/sbin/nologin
[-n-]{+m+}m-openvpn:x:978:977:Default user for running openvpn spawned by NetworkManager:/:/sbin/nologin
qemu:x:107:107:qemu user:/:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

That’s quite good, at least to my eyes.

On the web

GitHub, GitLab, and Bitbucket do a good job of showing readable diffs for most common cases: line-oriented, but within each line also word or character diffs highlighted via color. Open up each of the following links to see how they present a few of our earlier examples as commit diffs:

But GitHub and GitLab both break down on a reflowed paragraph, while Bitbucket shows a sensible diff of what logically changed, including spaces to newlines and vice versa:

It appears that GitLab may soon gain proper cross-line word diff ability as seen in the project’s issue Add word-diff option to commits view, which states its “Problem to solve” as:

When working with markdown (or any type of prose/text in general), the “classic” git-diff (intended for code) is of limited use.

Exactly right.

IDEs

Visual Studio Code (VS Code) handles the above cases well out of the box for uncommitted changes in the current Git clone, and the GitLens extension helps it do the same for showing past commit diffs.

IntelliJ IDEA handles both cases well by default.

For those left behind

To make the most of Git, you’ll want a fairly recent version, since new features are being added all the time. If you’re working on a server using the popular but aging CentOS 7 which comes with the ancient Git 1.8.3, you can follow our simple tutorial to upgrade to Git 2.34.1 or newer on CentOS 7.

Enjoy!

Reference

  • diff on Wikipedia, including history and samples of original, context, and unified context diff output
  • patch on Wikipedia
  • git-diff man page
  • wdiff
  • dwdiff
  • Pangrams on Wikipedia, the source of our sample prose here

git terminal visual-studio-code tips
Page 1 of 200 β€’ Next page