LearningToTest – Experiences in databases, ETL, and data engineering

TIL: (Pre-)Commit(ment) Issues

Post author By lalligood
Post date 2023-11-08
Categories In Uncategorized
No Comments on TIL: (Pre-)Commit(ment) Issues

In a personal repo, my latest attempt to commit was failing with a very generic appearing message:

/usr/bin/env python 3.9: No such file or directory

What was really strange is that this repo wasn’t based on python version 3.9–at least not in recent memory–it was running on 3.11!

Come to find out, the problem actually had to do with pre-commit! And that made sense as the last time I’d worked on that repo on my personal laptop, I was probably running python 3.9 at the time… Running a couple commands to reset the pre-commit hook environment was all I needed to do to get my commit to work properly:

pre-commit clean

pre-commit autoupgrade

SSL Certs Are The WORST (To Generate & Install)

Post author By lalligood
Post date 2023-10-29
Categories In Uncategorized
No Comments on SSL Certs Are The WORST (To Generate & Install)

With having various IT-related jobs for nearly 30(!) years now, I consider myself to be pretty technically savvy. But every once in a while, there’s something that’s so complex that I just can’t believe that some bit of technical wizardry is so prevalent yet difficult for me to wrap my head around. And holy cow, generating & installing SSL certs aren’t just one of those things…SSL certs might be the quintessentially worst bit of technology that actually does amazingly good things.

Now I’ll admit, that part of this is probably me just being cheap–instead of paying my hosting service a little more every year to keep this website alive & automatically renew the site’s SSL, I went with a cheaper plan that means i have to manage the SSL cert.

And just like anything else that’s technical & intricate to do, details on how to do such things often gets lost…especially when you only have to do that task once a year.

I want to rectify that by documenting what I (finally!) did to get it working so that I never have to go through so much painful effort again…

What’s perhaps so bad about this is that it’s really not that hard, but the hosting company’s documentation (vs the actual execution) could be simpler.

Easy Part: Generating a Private Key & the CSR

As long as you have a Linux (or maybe a Mac) machine handy, this should be trivial to do. 1. Run the following command to generate a private key:
sudo openssl genrsa --out [YOUR_DOMAIN.TLD].key 2048
You’ll want to replace “[YOUR_DOMAIN.TLD]” with your actual domain name. So I would do:
sudo openssl genrsa --out learningtotest.com.key 2048

2. With the private key created, we need to generate the certificate signing request (CSR) with this command:
sudo openssl req --new --key [YOUR_DOMAIN.TLD].key --out [YOUR_DOMAIN.TLD].csr

Again, replacing [YOUR_DOMAIN.TLD] with your actual domain name–at the very least so that it matches the private key file that you created with the first command is imperative! So mine would become:
sudo openssl req --new --key learningtotest.com.key --out learningtotest.com.csr

3. Fill all of the prompts (but it’s OK to leave the passphrase blank) to finish generating the CSR.

4. View the contents of the CSR file so you can copy’n’paste it in your webhost’s site. (Hold on to both files because we will need the key file again later!)

The Tricky Part: Validating the Domain

Part of this process requires proving that you own the domain that you are generating the SSL cert for. There are are a few ways this can be done. Two of them are:

Add a CNAME record to your DNS entries (this is most likely the default).
Receive a code via email.

You really want to do the email validation because the DNS entry can be very hit-or-miss (mostly miss). Usually the email is sent to admin@[YOUR_DOMAIN.TLD] so obviously this account (or email alias 😉 ) needs to already exist.

From there you follow the instructions in the email (usually click a link & provide a code contained in the email). This will most likely result in a second email being sent which contains a ZIP file with certificate & CA bundle text files.

Coming Down The Home Stretch

You have everything you need to complete the certificate installation. Login to your WordPress host & navigate to the SSL certificate page.

Select Custom SSL certificate, then in each of the 3 fields, either copy’n’paste or upload the following 3 files:

Private key (learningtotest.com.key)
SSL certificate (learningtotest.com.crt)
CA Bundle (learningtotest.com.ca_bundle)

And assuming that you do not get any errors here (the most likely being that your private key is incorrect), you can click on the Change or Submit button.

Congratulations, your site is good to go for another year!

TIL: Simplified(?) Python Development Environment with pyenv & pipenv

Post author By lalligood
Post date 2022-05-31
Categories In Uncategorized
No Comments on TIL: Simplified(?) Python Development Environment with pyenv & pipenv

Creating & maintaining a stable local Python development environment has often felt like more trouble than it ought to be, especially when you haven’t created one in a while but need to. The main goals/challenges of creating one are:

Installing the desired version of Python with the least amount of trouble.
Installing any necessary Python packages for the project in the right place.
Maintaining the environment configuration as a part of the project.

While it would be ideal to have a single tool that will do all of this, I will admit that I have long since become accustomed to using pipenv. However, pipenv‘s weakness is that it doesn’t have the ability to install different versions of python–it is just a (very, very good) package manager. Meanwhile, pyenv just happens to be a very good Python installer that lacks the ability to create & manage virtual environments.

Installing `pyenv` & `pipenv` on Mac OSX

It’s as easy as this:

brew install pyenv pipenv

Installing Python with `pyenv`

In the directory of your new project, you only need to run 2 commands:

pyenv install 3.10.4
pyenv local 3.10.4

This will install Python 3.10.4 then it creates a simple .python-version file in the current directory (which should be the root directory of your project).

Creating the Virtual Environment with `pipenv`

Again, just 2 simple commands:

pipenv --python 3.10
pipenv install

The first command will install the most recent version of 3.10 that is installed, however, to be sure you can specify the minor release version (3.10.4) if you feel the need. The second command assumes that your project already has a Pipfile in it & will install of the packages found in that file.

That’s it! The only thing left is to pipenv shell to shell into your new virtual environment & get to work.

Today I Learned: BigQuery Scheduled Queries Don’t Fail on Missing Columns

Post author By lalligood
Post date 2021-09-03
Categories In Uncategorized
No Comments on Today I Learned: BigQuery Scheduled Queries Don’t Fail on Missing Columns

I’m continuing to enjoy the relaxed abilities that BigQuery brings compared to other common databases. However, yesterday I was bitten by how relaxed it can be.

I had been refactoring an existing scheduled query & as part of it, I had been (manually) converting many of the reserved words in ALL CAPS. Well, I had used a search & replace at some point that accidentally changed the gender column to the gENDer column in the query’s text. I always make sure to run these queries manually to verify that they work…and no error had come back. I save the changes, updated the schedule, & went home.

When I got to work this morning, there was an email thread from someone who noticed that the gender column had changed its name & that all the values were NULL. My reaction: “Huh? How can that be?” I was disappointed that I’d overlooked the accidental change of the column name reference, but there hadn’t been any error! However, it was the all NULL values that really made my head swim…!

I’m not sure how/why BigQuery is so laissez-faire about this (yet still can be case-sensitive about the column name!) but I sure learned how important it is to be consistent with column names!

Today I Learned: BigQuery Scheduled Queries Edition

Post author By lalligood
Post date 2021-08-26
Categories In Uncategorized
No Comments on Today I Learned: BigQuery Scheduled Queries Edition

Today I learned to never, ever assume that any scheduled queries written & deployed by someone else won’t be contradicted by that same person in another scheduled query.

If you find a scheduled query (don’t use them, just don’t…) that populates a table but produces a result different than what you expect (read: has significantly fewer or more rows), examine every single scheduled query that runs after the first one. You may find there is a second scheduled query that overwrites the first one. No, really.

Managing Alembic Migrations with a single alembic.ini & env.py

Post author By lalligood
Post date 2021-06-17
Categories In Uncategorized
No Comments on Managing Alembic Migrations with a single alembic.ini & env.py

The current project that I am working on has multiple databases, but we wanted to streamline the migrations for while keeping them tracked separately with alembic. Previously we had a migration configuration for each database & it just felt unwieldy & unpythonic having multiple alembic.ini & env.py files that all pretty much contained the same information (other than references to the unique database names in the connection strings.

The multidb template in alembic is interesting, but it tracks all of your migrations in a single alembic_version table. (We found this to be undesirable because if we wanted to downgrade a database, we only wanted those changes to be isolated to that particular database–this doesn’t seem possible with a single alembic_version table for multiple databases.) However, there didn’t seem to be a single source of information to guide you through this scenario.

I figured it out eventually & it felt a little Frankenstein-ish at first since my solution was cobbled together from multiple sources… In short, it uses what is a single database env.py that pulls in some environment variables to aid in local development vs server deployment, and that is mixed with a multi-database alembic.ini. However, the solution is exactly what we were looking to do and perhaps turned out better than we originally envisioned (particularly the directory structure).

alembic.ini

There is some magic going on her as the sqlalchemy.url strings are mostly “{}” for all the elements (user, password, host, & port) that might be different depending on where the migration is being run. Those lines are referenced & populated in the env.py file in the next section. And version_locations is the relative path (from alembic.ini) where alembic will create–and expect to find–the migrations for the respective database.

# **************************************************
# ** All settings for ALL databases should only   **
# ** go in [DEFAULT] section!                     **
# ** Do NOT add settings in [alembic] section!    **
# **************************************************

[alembic]
databases = database1, database2, database3, database4

[DEFAULT]
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .

# path to migration scripts
script_location = ./schemas/

# template used to generate migration files
file_template = %%(year)d-%%(month).2d-%%(day).2d_%%(hour).2d-%%(minute).2d_%%(rev)s_%%(slug)s

# max length of characters to apply to the "slug" field
truncate_slug_length = 60

[database1]
sqlalchemy.url = mysql+mysqlconnector://{}:{}@{}:{}/database1
version_locations = ./schemas/database1

[database2]
sqlalchemy.url = mysql+mysqlconnector://{}:{}@{}:{}/database2
version_locations = ./schemas/database2

[database3]
sqlalchemy.url = mysql+mysqlconnector://{}:{}@{}:{}/database3
version_locations = ./schemas/database3

[database4]
sqlalchemy.url = mysql+mysqlconnector://{}:{}@{}:{}/database4
version_locations = ./schemas/database4

[post_write_hooks]  # From here to end of file is default alembic.ini contents
...

env.py

Since the migrations will be deployed in docker containers where the migration container can only run if the database container is active & available, that allowed me to dispose of the offline migrations function, leaving just this:

from logging.config import fileConfig
from os import environ

from sqlalchemy import engine_from_config
from sqlalchemy import pool

from alembic import context

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config

# Interpret the config file for Python logging.
# This line sets up loggers basically.
fileConfig(config.config_file_name)

# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = None

# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
DB_USER = environ.get("FOM_DB_USER")
DB_PASS = environ.get("FOM_DB_PASSWORD")
DB_HOST = environ.get("FOM_DB_HOST")
DB_PORT = environ.get("FOM_DB_PORT")


def run_migrations_online():
    """Run migrations in 'online' mode.

    In this scenario we need to create an Engine
    and associate a connection with the context.
    """
    section = config.get_section(config.config_ini_section)
    # Inject environment variable values into connection string
    url = section["sqlalchemy.url"].format(DB_USER, DB_PASS, DB_HOST, DB_PORT)
    section["sqlalchemy.url"] = url
    connectable = engine_from_config(
        section,
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )

    with connectable.connect() as connection:
        context.configure(connection=connection, target_metadata=target_metadata)

        with context.begin_transaction():
            context.run_migrations()


run_migrations_online()

So when it comes to running any alembic commands, you preface the action with --name DATABASE so that your requests become:

alembic --name database1 history
alembic --name database3 current
alembic --name database2 revision -m "create a new index"
alembic --name database4 upgrade head

Final Directory Structure

The final directory structure is much easier to navigate than as 4 separate configurations & contains perhaps even less nesting of directories than the default alembic configuration for a single database.

migrations/
|--> schemas/
|   |--> database1/
|   |   |--> first_migration_for_database1.py
|   |--> database2/
|   |   |--> migrations_for_database2.py
|   |--> database3/
|   |--> database4/
|   |--> env.py
|--> alembic.ini

Using the Starship Prompt in pipenv Virtual Environment on Mac OSX

Post author By lalligood
Post date 2021-06-09
Categories In Uncategorized
No Comments on Using the Starship Prompt in pipenv Virtual Environment on Mac OSX

While attending PyCon 2021 online, I learned about Starship prompt & was impressed. It isn’t all that impressive when you are just navigating the file system, but as soon as you enter a project directory…that is when it springs to life. There is some magic that looks at the files & suddenly will tell you some useful language information (python, nodeJS, etc.) as well as what git branch you are on. And it makes generous use of color to do that. For me, Starship just ticks a lot of the right boxes, not to mention is much, much easier to install & configure than my previous favorite prompt enhancer, powerline.

My only complaint that I had (note the verb tense) about Starship was that on my Mac, I lost it whenever I entered a pipenv shell. What helped me realize an easy solution was that I had installed Starship on my Ubuntu laptop & it didn’t exhibit this behavior.

So what’s the difference between the Mac & Ubuntu?

It seems to be where they get their bash settings from. Mac uses .bash_profile while Ubuntu uses .bashrc.

I created a simple .bashrc in my Mac user’s home directory with the following–and redundant–contents:

if [ -f ~/.bash_aliases ]; then
. ~/.bash_aliases
fi

# Supercharge -r with fzf (requires apt|brew install fzf)
[ -f ~/.fzf.bash ] && source ~/.fzf.bash

# Activate starship prompt
eval "$(starship init bash)"

I did find that my favorite command history fuzzy finder tool, fzf, wasn’t available in my virtual env either so it came along, & I added my aliases file as extra insurance. So if you go “reaching” for one of your favorite CLI commands & it’s doesn’t appear to be available in your pipenv virtual environment, just add it to your .bashrc.

Upgrading Python in an Existing pipenv Virtual Environment

Post author By lalligood
Post date 2021-05-27
Categories In Uncategorized
No Comments on Upgrading Python in an Existing pipenv Virtual Environment

While I have a love-hate relationship with pipenv–make no mistake, it’s definitely more love than hate–the things that I hate seem to be more due to a lack of (quality) documentation than a lack of anything else.

A simple example is upgrading the version of python running in an existing virtual environment. My “google-fu” was returning results that seemed to contradict each other, offer vague advice, or be old enough to make me question the validity of the answer.

So I had no choice but to try some of them out. But at least I tried it on a project that I was just doing a test drive on–so no real loss if things were to go wrong… First though, I had to pick from a small number of different commands, and I decided to go with the one that was mentioned the most:

pipenv --python 3.8.10 install

(As I was running 3.8.6 & wanted to get the latest patches for 3.8.)

Other than being prompted if I wanted to download 3.8.10–and waiting for the download to complete–it was just like most other pipenv actions. And it did exactly what I was attempting to do. Success!

And now I just have to figure out how to resolve my biggest gripe: how to get my prompt & aliases working inside the virtual environment…

UPDATE: I highly recommend using pipenv --help and pipenv COMMAND --help (where COMMAND is check, clean, graph, install, etc.) as pipenv install --help doesn’t explicitly say “this is how you upgrade python in your virtual environment, but it is strongly implied. (Doing this sooner would have saved me some time.)

UPDATE 2: It seems that maybe some subtle changes have been made to pipenv. Running the pipenv --python... command above by itself doesn’t seem to work anymore. You need to remove the existing virtual environment & rebuild it. Therefore, you need to do the following two commands to bump up the python version:

pipenv --rm
pipenv --python X.Y.Z install

(Just don’t forget to replace X.Y.Z with the desired python version!)

Cleaning Up Your Very First (Bad) Alembic Migration

Post author By lalligood
Post date 2021-04-24
Categories In Uncategorized
No Comments on Cleaning Up Your Very First (Bad) Alembic Migration

As with many things, the first time you try something, it often doesn’t go well. And recently we’ve adopted using alembic for database migrations with a python-based project. Conceptually, alembic is a lot like git–each migration is a commit-like object that you can apply (upgrade) or remove (downgrade). And an autogenerated hash is assigned to each migration with a pointer back to the has for the previous commit.

Except when it is the first one.

For that first migration, it points back to an empty string. So after trying various pythonic options (<nothing>, an empty string, None, -1), all of which failed, I read a reference in the alembic docs where it called the initial state (or parent) “base”.

Alas that works!

alembic downgrade base

Mess averted, problem solved, & we’re back to the beginning again. Now let’s give that database migration another try…

One Way to Deal With a Failed Parametrized Test

Post author By lalligood
Post date 2021-01-27
Categories In Uncategorized
No Comments on One Way to Deal With a Failed Parametrized Test

You have a test that’s parametrized but only one of the examples fails. While it seems you may have run into an edge case, you don’t have time to fix the code but need to get the test to stop failing. (And it goes without saying that you can’t remove that example either!) One way to do this is mark that specific as xfail. But how do you do that for only the one example?

First, let’s see the original test

from pytest import mark

@mark.parametrize(
    "add_value, expected",
    [
        (3, 6),
        (4, 7),
        (-1, -4),  # This one (obviously) fails!
    ],
    ids=["add 3", "add 4", "add -1"])
def test_add_3(add_value, expected):
    assert add_value + 3 == expected

Let’s set aside that the third example above is just plain wrong–we just needed an example that fails. Now let’s modify the test so that the entire test will stop failing.

from pytest import mark, param

@mark.parametrize(
    "add_value, expected",
    [
        param(3, 6),
        param(4, 7),
        param(-1, -4, marks=mark.xfail(raises=AssertionError, reason="Doesn't add up!")),
    ],
    ids=["add 3", "add 4", "add -1"])
def test_add_3(add_value, expected):
    assert add_value + 3 == expected

When running the new test, it will return 2 PASS 1 XFAIL. Also worth mentioning that the raises= & reason= keywords are both entirely optional, however, I’d make the case that they do leave a valuable “crumb trail” as to why/how the test is failing. That way when you come back to figuring out why that test is failing with that example, you at least have an idea as to what might be the cause!