Ecommerce customer names with interesting Unicode characters

By Jon Jensen
December 29, 2021

Photo of a small garden in front of a house basement window with two cats looking out

One of our clients with a busy ecommerce site sees a lot of orders, and among those, sometimes there are unusual customer name and address submissions.

We first noticed in 2015 that they had a customer order come in with an emoji in the name field of the order. The emoji was 😏 and we half-jokingly wondered if that was a new sign of fraud.

Over the following 3 years only a few more orders came in with various emoji in customer’s names, but in mid-2018 emoji started to appear increasingly frequently until now one now appears on average every day or two.

Why the sudden appearance of emoji in 2015? It correlates with the rapid shift to browsing the web and shopping on mobile devices. Mobile visits now represent more than half of this client’s ecommerce traffic.

Most people in 2015 didn’t even know how to type emoji on a desktop or laptop computer, but mobile touchscreen keyboards began showing emoji choices around that time, so the mobile explanation makes sense. And mobile keyboard autocorrect also sometimes offers emoji in addition to words, making them even more common in the past few years.

Just for fun I wanted to automatically find all such “interesting” names, so I wrote a simple report that uses SQL to query their PostgreSQL ecommerce database.

To preserve customer privacy, names shown here have been changed and limited to a few names that are common in the United States.

Real names with bonuses

First let’s look at the apparently real names with emoji and other non-alphabetical Unicode characters mixed in:

Amy 🩺
Amy💕
Amy👑
👸Amy
Anna 💫
Anna 😎
Anna ❤️
Anna💊💉
Bob☯️
Bob😍😘😜👑💞
Bob🐞
Brenda 🌙
Brenda 🥳🤪
Brenda☠️
B R E N D A 💕💅🏾💃🏾👜
Cameron 🌻
Cameron 😎
Cameron 💁🏻‍♀️
Doug 💕
Doug 🅱️
Doug 🌸🤩
Doug 👦🏻❤️
Emily 🌷
Emily 😊🇨🇺
Emily😚
Emily 🎰👑❤️
Frank 🥰
Frank ⚗️
Frank🔆
Frank💞💰
Jane 💕
Jane⁸
Jane ❤️❤️
Jane🔆
Jill 👑🎀
Jill 👼🏽✨🌫
Jill🍓🍒🍒
Jill👸🏽💖
Jim 🐰
Jim, 💕
Jim’s iPhone✨
Joe 🐯
Joe 🇦🇺
Joe😏
Joe🏀🎸‼️
John 👪
John⁵
John🎭
Karen 🌻🌹
Karen 👑✨
Karen 🔒❤️
Karen🎀👑
Kate💙💚
Kate🤪🤞🏽💙
Kate.💘
K$ate💉
Liz🌺
Liz❣️
Liz❤️🙃
Liz Mama💙💙
Mary 🎀
Mary💘
Mary👼🏼💓
Mary⁹
Mike 👑
Mike 👑🌸
Mike 👩🏻‍🌾
Mike⁶
Sarah 👑
Sarah👑A.
✨Sarah✨
💛🌻 Sarah
Steve🐑
Steve 🦁
Steve♓️💓
Victoria 🥴
Victoria🌻
V𝚒𝚌𝚝𝚘𝚛𝚒𝚊
V I C T O R I A 🤍

Would you have expected all that in ecommerce orders? I didn’t!

Fake names

Next let’s look at placeholder names with people’s role or self-description or similar:

Amor ⚽️
Babe ❤️
C𝚒𝚝𝚒𝚣𝚎𝚗
Daddy🥴😏
Daddy😘👴👨🙏👨👩👧🙇🏾
Fly High 🕊
Forever 💍💜
Granny 👵🏽
HOME ❤️
Home🏠
Home🏠💜
Hubby🥰
💦Juicy🍑
me!! 💛
mi amor ❤️
Mom💗
Mom♥️
Mom 🐥💛
MOMMY 💗
Myself 😘
Princess👑
princess❤️
Queen 😍💖🔓
Queen💘
The Husband💍❤️
Wifey 😈✌🏽👅

Perhaps the occurrence of “home” several times reflects a mobile address book auto-fill function for billing or shipping address fields?

Not names at all!

Then there are those customers who didn’t provide any kind of name at all, just emoji and other special characters:

🦋
💙
🤡🎪
💗☁️
🍌
♥️
🌙
⁵
∅

I guess only one or two of those per year doesn’t amount to much, but they’re interesting to see.

Strange addresses

In addition to the name fields we also checked the address fields for unusual characters and found (again, details changed to preserve privacy):

125 E 27😎
227 W 24 Circle ⭕️
3 Blvd. George Washington™

Simply odd

The prizewinner for oddity, which seems like some kind of copy-and-paste mistake, is this in the city field of an address:

Indian® Roadmaster™ Classic

Maybe at least one motorcycle has achieved sentience and needed to do some online shopping!

“Interesting” Unicode ranges

When searching for interesting Unicode ranges, we could just look for characters in the Unicode emoji ranges. That would be fairly straightforward since there are just a few ranges to match.

But we were curious what other unusual characters were getting used aside from emoji, so we wanted to include other classes of characters. So perhaps we should include everything to start and then exclude the entire class of Unicode “word characters”? That covers the world’s standard characters used for names and addresses, including not just Roman/Latin with optional diacritics, but also other character sets such as Cyrillic, Arabic, Hebrew, Korean, Chinese, Japanese, Devanagari, Thai, and many others.

PostgreSQL POSIX regular expressions include the class “word character” represented by either [[:word:]] or the Perl shorthand \w. I started with that, but found it covered too many things I did want to see, such as the visually double-width Latin characters that are part of the Chinese word character range, and some special numbers.

So I switched back to matching what I want, rather than excluding what I don’t want, and I manually went through the Unicode code charts and noted the ranges to include.

The list of Unicode code ranges I came up with, in hexadecimal, is:

250-2ba
2bc-2c5
2cc-2dc
2de-2ff
58d-58e
fd5-fd8
1d00-1dbf
2070-2079
207b-209f
20d0-2104
2106-2115
2117-215f
2163-218b
2190-2211
2213-266e
2670-2bff
2e00-2e7f
2ff0-2fff
3004
3012-3013
3020
3200-33ff
4dc0-4dff
a000-abf9
fff0-fffc
fffe-1d35f
1d360-1d37f
1d400-1d7ff
1ec70-1ecbf
1ed00-1ed4f
1ee00-1eeff
1f000-10ffff

Those ranges exclude several fairly common characters that people (or their software’s autocorrect) used in their address fields, which we wanted to ignore, such as:

Music sharp sign: ♯ (instead of # before a number)
numero: №
care of: ℅
replacement character: � (though this could be interesting if it reveals unknown encoding errors)

The SQL query

PostgreSQL allows us to represent Unicode characters in hexadecimal numbers as either \u plus 4 digits or \U plus 8 digits. So the character 2ba is written in a Postgres string as \u02ba and the range fffe-1d35f is written in a Postgres regex range as [\ufffe-\U0001d35f].

With a little scripting to put it all together, I came up with:

SELECT order_number, order_timestamp::date AS order_date,
    fname, lname, company, address1, address2, city, state, b_fname, b_lname, b_company, b_address1, b_address2, b_city, b_state, phone
FROM orders
WHERE concat(fname, lname, company, address1, address2, city, state, b_fname, b_lname, b_company, b_address1, b_address2, b_city, b_state, phone)
    ~ '[\u0250-\u02ba\u02bc-\u02c5\u02cc-\u02dc\u02de-\u02ff\u058d-\u058e\u0fd5-\u0fd8\u1d00-\u1dbf\u2070-\u2079\u207b-\u209f\u20d0-\u2104\u2106-\u2115\u2117-\u215f\u2163-\u218b\u2190-\u2211\u2213-\u266e\u2670-\u2bff\u2e00-\u2e7f\u2ff0-\u2fff\u3004\u3012-\u3013\u3020\u3200-\u33ff\u4dc0-\u4dff\ua000-\uabf9\ufff0-\ufffc\ufffe-\U0001d35f\U0001d360-\U0001d37f\U0001d400-\U0001d7ff\U0001ec70-\U0001ecbf\U0001ed00-\U0001ed4f\U0001ee00-\U0001eeff\U0001f000-\U0010ffff]'
    -- limit how many years back to go
    AND order_timestamp > (SELECT CURRENT_TIMESTAMP - interval '3 years')
    -- exclude any order that had PII expunged for GDPR
    AND expunged_at IS NULL
ORDER BY order_timestamp DESC

Try a similar query on databases you have access to, and see what interesting user submissions you discover. I always find surprises and in addition to being fun, sometimes we find things that help us improve our input validation and user guidance so that more mistakes are caught when they’re easy for the customer to correct.

Happy holidays!

ecommerce unicode sql postgres

Custom Ecommerce

Application Development

Database Consulting

Cloud Hosting

Systems Integration

Legacy Business Systems

Security & Compliance

GIS

DevOps / Cloud

Databases

Frameworks

Ecommerce

About Us

Our Team

Clients

Blog

Careers

Our Blog