Regexes are a powerful way to do search queries on a text string. They can be used to replace text, sanitize user inputs and much more.
Contents
An Introduction to REGEX
A REGEX or regular expression is used to match text strings in many programming languages. Let’s look at some examples in Javascript and PHP.
Javascript:
const cleanString = inputString.split(/[^a-zA-Z0-9]/).join('');
PHP:
$cleanString = preg_replace("/[^a-zA-Z0-9]/","",$inputString);
This code removes any bad characters which aren’t alphanumeric from a user inputed string which is useful for indexing and filenames.
The regex part is the /[^a-zA-Z0-9]/, notice how it is the same across both languages.
The regex is contained within two / / symbols. We can add modifyer letters such as i like this /regex/i which means the search is case-insensitve.
Regexs can be used to build complex if statements, search and replace functions, data extraction etc.
One word of note is that often regex performance isn’t great so if you are looking at doing something at really high volumes which is performance sensitive consider doing a more direct search.
Let’s look at some of the ways we compose regexes using a cheatsheet.
Download the regex cheat sheet here:
https://jamesbachini.com/resources/REGEXCheatSheet.pdf
REGEX Cheatsheet
Characters | |
---|---|
. | match any character |
\w \d \s | word, digit, whitespace |
\W \D \S | not word, digit, whitespace |
[abc] | any of a, b, or c |
[^abc] | not a, b, or c |
[a-g] [0-9] | character between a & g or 0-9 |
Anchors | |
^abc$ | Anchor ^start and end$ of the string |
\b \B | word, not-word boundary |
Escaped characters | |
\. \* \\ | escape special characters with backslash |
\t \n \r | tab, linefeed, carriage return |
Groups | |
(abc) | capture group |
\1 | backreference to group #1 |
(?:abc) | non-capturing group |
(?=abc) | positive lookahead |
(?!abc) | negative lookahead |
Quantifiers | |
a*a+a? | 0 or more, 1 or more, 0 or 1 |
a{5}a{2,} | exactly five, two or more |
a{1,3} | between one & three |
a+?a{2,}? | match as few as possible |
(ab|cd) | match ab or cd |
Favorite REGEX Examples
/[ -~]/
This matches all printable characters
/^\S+@\S+.\S+$/
Match and validate email addresses
/^[1-9][0-9]*$/
Match any numbers
/^(?!.Edge).Chrome/
Regex to match the user agent for just Chrome
^(?!.(?:Chrome|Edge)).Safari
Regex to match the user agent for just Safari
/(\r\n|\n|\r)/
Match any newline windows & unix
Favourite Javascript REGEX Code Examples
Test if a string matches a regex
if (/^([a-z0-9]{5,})$/.test('abc1')) {
console.log('Yep');
}
Count the occurrences of a regex
const count = (string.match(/test/g) || []).length;
Search and replace regex based on query
const new = oldString.replace(/\[code:([0-9]+)\]/g, '<a href="$1">click</a>');
This example adds all the URL string query parameters to the get variable.
const getQueryParams = () => {
const qs = document.location.search.split("+").join(" ");
const params = {};
let tokens;
const regex = /[?&]?([^=]+)=([^&]*)/g;
while (tokens = regex.exec(qs)) {
params[decodeURIComponent(tokens[1])] = decodeURIComponent(tokens[2]);
}
return params;
}
const get = getQueryParams();
A clean string function for user input that allows emojis and limits input to 256 characters
const cleanString = (stringRaw) => {
let stringClean = String(stringRaw);
stringClean = stringClean.split(/[^ .$*+?\\\-_:/&=,{}@a-zA-Z0-9😀😁😂🤣😃😄😅😆😉😊😋😎😍🥰😘😗😙😚☺️🙂🤗🤩🤔🤨😐😑😶🙄😏😣😥😮🤐😯😪😫😴😌😛😜😝🤤😒😓😔😕🙃🤑😲☹️🙁😖🥵😞😟🥶🥴😤😢😭😦😧🥳😨😩🤯😬😰😱😳🤪😵😡🥺😠🤬😷🤒🤕🤢🤮🤧😇🤠🤥🤫🤭🧐🤓😈👿🤡👹👺💀☠️👻👽👾🤖💩🙊💋💘💝💖💗💓💞💕💌❣️💔❤️🧡💛💚💙💜🖤💟💍💎💐💒👶🧒👦👧🧑👨👱♂️🧔👩👱♀️🧓👴👵👨⚕️👩⚕️👨🎓👩🎓👨🏫👩🏫👨⚖️👩⚖️👨🌾👩🌾👨🍳👩🍳👨🔧👩🔧👨🏭👩🏭👨💼👩💼👨🔬👩🔬👨💻👩💻👨🎤👩🎤👨🎨👩🎨👨✈️👩✈️👨🚀👩🚀👨🚒👩🚒👮♂️👮♀️🕵️♂️🕵️♀️💂♂️💂♀️👷♂️👷♀️🤴👸👳♂️👳♀️👲🧕🤵👰🤰🤱👼🎅🤶🧙♂️🧙♀️🧚♂️🧚♀️👨🦰🧛♂️🧛♀️👨🦱👨🦳👨🦲🧜♂️🧜♀️🧝♂️👩🦰👩🦱🧝♀️👩🦳🧞♂️👩🦲🧞♀️🧟♂️🧟♀️🙍♂️🙍♀️🙎♂️🙎♀️🙅♂️🙅♀️🙆♂️🙆♀️💁♂️💁♀️🙋♂️🙋♀️🙇♂️🙇♀️🤦🤦♂️🤦♀️🤷🤷♂️🤷♀️💆♂️💆♀️💇♂️💇♀️👤👥🦸♂️🦸♀️🦹♂️🦹♀️👫👬👭👩❤️💋👨👨❤️💋👨👩❤️💋👩👩❤️👨👨❤️👨👩❤️👩👨👩👦👨👩👧👨👩👧👦👨👩👦👦👨👩👧👧👨👨👦👨👨👧👨👨👧👦👨👨👦👦👨👨👧👧👩👩👦👩👩👧👩👩👧👦👩👩👦👦👩👩👧👧👨👦👨👦👦👨👧👨👧👦👨👧👧👩👦👩👦👦👩👧👩👧👦👩👧👧🦵🦶🤳💪👈👉☝️👆🖕👇✌️🤞🖖🤘🤙🖐️✋👌👍👎✊👊🤛🤜🤚👋🤟✍️👏👐🙌🤲🙏🤝👂👃👀👁️🧠👅👄🚶♂️🚶♀️🏃♂️🏃♀️💃🕺👯♂️👯♀️🧖♂️🧖♀️🧗♂️🧗♀️🧘♂️🧘♀️🛌🕴️🗣️🤺🏇⛷️🏂🏌️♂️🏌️♀️🏄♂️🏄♀️🚣♂️🚣♀️🏊♂️🏊♀️⛹️♂️⛹️♀️🏋️♂️🏋️♀️🚴♂️🚴♀️🚵♂️🚵♀️🏎️🏍️🤸🤸♂️🤸♀️🤼🤼♂️🤼♀️🤽🤽♂️🤽♀️🤾🤾♂️🤾♀️🤹🤹♂️🤹♀️🎖️🏆🏅🥇🥈🥉⚽⚾🏀🏐🏈🏉🎾🎳🏏🏑🥎🏒🏓🏸🥊🥋🥏🥅⛸️🎣🥍🎿🛷🥌🎯🎱🧿🧩🧸🧵🧶📢📣📯🔔🎼🎵🎶🎙️🎚️🎛️🎧📻🎷🎸🎹🎺🎻🥁💽💿📀🎥🎞️📽️🎬📺📷📸📹📼🥭🍇🍈🍉🍊🍋🍌🍍🍎🍏🍐🍑🍒🥬🍓🥝🍅🥥🥑🍆🥔🥕🌽🌶️🥯🥒🥦🥜🌰🍞🥐🥖🥨🥞🧀🍖🍗🥩🥓🍔🍟🍕🌭🥪🌮🌯🥙🥚🧂🍳🥘🍲🥣🥗🍿🥫🍱🍘🍙🍚🍛🍜🥮🍝🍠🍢🍣🍤🍥🍡🥟🥠🥡🍦🍧🍨🍩🍪🧁🎂🍰🥧🍫🍬🍭🍮🍯🍼🥛☕🍵🍶🍾🍷🍸🍹🍺🍻🥂🥃🥤🥢🍽️🍴🥄🏺😺😸😹😻😼😽🙀😿😾🙈🙉🦝🐵🐒🦍🐶🐕🐩🐺🦊🐱🐈🦁🐯🐅🐆🐴🐎🦄🦓🦌🐮🦙🐂🐃🐄🐷🦛🐖🐗🐽🐏🐑🐐🐪🐫🦒🐘🦏🐭🐁🐀🦘🐹🦡🐰🐇🐿️🦔🦇🐻🐨🐼🐾🦃🐔🦢🐓🐣🐤🦚🐥🐦🦜🐧🕊️🦅🦆🦉🐸🐊🐢🦎🐍🐲🐉🦕🦖🐳🐋🐬🐟🐠🐡🦈🐙🐚🦀🦟🦐🦑🦠🐌🦋🐛🐜🐝🐞🦗🕷️🕸️🦂🦞🌸💮🏵️🌹🥀🌺🌻🌼🌷🌱🌲🌳🌴🌵🌾🌿☘️🍀🍁🍂🍃🍄⌛⏳⚡🎆🎇🔇🔈🔉🔊🔕🔒🔓🔏🔐🚮🚰♿🚹🚺🚻🚼🚾🛂🛃🛄🛅⚠️🚸⛔🚫🚳🚭🚯🚱🚷📵🔞⏭️⏯️⏮️⏸️⏹️⏺️⏏️🎦🔅🔆📶📳📴🔱ℹ️Ⓜ️🅿️🧭⬆️↗️➡️↘️⬇️↙️⬅️↖️↕️↔️↩️↪️⤴️⤵️🔃🔄🔀🔁🔂▶️⏩◀️⏪🔼⏫🔽⏬🦷🦴🛀👣💣🔪🧱🛢️⛽🛹🚥🚦🚧🛎️🧳⛱️🔥🧨🎗️🎟️🎫🧧🔮🎲🎴🎭🖼️🎨🎤🔍🔎🕯️💡🔦🏮📜🧮🔑🗝️🔨⛏️⚒️🛠️🗡️⚔️🔫🏹🛡️🔧🔩⚙️🗜️⚖️⛓️⚗️🔬🔭📡💉💊🚪🛏️🛋️🚽🚿🛁🛒🚬⚰️⚱️🧰🧲🧪🧴🧷🧹🧻🧼🧽🧯💠♟️💺🎮🕹️🎰📱📲☎️📞📟📠💻🖥️🖨️⌨️🖱️🖲️💾📔📕📖📗📘📙📚📓📒📃📄📰🗞️📑🔖🏷️💰💴💵💶💷💸💳💹💱✉️📧📨📩📤📥📦📫📪📬📭📮🗳️✏️✒️🖋️🖊️🖌️🖍️📝💼📁📂🗂️📅📆🗒️🗓️📇📈📉📊📋📌📍📎🖇️📏📐✂️🗃️🗄️🗑️🧾💅👓🕶️👔👕👖🧣🧤🧥🧦👗👘👙👚👛👜👝🛍️🎒👞👟👠👡👢👑👒🎩🎓🧢⛑️📿💄🌂☂️🎽🥽🥼🥾🥿🧺🚂🚃🚄🚅🚆🚇🚈🚉🚊🚝🚞🚋🚌🚍🚎🚐🚑🚒🚓🚔🚕🚖🚗🚘🚙🚚🚛🚜🚲🛴🛵🚏🛣️🛤️⛵🛶🚤🛳️⛴️🛥️🚢✈️🛩️🛫🛬🚁🚟🚠🚡🛰️🚀🛸🌍🌎🌏🌐🗺️🗾🏔️⛰️🗻🏕️🏖️🏜️🏝️🏞️🏟️🏛️🏗️🏘️🏚️🏠🏡🏢🏣🏤🏥🏦🏨🏩🏪🏫🏬🏭🏯🏰🗼🗽⛪🕌🕍⛩️🕋⛲⛺🏙️🎠🎡🎢🎪⛳🗿💦🌋🌁🌃🌄🌅🌆🌇🌉🌌🌑🌒🌓🌔🌕🌖🌗🌘🌙🌚🌛🌜🌡️☀️🌝🌞🌟🌠☁️⛅⛈️🌤️🌥️🌦️🌧️🌨️🌩️🌪️🌫️🌬️🌀🌈☔❄️☃️⛄☄️💧🌊🎑⌚⏰⏱️⏲️🕰️🕛🕧🕐🕜🕑🕝🕒🕞🕓🕟🕔🕠🕕🕡🕖🕢🕗🕣🕘🕤🕙🕥🕚🕦🏧🔙🔚🔛🔜🔝🔰‼️⁉️❓❔❕❗™️#️⃣*️⃣0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣🔟💯🔠🔡🔢🔣🔤🅰️🆎🅱️🆑🆒🆓🆔🆕🆖🅾️🆗🆘🆙🆚🈁🈂️🈷️🈶🈯🉐🈹🈚🈲🉑🈸🈴🈳㊗️㊙️🈺🈵🇦🇧🇨🇩🇪🇫🇬🇭🇮🇯🇰🇱🇲🇳🇴🇵🇶🇷🇸🇹🇺🇻🇼🇽🇾🇿💢♨️💈⚓♠️♥️♦️♣️💲☢️☣️🛐⚛️🕉️✡️☸️☯️✝️☦️☪️☮️🕎🔯♈♉♊♋♌♍♎♏♐♑♒♓⛎♀️♂️⚕️♻️⚜️©️®️♾️👁️🗨️💤💥💨💫💬🗨️🗯️💭🕳️🚨🛑⭐🎃🎄✨🎈🎉🎊🎋🎍🎎🎏🎐🎀🎁🃏🀄🔋🔌🔗🧫🧬📛⭕✅☑️✔️✖️❌❎➕➖➗➰➿〽️✳️✴️❇️〰️🔴🔵⚪⚫⬜⬛◼️◻️◽◾▫️▪️🔶🔷🔸🔹🔺🔻🔘🔲🔳\s]/).join('');
stringClean = stringClean.substr(0, 256);
return stringClean;
}