TLS fingerprinting: from ClientHello bytes to JA4
Every HTTPS request starts the same way. The client opens a TCP connection, the server accepts, and before a single byte of encrypted application data flows, the client sends a TLS ClientHello in the clear. That packet is the most opinionated thing your software does on the network. The cipher list, the extensions, their order, what gets included, what gets omitted: all of it leaks the identity of the underlying TLS stack to anyone watching the wire.
This is the whole story behind TLS fingerprinting. Anti-bot systems have been mining ClientHello packets for years, first as JA3, more recently as JA4. The rest of this piece walks the journey end to end. What’s in a ClientHello, how the original 2017 hash worked, why it broke, what replaced it, and how to compute either against a real capture in a few lines of Python.
The ClientHello, in plaintext
TLS 1.3 encrypts more of the handshake than TLS 1.2 did. The very first message, the ClientHello, has to be in plaintext, because the two parties have not agreed on a key yet. Inside the record-layer wrapper, a ClientHello carries a fixed set of fields:
- A two-byte protocol version, frozen at
0x0303(TLS 1.2) for compatibility, with the real version negotiated through an extension - 32 bytes of client random
- A variable-length session ID, almost always 32 bytes of random in modern stacks
- The list of cipher suites the client is willing to use, two bytes each
- A list of compression methods, which after CRIME has been a single null byte for over a decade
- A length-prefixed extensions block
The extensions block is where the personality lives. Server Name Indication tells the server which virtual host you want. Supported groups lists which elliptic curves you’ll do key exchange over. Signature algorithms enumerates which signing primitives you’re willing to verify. ALPN advertises the application protocols you speak. Supported versions in TLS 1.3 carries the actual version number that the legacy field cannot. Then there are key shares, PSK modes, signed certificate timestamps, and a long tail of optional negotiations.
Two stacks built against the same RFC will not produce identical ClientHellos. They differ in which cipher suites they include, what order those suites appear in, which extensions they advertise, the values inside each extension, and how those values are ordered inside the extension. Curl, Python’s ssl module, OpenSSL’s s_client, Chromium’s BoringSSL, Firefox’s NSS, Go’s crypto/tls, and Node’s tls all produce subtly different packets. Most of the time those differences are uninteresting trivia. To anyone trying to tell a browser from a script, they are gold.
2017: JA3 and the five-field hash
In January 2017, three Salesforce engineers (John Althouse, Jeff Atkinson, and Josh Atkins) published a small Python script and a blog post that turned that observation into a one-line identifier. They called it JA3, after their initials.
The recipe is exactly five fields, pulled out of the ClientHello in this order:
SSLVersion, Cipher, SSLExtension, EllipticCurve, EllipticCurvePointFormatEach field is the decimal value of the relevant byte sequence. Inside a field, multiple values are joined with -. Between fields, the separator is ,. The whole concatenated string is then MD5-hashed to produce a 32-character fingerprint.
A real example from the original tooling, computed against a TLS 1.0 client:
769,47-53-5-10-49161-49162-49171-49172-50-56-19-4,0-10-11,23-24-25,0MD5 that and you get a deterministic identifier for the stack that produced it. The MD5 choice is not a security claim. JA3 is not a hash in the cryptographic sense, it’s a database key. Collisions don’t matter, repeatability does.
What made JA3 useful was that the same stack tended to produce the same string from session to session, day to day, and across IP addresses. So the same Tor browser hashed to e7d705a3286e19ea42f587b344ee6865 whether it was running in Berlin or in Buenos Aires. Trickbot’s TLS client gave you 6734f37431670b3ab4292b8f60f29984. Emotet, 4d7a28d6f2263ed61de88ca66eb011e3. Build a list of those, drop it into the WAF, and you can identify malware families by how they speak TLS, not by what they say over it.
JA3S extended the same trick to the ServerHello, with three fields rather than five:
TLSVersion, Cipher, ExtensionsThe pairing was the real value. Servers respond to different clients differently. TLS 1.3 can negotiate one cipher with Chrome and a different one with curl. But they respond to the same client the same way. So the JA3+JA3S pair, taken together, identifies a specific interaction, not just a specific client. That made it a much sharper detector for command-and-control beacons, where the malware’s TLS client and its server are configured in lockstep.
For about six years, this worked.
What broke it
Two things broke JA3, in roughly that order.
The first was GREASE. Google introduced GREASE (Generate Random Extensions And Sustain Extensibility) in 2016 as a way to keep the TLS ecosystem from ossifying around whatever the deployed clients happened to advertise. The idea is that browsers inject deliberately fake values into their cipher lists and extensions: cipher 0x0a0a, extension 0x1a1a, and so on, drawn from a reserved range. Servers that aren’t ready for unknown values will choke. Servers that are ready will ignore them. Run that for a few years and you’ve selected for an internet that gracefully tolerates new extensions, which is what you want when you’re trying to ship TLS 1.3.
JA3 dealt with this by ignoring GREASE values during fingerprinting. The IETF had reserved sixteen specific values for the purpose, and the canonical reference implementation maintained a lookup table for them. Every fingerprint algorithm that wants to survive GREASE has to do the same.
The second thing was extension permutation. In Chromium 110, shipped in February 2023, Google began randomizing the order of TLS extensions in every ClientHello. Firefox followed in version 114. Same browser, same machine, same destination, and the extension order changed every time you reloaded the page.
JA3 reads extensions in the order they appear in the packet. Permutation broke that. The same Chrome browser now produced a different JA3 hash on every connection. A WAF that had been blocking, say, a known scraper’s JA3 was suddenly seeing thousands of unique hashes per minute, because the ordering inside the extension list was now noise rather than signal. A fingerprint that changes every connection isn’t a fingerprint, it’s a session ID.
You can see this happen on a real network. Fastly’s security team published their first measurements on February 8, 2023, watching the canonical pre-110 Chrome JA3 fall off their network. The hash they were tracking, cd08e31494f9531f560d64c695473da9, started declining on January 20 — the day the Chrome 110 rollout reached enough of the user base to show up in passive traffic. Within a few weeks it had collapsed to a long tail. Chrome’s ClientHello sends roughly fifteen extensions, so the number of possible orderings is on the order of 15! ≈ 1.3 × 10¹². For practical purposes, every connection from a recent Chrome now produces a unique JA3.
The Salesforce JA3 repo on GitHub was archived on May 1, 2025, with the README pointing at FoxIO and JA4 as the maintained successor. In the eight years between, JA3 had become one of the most cited identifiers in network security. By any reasonable measure, it was a successful piece of open infrastructure. By 2023, it was also structurally broken.
2023: JA4
JA4 launched in September 2023, from a company called FoxIO that John Althouse founded after leaving Salesforce. The new fingerprint is not just a different hash function over the same fields. The whole shape of the identifier changed.
JA3 was an opaque MD5. JA4 is segmented:
t13d1516h2_8daaf6152771_e5627efa2ab1The piece before the first underscore is human-readable. t says the connection is TLS over TCP, distinguishing it from q (QUIC) and d (DTLS). That’s the protocol slot JA3 didn’t have. 13 is the negotiated TLS version, decoded from the supported_versions extension if present, falling back to the legacy version field otherwise. d says the SNI extension was present. i would mean absent. 15 and 16 are the cipher count and extension count, with GREASE values stripped before counting. h2 is the first ALPN value, condensed to its first and last alphanumeric characters.
You can read that prefix without computing anything. Two clients with the same prefix are doing the same kind of thing, even before you compare the rest of the fingerprint.
The two twelve-character pieces after the underscores are SHA-256 hashes, truncated. The first hashes the cipher list. The second hashes the extension list, with one twist: signature algorithms get appended to the extension list before hashing, and SNI and ALPN are excluded from it. Their presence already lives in the readable prefix, and their contents would change between sites and protocols, so they’re not useful inside the hash. All of these values get sorted into canonical hexadecimal order before they’re concatenated and hashed.
That sort is the change that fixed permutation. JA3’s extension list was ordered by appearance, so reordering the packet reordered the string. JA4 sorts the extension list by hex value before hashing, so any browser that randomizes its extension order produces the same JA4 hash on every connection. Permutation stops being a problem.
JA4 also doesn’t ship alone. It’s the TLS-client method inside a wider suite called JA4+, which adds JA4S for the server side of the TLS handshake, JA4H for HTTP request fingerprints, JA4L for measuring client-to-server latency, JA4X for X.509 certificate fingerprints, JA4SSH for SSH traffic, JA4T for TCP option fingerprinting, JA4TS for TCP server response, plus a few more. The point is that no single layer is enough. A bot that spoofs its TLS stack might still leak its identity through HTTP headers or TCP window sizes.
The licensing of the suite is split intentionally. JA4 itself, the TLS client method, is BSD 3-Clause, same as JA3. The rest of JA4+ is under a custom FoxIO license that’s free for academic and internal business use but requires an OEM agreement to bake into a commercial product. That split is deliberate. Keep the core fingerprint open so vendors and researchers adopt it, monetize the surrounding suite. So far, vendors have adopted it.
ja4db.com
A fingerprint by itself tells you what a client looks like. To know what it is, you need a corpus. JA3 had a few community lists scattered across blog posts and security vendor knowledge bases. JA4 has ja4db.com, a community database that maps fingerprints to the application, library, OS, or threat actor that produces them.
The model is similar to VirusTotal in spirit, but narrower. Paste in a JA4 hash, get back what’s known about the client that emits it. Contributors submit mappings with their packet captures attached, and the database grows. As of mid-2025 it had thousands of mapped fingerprints across browsers, cloud services, malware families, and infrastructure tooling.
The database matters more than people initially thought it would. Without it, a fingerprint is a string. With it, the same string carries reputation, behavioral context, and historical association. A WAF rule that says “block this JA4” is operationally cheap. The work happens upstream, in the database that decided the JA4 was worth blocking.
How Cloudflare and Akamai actually use it
Cloudflare added JA4 to its bot management product in August 2024, alongside the existing JA3 surface. Their announcement included a number worth pausing on: in a typical hour they observe more than fifteen million unique JA4 fingerprints, derived from over five hundred million user agent strings and billions of source IPs. That ratio tells you something about the shape of the modern internet. There are not fifteen million distinct TLS stacks in active use. There are dozens. The fifteen million is a long tail of mutated and adversarial clients, plus the natural variance from version drift across major browsers.
The interesting move Cloudflare made wasn’t adding the hash. It was layering signals on top of it. JA4 by itself answers what does this client look like? Their JA4 Signals layer answers what does that fingerprint typically do? They publish features like browser_ratio_1h, the percentage of requests with that JA4 that look like browser traffic over the last hour, cache_ratio_1h, the share of cacheable responses, and h2h3_ratio_1h, the share using HTTP/2 or HTTP/3. A single Chrome-shaped JA4 with 94% browser-ratio behavior gets a high human score. A fresh JA4 that’s never been seen before, with zero cache hits and pure HTTP/1.1, gets the opposite.
This is the pattern. Vendors do not use JA4 as a binary block list. They use it as one categorical feature in a bigger score. The hash identifies the cohort. The surrounding signals describe what that cohort typically does. A scraper that copies Chrome’s exact ClientHello bytes, which is what curl-cffi and similar libraries do, will land in the Chrome cohort. But its request volume, cache behavior, and protocol mix will sit in a different distribution from real Chrome users, and that distribution drives the score. The bytes match. The behavior doesn’t.
Akamai shipped JA4 support across its Bot Manager and Application Security Manager products on a similar timeline, plus a community contribution worth knowing about: an EdgeWorker (Akamai’s edge-compute primitive) that computes JA4 client fingerprints in JavaScript, on the edge, against the live ClientHello. The full JA4+ suite is increasingly available as a first-class field in their rule language, alongside the older JA3 fields they’ve kept for backward compatibility.
The pattern across both vendors is the same. JA3 hasn’t been turned off, it’s been demoted. New rules are written against JA4 and behavioral signals. Old rules against JA3 still fire, but their accuracy is steadily decaying as Chrome’s permutation rolls through more of the user base and as more malware authors notice that randomizing extension order is a free way to evade JA3-only detections.
Computing it yourself
The cleanest way to get a JA4 hash is to capture a real session and let a parser do the heavy lifting. Wireshark 4.0.6 and later ship with a JA4 plugin in the main distribution, and tshark will print the field directly:
tshark -r capture.pcap -Y 'tls.handshake.type == 1' \ -T fields -e tls.handshake.ja4That gives you JA4 hashes for every ClientHello in the capture, one per line. To produce a capture against a live site, the simplest invocation is:
sudo tshark -i eth0 -f "host blog.crawlex.net and port 443" \ -w /tmp/handshake.pcap -c 1curl -s https://blog.crawlex.net/ > /dev/nullThen run the JA4 query above against /tmp/handshake.pcap.
For the same value computed by hand, the FoxIO Python reference is short enough to read in one sitting. The core of it is roughly this. Extract the relevant fields from the parsed ClientHello, drop GREASE, sort, hash:
import hashlib
GREASE = {0x0a0a, 0x1a1a, 0x2a2a, 0x3a3a, 0x4a4a, 0x5a5a, 0x6a6a, 0x7a7a, 0x8a8a, 0x9a9a, 0xaaaa, 0xbaba, 0xcaca, 0xdada, 0xeaea, 0xfafa}
def ja4(client_hello): proto = "t" # TLS over TCP for this example
# 13 = TLS 1.3, derived from supported_versions if present version = client_hello.negotiated_version_str # "13", "12", ...
sni = "d" if client_hello.has_sni else "i"
ciphers = [c for c in client_hello.cipher_suites if c not in GREASE] extensions = [e for e in client_hello.extensions if e not in GREASE]
cnt_c = f"{min(len(ciphers), 99):02d}" cnt_e = f"{min(len(extensions), 99):02d}"
alpn = client_hello.first_alpn or "" alpn_chars = (alpn[0] + alpn[-1]) if alpn else "00"
prefix = f"{proto}{version}{sni}{cnt_c}{cnt_e}{alpn_chars}"
cipher_list = ",".join(f"{c:04x}" for c in sorted(ciphers)) cipher_hash = hashlib.sha256(cipher_list.encode()).hexdigest()[:12]
# SNI (0x0000) and ALPN (0x0010) excluded from the extension hash ext_for_hash = [e for e in extensions if e not in (0x0000, 0x0010)] ext_list = ",".join(f"{e:04x}" for e in sorted(ext_for_hash)) sig_algs = ",".join(f"{s:04x}" for s in client_hello.signature_algorithms) ext_hash = hashlib.sha256( f"{ext_list}_{sig_algs}".encode() ).hexdigest()[:12]
return f"{prefix}_{cipher_hash}_{ext_hash}"What this skips is the actual ClientHello parsing. Pulling the cipher list, extension list, SNI presence, ALPN, signature algorithms, and supported_versions out of the binary record is mechanical but length-prefixed everywhere. The FoxIO repository has a clean reference implementation in around 200 lines of Python plus a dpkt-based PCAP reader. Running it against a fresh tshark capture and watching the hash fall out of the same client across multiple connections is the fastest way to convince yourself JA4 is stable where JA3 isn’t.
The first time you do this against your own browser is informative. Run a tcpdump while you reload the same page in Chrome ten times. Compute JA3 for each ClientHello. The MD5 will change every reload, because Chrome’s extension order is now randomized. Compute JA4 against the same captures. The hash is identical every time.
What this all means
The fingerprint world is moving toward structured, layered identifiers. JA4’s design, with its readable prefix and sorted hashes, is a template. The same pattern is showing up in HTTP/2 fingerprinting (Akamai’s framing fingerprint), TCP option fingerprinting (the JA4T family), and the experimental QUIC work that surfaced through 2025. Opaque hashes are out. Segmented identifiers are in. The metadata around the identifier is becoming as important as the identifier itself.
For anyone building scraping infrastructure or anti-bot defenses, the operational story is fairly settled. JA3 alone is no longer a usable detector against modern browsers. JA4 is, but it stops working the moment you treat it as a binary signal. The fingerprint is the cohort. The behavior is the score. Detection lives in the gap between them, and that gap is where vendors compete now.
The TLS handshake remains the clearest window into a client’s identity that the wire offers. As long as the first packet of every connection has to negotiate ciphers in plaintext, that window stays open. The fingerprint just gets sharper.
Frequently asked questions
What five fields go into a JA3 fingerprint and how are they combined?
JA3 pulls five fields from the ClientHello in a fixed order: SSL version, the cipher suite list, the extension list, supported elliptic curves, and elliptic curve point formats. Each field holds the decimal values of the relevant bytes, with multiple values inside a field joined by hyphens and the fields themselves separated by commas. The whole concatenated string is then MD5-hashed into a 32-character identifier. The MD5 is used as a database key for repeatability, not as a security claim.
Why did Chrome's extension order randomization break JA3 fingerprinting?
Starting in Chromium 110, shipped February 2023, Google randomized the order of TLS extensions in every ClientHello, and Firefox followed in version 114. JA3 reads extensions in the order they appear in the packet, so the same browser produced a different JA3 hash on every connection. A WAF tracking a known client's JA3 suddenly saw thousands of unique hashes per minute. With roughly fifteen extensions, the number of orderings is around 15 factorial, so each connection effectively became unique.
How is a JA4 fingerprint structured differently from a JA3 hash?
JA3 is a single opaque MD5, while JA4 is segmented into three underscore-separated parts. The first part is human-readable and encodes the transport protocol, negotiated TLS version, whether SNI is present, the cipher and extension counts with GREASE stripped, and the first ALPN value. The two twelve-character parts that follow are truncated SHA-256 hashes, one over the cipher list and one over the extension list. You can read the prefix without computing anything.
How does JA4 stay stable when a browser randomizes its extension order?
JA4 sorts the extension list into canonical hexadecimal order before hashing it, rather than reading extensions in the order they appear in the packet the way JA3 does. Because the sort removes any dependence on packet ordering, a browser that randomizes its extension order still produces the same JA4 hash on every connection. Reloading the same page in Chrome ten times yields a changing JA3 but an identical JA4.
Why do Cloudflare and Akamai avoid using JA4 as a simple block list?
Both treat JA4 as one categorical feature in a larger score rather than a binary block. The hash identifies a cohort of clients that look alike, and behavioral signals describe what that cohort typically does, such as Cloudflare's browser ratio, cache ratio, and HTTP/2 or HTTP/3 share over the past hour. A scraper copying Chrome's exact ClientHello lands in the Chrome cohort, but its request volume, cache behavior, and protocol mix sit in a different distribution that drives the score down.
Further reading
The TLS ClientHello, field by field: a fingerprinting reference
A field-by-field dissection of the TLS ClientHello, tracing exactly which bytes JA3 and JA4 read: version, cipher suites, compression, extensions, supported_groups, signature_algorithms, supported_versions, key_share, and ALPN.
·19 min readDetecting curl-impersonate and uTLS: the second-order tells
How detectors catch tools that forge a perfect browser ClientHello: the mismatch between the TLS layer and the HTTP/2 frames above it, library-specific residue, header order, and version drift.
·23 min readDataDome's detection model: every signal it collects on the first request
Traces what DataDome evaluates on the very first request, before any JavaScript runs: the TLS/JA4 fingerprint, the HTTP/2 frame profile, the header set, and IP and ASN reputation, and how those signals stack into one decision.
·19 min read