{
.date=
.title=
Quickjs Jsatom And `jsruntime->atom_array` Part 2
It's now 2021, and I'm still struggling to understand some of the nuance behind the hashing/bucket finding algorithm. I understand how/what hash tables are, but I think with QuickJS's atom hash table, there's some optimizations going on that I haven't quite figured out.
Let's look through it, continuing from last time. The atom hash table is made of two primary data structures: atom_hash
and atom_array
. atom_hash
is the list of indexes into atom_array
for a given hash for a given table size. atom_hash
also always has a length that is a power of two so that for each "resize" of the table, the hash can be masked. The basic usage seems to be:
- we have key string K, and atom value V
- we produce a hash of K
hash(K)
producingH
which is a 32bit integer - since the initial hash size is say 256 (mask of
0xff
) entries, we maskH
to produceH1
by usingH & 0ff
. (H
is then stored onV
so thatatom->hash
isH
and we can reuse it later). H1
is used as the index intoatom_hash
so thatatom_hash[H1]
gives usI
.I
is the index intoatom_array
, such thatatom_array[I]
gives us a containerC
(linked list like object that containsV
).
The difficulties I'm having, is that there is some pointer compression going on with C
so that the lowest(?) bit of &C
says if the entry is empty or not. C
also has the field next_hash
which returns an I
for the next entry, which is where I start to get confused. Does this mean there are multiple I
that map to the same hash? I think so, but it's not stored as a linked list like I would expect, instead the values all seem to be stored linearly in atom_array
, whereas I expected atom_array
to be buckets of linked lists.
What I want to answer for myself is if the whole datastructure in QuickJS is needed in Rust, or if I could just use some standard library datastructures that Rust provides and still maintain all the memory/speed benefits of the handwritten C version. At the moment, I feel like there's still too much escaping my understanding to be able to evaluate it.
I'm still planning/working on doing a port of the algorithm to Rust so I can understand it better, but all the indirection and bit/address manipulation is definitely pushing my abilities in Rust. It's a good thing in the long run, as it helps me udnerstand Rust's type/data system better, but it's quite the steep curve.
}