MDEV-31669: use XXH3 as a digest instead of md5#4573
MDEV-31669: use XXH3 as a digest instead of md5#4573MohamedM216 wants to merge 1 commit intoMariaDB:mainfrom
Conversation
gkodinov
left a comment
There was a problem hiding this comment.
This is a preliminary review. Right now there's no strong opposition against removing MD5 from this. But the choice of a replacement hash function is yet to be determined. I can see the logic in Sergei's argument that using a crypto hash for a non-crypto purpose is kind of a mismatch. But I will leave that discussion for the final review now.
First of all: please make sure the change compiles and runs all buildbot test in a satisfactory way.
899f3b3 to
3e7b460
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Please switch this to XXH3, as suggested by Sergei. I will then do a full review of the result once this is done.
3e7b460 to
241cb0c
Compare
241cb0c to
ea765b3
Compare
|
Apologies, I accidentally closed this PR. |
gkodinov
left a comment
There was a problem hiding this comment.
Thanks for switching to XXhash.
OK to push from me, given that you add a commit message to the commit that follows CODING_STANDARDS.md.
Please wait for the final review.
ea765b3 to
233436e
Compare
|
Hi @gkodinov , I've updated the commit message. Thanks! |
vuvova
left a comment
There was a problem hiding this comment.
the main comment here: I've looked at xxhash.h that you're using and noticed that it has 128-bit hash. This is the same width as MD5 so less changes and also same chance of collison, 64-bit hash has it much higher.
could you try with XXH3_128bit please?
mysql-test/include/check_digest.inc
Outdated
|
|
||
| --disable_query_log | ||
| create table test._digests(d varchar(32) primary key); | ||
| create table test._digests(d varchar(16) primary key); |
There was a problem hiding this comment.
let's keep it at 32 here. It'd be good to know that existing applications (that query and, perhaps, temporarily store these digests) will continue to work
sql/sql_digest.cc
Outdated
| compute_md5_hash(md5, | ||
| (const char *) digest_storage->m_token_array, | ||
| digest_storage->m_byte_count); | ||
| XXH64_hash_t res = XXH3_64bits(digest_storage->m_token_array, |
There was a problem hiding this comment.
I wonder, perhaps we should use XXH3_128bits? it exists in the same file, so as easy to use. but has the same width as md5, so less changes and more importantly same chance of collisions as before.
|
OK, I'll work on that. Thanks for the feedback! |
bc14604 to
8ad7bd6
Compare
vuvova
left a comment
There was a problem hiding this comment.
Looks good! Changes are small and from the user point of view nothing changes whatsoever. Except that the digest is faster and it no longer fails under strict FIPS mode.
just a couple of very minimal changes and I'll approve.
mysql-test/include/check_digest.inc
Outdated
| begin | ||
| declare digest_exists tinyint; | ||
| if length(digest) != 32 or conv(digest, 16, 10) = 0 then | ||
| if length(digest) != 32 or digest is null then |
There was a problem hiding this comment.
this change probably can be reverted completely
storage/perfschema/table_helper.h
Outdated
| uint m_schema_name_length; | ||
| /** Column DIGEST. */ | ||
| char m_digest[COL_DIGEST_SIZE]; | ||
| char m_digest[DIGEST_HASH_TO_STRING_LENGTH + 1]; |
8ad7bd6 to
1a0ae81
Compare
|
@MohamedM216, thanks, looks good. This is what happens now: MDEV-31669 is moved to the "In-Testing" status, if everything is fine, a tester will approve it and it'll be pushed into main in time for the 13.0.1 release which is planned for the beginning of May. There is nothing you need to to anymore, unless testing finds bugs. |
|
👍 |
|
Small typo (s/128/32/): Should read, for example: Another small typo: Should read: Also, in the commit message, Does not seem applicable. |
|
@MohamedM216 Hi! Testing is ongoing, especially in the area of performance, ref MDEV-31669. Please see my last comment from above & please also add a MTR testcase. For example, you could show how before the patch a given MD5 hash is expected while after the patch a XXH3 hash is present. For an example/some ideas see this comment (it's a long testcase; there is no need for that, a simple sanity level check will do!). Thank you |
|
@MohamedM216 Hi! Please see this message. Thank you! |
|
Hi @mariadb-RoelVandePaar , I’ve seen the latest updates, that’s great news! Thank you for taking the time to test the PR, I really appreciate it. I’ve been a bit busy over the past few days, but I’ll resume working on the remaining tasks shortly. |
|
Thank you! |
1a0ae81 to
1520fc7
Compare
|
Changes applied. Looking forward to your review. |
- XXH is significantly faster than both MD5 and SHA-256 - Here we use the 128-bit XXH3 - Keep DIGEST_HASH_SIZE at 16 bytes (same as MD5) - Reduce COL_DIGEST_SIZE to 32 chars - Keep DIGEST columns as VARCHAR(32)
1520fc7 to
437fa81
Compare
Jira Issue number for this PR: MDEV-31669
I followed mysql implementation as suggested in jira.