List isn't the quickest operation. Listing here will be marginally better than listing in the data directory (assuming the number of list files is smaller), but are we still going to take a MTTR hit if we have to list every time we open a region?
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 10, 2021
Approver
In general, I guess S3 will store the metadata in something HBase, where the object names are sorted, and a list of a directory is just a prefix match, so if we just listing the tail elements of a path, it will be much faster than listing the head elements, as we will only scan all the things we want, while listing the head elements, we need to scan all the objects...
Reply was deleted
Show more
Show less
Comment details cannot be verified
Zach York
Jul 9, 2021
Approver
I guess I should more clearly state my concern with this approach. The main reason to remove rename from the critical processes is to reduce the burden on the filesystem/store and improve performance. If as a part of writing, we need to read and write another metadata file, I don't see how we are going to improve performance by that much or reduce the burden on the filesystem.
Do you have any perf numbers from a prototype that show this is beneficial? Why choose a file rather than using something we already guarantee is fast such as a being backed by a WAL?
I think regardless, this is an optional implementation, not the default, so I think it's okay to implement, but I would have these concerns if we ever were considering changing the default.
Reply was deleted
Show more
Show less
Comment details cannot be verified
2 replies
New
Michael Stack
Jun 9, 2021
Approver
Code in FSTableDescriptors which does something similar... if of use.
Reply was deleted
Show more
Show less
Comment details cannot be verified
0 replies
New
Michael Stack
Jun 9, 2021
Approver
Sorry. I don't get this bit. Where is the renaming happening? I was thinking we would always write a new storefilelist... cleanup of the old would be an after-task. No rename?
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 10, 2021
Approver
This is for the store file. We still need to support writing store file at another place(tmp directory, or just another place when bulkloading), and then commit it by renaming it to the store file directory.
Reply was deleted
Show more
Show less
Comment details cannot be verified
1 reply
New
Michael Stack
Jun 9, 2021
Approver
Nice
Reply was deleted
Show more
Show less
Comment details cannot be verified
0 replies
New
Michael Stack
Jun 9, 2021
Approver
A file is coherent/non-corrupt if it can be pb parsed? I suppose this good enough if a list... since pb will complain if list gets cut off.. .. No need of an EOF marker...
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 10, 2021
Approver
Yes, if it can be parsed to a pb object then it is fine. If not, we just skip to the next one.
Reply was deleted
Show more
Show less
Comment details cannot be verified
1 reply
New
Zach York
Jun 9, 2021
Approver
User configured or HBase is toggling this?
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 11, 2021
Approver
Sorry, what do you mean by toggling?
Reply was deleted
Show more
Show less
Comment details cannot be verified
Zach York
Jul 9, 2021
Approver
I meant does HBase control setting this config (it knows it should fall back to loading again) or the user needs to configure this.
Reply was deleted
Show more
Show less
Comment details cannot be verified
2 replies
New
Michael Stack
Jun 9, 2021
Approver
It would be sweet.
Reply was deleted
Show more
Show less
Comment details cannot be verified
0 replies
New
Zach York
Jun 9, 2021
Approver
You mean remove the storefile list file when the feature is not enabled?
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 10, 2021
Approver
Remove the broken store files. As now we write to store file directory directly, a region server crash in the middle of a flush and compaction could lead to partial store files in the store file directory, the old code can not deal with this situation.
Reply was deleted
Show more
Show less
Comment details cannot be verified
1 reply
New
Zach York
Jun 9, 2021
Approver
Won't writing a new file be significantly slower than updating a HBase row (the original proposal)? How much faster is it than an actual rename (copy + delete)?
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 10, 2021
Approver
We want the storage system to be self maintained. As we rely this storage system to store our region data, if it depends on a region, it will introduce cyclic dependency. I will not say this is impossible, but I just do not like this style of architecture. Performance comparing about write a new metadata file and write a hbase row is not the most important thing here, as they are all much faster than reanming the actual data file on S3 right? This is the problem we need to solve here.
Reply was deleted
Show more
Show less
Comment details cannot be verified
1 reply
New
Michael Stack
Jun 9, 2021
Approver
Would it help if on bulk load, after moving the file under store dir, we also add an empty file named for the bulk load under the .storefilelist dir. The Region deletes this file after it has been successfully added to the list of storefiles... and this file has been successfully written? If crash during writing on reopen, it retries adding the bulk loaded file. Bulk load does not complete successfully until the new store file and the store file pointer get written.
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 11, 2021
Approver
What if we just crash before writing the empty file but after we move the file to store directory? Still the same problem?
Reply was deleted
Show more
Show less
Comment details cannot be verified
Michael Stack
Jun 12, 2021
Approver
The bulk load is not successful until the write of the empty file
Reply was deleted
Show more
Show less
Comment details cannot be verified
张铎
Jun 14, 2021
Approver
But we will still remove the file when opening the region again? Or maybe we need to redesign the bulk load process to make it transactional? Like a procedure?
Reply was deleted
Show more
Show less
Comment details cannot be verified
3 replies
New
You're suggesting
Gemini created these notes. They can contain errors so should be double-checked. How Gemini takes notes
Drag image to reposition
11
10
9
8
7
6
5
4
3
2
1
1
2
3
4
5
6
7
8
9
10
Outline
Outline
Document tabs
HBASE-25988 Store the store file list by a file
11
Headings you add to the document will appear here.