-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to copy separate tables into table vector? #49
Comments
Hi Andrew, thanks for asking. Unfortunately there is no such function although it would be good to have. It does not exist because it requires code for recursively iterating over tables - a copied table can have a vector of other tables etc. and there could be two references to the same table during copy - if this is not handled, data could explode in size. So it is non-trival to implement, but definitely possible - it would approximately follow the same logic as the generated verifier code and I am open to contributions if anyone feels up to the task. |
@mikkelfj Thanks for the quick reply! Taking on such a task is a bit more than I can manage at the moment, so I'll just rethink my overall approach and try avoid this problem altogether for now. |
I would suggest you implement your own type specific table clone function. It would return a reference to the table and you can push that reference onto a vector or store it as a member of another table. In monster_test you will find examples of how to juggle table creation with references in different ways. You still need to adapt the function, but you only have to do it in one place per schema change. You can also avoid copying all fields, if some are irrelevant and not flagged as required. If you then feel like you need a function as you suggest, you could easily create a non-changing push_clone on top of that. This is what I had in mind when leaving out the table clone feature in the early design. |
Thanks. As a side note, is this even something that the flatbuffers C++ project supports to your knowledge? Just curious. |
I'm not really sure - I believe the C++ version does have some advanced reflection logic that allows you to modify a buffer in temporary storage and rewrite it, and possibly also to copy from an external buffer - but it is probably not the fastest way to go about things. |
You could also write your own purpose specific clone function generator in C using reflection without supporting all sorts of special cases. It could be based on https://github.com/dvidelabs/flatcc/tree/master/samples/reflection |
Another quick, slightly related question. It feels like it should be possible to clone an existing
But I've tried all different combinations of things and can't seem to get anything to work. Is this possible? |
It appears not - though it could be implemented with a simple loop since the string_vec has a push_clone method. If you look at flatbuffers_common_builder.h you'll find there is a vec_clone but no string_vec_clone: #define __flatbuffers_build_vector(NS, N, T, S, A) #define __flatbuffers_build_string_vector_ops(NS, N) |
Note: strictly speaking a string_vec_clone and string_vec_slice should handle multiple references to the same string, so a robust implementation is not that trivial to do. It would require a hash table, and probably an option to not check for duplicates for performance and memory concerns. |
Thanks again for your pointers. I'd like to move back to the original question a bit... Let's say I want to embed a bunch of separately saved flatbuffers each containing a SomeTable_table_t into a brand new flatbuffer that I'm building (literally, I just memcpy them in there, end-to-end), retrieve references to those tables each time I memcpy them in, then populate the SomeTables_tables list with references to each of the SomeTables that i memcpy'd in earlier. Given each flatbuffer containing the SomeTable_table_t is self contained and will by definition contain everything else required by that SomeTable_table_t, why is what I describe not possible? Basically, I don't entirely understand the need for the table recursion as you described it, why can't I just blindly copy in these flatbuffers as "sub flatbuffers", then refer to them from later in the flatbuffer? Is this some limitation of the flatbuffer format that it doesn't allow references of this kind? Thanks! |
As a side note, I actually tried to do something like what I just described using flatcc_builder_embed_buffer, but I didn't get far with that. I believe I'm misunderstanding something fundamental about the flatbuffer format, here. |
This is complex question, but I'll try: The short answer you probably seek is that if a table does not contain any offset types, you can copy the table into a new buffer iff you make sure to also copy the vtable. A string, a vector, and a table are all offset types meaning that these exists it different memory blocks in the existing buffer. You would then have to chase each offset and copy the data, then copy the table referencing those data and update the tables references to the copies. Note that these references are known as uoffset_t types and are NOT the reference type the builder works with. If the offset types do not contain further offset types, the recursion stops there, meaning a table does contain subtables or vectors of tables. But even here, you need to consider the possibility of the same string being referenced in two places and decide if you want to copy the string once, or just duplicate it naively. Updating offsets and vtable references and ensuring proper alignment in the resulting buffer is non-trivial - this is what the builder does for you. If you want to get creative, you can use the untyped low-level flatcc_builder.h interface to build you own data with proper alignment iff you respect the schema the buffer is expected to have. You also need to understand that the ref type in the builder is an relative offset to specific location in the new buffer being built. Even if the buffer does not (yet) physically exist as a single coherent block, the reference will be translated to a relative distance in the final resulting buffer. This final distance is an offset type. You (or the builder) cannot know the offset before you both know the location of the source location and destination location in the new buffer. References exists to work around this temporarily. When a vector is being built piecemeal, the references are pushed, and when the vector closes, all references are converted. Anything added to an open vector must already have been added to the buffer. (Edit: unlike C++ you can add things, such as a subtable or a string, while the vector is open, and then push the reference, but you can't have the reference to push before the object has been created. C++ requires you to create the object even before opening the vector, and then remember all references before creating the vector. The same applies to Go and other languages). You cannot get cannot get a reference into another existing buffer, because it makes no sense as a concept. You also cannot easily access an offset from an existing buffer because it is not exposed - it is always converted into a pointer for easy access. Read only bavigation works internally by adding offsets to pointers (roughly). So, you could theoretically copy a very simple table type, but then it is almost the same as struct type, except for schema evolution. So you might consider using structs instead, and structs can be cloned by the existing builder API. None of this prevents copying tables from one buffer to another, there is just a lot of detail to manage. |
Note on shared strings - which also apply to vectors and tables: If a string is referenced in two different places in the same buffer, it will be stored as two different uoffset_t values that happen to reach the same location. When building a new buffer, a string will get a single string_ref_t value. This reference can be used in several places and will later by converted to offsets, where the offset value depends on the location from which the string is being referenced. |
I don't think that I suggested adding source tables as nested buffers, but if you wish to do that, that is a possibility. But then you must add the entire buffer, not a specific table. Also the generated API should have type specific support for this - you don't have to use the low-level builder API for that. However, I don't think this is what you want to do. You can read and copy a table from a source buffer, as follows: There are two ways to do this, either bottom up: first create all offset types and store the reference returned for each, then open the table and add / clone all scalar and struct fields, and also add the references previously stored. You can also open the table first, then copy or create each field top-down using the old tables field as a source to the new field. A source must be a string pointer, a struct pointer, or a scalar value. You cannot use a pointer to a table or vector, except for scalar/struct vectors which you can use as source because a scalar/struct vector is a self-contained memory block. As discussed, other vectors could theoretically be cloned automatically as well, but there is no support for it. |
@mikkelfj Thanks for all your detailed help in this thread! I'm still absorbing/using flatbuffers and how I can best use them. I may end up implementing this at some point after all -- I'm not sure yet. |
You are welcome - as I said, it is a complex topic you picked :) |
For reference, the limitation is/was actually documented here: https://github.com/dvidelabs/flatcc/blob/master/doc/builder.md#limitations |
Just for the record, I did end up using nested flatbuffers for this situation and it is doing the job. Using the _nest() call as well as the inline call method of creating nested flatbuffers in flatcc is quite convenient to use. |
Also for the record: Copying tables is (even) more difficult than I originally thought: A table start is only aligned to a uoffset size (4 bytes) while individual members can be aligned to a higher value. Thus copying a table might skew alignment of a member field. To handle this each field must be checked to see if they are aligned properly in the target location or the table must be rewritten. For most common cases there are 3 situations:
And two abnormal situation: the field has members forced to a higher alignment and
In praxis this would be a |
@AndrewJDR Cloning of tables, vectors, unions and union vectors is now supported on master branch. |
@mikkelfj This is great. Thanks! I had one question. If I'm using a table clone operation such as:
Does that do the equivalent of calling *_Payload_start() / *_Payload_end() on the newly cloned table, or is the table left open for me to add some fields that perhaps were not added by the clone operation (for lack of those fields being set in the table being cloned from)? In other words, would it ever make sense to do this?
My current assumption is that this will not work because the cloned table is already closed by the _clone operation. So I do something like this instead:
Reason I'm asking: But this leads me to a side question... if you set a field twice, what happens? I.e.
Does 43 overwrite 42 here? And so with this situation (assuming it was even possible -- i'm going to postfix the call with 'start_clone' just for clarity):
(Questions inline with the code above) |
(Just updated my last post with more questions/details) |
It basically boils down to: |
Hi Andrew, these are good questions. I don't think the operation you request is currently avaiable. I chose to focus on pick and clone which starts and ends the target table. I did think a fair bit about the leaving table open, but did not prioritize it thinking you can easily pick subfields manually if you need to manipulate the table anyway, and also that you could easily copy/paste from the generated clone source - have a look if you haven't already - only the refmap thing is a bit intriguing. I also needed to cut down on the work effort to implement this - there are many small intertwined macros. However, it should be fairly straightforward to do now that the ground work has been done, and the names can followed in the source, so if you are brave, you could attempt a PR on this. On the overwriting fields: this is another reason I chose ot not address this problem. The behavior is the that the low-level flatcc_builder.h interface will reject the conflict on purpose. It is quite possible to avoid this, but you a) need to design an interface for this, and b) accept that you might be dumping garbage into the buffer that you are building. But, if you accept that conflicts are errors, then it is not difficult to manage. What would be a good name for such an operation? |
If such a new operation were to be added, it would probably not include a start. When you reject conficts it doesn't matter, but if want to add if not already present, you want to do it on an open table. This kind of conflicting handling would actually be relatively easy to handle - it can test before picking and avoid leaving garbage. flatcc_builder.h has a test to check if a field was already set while a table is still open. |
On you conflict example with create calls: note that it first creates, then adds. This means garbage in the buffer before the high level operation returns an error (either non zero int, or a null ref - I don't recall). It is safe to do if you ignore the error and accept garbage in the buffer - but debug builds will likely assert if you do not flag out those assertions (some tests do that). |
I may give it a shot if I'm feeling brave some day :) Another way of thinking about the syntax:
This maybe fits in better with the current _clone and _pick concepts you have in place, but I'm not sure. Thoughts?
For my use-case, I'm not interested in overwriting existing fields... only defining fields that were not already defined. For example:
In fact, for my purposes, I would probably prefer something like:
Because I consider otherMsg defining the theAnswer field to be an error condition, and would like to know about that if it happens... does anything like |
In summary, I'm mostly looking to define fields that aren't already defined on the open table. Ability to actually overwrite defined fields on the open table is maybe interesting, but goes above and beyond what I'm really looking for here, so unless it's easy to implement I probably wouldn't go that far with it. It also feels like it goes against the design of flatbuffers/flatcc, which is that you're working with the output buffer in a more-or-less in-place fashion. With a field ovewriting feature, we've either got to rewind and deal with the previously written field in some way (probably by rewriting the buffer which I imagine gets messy), or be okay with garbage in the buffer. Eh, I'd rather not fight that part of the design. |
Just focusing on this one question here:
In the generated _common_reader.h there is an _is_present() method, but at the typed (generated) level there is no such thing for the builder. You generally do want to use the untyped builder interface for adding new features. If you go ahead with a PR you could also add an _is_present for the builder. (Maybe I wrote earlier that the low-level builder has a method for this, but I couldn't actually find it - it is in a private branch for an unpublished feature
Implemented as
it is quite similar to Also note the builder flag that hardcodes to ignore duplicate added fields - it is likely not the solution but worth knowing about: |
A agree with this. As to being able to overwrite a field, this is extremely easy a the the flacc_builder_ api level, you just copy the table_add method and enable the flagged out FLATCC_BUILDER_ALLOW_REPEAT_TABLE_ADD But then you have to deal with buffer garbage - cleanup is absolute not possible becuase the content might be cross-atlantic by the time we get to that point - the emitter is streaming. Post cleanup is easy with the new clone facility. You just clone the entire buffer - and if you don't care about identity, you can the refmap to have at most one copy. Worse, the user facing api needs to duplicate add methods - which is quite messy - or alternatively set a temporary runtime flag - which is costly to check and easy to get wrong. Actually, the refmap used with cloning does something similar - you can decide if you want to reuse a clone or not - but this is also marketed as an advanced don't do this at home feature. |
Now, to address you main questions: It would be greate if would consider a PR - I believe we are on the same page. quoting
Yes, this is what I in mind, except for the name pick_all. The problem with that name is that pick takes a table type T and returns another table of type T, so in a sense it already picks all, but you using pick at the wrong semantic level. You can't a member method like
I'm not sure I really have the overview to give the right answer here right now - but the point is that you need to think about the different ways the operation can be used - as a member in a table or standalone (clone can be standalone returning a ref, but your pick_all likely can't).
The actual behavior of the method, OTOH, is quite clear:
You also need to consider what happens at the root level - e.g. you can clone a buffer with table, but does clone_into make any sense here? Likely not. What about nested buffers - likely not. What about structs - there is no presence check - should it just overwrite any null fields? (This problem was also a reason to leave out this feature). Structs already have assign methods and these somewhat map to clone - but it depends on whether the struct is standalone is in roots and unions, or inlined as in tables and vector elements. There is also the separate operation of As to your suggested assertion:
The builder call |
Actually, you have such an operation assuming you had called |
Reopening issue. |
This "known to not be present in the target table" is a good point. This way, if one wants to override a value that might be present in the source table, one need only define that field prior to the clone_into operation (rather than afterward). |
Let's say I have:
I have a whole bunch of these
SomeTable
flatbuffers saved out into separate binary files somewhere on my filesystem as root objects.Now let's say that in my code, I want to create a new table comprised of a vector of those SomeTables flatbuffers:
How can I build this SomeTables buffer as concisely as possible? That is, I'd very much prefer to avoid code like this (pseudocode):
In such an arrangement, I have to update the SomeTables_tables_push_create line when the SomeTable schema changes. I'd prefer a "Take this SomeTable_table_t, copy its contents into the flatbuffer, and push it onto this table vector" function.
Does such a thing exist? e.g.
SomeTables_tables_push_create_from_table_t
in this example?The text was updated successfully, but these errors were encountered: