-
-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix accuracy of sourcepos for inline elements #452
Comments
Including the comment from @kivikakk in the PR, as a starting point of discussion: How awkward. In all these cases: - A
B - A
B - A
B the inline parser is initialised with the subject This will require something of a rearchitecting — sourcepos info needs to be associated and carried through from much earlier in the block processing stage, with the possibility of multiple spans per content block[1] — and I have no real inclination to do such work at present. In light of all this, for now I'm going to adjust to match upstream (#301 (comment)) and not output inline-level sourcepos in HTML in the same cases as cmark. Block-level sourcepos will be unaffected, as it's not known to have issues. [1] In the examples above, we still need to be passing through the same inline This changes a lot of the bookkeeping on the block parser side, and a complete rework of sourcepos handling on the inline parser side, so I can't even say for sure that I won't hit an even bigger roadblock. Also worth noting in conjunction is this comment about autolink sourcepos unsoundness. |
I've started researching this a little, trying to get familiar again with how this works, and play around with a few things. When I take simple text, such as
I get
So it doesn't count any leading spaces. So it seems like we would want the same behavior for a following line in the same paragraph. Meaning
should yield
But right now it seems like we're always adding the |
As an additional researching angle, I've been looking at the output from https://github.com/jgm/commonmark-hs. For example
|
I know you said this
but it wasn't until I stepped through the code that it clicked into place. Any leading spaces have been stripped off, so we have no idea how many spaces on each line were stripped off. |
Yes, indeed; the necessary bookkeeping must start at the block parser, since it's the one doing the normalising. Both parsers would need to be adjusted to communicate that information. What's likely to be simplest would be having the block parser give the inline parser a list of tuples like This would probably remove some work from the inline parser, such as the bookkeeping done in But the changes in the block parser itself to enable this would be significant, in a way that's hard to estimate without first looking carefully (and probably trying a speculative approach or two). It's not set up for this kind of thing at all — presumably why it wasn't added to |
My initial thought was maybe having a vector of offsets, say |
If it works, it works! 😸 |
I spun up #453 to play with some ideas |
I think we can close this with #453 merged. Will open another issue if any other problems are found. There may come a day when we can move inline sourcepos out of experimental status, but it certainly needs more battle testing before that can happen. @kivikakk would you have time to release a |
Will do. |
This issue is for discussing how we might fix the accuracy of
sourcepos
for inline elements.Simple recreation
This should generate an AST that looks like this
Instead, the last
text
hassourcepos="2:1-2:1"
Related issue Unexpected sourcepos and PR Inline sourcepos fixes which moved outputting of inline sourcepos into an option,
experimental_inline_sourcepos
Two other examples captured in tests are
comrak/src/tests/core.rs
Lines 580 to 603 in 9f4d391
comrak/src/tests/core.rs
Lines 532 to 555 in 9f4d391
The text was updated successfully, but these errors were encountered: