-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tree-sitter based highlighter #5099
base: master
Are you sure you want to change the base?
Conversation
Hello, |
{ create_tree_sitter_highlighter, &tree_sitter_desc } }); | ||
registry.insert({ | ||
"tree-sitter-injection", | ||
{ create_tree_sitter_injection_highlighter, &tree_sitter_injection_desc } }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't care about about highlighting but I'm interested in commands to select syntax tree nodes etc.
LSP provides some of that but it's not optimized for it.
In general, integrations with external tools live in scripts in rc/
.
I wonder if that could work for this feature too?
We can add any missing generic highlighter types like the InjectionHighlighterApplier
to support these cases.
I think there is great value in having an obvious boundary between C++ core and scripts. It keeps us honest.
With the shared library approach, tree sitter can do something that other integrations cannot.
I wonder what's the difference to https://github.com/phaazon/kak-tree-sitter ?
As a user, I think it would be great if we concentrate most effort on one approach.
In either case, I think tree sitter integration is highly valuable and I'd probably follow whichever approach gains traction.
Thanks
Hello, I think it’s an interesting approach, but in the same time, it bothers me a bit. Not because I’m the author of However, I do think that what I made with KTS is super complex and that Kakoune needs more toolings to make it easier to integrate (and I think @krobelus would also benefit from that for |
This is indeed an impressive PR, but I do not intend to merge it for a few reason. First I do not want to have multiple, competing, built-in highlighting to maintain, and I do not want to solely rely on tree-sitter for highlighting. Having both will likely lead to one bitrotting with time. Second (and most importantly), as noted by @phaazon, this goes against Kakoune's design principles. Introducing a dependency directly in core for a functionality that could be implemented externally. I do agree that there are some limitations at the moment, I have not looked at kak-tree-sitter but I suspect it is a far more complex codebase than what you did in that PR, I hope we can find a way to simplify how external plugins that rely on buffer content work. |
MotivationI just spend some time with
|
About the partial updates / edit in place, I have this still open about that topic. It’s not something I have started working on because I want to stabilize the performance and features already (and I think it’s more important to have semantic text-objects first before going full optimizations), but clearly yes, it can have a negative impact on “how fast you see highlighting”. Also, the speed at which |
if the slow part is parsing you can probably work around it by computing a diff so you can use the incremental API.
not necessarily; you can write to Kakoune's socket directly, see https://github.com/tomKPZ/pykak
That's an interesting problem indeed. |
can we have something like vim's text-properties |
So, I wanted to share some thoughts as a relatively new kakoune user that finds the kakoune design very interesting.
I'm not sure this is entirely fair. tree-sitter as it's used here is not an external binary or tool. It is a third-party dependency, it is beyond the standard C & C++ libraries, as well as the POSIX APIs ... but it's a third-party dependency that's very well regarded for solving a very hard problem that kakoune already tries to solve as part of its core offering. This seems like less of breaking the kakoune ideology (as I understand it) and more outsourcing a hard problem to the broader community (which arguably makes kakoune easier to maintain in the long term). The tree-sitter dependency isn't particularly onerous either as it itself is dependent only on the standard C library. So, it's not a language or platform level standard library/API but it's increasingly the standard library for this domain. This also seems like an area where pragmatism may be necessary. There are great command line tools for finding files, listing files, source control, etc. However, that sort of tool just does not exist at an edit level. Also as someone that works in a code with large C++ source files, I would be concerned about the performance implications of marshaling syntax highlighting annotations in a way that doesn't have a lot of overhead or have the potential for a litany of escape-sequence related bugs with arbitrary input. A daemon is potentially viable (and that seems to be the community's taken approach), however a tree-sitter daemon is (while impressive!) particularly exotic (AFAIK). This project is also written in Rust, which is a significant increase in the complexity (in terms of the size of the dependency chain) required to get this functionality vs directly integrating tree-sitter. I've tried to look more into exactly how the kak-tree-sitter project works while writing this up, but unfortunately the project's website seems to be having issues that make browsing the source very difficult (so I've worked off the old GitHub sources).
I found this old comment when investigating the state of tree-sitter as it relates to kakoune: #50 (comment) It really seems like tree-sitter may be the answer to that research problem; I think it would be unfortunate to not integrate tree-sitter since the CST parser and parse rules are handled by a third party and the existing functionality provided via regex highlighters is upgraded/outclassed. My thought on this would be (even if not via this specific PR's implementation):
That to me, seems like it would provide a nice compromise between ideology and pragmatism. Kakoune would still say far far away from the "kitchen sink" that's emacs, but would pick up an easy to use, mature, and highlighly competent syntax highlighting engine. Thanks for reading; I hope these are some helpful thoughts. |
My two unsolicited two cents: I honestly, don't see the point in tree-sitter as more and more LSP servers start adding semantic highlighting. A classic example that I think it's unsolvable is, for example, in Zig you can do: const debug = @import("std").debug;
const print = @import("std").debug.print; Where However, as more and more LSPs are adding semantic highlighting, what's the point of getting wrong highlighting, no matter how fast you go? I am surprised this is not brought up more often, but maybe I am missing something here. |
Thanks for the read! So just to respond to this... I think language servers are really interesting, but they're ultimately different. Using a language server is the ideal because it understands the semantics of the language and can do "the best" that can be done. However, language servers have a lot of overhead associated with them (both technical and human) and there's wide variance in the quality of various language server projects. They can be hard to correctly configure in projects with abnormal build graphs. Personally, my attempts to use clangd (attempted in emacs, Zed, and Helix) on the code base I work on professionally has been very frustrating. When I worked on clang itself I did get it working but the clangd overhead was just absurd (I forget specifics as it's been a few years) since clang is such a large code base. The CSTs used by TreeSitter are pretty much as close as you can get without having a true compiler running for semantic-aware highlighting (and as someone that works on compilers professionally ... those can be expensive). So, I think from a sort of "technology" stand point tree-sitter is "the" solution for something that can actually be reasonable integrated into the binary. It's a good baseline balance of crowdsourced language parsing, performance, competency, and relatively low dependency overhead. It's a big step up from just using regex. |
@DarkArc Yes, I entirely agree with you on clangd, whenever I work on large C++ projects, all my CPUs go to 100% when I modify heavily templated code. ZLS is really fast and lightweight, though. Also, go-pls and js/ts language servers work fine with semantic highlighting with no noticeable overhead. It's a shame Python doesn't have an open-source LSP with semantic highlighting, though. |
Description
This introduces a
tree-sitter
highlighter that maps tree-sitter captures to kakoune faces.This allows kakoune to highlight some recursive grammars that the regions based highlighting is not able to parse (e.g. Nix, Shell, Python).
Example - Nix
This is an example from a Nix codebase i recently visited.
The problem here is that a Nix string can contain interpolated nix expression, which in turn can have nested strings with the same delimiter (e.g
"outer string ${ let var = "inner string"; in var }"
).Building
This highlighter is optional and can be excluded by passing
tree_sitter=no
tomake
.Related Issues
TODO
%val{runtime}/grammars
)combined
injections this would allow us to e.g. highlight the embedded Bash in the example above)