Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc-gen: migrate scalar functions (encoding & regex) documentation #13919

Merged

Conversation

Chen-Yuan-Lai
Copy link
Contributor

@Chen-Yuan-Lai Chen-Yuan-Lai commented Dec 27, 2024

Which issue does this PR close?

Part of #13671 .

Rationale for this change

What changes are included in this PR?

As discussed in #13671, this PR will migrate the builtin binary string and regular expression functions documentation that currently support migration.

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Chen-Yuan-Lai 🎉

I noticed the CI is failing on this PR: https://github.com/apache/datafusion/actions/runs/12517861740/job/34921789845?pr=13919

It looks like some of the content is getting lost - when I ran ./dev/update_function_docs.sh locally, the results seem to show the regexp functions having lost their documentation 🤔

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git diff
diff --git a/docs/source/user-guide/sql/scalar_functions.md b/docs/source/user-guide/sql/scalar_functions.md
index be4f5e56b..f9585ce12 100644
--- a/docs/source/user-guide/sql/scalar_functions.md
+++ b/docs/source/user-guide/sql/scalar_functions.md
@@ -1729,12 +1729,12 @@ uuid()
 Decode binary data from textual representation in string.

-decode(expression, format)
+decode(e xpression, format)


#### Arguments

-- **expression**: Expression containing encoded string data
+- **expression**: Expression containing string or binary data
- **format**: Same arguments as [encode](#encode)

**Related functions**:
@@ -1758,167 +1758,6 @@ encode(expression, format)

- [decode](#decode)

-## Regular Expression Functions
-
-Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
-regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
-(minus support for several features including look-around and backreferences).
-The following regular expression functions are supported:
-
-- [regexp_count](#regexp_count)
-- [regexp_like](#regexp_like)
-- [regexp_match](#regexp_match)
-- [regexp_replace](#regexp_replace)
....

#[user_doc(
doc_section(label = "Binary String Functions"),
description = "Decode binary data from textual representation in string.",
syntax_example = "decode(e xpression, format)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syntax_example = "decode(e xpression, format)",
syntax_example = "decode(expression, format)",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thanks for the correction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for the correction, I have fixed the typo

@goldmedal
Copy link
Contributor

I think the PR is just a part of #13671. Modifying the close keyword to part of in the PR description is better.

@Chen-Yuan-Lai
Copy link
Contributor Author

Sure! I'll modfy all the other PRs. Thanks @goldmedal

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 3, 2025
- [regexp_like](#regexp_like)
- [regexp_match](#regexp_match)
- [regexp_replace](#regexp_replace)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be the case the entire doc for 4 functions is vanished, checking

Copy link
Contributor Author

@Chen-Yuan-Lai Chen-Yuan-Lai Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have no idea why the documentation of regular expression functions was lost. Thanks @comphead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason being the document printer checks first the doc section if its doesn't match with predefined enum the printer skips such function. The reason why it skips we need add a description to doc_section to be as

    doc_section(
        label = "Regular Expression Functions",
        description = r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
    ),

The description doesn't match as in attribute the description is not set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a ticket to make user_doc to work with predefined consts.

Currently the doc_section attribute must match fully the predefined DocSection consts, for example

    pub const DOC_SECTION_REGEX: DocSection = DocSection {
        include: true,
        label: "Regular Expression Functions",
        description: Some(
            r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
        ),
    };

In case of doc_section attribute contains any mismatches such function will be silently ignored. I believe we can make doc macros more smart like:

  • the doc_section will be just a string
  • using the string find correspondent const from scalar_doc_sections.doc_sections()
  • in datafusion/macros/src/user_doc.rs when constructing the builder use the const instead of building DocSection manually

@alamb
Copy link
Contributor

alamb commented Jan 8, 2025

Is this PR ready to go? Or are we waiting for something else to finisih it up?

@comphead
Copy link
Contributor

comphead commented Jan 8, 2025

Its not ready yet, it can be fixed by #14001 (preferrable) or alternatively I can do manual correction. I'm sending this to draft for now

@comphead comphead marked this pull request as draft January 8, 2025 22:15
@Chen-Yuan-Lai Chen-Yuan-Lai force-pushed the migrate_scalar_functions_encoding_regex branch from 072d756 to c832494 Compare January 15, 2025 17:40
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 15, 2025
@Chen-Yuan-Lai Chen-Yuan-Lai marked this pull request as ready for review January 16, 2025 01:13
@Chen-Yuan-Lai
Copy link
Contributor Author

@alamb @comphead I think this PR is ready

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @Chen-Yuan-Lai

@comphead comphead merged commit 2159ba3 into apache:main Jan 16, 2025
25 checks passed
@Chen-Yuan-Lai Chen-Yuan-Lai deleted the migrate_scalar_functions_encoding_regex branch January 24, 2025 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants