Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow downloading non periodic filings / non financial reports #32

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jsertx
Copy link

@jsertx jsertx commented Dec 27, 2024

Description

This add support to download non financial reports which do not have period of report field.

Consideration

  • The full list is not added as the list of submission types is huge.
  • I thought about hard coding the list of non_periodic_filing_types list to item_lists.py but this way, there is no need to create a pull request to support other filings.

Disclaimer

I am concerned that the repository is focused on financial reports, but it might be of interest to add support for other kind of filings, as many people (like I did), reached this repo looking to crawl any kind of edgar report.

@jsertx jsertx marked this pull request as ready for review December 27, 2024 13:19
@jsertx jsertx marked this pull request as draft December 27, 2024 13:20
@jsertx jsertx marked this pull request as ready for review December 27, 2024 13:26
@jsertx jsertx changed the title Allow downloading non periodic filings Allow downloading non periodic filings / non financial reports Dec 27, 2024
@eloukas
Copy link
Collaborator

eloukas commented Dec 28, 2024

Thanks @jsertx for your PR.

Yes, while the toolkit started by focusing on financial-heavy reports with specific dates (like 10-Ks), I do see that there are more and more reports that could be interesting. And, of course, they are related to the repo.

Out of curiosity, could you tell us a bit more how you use form-144? Most of the research I've seen is directed towards 10-Ks, 10-Qs, 8-Ks and S-1 filings.

@eloukas
Copy link
Collaborator

eloukas commented Dec 28, 2024

Couple things more, specifically about the PR:

  1. Can you maybe link us to two to three filings of Form 144 (actual SEC/EDGAR URLs) to see how the website is structured and what kind of fields are there?
  2. Note: The website does have some rare inconsistencies (I remember some "state of incorporation" fields were empty while crawling 10-k/10-q data). Have you checked multiple filings to verify that your code works in batch? I've never worked with such forms btw, sorry for "double-checking". Just wanna make sure that your code applies to the majority of the Form 144 filings, and not few of them (for example only of one company!)
  3. Could you also post some screenshots of the output you have with the current proposed PR? Is anything changed in the metadata file btw ?

Many thanks once again for your interest!

@jsertx
Copy link
Author

jsertx commented Dec 29, 2024

Hey!

Out of curiosity, could you tell us a bit more how you use form-144? Most of the research I've seen is directed towards 10-Ks, 10-Qs, 8-Ks and S-1 filings.

This form, also called Notice of Proposed Sale of Securities is used to indicate the intention to sell from an insider.

I will answer the other comments later.

@jsertx
Copy link
Author

jsertx commented Dec 29, 2024

Hey, thanks to you (and the rest of the contributors) for this great project. I’m happy to do my bit :)> Couple things more, specifically about the PR:

  1. Can you maybe link us to two to three filings of Form 144 (actual SEC/EDGAR URLs) to see how the website is structured and what kind of fields are there?

Re: GigaCloud, Berkshire Hathaway, Alibaba

  1. Note: The website does have some rare inconsistencies (I remember some "state of incorporation" fields were empty while crawling 10-k/10-q data). Have you checked multiple filings to verify that your code works in batch? I've never worked with such forms btw, sorry for "double-checking". Just wanna make sure that your code applies to the majority of the Form 144 filings, and not few of them (for example only of one company!)

Re: I was not aware of those inconsistencies, though I have tested with the tickers below and It downloaded the txt files
["AAPL", "GCT", "GE", "BRK-A", "OXY", "TWLO", "BABA"]

  1. Could you also post some screenshots of the output you have with the current proposed PR? Is anything changed in the metadata file btw ?

Re: The crawler properly downloads the txt files from the report, but I found two issues:

  • col htm_file_link not filled.
  • col State of Inc filled with weird values.

Let me get back to you in a few days after I fix those issues and test deeper.
By the way, If this PR gets merged I plan to also work on the extractor!

Best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants