intermediate level

The Best Personalized Fundamental Stocks Scanner Setup (Part 2)

ZhiZhi Gewu

27 May 2020 — 1 min read

A few weeks ago) (In this post), I have shared with you where to find 10 years of financial statement data and how to use AWS Athena to query the files stored in AWS S3. In this post, I am going to show you how to write a program to download the data and store it in S3.

To download the files using Python in SEC web is not that difficult.

Find the pattern in the download links
Use requests package to get the files
Unzip the files and upload to S3

The basic steps are like that. But there are some details we need to handle. For example, how to keep track of the files we have already processed? We don’t want to keep duplicated copies of data in S3. How can we select only certain file types to be uploaded?

Without further a due, let me show you the code:

It may look complicated. But the main part is in the for loop. I basically loop through all the URLs found, create a folder called zipdate if not existed yet, and extract the downloaded zip files to the folder, and finally upload to s3 path.

To get the URLs for all financial statements data, I used BeautifulSoup to find all href links on the webpage. Then I used regex to filter only those links with zip extension.

Next, to maintain a record of processed URLs and prevent duplication, I used logging package to create a log file containing the URLs already processed. Then, I defined a function called find_url_in_file to retrieve those processed URLs and used set operation to get the symmetric difference. Finally, I also check if the files already existed in s3 using the upload_dir_s3` function.

When Can You Realistically Expect to Outperform the Average?

I tend to feel unconfident in my opinions. Partly, it's because I might look bad if my views are wrong. Another reason is that most of the time, the people in the room share similar views to mine. I feel I have little to add. What makes me

Echoes of Our Minds: How AI is Learning to Think Like Humans

Lilian Weng's recent article, "Why We Think", provides an excellent overview of the current state-of-the-art in AI reasoning. Intriguingly, and perhaps unsurprisingly, the methods employed to enable AI to "think" closely mirror human cognitive processes. These parallels are summarized in the table below. Table

Under the Hood of AI Timelines: How the AI 2027 Forecast is Made

While forecasting how AI will impact my career in Consumer Lending Credit Risk Management (Article Link), I explored the timeline forecast methodology used in AI 2027. Prerequisites Time Horizon The concept of time horizon, proposed by the Model Evaluation & Threat Research (METR) team, states that the "50%-task-completion

AI and the Future of Personalized Education: A Paradigm Shift in Learning

Recently, I've been exploring the theory of computation. With the rapid advancement of artificial intelligence—essentially a vast collection of algorithms and computational instructions designed to process inputs and generate outputs—I find myself increasingly curious about the fundamental capabilities and limitations of computation itself. Concepts such as

Read more

When Can You Realistically Expect to Outperform the Average?

Echoes of Our Minds: How AI is Learning to Think Like Humans

Under the Hood of AI Timelines: How the AI 2027 Forecast is Made

AI and the Future of Personalized Education: A Paradigm Shift in Learning