Blog | Page 2 of 11

Forecasting Time Series Data with Facebook Prophet

Nov 14

Posted: Nov 25, 2024

Updated: Nov 14, 2025

Prophet is an open-source library available in python and R for predictive data analysis, developed by Facebook's Core Data Science Team and released in 2017. 8 years later, it is still evolving through research and community contributions. Ongoing development in Prophet includes further multivariate support, enhanced scalability for large datasets, and community-driven extensions. Active research involves integrating new statistical methods and improving performance in complex, high-frequency datasets.

Facebook Prophet’s Core Modeling Framework

Prophet is built on a statistical modeling framework, not a standard machine learning one. It uses an additive model to capture key components of time series data: trend, seasonality, and holidays. The approach relies on curve-fitting rather than optimization of parameters through iterative learning. Although Prophet doesn't inherently use a separate training and test dataset, you can manually split your data into historical data for fitting the model and future data for evaluating the model's performance.

Compared to other models like ARIMA which assume a linear relationship between past and future values, Prophet's strength is that it automatically detects "changepoints" and accordingly adjusts forecasts, making it robust to outliers and flexible in handling seasonality and non-linear trends.

Oddly enough, one of the main draws of Prophet is also one of its core weaknesses. Its approach for handling changepoints can result in both underfitting and overfitting. In this post, I will share the methods that I previously used when implementing prophet for weekly revenue forecasting.

The first step is not specific to Prophet. Follow best practice and do a simple plot of your historical data before jumping into the forecast. The visual may help you to notice any unexpected trends or gaps that could skew the forecast. Although Prophet is inherently robust to outliers, you should still manually exclude data that is obviously "wrong" or unhelpful, such as internal test campaigns or an unwanted spike in reseller activity. It's usually easier to do the data cleansing before putting data in Prophet.

External Regressors

Prophet is designed for univariate time series analysis. Multiple variables can be added as external regressors.

The business that I was looking at did sales at seemingly random times throughout the year, so I couldn’t just use holiday or yearly seasonality. Instead, I relied almost solely on variables (“flags”) to indicate whether a sale occurred that week, and whether a sale started or ended that week.

model.add_regressor('sale_y_n', prior_scale=40.0)

For multiple related output variables, just model each one separately as univariate time series, or use the forecast values of one variable as inputs to the next. You can also consider more advanced models like VAR or LSTM.

Hyperparameter Tuning

In the previous code snippet, I had set prior_scale=40.0, indicating that this regressor should be heavily factored.

In Prophet, the below hyperparameters are set as default, so you may need to tune them to balance underfitting and overfitting.

changepoint_range=0.8
changepont_prior_scale=0.05
seasonality_prior_scale=10
holidays_prior_scale=10
fourier_order=10

Holidays

Prophet has built-in holidays for some countries, but is quite limited. f the country that you are looking for is not included in Prophet, use python-holidays to add them manually. Below is an example where I extracted Japan's public holiday dates from 2023 to 2025.

years = [2023, 2024, 2025]
jp_holidays = holidays.Japan(years=years)
japan_holidays = pd.DataFrame({
'ds': list(jp_holidays.keys()),
'holiday': list(jp_holidays.values())
})
model = Prophet(holidays=japan_holidays)

Data Normalization

Since Prophet is a statistical model, it does not require the same level of data flattening as machine learning models. However, some data normalization may be beneficial depending on what you're working with. In my original dataset, the sale_y_n variable was an integer between 0 and 7, indicating how many day out of that week were within a sale period. The raw scale of 0 to 7 may introduce bias to the forecast, so I first normalized the sale_y_n variable using sklearn MinMaxScaler.

Performance Metrics

By default, prophet uses 80% confidence interval, meaning there's an 80% chance that the actual value will fall between yhat_lower and yhat_upper. However, Prophet's performance can be hit or miss depending on the use case. If it's still not working, try multiple approaches and use the best model that performs well on cross-validation, using the following metrics to evaluate the accuracy.

Mean absolute error (MAE): measures the average magnitude of errors, using the same unit as the data
Root mean squared error (RSME): measures the square root of the average of squared differences. Squaring puts more weight on larger errors, so RSME is a useful metrics when larger errors are especially costly. What constitutes a "good" RMSE depends
Mean absolute percentage error (MAPE): measures the average of the absolute percentage errors, expressed as a percentage. Generally MAPE under 10% is considered very good, 0-20% is considered good, and even up to 50% can also be acceptable in some use cases.

When evaluating the accuracy of your model's predictions follow the parsimony gradient - start with the simplest model and end at the most complex as needed. The simplest option is a naive model, or a seasonal naive model if you have significant seasonality or repetitive sales patterns. Another straightforward method is to use a simple average or exponential smoothing with seasonal adjustments. For greater flexibility, look into machine learning models such as XGBoost, LightGBM, or Random Forest Regressor.

Related Resources for Facebook Prophet

To learn more about Facebook Prophet, visit its website for tutorials and further documentation about forecasting. Also, check out the Github repo for examples and runnable notebooks.

For further reading, I recommend Forecasting Time Series Data with Facebook Prophet by Greg Rafferty. This is a beginner-friendly book that starts off with a brief introduction to time series methods that are used across the forecasting domain, then goes on to concepts and hands-on code examples that you can use right away.

How to Set Up a Google Search Ads Campaign

Nov 9

From B2B tech to affiliate marketing, I’ve helped a wide range of businesses hit the ground running with Google Ads. In this post, I will explain how to set up a new Google search ads campaign, covering best practices across all of the necessary steps including campaign structure, assets, bid strategy, and conversion tracking.

Review Historical Data (if available)

If your brand has never run any Google Ad campaigns before, skip to the next session. Otherwise, I recommend checking the historical data in GA4 before relaunching, because you may uncover valuable insights for optimizing campaign performance

Google Ads traffic will automatically show up in GA4 as medium=cpc. In the Acquisition overview report, the card that shows sessions by session campaign is dedicated to Google Ads campaigns. Apart from the standard reports, you can also build custom exploration reports. For example, you could build a report that combines the keywords that you bid on, with the search term that a user entered on Google before clicking on the ad.

Conversion Tracking in GA4

In this section, I will go over conversion tracing in GA4. For conversion tracking in Google Tag Manager, please see my previous blog post.

Conversion tracking using GA4 is easy, but it requires several steps. First, link the Google Ads and GA4 accounts. To check whether the accounts are linked properly, in GA4, the Google Ads account should be listed under “Google Ads Links.” In Google Ads, the GA4 property should appear under Linked accounts.

Next, import Key Events from GA4 to use as conversion actions in Google Ads. Here’s a helpful tutorial video on how to do this.

When creating your ads in GA4, make sure that the final URL has UTM parameters.

For example:

https://datachai.com/utm_source=google&utm_medium=cpc&utm_term={keyword}&utm_campaign=datachai&utm_content=none&utm_campaign_id={campaignid}

The above UTM template includes two dynamic values. {keyword} and {campaignid} automatically insert the actual search query that triggered the ad, and the Google Ads campaign ID, respectively. Google Ads replaces these placeholders with real-time values when someone clicks your ad.

To ensure that your conversion tracking was set up correctly, do a conversion test by clicking on your own ad and generating a conversion, such as submitting a contact form. You should see your conversion in GA4 within a few seconds. Note that GA4 takes 24-48 hours to get attributed, so your conversions on Google Ads will be delayed by that much.

Campaign Structure

A well-organized Google Ads account is built on a foundation of purpose-driven campaigns. I recommend creating separate campaigns for:

Non-Brand: Themed campaigns grouped by core products, services, or target industries.
Competitor: To target users considering your competitors.
Brand: To capture high-intent users already searching for your brand. Aim for 70%+ impression share on brand terms. To better control the cpc, further split your brand campaign into two.
- Core brand: Target only the pure brand name (ex. "Data Chai")
- Brand + modifier: Target keywords that combine your brand name plus a modifier, such as a service or product (ex. "Data Chai digital marketing", "Data Chai pricing")

A quick word on competitor campaigns: they can be expensive, so it's completely acceptable to pause this if it doesn't align with your current budget. However, a critical best practice is to add competitor names as negative keywords across all your other campaigns. This prevents your non-competitor ads from triggering on these costly searches, ensuring your budget is spent efficiently.

Google’s algorithm needs options to optimize performance, so aim for at least 2 ads per ad group to enable testing. If you’re unsure of what to test, minimum variation is fine. For example, use straightforward copy in the one version (ex. “Marketing Automation Solutions”), versus urgent or price-sensitive copy in another (ex. “15% Off, Limited Time Offer”). To save time, draft ad copy in a spreadsheet for bulk uploads.

Keyword Match Types

For maximum control and cost-efficiency, I recommend starting with exact match keywords. Google's phrase match has become increasingly expensive and less precise in recent years. Instead, use phrase match strategically as a research tool to discover new, high-performing keywords to later add as exact match keywords.

Search Partner Network Placements

When setting up your Google search ad campaigns, you'll encounter the option to include Google search partners. Search Partners include other Google properties and websites, such as the Google play store, Google maps, and YouTube. I recommend turning off search partners, as I noticed that it generates mostly junk traffic and leads. In my experience, I've never seen or heard of any clients or colleagues getting positive results with it.

Bidding Strategy

In this section, I'll describe how to choose the appropriate bidding strategy for your business goal. Google offers several different manual and automated bidding methods: maximize clicks, maximize conversions, maximize conversion value, target impression share, and manual cpc.

If your business goal is lead generation or conversions, follow this basic strategy: start with manual cpc bidding to maintain strict budget control while the algorithm gathers data. Then, once your campaign consistently generates 30-50 conversions per month, consider switching to maximize conversions.

However, if your business goal is brand awareness, then you might try target impression share (TIS) bidding. TIS is an automated bidding strategy that allows marketers to set a target for the percentage of impressions they'd like their ads to receive, in relation to the estimated number of impressions available. I'd recommend focusing on ad placement at the "Top of results page", because absolute top can be extremely expensive, but visibility significantly drops as you move down the page.

Note that target impression share is not a smart bidding strategy, but is an automated budding strategy. Google's algorithm will ignore your bid adjustment settings, with the one exception being a -100% device bid adjustment, which you can use to opt out of showing your ads on a specific device type.

Once you've set up and launched the ad campaigns, monitor them especially closely over the first 1-2 weeks. Avoid over-optimizing too soon because Google's algorithm needs conversion data. I'd wait for a minimum of 50 clicks before acting on any early trends such as underperforming ads or converting keywords.

---

Launching a new Google Search Ads campaign requires careful planning —from ad copywriting to technical implementation. Whether you’re setting up a new account or relaunching an existing one, these best practices will help you to hit the ground running and maximize ROAS. Remember to start with manual CPC to control spend, give the algorithm enough data before making major adjustments, and leverage GA4 for reporting.

If you found this post helpful, share it with a colleague, or bookmark it for your next Google Ads campaign setup.

Ready to take the next step? If you're looking to launch a new Google Ads campaign and require a digital marketing specialist, feel free to reach out to me —I offer full-service including campaign setup, conversion tracking, and ongoing optimization and reporting.

How to Execute SFTP File Transfers Using Python and Paramiko

Jun 10

I was working on a project where I needed to download a file from a remote SFTP server, and upload it to Databricks. There are multiple ways to achieve this such as netmiko, pysftp, and paramiko scp. Python has several libraries for interacting with SFTP servers, but the solution that I ended up using is a python script with paramiko client.

What is SFTP?

SFTP (SSH File Transfer Protocol) is a secure file transfer protocol that operates over the SSH (Secure Shell) protocol. It enables users to access, transfer, and manage files securely over a network, and can be accessed via free tools such as Cyberduck, FileZilla, and WinSCP. While newer protocols like HTTP/3 and WebDAV have come out, SFTP continues to be a relevant tool for secure file transfers.

What is paramiko?

According to paramiko.org documentation, paramiko is a python implementation of SSHv2. It provides both client and server-side functionality. To connect to the remover server and transfer the file or perform any operations, we need to create a paramiko client as shown below.

SSH_Client= Paramiko.SSHClient()

Connecting to the Remote Server

Once we have created a paramiko client, we can then use the username and password for authentication to connect to the server.

ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.connect(hostname=host,port=port,username=username,password=password)

The best practice is to store your login credentials in an encrypted format, such as in Azure Key Vault, rather than hardcoding them.

AZURE_KEY_VAULT = "abc123"
password = dbutils.secrets.get(scope = AZURE_KEY_VAULT, key='supersecretpassword')

This not only enhances security, but also eliminates the need to update credentials across multiple code files whenever the username or password changes.

Transferring Files

To transfer files, we need to first create an SFTP session between the client and the server, using the SSHClient that we have created earlier.

ftp = ssh_client.open_sftp()

Then, do the API call to import the file from SFTP to DBFS or another local system.

dbfs_file_path = "/dbfs/xxxyyy.csv" # update
ftp_file_path = "/Import/xxxyyy.csv" # update
files = ftp.get(ftp_csv_path, dbfs_file_path)

If you need to import a file from local system to SFTP, use the function ftp.put

files = ftp.put(dbfs_file_path, ftp_csv_path)

Once the file transfer is done, close the session by calling close() function.

ftp.close()
ssh_client.close()

For more on SSH sessions with paramiko, check out this tutorial video by DevOps Journey, which also covers additional settings such as look_for_keys and AutoAddPolicy.

13 4 5