VectorBT Pro - MultiAsset Data Acquisition
5 min read

VectorBT Pro - MultiAsset Data Acquisition

VectorBT Pro - MultiAsset Data Acquisition

In this tutorial, we will talk about the acquisition of M1 (1 minute) data for various forex currency pairs from dukascopy (a free data provider).The acquired data will be saved to a .hdf file for use in a VectorBT Pro Backtesting project. We will use a nodeJS package called Dukascopy-node to download M1 (1 minute) historical data for the following currency pairs.

You can find the installation instructions and other details for this node package here: https://github.com/Leo4815162342/dukascopy-node

Multi Asset Market Data Acquistion with DukaScopy

npx dukascopy-node -i gbpaud -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpaud -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i eurgbp -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i eurgbp -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i gbpjpy -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpjpy -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i usdjpy -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i usdjpy -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i usdcad -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i usdcad -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i eurusd -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i eurusd -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i audusd -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i audusd -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

npx dukascopy-node -i gbpusd -p ask -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv
npx dukascopy-node -i gbpusd -p bid -from 2019-01-01 to 2022-12-31 -t m1 -v true -f csv

The acquired bid and ask files need to be averaged to get normalized 1 min data and finally saved into a hdf (.h5) file. The code for these processes is as follows:

def read_bid_ask_data(ask_file : str, bid_file : str, set_time_index = False) -> pd.DataFrame:
    """Reads and combines the bid & ask csv files of duksascopy historical market data, into a single OHLCV dataframe."""
    df_ask = pd.read_csv(ask_file, infer_datetime_format = True)
    df_bid = pd.read_csv(bid_file, infer_datetime_format = True)
    merged_df = pd.merge(df_bid, df_ask, on='timestamp', suffixes=('_ask', '_bid'))
    merged_df['open'] = (merged_df['open_ask'] + merged_df['open_bid']) / 2.0
    merged_df['close']= (merged_df['close_ask'] + merged_df['close_bid']) / 2.0
    merged_df['high'] = merged_df[['high_ask','high_bid']].max(axis=1)
    merged_df['low'] = merged_df[['low_ask','low_bid']].max(axis=1)
    merged_df['volume'] = merged_df['volume_bid'] + merged_df['volume_ask']    

    merged_df = merged_df[merged_df["volume"] > 0.0].reset_index()
    ## Case when we downloaded Dukascopy historical market data from node package: dukascopy-node
    merged_df['time'] = pd.to_datetime(merged_df['timestamp'], unit = 'ms')
    merged_df.drop(columns = ["timestamp"], inplace = True)

    final_cols = ['time','open','high','low','close','volume','volume_bid','volume_ask']

    if set_time_index:
        merged_df["time"] = pd.to_datetime(merged_df["time"],format='%d.%m.%Y %H:%M:%S')
        merged_df = merged_df.set_index("time")
        return merged_df[final_cols[1:]]      
    return merged_df[final_cols].reset_index(drop=True)

## Specify FileNames of Bid / Ask data downloaded from DukaScopy
bid_ask_files = {
    "GBPUSD" : {"Bid": "gbpusd-m1-bid-2019-01-01-2023-01-13.csv",
                "Ask": "gbpusd-m1-ask-2019-01-01-2023-01-13.csv"},
    "EURUSD" : {"Bid": "eurusd-m1-bid-2019-01-01-2023-01-13.csv",
                "Ask": "eurusd-m1-ask-2019-01-01-2023-01-13.csv"},
    "AUDUSD" : {"Bid": "audusd-m1-bid-2019-01-01-2023-01-13.csv",
                "Ask": "audusd-m1-ask-2019-01-01-2023-01-13.csv"},
    "USDCAD" : {"Bid": "usdcad-m1-bid-2019-01-01-2023-01-13.csv",
                "Ask": "usdcad-m1-ask-2019-01-01-2023-01-13.csv"},
    "USDJPY" : {"Bid": "usdjpy-m1-bid-2019-01-01-2023-01-13.csv",
                "Ask": "usdjpy-m1-ask-2019-01-01-2023-01-13.csv"},
    "GBPJPY" : {"Bid": "gbpjpy-m1-bid-2019-01-01-2023-01-13.csv",
                "Ask": "gbpjpy-m1-ask-2019-01-01-2023-01-13.csv"},
    "EURGBP" : {"Bid": "eurgbp-m1-bid-2019-01-01-2023-01-16.csv",
                "Ask": "eurgbp-m1-ask-2019-01-01-2023-01-16.csv"},
    "GBPAUD" : {"Bid": "gbpaud-m1-bid-2019-01-01-2023-01-16.csv",
                "Ask": "gbpaud-m1-ask-2019-01-01-2023-01-16.csv"}                                                                           
}

## Write everything into one single HDF5 file indexed by keys for the various symbols
source_folder_path = "/Users/John.Doe/Documents/Dukascopy_Historical_Data/"
output_file_path = "/Users/John.Doe/Documents/qqblog_vbt_pro_tutorials/data/MultiAsset_OHLCV_3Y_m1.h5"

for symbol in bid_ask_files.keys():
    print(f'\n{symbol}')
    ask_csv_file = source_folder_path + bid_ask_files[symbol]["Ask"]
    bid_csv_file = source_folder_path + bid_ask_files[symbol]["Bid"]
    print("ASK File PATH:",ask_csv_file,'\nBID File PATH:',bid_csv_file)
    df = read_bid_ask_data(ask_csv_file, bid_csv_file, set_time_index = True)
    df.to_hdf(output_file_path, key=symbol)

Output

GBPUSD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpusd-m1-ask-2019-01-01-2023-01-13.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpusd-m1-bid-2019-01-01-2023-01-13.csv

EURUSD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurusd-m1-ask-2019-01-01-2023-01-13.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurusd-m1-bid-2019-01-01-2023-01-13.csv

AUDUSD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/audusd-m1-ask-2019-01-01-2023-01-13.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/audusd-m1-bid-2019-01-01-2023-01-13.csv

USDCAD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdcad-m1-ask-2019-01-01-2023-01-13.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdcad-m1-bid-2019-01-01-2023-01-13.csv

USDJPY
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdjpy-m1-ask-2019-01-01-2023-01-13.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/usdjpy-m1-bid-2019-01-01-2023-01-13.csv

GBPJPY
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpjpy-m1-ask-2019-01-01-2023-01-13.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpjpy-m1-bid-2019-01-01-2023-01-13.csv

EURGBP
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurgbp-m1-ask-2019-01-01-2023-01-16.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/eurgbp-m1-bid-2019-01-01-2023-01-16.csv

GBPAUD
ASK File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpaud-m1-ask-2019-01-01-2023-01-16.csv 
BID File PATH: /Users/john.doe/Documents/Dukascopy_Historical_Data/gbpaud-m1-bid-2019-01-01-2023-01-16.csv
💡
Note: The free M1 data, provided by Dukascopy has some missing data and one needs to validate the data quality by comparing it with other preferable paid data sources.

Binance Crypto Data

For the crypto fans VectorBT directly provides a wrapper to fetch data from Binance

## Acquire multi-asset 1m crypto data from Binance

data = vbt.BinanceData.fetch(
    ["BTCUSDT", "ETHUSDT", "BNBUSDT", "XRPUSDT", "ADAUSDT"], 
    start="2019-01-01 UTC", 
    end="2022-12-01 UTC",
    timeframe="1m"
    )

## Save acquired data locally for persistance
data.to_hdf("/Users/john.doe/Documents/vbtpro_tuts_private/data/Binance_MultiAsset_OHLCV_3Y_m1.h5")