Skip to content

Research

Apifiny Algo supports the full life cycle of trading strategy research: market data recording, data generation, model fitting, and simulation (aka back-testing).

Market data

You can use self-recorded market data or 3rd party market data to do simulation and data generation.

Record market data

Some pre-recorded sample data can be downloaded here. To use it for your simulation or data generation, you just need unzip it into a directory, and add players into json configuration files like this (Please make sure you change the path configuration to match your data directory):

"players": [
    ["BTCUSDT.HUOBI_Player", ["CobJsonPlayer", {"port": ["BTCUSDT", "HUOBI"], "path": "/data/cob_data"}]], 
    ["ETHUSDT.HUOBI_Player", ["CobJsonPlayer", {"port": ["ETHUSDT", "HUOBI"], "path": "/data/cob_data"}]]
]
We also offer two applications for you to record market data. The program that receives the market data and the program that writes it to the file are:

  • ccc_mktpub: Receive market data and publish it to the specified redis
  • ccc_record:Get the corresponding data from redis and save it locally

Users can record real-time market data of different products on different platforms through configuration files. For specific configuration files, please refer to the ${ALGO_HOME}/examples/ccc_mkput example.

ccc_mktpub.json:

{
    "instance": {
        "log_path": "/local/cchome/devlogs",
        "levels": 20  
    },
     "servers": {
        "redis_server": "127.0.0.1"
    },
    "symbols": [        
        {"port": ["BTCUSDT", "HUOBI"]},
        {"port": ["ETHUSDT", "HUOBI"]},
        {"port": ["DOGEUSDT", "HUOBI"]},
        {"port": ["SOLUSDT", "HUOBI"]},
        {"port": ["ADAUSDT", "HUOBI"]}
    ]
}

ccc_record.json:

{
    "data_path": "/data/cc/prod/cc_record2",
    "instance": {
        "tradeDate":1,
        "is_redis": true,
        "is_optimization":true // true: only save BBO data.   false: save all data
    },  
    "servers": {
        "redis_server": "127.0.0.1"
    },
    "symbols": [        
        {"port": ["BTCUSDT", "HUOBI"]},
        {"port": ["ETHUSDT", "HUOBI"]},
        {"port": ["DOGEUSDT", "HUOBI"]},
        {"port": ["SOLUSDT", "HUOBI"]},
        {"port": ["ADAUSDT", "HUOBI"]}
    ]
}   

3rd party market data

Algo also supports 3rd party market data.

tardis.dev

tardis.dev offers historical data subscription and has 15-day free trial. To use their data for simulation, you can download their data using some python code like below and then setup your players with type TardisPlayer and corresponding data path:

# pip3 install tardis-dev

from lib2to3.pygram import Symbols
from tardis_dev import datasets, get_exchange_details
import logging
import os
import pandas as pd

TARDIS_API = "Please get an API key from tardis.dev"


EXCHANGE_MAP = {
    "binance":"BINANCE",
    "binance-futures":"BINANCE_FUTURES",
    "huobi":"HUOBI",
    "okex-swap":"OKEX_SWAP",
    "okex":"OKEX",
    "okcoin":"OKCOIN",
    "binance-us":"BINANCEUS",
    "ftx":"FTX"
}

SYMBOL_MAP = {
    "binance": ["btcusdt", "ethusdt"],    
    "binance-us": ['btcusd', 'ethusd'],
    "binance-futures":["btcusdt", "ethusdt"],
    "huobi":["btcusdt", "ethusdt"]
    "okex": ["BTC-USDT", "ETH-USDT"],
    "okex-swap":["BTC-USD-SWAP", "ETH-USD-SWAP"],
    "okcoin": ["BTC-USD", "ETH-USD"],
    "ftx": ["BTC-USDT", "ETH-USDT"]
}

def default_file_name(exchange, data_type, date, symbol, format):
    exchange = EXCHANGE_MAP[exchange]
    sym = symbol.upper()
    if exchange=="FTX":
        if symbol[-5:]=='-PERP':
            sym = sym.replace('-PERP', 'USDSWAP')
            exchange = "FTX_SWAP"
            return f"{exchange}_{sym}_{data_type}.{format}.gz"
    sym = sym.replace('-', '')
    return f"{exchange}_{sym}_{data_type}.{format}.gz"

def file_name_nested(exchange, data_type, date, symbol, format):
    return f"{exchange}/{data_type}/{date.strftime('%Y-%m-%d')}_{symbol}.{format}.gz" 

def download_tardis_data(exchange_name, file_types, symbol_list, start_date, end_date, save_path):
    for s_date,e_date in zip(start_date, end_date):
        ss_date = s_date.replace('-', "")
        s_path = f'{save_path}{ss_date}/'
        if not os.path.exists(s_path):
            os.makedirs(s_path)
        try:
            datasets.download(
                exchange = exchange_name,
                data_types = file_types,
                symbols = symbol_list,
                api_key=TARDIS_API,
                from_date = s_date,
                to_date = e_date,
                download_dir=s_path,
                get_filename=default_file_name
            )
        except Exception as e:
            print(e)

def run(save_path, exchange_list, date_range, file_types):
    date_list = [d.strftime('%Y-%m-%d') for d in pd.date_range(date_range[0], date_range[1])] 
    start_date = date_list[0:-1]
    end_date = date_list[1:]
    for exchange in exchange_list:
        download_tardis_data(exchange, file_types, SYMBOL_MAP[exchange], start_date, end_date, save_path)

if __name__=="__main__":    
    exchange_list = ["okcoin", "okex", "ftx", "huobi", "binance", "binanceus"]

    #Please change these to your preferred dates and path
    date_range = ('20220515', '20220530')
    save_path = "/data/ccc_tardis/"

    file_types = ["trades", "book_snapshot_25"]
    run(save_path, exchange_list, date_range, file_types)

You can use Tardis data in your simulation by adding players into json configuration files like this:

"players": [
    ["BCHUSDTSWAP.OKEX_SWAP_Player", ["TardisPlayer", {"port": ["BCHUSDTSWAP", "OKEX_SWAP"], "path": "/data/ccc_tardis"}]], 
    ["BTCUSDTSWAP.OKEX_SWAP_Player", ["TardisPlayer", {"port": ["BTCUSDTSWAP", "OKEX_SWAP"], "path": "/data/ccc_tardis"}]]
]

Security master data

We store security master data such as ticksize under directory algo_sdk/config/symbol_info/, and you can get all related data of a symbol using SymbolInfo class in C++ code.

Simulation

To run simulation (back testing), you first create a json configuration file that defines your strategies and data players, and then run ccc_sim_trader. It will generate order files and trade files. You can then use python scripts or other tools to analyze the trading results.

Here is an example:

Configuration file: my_sim_cfg.json

{
    "instance": {
        "license_id":"",
        "license_key":"",        
        "log_path": "/data/ccclogs/sim1",
        "name": "sim1"
    },
    "sim": {
        "ioc_only": true,
        "use_tbbo": true,
        "delay_o2a_us": 0,
        "delay_a2m_us": 0
    },
    "fees": {
        "OKEX_SPOT": {
            "make": 0.0002,
            "take": 0.0006
        },
        "OKEX_SWAP": {
            "make": 0.0001,
            "take": 0.0003
        }
    },
    "players": [
        ["BTCUSDTSWAP.OKEX_SWAP_Player", ["TardisPlayer", {"port": ["BTCUSDTSWAP", "OKEX_SWAP"], "path": "/data/ccc_tardis"}]], 
        ["ETHUSDTSWAP.OKEX_SWAP_Player", ["TardisPlayer", {"port": ["ETHUSDTSWAP", "OKEX_SWAP"], "path": "/data/ccc_tardis"}]]
    ],
    "risk_formulas": [
        ["Port_Risk", ["RiskFormula", {"components": [[["BTCUSDTSWAP", "OKEX_SWAP"], 1.0], [["ETHUSDTSWAP", "OKEX_SWAP"], 1.0]]}]]
    ],
    "accounts": [
        [10001, ["Account", {"risk_formulas": ["Port_Risk"], "id": 10001}]]
    ],
    "symbols": [
        {"port": ["BTCUSDTSWAP", "OKEX_SWAP"], "cid": 10001}, 
        {"port": ["ETHUSDTSWAP", "OKEX_SWAP"], "cid": 10002}
    ],
    "samplers": [
        ["ts_bt", ["TimeSampler", {"halflife": 1800, "msecs": 1000}]]
    ],
    "pricing_models": [
        ["BTCUSDTSWAP.OKEX_SWAP_midpx", ["MidPx", {"port": ["BTCUSDTSWAP", "OKEX_SWAP"]}]], 
        ["ETHUSDTSWAP.OKEX_SWAP_midpx", ["MidPx", {"port": ["ETHUSDTSWAP", "OKEX_SWAP"]}]]
    ],
    "variables": [
        ["BTCUSDTSWAP.OKEX_SWAP_br", ["BookReturn", {"port": ["BTCUSDTSWAP", "OKEX_SWAP"], "ref_pm": "BTCUSDTSWAP.OKEX_SWAP_midpx"}]], 
        ["ETHUSDTSWAP.OKEX_SWAP_br", ["BookReturn", {"port": ["ETHUSDTSWAP", "OKEX_SWAP"], "ref_pm": "ETHUSDTSWAP.OKEX_SWAP_midpx"}]]
    ],
    "models": [
        ["BTCUSDTSWAP.OKEX_SWAP_m", ["SimpleModel", {"variable": "BTCUSDTSWAP.OKEX_SWAP_br"}]], 
        ["ETHUSDTSWAP.OKEX_SWAP_m", ["SimpleModel", {"variable": "ETHUSDTSWAP.OKEX_SWAP_br"}]]
    ],
    "strategies": [
        ["BTCUSDTSWAP.OKEX_SWAP", ["CCTakerStrategy", {"symbol": "BTCUSDTSWAP", "trade_market": "OKEX_SWAP", "use_margin": true, "account": 10001, "use_separate_logs": true, "model": "BTCUSDTSWAP.OKEX_SWAP_m", "dep_pm": "BTCUSDTSWAP.OKEX_SWAP_midpx", "use_bps_thold": true, "take_thold": 4.0, "low_take_thold": 3.0, "max_sweep_bps": 2.0, "pos_expanding_cooloff": 1000, "cooloff": 1000, "ioc_notional": 1000, "max_notional": 4000, "max_risk": 8000, "start_time": "00:30:00", "end_time": "23:59:59"}]],
        ["ETHUSDTSWAP.OKEX_SWAP", ["CCTakerStrategy", {"symbol": "ETHUSDTSWAP", "trade_market": "OKEX_SWAP", "use_margin": true, "account": 10001, "use_separate_logs": true, "model": "ETHUSDTSWAP.OKEX_SWAP_m", "dep_pm": "ETHUSDTSWAP.OKEX_SWAP_midpx", "use_bps_thold": true, "take_thold": 4.0, "low_take_thold": 3.0, "max_sweep_bps": 2.0, "pos_expanding_cooloff": 1000, "cooloff": 1000, "ioc_notional": 1000, "max_notional": 4000, "max_risk": 8000, "start_time": "00:30:00", "end_time": "23:59:59"}]]
    ]
}

To run simulation for one day:

ccc_sim_trader YOUR_DIR/my_sim_cfg.json 20220520

The simulation process record orders and trades into csv log files. The log path is configured in instance->log_path. You can get into that directory to check the details.

Users can use a preferred infrastrure to run multiple day simulation and analyze the results ealisy. Below is a simple simulation process:

Run multiple-day simulation on a multil-core server:

gen_dates.py -sd 20220515 -ed 20220530 | parallel -j 100 ccc_sim_trader YOUR_DIR/my_sim_cfg.json 

After the simulation is done, let's analyze the generated trade logs:

sim_ana.py -p /data/ccclogs/sim1 -sd 20220515 -ed 20220530 

We will see trading summary like this (The result is cut from a full sim result, so please ignore the data inconsistence in the table):

dates: 302
+---------------+-----------+-----------------+---------------+--------------+---------------+---------------+---------------+------------+
| symbol        |      edge |   netUSD_Sharpe |   netUSD_mean |   netUSD_std |   pnlUSD_mean |   trdCnt_mean |   volUSD_mean |   vol_mean |
|---------------+-----------+-----------------+---------------+--------------+---------------+---------------+---------------+------------|
| BTCUSDTSWAP.S |  -1.7859  |      -0.0107418 |      -2.12864 |      198.165 |       1.44714 |       13.6954 |      11919.2  |    27.3311 |
| ETHUSDTSWAP.S |   1.62199 |       0.0143254 |       2.18322 |      152.402 |       6.22124 |       18.8477 |      13460.1  |    45.6026 |
| ALL           |   5.05387 |       0.136787  |      41.6943  |      304.812 |      66.4443  |      134.5    |      82499.8  |   788.076  |
+---------------+-----------+-----------------+---------------+--------------+---------------+---------------+---------------+------------+\

Data generation

You can use Apifiny Algo’s sim engine to generate data and fit models. The data generation functionality creates a data frame that contains two types of data: variable values and dependent values. The variable value gives the current value of a certain feature, and the dependent value gives the forward return or another future value. So the first one can be used as X and the latter one as Y in model fitting algorithms.

To generate data, you first create a configuration file that defines the variables and dependents that you want to generate. The configuration also needs to contain a printer node to specify how to sample and save the values.

Below is an example of the data generation config file:

{
    "instance": {
        "license_id": "",
        "license_key": "",
        "log_path": "~/tmp"
    },
    "players": [
        ["BTCUSDT.BINANCE_Player", ["TardisPlayer", {"path": "/data/cc/prod/ccc_tardis_2/ccc_tardis", "port": ["BTCUSDT", "BINANCE"]}]]
    ],
    "symbols": [
        {"cid": 20001, "port": ["BTCUSDT", "BINANCE"]}
    ],
    "samplers": [
        ["ts_basis", ["TimeSampler", {"halflife": 1200, "msecs": 500}]], 
        ["ts_basis_2", ["TimeSampler", {"msecs": 100, "halflife": 60}]]
    ],
    "pricing_models": [
        ["BTCUSDT.BINANCE_midpx", ["MidPx", {"port": ["BTCUSDT", "BINANCE"]}]], 
        ["BTCUSDT.BINANCE_askpx", ["AskPx", {"port": ["BTCUSDT", "BINANCE"]}]], 
        ["BTCUSDT.BINANCE_bidpx", ["BidPx", {"port": ["BTCUSDT", "BINANCE"]}]]
    ],
    "variables": [
        ["BTCUSDT.BINANCE_midpx", ["PriceVar", {"pm": "BTCUSDT.BINANCE_midpx"}]], 
        ["BTCUSDT.BINANCE_askpx", ["PriceVar", {"pm": "BTCUSDT.BINANCE_askpx"}]], 
        ["BTCUSDT.BINANCE_bidpx", ["PriceVar", {"pm": "BTCUSDT.BINANCE_bidpx"}]], 
        ["BTCUSDT.BINANCE_bboSpread", ["Sub", {"v1": "BTCUSDT.BINANCE_askpx", "v2": "BTCUSDT.BINANCE_bidpx"}]], 
        ["BTCUSDT.BINANCE_trend", ["Trend", {"pm": "BTCUSDT.BINANCE_midpx", "sampler": "ts_basis_2"}]], 
        ["BTCUSDT.BINANCE_trend_ema", ["VarEma", {"variable": "BTCUSDT.BINANCE_trend", "sampler": "ts_basis"}]]
    ],
    "dependants": [
        ["BTCUSDT.BINANCE_dep_0", ["TwapDependant", {"port": ["BTCUSDT", "BINANCE"], "pm": "BTCUSDT.BINANCE_midpx", "tgt_pm": "BTCUSDT.BINANCE_midpx", "base_gap": 0, "dur": 1000}]]
    ],
    "printers": [
        ["BTCUSDT.BINANCE_printer", ["CsvPrinter", {"path": "YOUR PATH", "dependants": ["BTCUSDT.BINANCE_dep_0"], "variables": ["BTCUSDT.BINANCE_midpx", "BTCUSDT.BINANCE_askpx", "BTCUSDT.BINANCE_bidpx", "BTCUSDT.BINANCE_bboSpread", "BTCUSDT.BINANCE_trend", "BTCUSDT.BINANCE_trend_ema"], "printing_sampler": "ts_basis_2"}]]
    ]
}

Compared to simulation config file, data generation config file doesn't need to define model and strategy component. Additionally, you need to specify the path to store data, and what dependants(y) and variables(x) you want to print out, as well as printing_sampler in printers node.

Inputting a Model

If you already have a model, you can use the variable LinearSum to hold it.

{
    "variables": [
        [
            "My_Model",
            [
                "LinearSum",
                {
                    "variables": [
                        {
                            "weight": 0.01343993278740841,
                            "variable": "BTCUSDT.HUOBI_bid_spread",
                        },
                        {
                            "weight": 0.028607602977861364,
                            "variable": "BTCUSDT.HUOBI_ask_depth",
                        },
                        {
                            "weight": 0.004402537428023949,
                            "variable": "BTCUSDT.HUOBI_ask_spread",
                        },
                        {
                            "weight": 0.05897431517277036,
                            "variable": "BTCUSDT.HUOBI_bid_depth",
                        },
                    ],
                }
            ]
        ]
    ]
}

Model Fitting / Machine Learning

Apifiny Algo is designed with a data-driven trading process in mind. You can use sklearn or other tools to fit models, and plugin them into Algo applications through json configuration. Below are some fitting algorithms you can try:

  • Linear Models: OLS, Lasso, Ridge, ElasticNet
  • Non-linear Models: simple decision tree, GBRT, Random forest