<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          Feathr領(lǐng)英開源的企業(yè)級特征存儲

          聯(lián)合創(chuàng)作 · 2023-09-26 06:28

          Feathr 是領(lǐng)英(LinkedIn)開源的企業(yè)級高性能特征存儲。

          特性:

          • 定義特征:使用簡單的 API,基于原始數(shù)據(jù)源
          • 按名稱獲取這些特征:在模型訓(xùn)練和模型推理期間
          • 共享特征:在你的團(tuán)隊(duì)和公司中

          Feathr 會自動計算特征值并將它們加入你的訓(xùn)練數(shù)據(jù),使用正確的時間點(diǎn)語義來避免數(shù)據(jù)泄漏,并支持實(shí)現(xiàn)和部署你的特征以在生產(chǎn)中使用。

          在本地安裝 Feathr 客戶端

          如果沒有使用 Jupyter Notebook 并且想在本地安裝 Feathr 客戶端,使用這個:

          pip install -U feathr

          或者使用來自 GitHub 的最新代碼:

          pip install git+https://github.com/linkedin/feathr.git#subdirectory=feathr_project 

          亮點(diǎn)

          使用轉(zhuǎn)換定義特征

          features = [
              Feature(name="f_trip_distance",                         # Ingest feature data as-is
                      feature_type=FLOAT),
              Feature(name="f_is_long_trip_distance",
                      feature_type=BOOLEAN,
                      transform="cast_float(trip_distance)>30"),      # SQL-like syntax to transform raw data into feature
              Feature(name="f_day_of_week",
                      feature_type=INT32,
                      transform="dayofweek(lpep_dropoff_datetime)")   # Provides built-in transformation
          ]
          
          anchor = FeatureAnchor(name="request_features",             # Features anchored on same source
                                 source=batch_source,
                                 features=features)

          豐富的 UDF 支持

          Feathr 具有高度可定制的 UDF,具有原生 PySpark 和 Spark SQL 集成,可降低數(shù)據(jù)科學(xué)家的學(xué)習(xí)曲線:

          def add_new_dropoff_and_fare_amount_column(df: DataFrame):
              df = df.withColumn("f_day_of_week", dayofweek("lpep_dropoff_datetime"))
              df = df.withColumn("fare_amount_cents", df.fare_amount.cast('double') * 100)
              return df
          
          batch_source = HdfsSource(name="nycTaxiBatchSource",
                                  path="abfss://[email protected]/demo_data/green_tripdata_2020-04.csv",
                                  preprocessing=add_new_dropoff_and_fare_amount_column,
                                  event_timestamp_column="new_lpep_dropoff_datetime",
                                  timestamp_format="yyyy-MM-dd HH????ss")

          訪問特征

          # Requested features to be joined
          # Define the key for your feature
          location_id = TypedKey(key_column="DOLocationID",
                                 key_column_type=ValueType.INT32,
                                 description="location id in NYC",
                                 full_name="nyc_taxi.location_id")
          feature_query = FeatureQuery(feature_list=["f_location_avg_fare"], key=[location_id])
          
          # Observation dataset settings
          settings = ObservationSettings(
            observation_path="abfss://green_tripdata_2020-04.csv",    # Path to your observation data
            event_timestamp_column="lpep_dropoff_datetime",           # Event timepstamp field for your data, optional
            timestamp_format="yyyy-MM-dd HH????ss")                   # Event timestamp format, optional
          
          # Prepare training data by joining features to the input (observation) data.
          # feature-join.conf and features.conf are detected and used automatically.
          feathr_client.get_offline_features(observation_settings=settings,
                                             output_path="abfss://output.avro",
                                             feature_query=feature_query)

          部署

          client = FeathrClient()
          redisSink = RedisSink(table_name="nycTaxiDemoFeature")
          # Materialize two features into a redis table.
          settings = MaterializationSettings("nycTaxiMaterializationJob",
          sinks=[redisSink],
          feature_names=["f_location_avg_fare", "f_location_max_fare"])
          client.materialize_features(settings)

          并從在線存儲獲取特征:

          # Get features for a locationId (key)
          client.get_online_features(feature_table = "agg_features",
                                     key = "265",
                                     feature_names = ['f_location_avg_fare', 'f_location_max_fare'])
          # Batch get for multiple locationIds (keys)
          client.multi_get_online_features(feature_table = "agg_features",
                                           key = ["239", "265"],
                                           feature_names = ['f_location_avg_fare', 'f_location_max_fare'])
          瀏覽 25
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          編輯 分享
          舉報
          評論
          圖片
          表情
          推薦
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          編輯 分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  豆花视频在线视频 | 久久毛片视频 | 欧美成人午夜无码A片秀色直播 | 人人摸人人看人人草 | 影音先锋在线视频免费观看 |