애스크로AIPublic Preview
← 학술논문 검색
학술논문한국지식정보기술학회 논문지2023.02 발행

Real-Time Processing System of E-Commerce User Data Based on Spark Streaming

Real-Time Processing System of E-Commerce User Data Based on Spark Streaming

장도(배재대학교); 가문초(배재대학교); 김은성(배재대학교); 정회경(배재대학교)

18권 1호, 15~24쪽

초록

The advent of the e-commerce era has changed the way people shop, and at the same time, users generate a large amount of data when shopping. These data can be analyzed by offline calculation, but the results of offline analysis lack real-time performance. In this paper, by processing the log data and business data of e-commerce users in real-time, the feedback of the processing results can be quickly realized. The Spark big data computing framework has the advantages of real-time computing capability and high throughput. Spark Streaming, as an extension of Spark core, is the real-time stream processing component of the Spark computing platform. In this paper, the data is processed in real-time through Spark. Through Maxwell, real-time monitoring of business data changes in the MySQL database is performed, and the monitored data is sent to Kafka. Log data is directly sent to Kafka. Spark Streaming consumes the data in Kafka, then performs specific processing on the data according to the requirements, and the processed data is written to the Elasticsearch. In order to achieve exactly once consumption of data, this paper realizes at least one consumption of data by manually submitting offsets. Elasticsearch supports idempotent writes, so it can achieve exactly once consumption of downstream data. Manually submitted offsets are stored in Redis. Finally, specific queries can be performed on the processing results according to business requirements.

Abstract

The advent of the e-commerce era has changed the way people shop, and at the same time, users generate a large amount of data when shopping. These data can be analyzed by offline calculation, but the results of offline analysis lack real-time performance. In this paper, by processing the log data and business data of e-commerce users in real-time, the feedback of the processing results can be quickly realized. The Spark big data computing framework has the advantages of real-time computing capability and high throughput. Spark Streaming, as an extension of Spark core, is the real-time stream processing component of the Spark computing platform. In this paper, the data is processed in real-time through Spark. Through Maxwell, real-time monitoring of business data changes in the MySQL database is performed, and the monitored data is sent to Kafka. Log data is directly sent to Kafka. Spark Streaming consumes the data in Kafka, then performs specific processing on the data according to the requirements, and the processed data is written to the Elasticsearch. In order to achieve exactly once consumption of data, this paper realizes at least one consumption of data by manually submitting offsets. Elasticsearch supports idempotent writes, so it can achieve exactly once consumption of downstream data. Manually submitted offsets are stored in Redis. Finally, specific queries can be performed on the processing results according to business requirements.

발행기관:
한국지식정보기술학회
DOI:
http://dx.doi.org/10.34163/jkits.2023.18.1.002
분류:
학제간연구

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작
Real-Time Processing System of E-Commerce User Data Based on Spark Streaming | 한국지식정보기술학회 논문지 2023 | AskLaw | 애스크로 AI