Navigation :

Streaming 서비스

문서 생성일: 2020-03-15 08:54:32 +0900 KST

OCI Streaming 제약 사항

OCI Streaming은 다음과 같은 제약을 갖습니다.

메시지가 스트림에 저장되는 최대 기간: 7일
단일 메시지 최대 사이즈: 1MB
각 파티션은 최대 초당 1MB의 데이터를 처리할 수 있음
각 파티션은 최대 초당 5 번 Read API 호출을 처리와 1MB의 저장을 처리
각 파티셔은 최대 초당 읽기 2MB와 초당 쓰기 1MB를 처리
각 테넌시는 5개 파티션 제약 설정이 되어 있음, 필요시 Resource 증설 요청 (SR)

API에서 파이션은 문자열로 표시됨 5개 파이션을 갖는 스트림을 생성할 때, - 스트링은 - “0” - “1” - “2” - “3” - “4” - “5”

offset은 밀집도가 높지 않음 Offset은 숫자가 커짐 그 값은 항상 1씩 올라가지 않음 미래에 offset 계산을 하려하지 말것 예를 들어서 같은 파티션에 메시지가 게시될 때, 첫번째 메시지는 offsetdl 42이고 두번째 메시지는 offset이 45일 수 있음 43과 44는 존재하지 않을 수 있음

스트림을 만드는 방법 - oci console - API

스트림은 특정 리전과 tenancy에서 생성됨 stream 데이터는 전체 리전에 복제됨 AD 장애에 대해서 내고장성을 갖춤 고가용성을 갖춤

프로비저닝할 때 파티션을 지정 파이션 생성 시간: 10초 소요

The number of partitions for your stream depends on the throughput expectations of your application (expected throughput = average recond size x maximum number of records written per sectond).

스트림의 파티션 수는 응용 프로그램의 예상 처리량에 따라 변함 (예상 처리량 = 평균 수신 크기 x 초당 기록 된 최대 레코드 수).

스트림의 처리량 - OCI 스트림의 처리량은 파이션으로 정의됨 - 파티션 당 1MB/sec 저장과 2MB/sec 읽기 지원

파티션에 초당 요청 수 - 1,000개 요청 발생 가능

SDK - https://github.com/oracle/oci-java-sdk/releases - https://github.com/oracle/oci-python-sdk/releases - https://github.com/oracle/oci-ruby-sdk/releases - https://github.com/oracle/oci-go-sdk/releases

스트림에

How do I emit data into a stream? Once a stream is created and active you can publish messages. For publishing, you can use the Write API (putMessages). The message will be published to a partition in the stream. If there is more than one partition, the partition where the message will be published is calculated using the message’s key.

스트림에 메시지 게시 - 스트림이 생성되고 활성화되면 메시지 게시 가능 - 게시 방법: Write API, putMessage - 메시지는 한개 파티션에 저장됨 - 피티션 선정에 message key로 계산됨

Message에 Key가 null이면, 키는 value의 일부분으로 계산됨 null을 갖는다고해서 도일한 파티션이라고 예측할 수 없음, 왜냐면 파티션 스키마는 변경됨 null key를 넘으면 랜점 파티션에 메시지르 넣은 효율적인 방밥

OCI Streaming에서 순서를 보증하는 방법 - 동일한 Value를 갖는 메시지를 동일한 파ㅣ션에 넣는 방법 - 이 메시지에 동일한 key를 넣는다.

메시지의 저장 안전성을 보증하는 방법 - OCI Streaming API가 에러 없이 putMessage의 답변을 보내면, 해당 메시지는 안전하게 저장된 상태

최대 크기 이상을 요청한다면? - 용량을 초과하면 request를 거부하고 error eception을 발생 - 용량이 1M 이상인것…. - 1초에 5번 이상을 날려 보자…

Throttle 메나니즘 다음 임계점을 초과할 때 발생 됨
- GetMessages: 초당 5번 호출 혹은 파티션당 2MB/s
1MB 이상 메시지 처리법
- Object Storage를 경유할 것
- Chunking으로 잘라서 처리할 것
date 지원 포멧
- ISO-8601
메시지를 게시하면서 스트림 서비스의 파티션 수와 offset을 획득하는 방법
- PutMessagesResultsEntry 클래스의
  - getPartition 사용
  - getOffset 사용

Partial failure?
- Trottling때문에 Partial failure 발생 가능
- 서비스 리턴
  - Status Code 200
OSS는 Free Tiral로 제공하지 않음

consumer의 gap을 알아내는 법 - 현재 시간과 time stamp q비교 차이가 커지면…… 지연되는것 - producer가 더 빠른것

corsor 타입 - TRIM_HORIZON, AT_OFFSET, AFTER_OFFSET, AT_TIME, and LATEST. F

cursor는 null이 아니다. 5분귀 expired consuming을 계속하는 한, cursor를 다시 만들 필요는 없음 - GetMessages - GetMessages, Commit, and Heartbeat은 다음 호출을 위한 cursor를 반환

얼마나 많은 메시지를 얻는가? from getMessage - GetMessageRequest의 getLimit는 최대수를 반환 - 10,000까지 지정 가능 - 서비스는 가능한 메시지 반환

스트림의 throutput 초과를 피하기 위해서 Consider your average message size to avoid exceeding throughput on the stream.

Streaming service getMessage batch sizes are based on the average message size published to the particular stream.

How long does an instance have to heartbeat before timing out? - o heartbeat 34초 타임아웃efore the 30-second timeout. For example, if a message is taking too long to process, we recommend that the instance send a heartbeat.

consumer groups - 각 인스턴스는 아하 이상의 파이션으로 부터 메시지를 받는다. - 자동으로 할당됨 - 동일한 메새지를 다른 인스턴스에 전달되지 않음 - 인스턴스 확장 가능, 새로운 인스턴스가 그룹에 조인 Consumer groups provide the following advantages:

=========

예제 - 다시 읽기 offset 지정

https://blogs.oracle.com/cloud-infrastructure/extending-oracle-streaming-with-kafka-compatibility https://blogs.oracle.com/cloud-infrastructure/announcing-oracle-cloud-infrastructure-streaming

OCI Streaming 서비스의 가격 모델 원친은 PAYG(Pay-as-You-Go, 사용 기준 과금 모델)입니다. Streaming 서비스에 대한 선불 금액이나 최소 비용은 없습니다. 오로지 자원을 사용한 만큼 비용으로 계산됩니다.

GET/PUT request price (GigaBytes of data transferred) Please refer to the pricing guide for actual pricing of OCI Streaming.

Let’s consider a scenario where a data producer puts 500 records per second in aggregate and each record is 2kB. The customer wants to egress/retrieve data at a rate twice that of ingress. Also the customer wants to store this data for 7 days.

Price calculation/day (just as an example)

Each record size = 4kB (rounded to 4kB for any record less than 4kB)

In this scenario, total amount of data produced per day = 500 * 4 * 24 * 60 * 60 kB = 172.8 GB Total amount of data retrieved = Twice that of Produce = 2*172.8 GB = 345.6 GB PUT Request price/day = $172.8 * $xx= $A GET Request price/day = $345.6 * $xx = $B Data storage cost = $172.8*247$yy = $C Total bill/day = $(A + B + C) Optional:

Extended data retention is an optional cost determined by the amount of additional days of retention beyond the default 24-hour retention (GigaBytes of storage per hour)

참고 문서

https://www.oracle.com/kr/big-data/streaming/faq.html

작성자: 김태완

사랑하는 민수와 데이터 관리, 데이터 분석 & 클라우드에 집중하고 있습니다.

E-mail: taewanme@gmail.com

Disclaimer

이 저작물은 Oracle과 관계없이 개인으로서 개인의 시간을 할애하여 작성된 글 입니다. 본 글의 내용, 입장, 예측은 Oracle을 공식적으로 절대 대변하지 않습니다.