Intra

Stella981
• 阅读 545

Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. Kafka differs from traditional messaging systems in that:

  1. It's designed as a distributed system that's easy to scale out.

  2. It persists messages on disk and thus can be used for batched consumption .

  3. It offers high throughput for both publishing and subscribing.

  4. It supports multi-subscribers and automatically balances the consumers during failure.

Replication

With replication, Kafka clients will get the following benefits:

  1. A producer can continue to publish messages during failure and it can choose between latency and durability, depending on the application.

  2. A consumer continues to receive the correct messages in real time, even when there is failure.

All distributed systems must make trade-offs between guaranteeing consistency, availability, and partition tolerance (CAP Theorem). 

Our goal was to support replication in a Kafka cluster within a single datacenter, where network partitioning is rare, so our design focuses on maintaining highly available and strongly consistent replicas. 

Strong consistency means that all replicas are byte-to-byte identical, which simplifies the job of an application developer.

Strongly consistent replicas

There are two typical approaches of maintaining strongly consistent replicas. 

Both require one of the replicas to be designated as the leader, to which all writes are issued. 

The leader is responsible for ordering all incoming writes, and for propagating those writes to other replicas (followers), in the same order.

The first approach is quorum-based. The leader waits until a majority of replicas have received the data before it is considered safe (i.e., committed). On leader failure, a new leader is elected through the coordination of a majority of the followers. This approach is used in Apache Zookeeper and Google'sSpanner.

The second approach is for the leader to wait for "all" (to be clarified later) replicas to receive the data. When the leader fails, any other replica can then take over as the new leader.

We selected the second approach for Kafka replication for two primary reasons:

  1. The second approach can tolerate more failures with the same number of replicas. That is, it can tolerate f failures with f+1 replicas, while the first approach often only tolerates f failures with 2f +1 replicas. For example, if there are only 2 replicas, the first approach can't tolerate any failures.

  2. While the first approach generally has better latency, as it hides the delay from a slow replica, our replication is designed for a cluster within the same datacenter, so variance due to network delay is small.

To understand how replication is implemented in Kafka, we need to first introduce some basic concepts.

In Kafka, a message stream is defined by a topic, divided into one or more partitions.

Replication happens at the partition level and each partition has one or more replicas.

The replicas are assigned evenly to different servers (called brokers) in a Kafka cluster. Each replica maintains a log on disk. Published messages are appended sequentially in the log and each message is identified by a monotonically increasing offset within the log.

The offset is logical concept within a partition. Given an offset, the same message can be identified in each replica of the partition. When a consumer subscribes to a topic, it keeps track of an offset in each partition for consumption and uses it to issue fetch requests to the broker.

点赞
收藏
评论区
推荐文章
blmius blmius
4年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
美凌格栋栋酱 美凌格栋栋酱
9个月前
Oracle 分组与拼接字符串同时使用
SELECTT.,ROWNUMIDFROM(SELECTT.EMPLID,T.NAME,T.BU,T.REALDEPART,T.FORMATDATE,SUM(T.S0)S0,MAX(UPDATETIME)CREATETIME,LISTAGG(TOCHAR(
Wesley13 Wesley13
4年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
1年前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Wesley13 Wesley13
4年前
java将前端的json数组字符串转换为列表
记录下在前端通过ajax提交了一个json数组的字符串,在后端如何转换为列表。前端数据转化与请求varcontracts{id:'1',name:'yanggb合同1'},{id:'2',name:'yanggb合同2'},{id:'3',name:'yang
Jacquelyn38 Jacquelyn38
4年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Stella981 Stella981
4年前
Android So动态加载 优雅实现与原理分析
背景:漫品Android客户端集成适配转换功能(基于目标识别(So库35M)和人脸识别库(5M)),导致apk体积50M左右,为优化客户端体验,决定实现So文件动态加载.!(https://oscimg.oschina.net/oscnet/00d1ff90e4b34869664fef59e3ec3fdd20b.png)点击上方“蓝字”关注我
Wesley13 Wesley13
4年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Wesley13 Wesley13
4年前
35岁是技术人的天花板吗?
35岁是技术人的天花板吗?我非常不认同“35岁现象”,人类没有那么脆弱,人类的智力不会说是35岁之后就停止发展,更不是说35岁之后就没有机会了。马云35岁还在教书,任正非35岁还在工厂上班。为什么技术人员到35岁就应该退役了呢?所以35岁根本就不是一个问题,我今年已经37岁了,我发现我才刚刚找到自己的节奏,刚刚上路。
Python进阶者 Python进阶者
1年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这