GS lab

1)When to use kafka:

Use Kafka for real-time data streaming, message queuing, and event-driven architectures.

Key features of Kafka include:

Distributed messaging system
High throughput and low latency
Fault-tolerance and scalability
Persistent storage
Stream processing capabilities
Integration with various programming languages and frameworks
message queuing refers to the process of sending, storing, and managing messages in a queue, where messages are temporarily held until they are processed by a consumer. In Kafka, message queuing involves publishing messages to topics (queues) and allowing multiple consumers to subscribe to these topics to process the messages asynchronously. This decouples producers of data from consumers, enabling efficient communication and scalability in distributed systems.
Brokers:

In Kafka, a broker is a core component responsible for handling message storage, distribution, and replication. It serves as a server instance that stores and manages Kafka topics, which are essentially categorized feeds of messages. Brokers receive messages from producers, store them on disk, and then serve them to consumers when requested.

Key points about brokers:

Message Storage: Brokers store messages persistently on disk, allowing them to be retrieved by consumers even after they have been processed.

Distribution: Brokers distribute messages across partitions within a topic. Each partition is hosted by a single broker, and the broker ensures that messages are evenly distributed across partitions for efficient processing.

Replication: Brokers replicate partitions across multiple nodes to ensure fault tolerance and high availability. This replication factor can be configured to ensure that data is not lost even if a broker fails.

Networking: Brokers communicate with producers and consumers over the network using the Kafka protocol. They handle incoming requests, such as producing messages or fetching messages, and respond accordingly.

Scaling: Kafka clusters typically consist of multiple brokers working together to handle large volumes of data and provide horizontal scalability. Adding more brokers to a cluster can increase throughput and fault tolerance.

In short, a broker in Kafka acts as a storage and distribution node within a Kafka cluster, facilitating the reliable and efficient exchange of messages between producers and consumers.

2)How do we handle duplicates in kafka? Handling duplicates in Kafka typically involves a combination of configuration settings and application-level logic. Here are some common approaches: Producer-side Deduplication: Implement logic within the producer application to generate a unique identifier for each message. This identifier can be based on the content of the message or a sequence number. Before producing a message, the producer checks if the identifier already exists in a storage system (like a database or an in-memory cache). If the identifier exists, the producer can choose to discard the message or take appropriate action. Idempotent Producers: Kafka provides an idempotent producer feature, which ensures that messages with the same key are either successfully delivered exactly once or not delivered at all. This feature is enabled by setting enable.idempotence=true in the producer configuration. Idempotent producers use sequence numbers and retries to ensure that duplicate messages are not produced. Consumer-side Deduplication: Consumers can maintain a record of processed message identifiers (e.g., in a database or a distributed cache). Before processing a message, the consumer checks if its identifier has already been processed. If so, the consumer can discard the duplicate message. Message Timestamps: Include a timestamp in each message and use it to identify and discard duplicates. Consumers can maintain a record of the latest processed timestamp and ignore messages with earlier timestamps. Windowed Deduplication: Group messages into time-based windows (e.g., by using Kafka Streams or other stream processing frameworks) and deduplicate messages within each window based on a unique identifier or message content. Exactly-Once Semantics: Kafka offers exactly-once delivery semantics through transactional producers and consumer groups with read committed isolation. This ensures that each message is processed exactly once, even in the presence of failures or retries. The choice of approach depends on the specific requirements of your application, such as the volume of data, the desired level of consistency, and the trade-offs between complexity and performance. Typescript:

// Define an interface interface User { id: number; username: string; email: string; } // Define a type type Task = { id: number; description: string; completed: boolean; };

/ Example async function returning the interface async function getUserById(userId: number): Promise<User> { // Simulating fetching user data asynchronously from a database or API return new Promise<User>((resolve) => { setTimeout(() => { const user: User = { id: userId, username: 'example_user', email: 'example@example.com' }; resolve(user); }, 1000); // Simulating a delay of 1 second }); }

// Example async function returning the type async function getTaskById(taskId: number): Promise<Task> { // Simulating fetching task data asynchronously from a database or API return new Promise<Task>((resolve) => { setTimeout(() => { const task: Task = { id: taskId, description: 'Example task', completed: false }; resolve(task); }, 1500); // Simulating a delay of 1.5 seconds }); } // Example usage async function fetchData() { const user = await getUserById(1); console.log('User:', user); const task = await getTaskById(1); console.log('Task:', task); } fetchData(); Promises in event loop:

Solving the mystery: where are the promises in the Node.js event loop?

#javascript #beginners #node #programming

When I started learning Node.js and became familiar with the event loop, I asked myself the question: In what phase is the promise fulfilled? I could not find an explicit answer to this question in the documentation.

The event loop in Node.js consists of several phases, each responsible for handling different types of asynchronous operations. Promises, which are a fundamental part of asynchronous programming in JavaScript, are executed in the corresponding phase of the event loop.

The diagram below shows the phases of the event loop:

Timers: In this phase, callbacks to scheduled timers are executed. These timers can be created with functions like setTimeout() or setInterval().

Pending callbacks: This phase performs callbacks for system operations. It includes callbacks for I/O events, network operations, or other asynchronous tasks that have completed but are waiting to be processed.

Idle, prepare: These phases are used internally by Node.js (libuv in particular) and are not managed directly.

Poll: The polling phase is responsible for timing and processing I/O events. It waits for new I/O events and calls them back. If there are no pending I/O events, it may block and wait for new events to arrive.

Check: This phase handles callbacks scheduled with setImmediate(). It makes immediate callbacks immediately after the polling phase, regardless of whether the polling phase was active or blocked.

Close callbacks: In this phase, callbacks associated with "close" events are executed. For example, when a socket or file is closed, the corresponding close event callback is executed in this phase.

So, what about promises?

While the main phases were mentioned earlier, it is important to note that there are other tasks that occur between each of these phases. These tasks include process.nextTick() and the microtask queue (which is where promises appear).

Our schema now looks like this:

The microtask queue is processed as soon as the current phase of the event loop ends, before moving on to the next phase. This ensures that promise callbacks are completed as quickly as possible without waiting for the next iteration of the event loop.

If you, like me, have been looking for an answer to this question, I hope this article has given you the clarity you were looking for. Understanding when promises are executed in the Node.js event loop is essential to effective asynchronous programming. I hope this explanation was helpful to you, allowing you to use promises confidently in your Node.js applications.

T

Node.js: Extract text from image using Tesseract.

In this article, we will see how to extract text from images using Tesseract . So let's start with this use-case, Suppose you have 300 screenshot images in your mobile which has an email attribute that you need for some reason like growing your network or for email marketing. To get an email from all these images manually into CSV or excel will take a lot of time. So now we will check how to automate this thing. First, you need to install Tesseract OCR( An optical character recognition engine ) pre-built binary package for a particular OS. I have tested it for Windows 10. For Windows 10, you can install it from here. For other OS you make check this link. So once you install Tesseract from windows setup, you also need to set path variable probably, 'C:\Program Files\Tesseract-OCR' to access it from any location. Then you need to install textract library from npm. To read the path of these 300 images we can select all images and can rename it to som...

Explore Learn and Grow

Search This Blog

GS lab

Solving the mystery: where are the promises in the Node.js event loop?

So, what about promises?

T

Comments

Post a Comment

Popular posts from this blog

Globant part 1

Node.js: Extract text from image using Tesseract.

CSS INTERVIEW QUESTIONS SET 2