Processes & Threads (Q1–Q12)
A process is an independent program in execution with its own address space, code, data, stack, heap, file descriptors, and PCB. Heavy — creating a process is expensive, and context switching is slow.
A thread is a unit of execution within a process. Threads in the same process share code, heap, and global data — but each has its own stack and registers. Lighter than processes — creation and context switching are faster.
| Aspect | Process | Thread |
|---|---|---|
| Memory | Own address space | Shared with siblings |
| Communication | IPC (pipes, sockets) | Shared memory (direct, faster) |
| Isolation | Yes — crash doesn't affect others | No — crash kills all threads |
| Creation cost | Expensive (fork/exec) | Cheap (pthread_create) |
5-state model: New → Ready (admitted to ready queue) → Running (CPU allocated) → Waiting/Blocked (waiting for I/O) → Terminated. Additionally: Suspended states (swapped to disk to free memory).
Key transitions: Ready → Running (scheduler dispatch), Running → Waiting (I/O request), Running → Ready (preemption/time quantum expiry), Waiting → Ready (I/O completion).
A context switch is the process of saving the state of a running process/thread so another can run. The OS saves the PCB (Process Control Block) which includes: program counter, CPU registers, process state, memory management info (page table base address), accounting info, and I/O status. Context switching itself consumes CPU time — it's pure overhead. TLB flushes on some architectures add to this cost.
User-level threads (ULTs): managed by a user-space thread library (e.g., GNU Pth). Kernel sees only one process. Fast to create and switch. Problem: one blocking syscall blocks all threads in the process (no true parallelism on multi-core).
Kernel-level threads (KLTs): managed by OS. True parallelism on multi-core. Blocking syscall in one thread doesn't block others. Higher creation and switching overhead. Most modern systems (Linux POSIX threads, Windows threads) use KLTs.
Modern hybrid: M:N model — M user threads mapped to N kernel threads (Go's goroutines use an M:N model via the Go runtime scheduler).
fork() creates a new process (child) as an exact copy of the parent. Returns 0 to child, child's PID to parent. Uses copy-on-write (COW) — memory pages are shared until either process writes, then the page is copied. Child inherits: address space, file descriptors, signal handlers, environment variables. Child does NOT inherit: parent's PID, pending signals, file locks.
Zombie process: A process that has finished execution but whose entry is still in the process table because the parent hasn't called wait() to read its exit status. It holds no resources except the PCB entry. Fixed by parent calling wait().
Orphan process: A process whose parent has terminated before it. Adopted by init (PID 1) / systemd, which periodically calls wait(). Not a problem in itself — init takes care of it.
CPU Scheduling (Q7–Q15)
| Algorithm | Preemptive? | Advantage | Disadvantage |
|---|---|---|---|
| FCFS | No | Simple | Convoy effect — long jobs block short ones |
| SJF (Shortest Job First) | No / Yes (SRTF) | Optimal average wait time | Starvation of long jobs; need to predict burst time |
| Round Robin | Yes (time quantum) | Fair, good for time-sharing | High context switch overhead for small quanta |
| Priority | Yes/No | Allows important tasks to run first | Starvation of low-priority jobs (fix: aging) |
| Multilevel Feedback Queue | Yes | Adapts to process behavior | Most complex |
Non-preemptive: Once a process gets the CPU, it runs until it voluntarily yields (completes or blocks for I/O). Simple to implement. Problem: one long process starves others.
Preemptive: OS can forcibly take the CPU from a running process (e.g., higher-priority process arrives, time quantum expires). Better for interactive and real-time systems. Requires careful synchronization — a process can be interrupted mid-critical-section.
Starvation: A low-priority process never gets CPU time because higher-priority processes keep arriving. Can happen in SJF (long processes) and Priority scheduling.
Aging: The solution — gradually increase the priority of waiting processes over time. A process that has been waiting for 10 minutes gets higher priority than it had initially. Prevents indefinite starvation.
Deadlocks (Q10–Q18)
- Mutual Exclusion: At least one resource held in non-shareable mode.
- Hold and Wait: A process holds one resource while waiting for another.
- No Preemption: Resources can't be forcibly taken; only voluntarily released.
- Circular Wait: P1 waits for resource held by P2, P2 waits for resource held by P3, P3 waits for P1.
Prevention: Break any one condition. Most practical: break circular wait by assigning resource ordering (all processes acquire resources in a fixed order).
Dijkstra's Banker's Algorithm is a deadlock avoidance algorithm. Before granting a resource request, it simulates allocation and checks if the resulting state is safe — i.e., there exists an order in which ALL processes can complete. If safe, grant. If not, block the process until safe. Works by finding a safe sequence: a process that can finish with available resources, freeing its resources, then using those to let the next process finish, etc.
Limitation: Requires knowing maximum resource needs in advance — not practical for most real systems.
- Prevention: Design system so one Coffman condition can never hold. E.g., require all resources upfront (breaks Hold-and-Wait). Conservative, reduces concurrency.
- Avoidance: Dynamically check each request (Banker's Algorithm). Allows more concurrency than prevention but requires advance information.
- Detection: Allow deadlocks, detect them via Resource Allocation Graph (RAG) cycle detection, then recover. Practical for many systems.
- Recovery: After detection: (1) kill all deadlocked processes, (2) kill one at a time until deadlock broken, (3) preempt resources from victims.
Memory Management (Q13–Q25)
Paging: divides logical address space into fixed-size pages and physical memory into same-size frames. Pages map to frames via a page table. Eliminates external fragmentation. Internal fragmentation possible (last page partially filled).
Page table: per-process mapping of page number → frame number. Address translation: virtual address = (page number, offset) → page table lookup → frame number + offset = physical address.
TLB (Translation Lookaside Buffer): hardware cache for recent page table entries. TLB hit: O(1) lookup. TLB miss: walk page table, load entry into TLB, retry. TLB is flushed on context switch (or tagged with ASID to avoid full flush).
A page fault occurs when the CPU accesses a virtual page not currently in physical memory (valid bit is 0 in page table entry). The OS handles it:
- Find the page on disk (in swap space)
- If memory is full, select a victim frame (page replacement algorithm)
- Swap victim to disk if dirty
- Load the needed page into the freed frame
- Update page table, restart the faulting instruction
Types: Minor (page in memory but not mapped in process — e.g., shared library already loaded). Major (page must be read from disk — expensive). Invalid (illegal access → segfault / SIGSEGV).
| Algorithm | Policy | Page Faults | Drawback |
|---|---|---|---|
| FIFO | Evict oldest loaded page | Higher | Belady's anomaly — more frames can cause more faults |
| LRU | Evict least recently used page | Good approximation | Expensive to implement (need timestamp or stack) |
| Optimal (OPT) | Evict page used farthest in future | Minimum (theoretical) | Not implementable — requires future knowledge |
| Clock (Second Chance) | FIFO + reference bit | Close to LRU | Approximation only |
Linux uses a variant of the Clock algorithm with active/inactive LRU lists for practical page replacement.
Thrashing occurs when a process spends more time handling page faults than doing actual computation. It happens when too many processes are in memory, each with insufficient frames. The CPU utilization drops drastically — the OS may respond by adding more processes (to fill CPU), making it worse.
Solution: Working Set Model — track the set of pages a process actively uses (working set). Ensure each process has at least its working set size in frames. If total working set > available frames, suspend some processes.
Internal fragmentation: Allocated memory is larger than requested. Waste inside the allocated block. Happens with fixed-size allocation (paging). Example: a 4KB page used to store 3KB of data wastes 1KB.
External fragmentation: Sufficient total free memory, but fragmented into small non-contiguous blocks. No single allocation can satisfy a large request. Happens with variable-size allocation (segmentation, malloc without compaction). Solutions: compaction (expensive), paging (eliminates external fragmentation).
IPC & Synchronization (Q18–Q35)
A race condition occurs when the output of a program depends on the non-deterministic interleaving of multiple threads/processes accessing shared resources without proper synchronization. Classic example: two threads both read-increment-write a counter — final value is unpredictable.
Prevention: protect shared state with synchronization primitives — mutex locks, semaphores, monitors, atomic operations (compare-and-swap). Design to minimize shared mutable state (immutability, actor model, message passing).
Mutex (Mutual Exclusion lock): Binary lock — only the thread that acquired it can release it. Used for mutual exclusion. Supports priority inheritance to prevent priority inversion. Ownership concept: strictly one-to-one.
Semaphore: Counter-based. Counting semaphore (value N): up to N threads can acquire simultaneously. Binary semaphore (value 0/1): similar to mutex but no ownership — any thread can signal. Used for resource counting and producer-consumer signaling. Semaphore can be used across processes; mutex is typically within one process.
The critical section is a code segment that accesses shared resources and must not be executed by more than one thread at a time. A correct solution must satisfy 3 conditions:
- Mutual Exclusion: Only one process in the critical section at a time.
- Progress: If no process is in the CS and some want to enter, the decision can't be postponed indefinitely.
- Bounded Waiting: A limit exists on how many times other processes can enter the CS before a waiting process is granted access (no starvation).
Classic synchronization problem: producers add items to a fixed-size buffer, consumers remove items. Problems to avoid: producer adding to a full buffer, consumer removing from an empty buffer, simultaneous access.
Solution using semaphores: mutex (binary, protects buffer access), empty (counting, initialized to buffer size — tracks free slots), full (counting, initialized to 0 — tracks filled slots). Producer: wait(empty) → wait(mutex) → add item → signal(mutex) → signal(full). Consumer: wait(full) → wait(mutex) → remove item → signal(mutex) → signal(empty).
A monitor is a high-level synchronization construct — a class/module where only one method executes at a time. Mutual exclusion is implicit. Condition variables (wait()/signal()/notifyAll()) allow threads to wait for specific conditions. Java's synchronized blocks and Python's threading.Condition implement monitors.
Advantage over semaphores: safer and less error-prone. Semaphores require careful ordering of wait/signal; monitors enforce mutual exclusion structurally.
- Pipes: unidirectional byte stream. Anonymous pipes: between parent-child. Named pipes (FIFOs): between unrelated processes.
- Message Queues: kernel-maintained message lists. Processes send/receive typed messages. Asynchronous.
- Shared Memory: fastest IPC — processes map same physical pages. Requires explicit synchronization (semaphores/mutex).
- Semaphores (System V/POSIX): for synchronization between processes.
- Sockets: both local (Unix domain sockets) and network communication. Used by most real-world services.
- Signals: asynchronous notifications (e.g., SIGTERM, SIGSEGV, SIGKILL).
Priority inversion occurs when a high-priority task is blocked by a low-priority task holding a shared lock, while medium-priority tasks preempt the low-priority task. The high-priority task is effectively blocked by medium-priority work.
Mars Pathfinder (1997) experienced exactly this: a low-priority meteorological thread held a mutex, was preempted by medium-priority tasks, and the high-priority bus management thread starved — causing software resets. Fixed by enabling priority inheritance: when a low-priority task holds a lock needed by a high-priority task, it temporarily inherits the high priority until it releases the lock.
Spin lock: Thread actively polls ("spins") in a loop checking if the lock is free. No context switch. Efficient only if lock is held for a very short time and on multi-core systems. Wastes CPU on a single-core system (spinning prevents lock holder from running). Used in kernel interrupt handlers.
Mutex (blocking lock): If lock unavailable, thread blocks (sleeps) and is descheduled. CPU given to other work. Context switch overhead on lock/unlock. Preferred when lock is held for more than a few hundred nanoseconds. Most user-space locks are mutexes.
File Systems & I/O (Q26–Q35)
An inode is a data structure on disk that stores metadata about a file: file size, owner (UID/GID), permissions, timestamps (access, modification, change), number of hard links, and pointers to the data blocks where the file's content is stored. Importantly: the filename is NOT stored in the inode — it's in the directory entry (which maps filename → inode number). A file can have multiple hard links (multiple directory entries pointing to the same inode).
Hard link: Another directory entry pointing to the same inode. Deleting the original file doesn't remove data — only when link count drops to 0. Can't span filesystems. Can't link to directories (usually).
Symbolic (soft) link: A file containing the path to another file. Deletion of the target makes the symlink dangling (broken). Can span filesystems and link to directories. Most ls -l shortcuts are symlinks.
mmap() maps a file or device into the virtual address space of a process. Once mapped, reading/writing the memory region directly reads/writes the file — no explicit read()/write() syscalls needed. The OS handles file caching and writeback automatically.
Benefits: zero-copy data access, shared memory between processes (MAP_SHARED), efficient large file access. Used by: JVM class loading, database buffer pools, video players, shared libraries.
Buffered I/O (user-space buffering): C's fread/fwrite/printf. Data is first accumulated in a user-space buffer. When the buffer is full (or flushed), a single syscall moves the bulk data. Fewer syscalls → efficient for small frequent writes.
Unbuffered I/O (direct syscall): read()/write() syscalls. Each call transfers directly. Better for random-access patterns or when you need to know exact write timing (e.g., network protocols).
Logical (virtual) address: generated by the CPU, relative to the process's address space. Starts at 0 in most systems. What your program sees.
Physical address: actual location in RAM. Translated from logical address by the MMU (Memory Management Unit) using the page table. The key insight: every process can use the same logical address range (e.g., 0x400000) but they map to different physical addresses — this is virtual memory isolation.