.

Friday, April 5, 2019

Measuring Processes of Pipelining

Measuring Processes of PipeliningSakshi DuaAbstractDiscuss the method to measure the mental process of pipelining. Give a space- conviction diagram for visualizing the pipeline behavior for a four-stage pipeline. Also discuss whatsoever way to control a pipeline for collision guiltless trading ope balancens.IntroductionPipelining A pipelining is a serial of stages ,where some work is done at each stage .The work is not finished until it has passed with all stages.It is a technique employ in advanced microprocessors where the microprocessor begin executing a back teaching method before the first has been effectedThree performance measures pipeline are provided-Speed-up S(n)Throughput U(n) efficacy E(n) . quickening S(n)-Consider the execution of m tasks ( dictations) using n-stages (units) pipeline.n+m-1 time units are required to complete m tasks.it is assumed that the unit time T = t units.Speed-up S(n) = clipping using sequential processing-Time using pipeline processing = m * n * t(n + m 1)* t= m * nn + m -1Lim S(n) = nmi.e. n fold increase in speed is theoretically possible.Throughtput T(n)-Throughtput U(n)= of task executed per unit time= m-(n + m 1)* tLim U(n) = 1mEfficiency E(n)-Efficiency E(n) = Ratio of the actual speed-up to maximum speed-up= speed-up-n= mn + m -1Lim E(n) = 1mSpace Time plot For Four Stage PipelineThe behavior of pipeline can be illustrated with space time diagram that the segment or stage utilization as a utilisation of time .The horizontal axis displays the time in clock cycles and the vertical axis gives the segment number.The Diagram shows 6 tasks T1 through T6 executed in 4 segments.Task T1 is handled by segment 1.after the first clock,segment 2 is busy with T1,while segment is busy with task T2.Continuing in this manner,the first task T1 is completed after the twenty-five percent clock cycle.From then on,the pipe completes a task e truly clock cycle.clockI/psS1 R1 S2 R2 S3 R3 S4 R4DIAGRAM FOUR STAGE PIPELINEcloc kStage1234SPACE TIME DIAGRAM FOR PIPELINEFor example-Consider the case where n- stages pipeline with a clock cycle time tp is used to execute m tasks. The first task t1 requires a time compare to ntp to complete its operation since there are n stages in the pipe. The remaining m-1 task make out from the pipe at the rate of one task per clock cycle and they depart be completed after a time equal to (m-1)tp. Therefore, to complete m tasks using a n-stages pipeline requires n+(m-1) clock cycles.For eg. Above diagram shows four stages and 6 tasks. The time required to complete all the operations is4+(6-1)=9 clock cycles.Consider a non pipeline unit that performs the same operation and takes a time equal to tn to complete each task. The total time required m tasks is mtn. The speedup of a pipeline processing over a equivalent non pipeline processing is defined by the ratioS(n)= mtn(n+m-1)tpAs the no. Of tasks increases , m becomes much larger than m-1 and n+m-1 approaches the value o f m. Under the condition , the speedup becomesS(n)= tntpAssume that the time it takes to process a task is the same in the pipeline and non pipeline traffic circles, we will have tn = ntp including this assumption, the speedup reduces toS(n)= ntp= NtpThis shows that the theoretical max. Speedup that a pipeline can provide is n,where n is the no. Of stagessegments in the pipeline.To clarify the meaning of the speedup ratio, let the time it takes to process a suboperation in each segment be equal totp=20 nsAssume that the pipeline has n stages and executes n = degree Celsius tasks in sequence. The pipeline system will take (n+m-1)tp=(4+99)*20=2060 ns to complete.Assuming that tn=mtp4*20=80 ns,A non pipeline system requires mntp=100*80=8000 ns to complete the 100 taks. The speedup ratio is equal to the8000/2060=3.88.As the no. Of tasks increases,the speedup will approach 4, which is equal to the no. Of stages in the pipeline.If assume thattn=60 ns,the speedup becomes60/20=3.Some way t o control a pipeline for collision free operationsTo avoid the collision in data dependency operation areHardware InterlocksIt is an enmesh circuit that detects instructions whose source operands are destinations of instructions farther up in the pipeline. Detection of the site causes the instruction whose source is not ready(prenominal) to be slow down by enough clock cycles to suffice the collision. This way the weapons platform maintains the sequence by using hardware to insert the required delays.Operand ForwardingIt uses finical hardware to detect a collision and then avoid it by routing the data through special paths between pipeline stages. This method requires additional hardware paths through multiplexers as well as the circuit that detects the collision.Delayed LoadIt solves the data collision problem to the compiler that translates the high level language into a machine language program. The compiler for such computers is designed to detect a data collision and o rder the instructions as necessary to delay the loading of the collisioned data by inserting no-operation instructions. This way is referred to as retard load.To avoid the collision in branch instructions operations arePrefetch Target InstructionThis is used to handling a conditional branch is to prefetch the target instruction in the additional to the instruction following branch. Bath are saved until the branch is executed. If the branch condition is successful, the pipeline continues from the branch target instruction. An citation the procedure is to continue fetching instructions from both places until the branch decision is of the correct program flow. split Target BufferThe BTB is an associative memory included in the fetch segment of the pipeline. Each entrance in the BTB consists of the address of a previously executed branch instruction and the target instruction for that branch. It stores the raw(a) few instructions after the associative memory BTB for the address of t he instruction . If it is in the BTB,the instruction is gettable directly and prefetch continues from the new path. If the instruction is not in the BTB, the pipeline shifts to a new instruction burgeon forth and stores the target instruction in the BTB. Advantage is that branch instruction occurred previously are readily available in the pipeline without interruption.Load BufferA Variation of the BTB is the load buffer. This is a small very high speed register file maintained by instruction fetch segment of the pipeline. When a program loop is detected in the program, it is stored in the loop buffer in its entirely, including all branches. The program loop can be executed directly without having to access memory until he loop rule is removed by final branching out.Branch PredictionA pipeline with branch portent uses some additional logic to guess he outcome of a conditional branch instruction before it is executed . The pipeline then beginsrefetching the instruction stream from the predicted path. A correct prediction eliminates the diminished time caused by branch penalties.Delayed BranchThis is the way to employed in RICS processors is the delayed branch. In the procedure, the compiler detects the branch instruction and instruction hat keep the pipeline operating without interruptions. An example of delayed branch is the insertion of a no operation instruction after a branch instruction . This causes the computer to fetch the target instruction during the execution of the no-operation instruction ,allowing a continuous flow of the pipeline.

No comments:

Post a Comment